What is Equivariance in Computer Vision?

티스토리 뷰

Research (연구 관련)

What is Equivariance in Computer Vision?

홍돌 2024. 6. 28. 12:44

What is SO(3)?

SO(3) is a special orthogonal group, which ia set of 3x3 matrices that transform a 3d point without changing distances between two points (isometry), invertible, and has +1 determinant. It is a proper rotation matrix. An isometry with -1 determinant is an improper rotation matrix or a reflection + rotation matrix.

A proper rotation matrix with determinant 1, denoted by R(nˆ, θ), represents a counterclockwise rotation by an angle θ about a fixed axis nˆ. An improper rotation with determinant −1, denoted by R(nˆ, θ) represents a reflection through a plane that passes through the origin and is perpendicular to a unit vector nˆ (called the normal to the reflection plane) together with a counterclockwise rotation by θ about nˆ. https://scipp.ucsc.edu/~haber/ph116A/rotation_11b.pdf

https://www.quora.com/What-is-the-relationship-between-rotation-matrices-and-reflection-matrices

https://github.com/hongsukchoi/3DCrowdNet_RELEASE/blob/79284d533cd88515aec08fa04ee0eb50f847c6af/common/utils/transforms.py#L36
def rigid_transform_3D(A, B):
    n, dim = A.shape
    centroid_A = np.mean(A, axis = 0)
    centroid_B = np.mean(B, axis = 0)
    H = np.dot(np.transpose(A - centroid_A), B - centroid_B) / n
    U, s, V = np.linalg.svd(H)
    R = np.dot(np.transpose(V), np.transpose(U))
    if np.linalg.det(R) < 0:
        s[-1] = -s[-1]
        V[2] = -V[2]
        R = np.dot(np.transpose(V), np.transpose(U))

    varP = np.var(A, axis=0).sum()
    c = 1/varP * np.sum(s) 

    t = -np.dot(c*R, np.transpose(centroid_A)) + np.transpose(centroid_B)
    return c, R, t

What is isometry?

Isometry is a transformatin that maps elements in a metrics space to the same or another space without changing the distances between two instances. In 2D or 3D Euclidean space, Direct isometry is a rigid motion (rotation, translation). Opposite isometry is a refelection. Composition of a rigid motion is also isometry.

https://en.wikipedia.org/wiki/Isometry

What is SE(3)?

In 3D space, SE(3) is the group of simultaneous rotation and translation.

R의 determinant가 +1을 만족하며, affine rigid motion을 이루는 group을 special Euclidean group( $S E (3)$ )라고 부른다.(affine: 평행선의 관계는 유지되는 변환, rigid(isometry): 각 쌍의 점들의 거리가 변하지 않는 변환)

https://jinyongjeong.github.io/2016/06/07/se3_so3_transformation/

What is Equivariance in computer Vision, chemistry, and physical modeling?

https://ccr-cheng.github.io/blog/2022/tfn/

x를 3d point cloud, f를 neural network, g를 rotation transformation으로 대입하면, 여기서 group homomorphism T은 뭐지. 어떤 함수이지? 그 때 그때 다르지. group homomorphism을 만족하기만 하면 됨. 그냥 identity function일 수도 있고. 예를 들어 입력이 이미지고 출력이 global human rotation인 neural network.

What is group theory? What is group homomorphism?

Group is a pair of (set, operation). In mathematics, given two groups, (G,∗) and (H, ·), a group homomorphism from (G,∗) to (H, ·) is a function h : G → H such that for all u and v in G it holds thatℎ(𝑢∗𝑣)=ℎ(𝑢)⋅ℎ(𝑣), where the group operation on the left side of the equation is that of G and on the right side that of H.

https://en.wikipedia.org/wiki/Group_homomorphism

I think the above description is not intuitive. This explanation from the TFN paper is better.

With respect to our goal, L is a equivariant neural network and D is the rotation.

https://slides.com/mariogeiger/e3nn_mrs_2021/#/18

What is Wigner-D matrices?

Wigner-D matrices are irreducible representations of SO(3).

https://math.stackexchange.com/questions/4361717/a-question-about-irreducible-representations-of-so3-group

Note: SO(3) can also apply to type-l vectors with l > 1.

What is spherical harmonics?

https://redstarhong.tistory.com/325

How are Spherical Harmonics related to an equivariance ?

In summary:

Spherical Harmonics are equivariant to SO(3).

Background:

Edge features in Tensor Field Network is defined as here:

The rotation of the spherical harmonics transforms into a linear combination of the spherical harmonics of the same degree (Eq.(9)).

The rotation of a real spherical function with m = 0 and ℓ = 3 . The coefficients are not equal to the Wigner D-matrices, since real functions are shown, but can be obtained by re-decomposing the complex functions. - https://en.wikipedia.org/wiki/Spherical_harmonics#Rotations

Eq (9) can be expressed as these two (they are the same):

The radial network is a neural network function that on a Euclidean space Rⁿ whose value at each point depends only on the distance between that point and the origin. The distance is usually the Euclidean distance. So the edge feature will be equivariant under rotation.

What is Tensor Field Network?

The input and output of each layer of a tensor field network is a finite set S of points in R3 and a vector in a representation of SO(3) associated with each point.

(I think) The vector x_a or y_a is a multidimensional array, where there are multiple dimensions (l dimensions, where 2l+1 dim) for different representations (or spherical tensors with different dimensions https://ccr-cheng.github.io/blog/2022/tfn/). This multidimensional array is denoted as

What is a point convolution filter in TFN?

neighbor node 한 개에 대한 filter 정의인듯. 그리고 Spherical harmonics parameters들은 learning하는게 아니고 fixed 인거같음 -> Spherical harmonics have well-defined mathematical forms, typically denoted as Ylm(θ,ϕ), where and m are integers representing the degree and order of the harmonic, and θ and ϕ are the spherical coordinates. These functions are fixed and do not change during the training process of the network. They are precomputed and used as part of the network's architecture to enforce rotational equivariance.

What is Tensor Product in TFN?

Our filters and layer input each inhabit representations of SO(3) (that is, they both carry l and m indices). In order to produce output that we can feed into downstream layers, we need to combine the layer input and filters in such a way that the output also transforms appropriately (by inhabiting a representation of SO(3)).

Dimension은 Clebsch-Gordan coefficients 을 이용해 줄이는 듯함

Self-interaction이랑 nonlinear layer는 나중에. 일단 다뤘다고 치고, TFN의 한 layer는 다음과 같은 수식임:

가장 많이 참고한 소스:

https://ccr-cheng.github.io/blog/2022/tfn/

https://slides.com/mariogeiger/e3nn_mrs_2021/#/68/0/2

https://ablondegeek.github.io/pdfs/20181208_tensorfieldnetworks_neurips_mol_and_mat.pdf

위키피디아, 챗지피티..

TFN 논문: https://arxiv.org/pdf/1802.08219

07.17.2024 SE(3) Transformer읽고

What is weight-tying?

A positive corollary of equivariance is increased weight-tying within the model.

the translational weight-tying of convolutions

이라는 문장 (SE(3) Transformer 논문)으로 봐서는 같은 weight으로 transformed된 input 대응하는 거 의미하는듯.

How does "The attention weights add extra degrees of freedom to the TFN kernel in the angular direction."?

Spherical harmonics parameter가 precomputed 된 것이 TFN의 한계인데, attention weight은 계산할 때 learnable matrix를 사용하니까 이렇게 표현한것같음.

저작자표시 (새창열림)

'Research (연구 관련)' 카테고리의 다른 글

What is Conformal Prediction? (1)	2024.07.23
What are "Spherical harmonics"? (0)	2024.07.01
What is JAX linen (nn) Module? (2)	2024.06.20
What is VLM? (0)	2024.05.09
What is Variational Score Distillation? (0)	2024.04.01

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

홍석쓰 블로그

티스토리 뷰