티스토리 뷰
What is covariance and correlation?
Covariance is the sum of the product of two centered variables. It measures the direciton of the two varaible's linear relationship.
Correlation is the standardize covariance, which means that it eqauls to covariance divided by standard deviation of two variables. It measures both the direciton and strength of the two variable's lienar relationship.
The statement that correlation captures the strength of the linear relationship means that the magnitude of the correlation value (r - Pearson’s correlation coefficient) represents the strength. 0-> -1 stronger. 0 -> 1 stronger. Different from correlation, the magnitude of the covariance doesn't mean anything because it is not standardized and affected by the scaling of the data. For example, when all data is multiplied by the same value, the correlation doesn't change but the covariance change. The strength of linear relationship does not change.
What are covariance matrix and correlation matrix? How are covariance and correlation matrix related?
Given n x p data matrix, n being the number of samples and p being the number of variables (e.g. x and y coordinate of sample, features of a data point), the covariance matrix is a p x p matrix that is product of the tranpose of the centered data matrix and the centered data matrix itself and dividing by n. Centering is done by subtracting each variable (column)'s mean from each variable (e.g., centering xy location of samples). The correlation matrix is the standardized version of the covariance matrix. It divides the centered data matrix with the standard deviation per column and does the tranpose and multiplication and dividing by n.
When do you use covariance matrix and correlation matrix repsectively?
You use covariance matrix when variables (e.g. x and y coordinates) have similar scale and use correlation matrix when variables have very different scale.
https://builtin.com/data-science/covariance-vs-correlation
Bernard Flury, in his excellent book introducing multivariate analysis, described this as an anti-property of principal components. It's actually worse than choosing between correlation or covariance. If you changed the units (e.g. US style gallons, inches etc. and EU style litres, centimetres) you will get substantively different projections of the data.
The argument against automatically using correlation matrices is that it is quite a brutal way of standardising your data. The problem with automatically using the covariance matrix, which is very apparent with that heptathalon data, is that the variables with the highest variance will dominate the first principal component (the variance maximising property).
So the "best" method to use is based on a subjective choice, careful thought and some experience.
실제 사용
관련 영상
https://www.youtube.com/watch?v=PjeOmOz9jSY&list=PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv&index=13
이거만 다 기억하면 인터뷰 때 당황안할듯.
'Research (연구 관련)' 카테고리의 다른 글
What is DDPM and DDIM? (0) | 2024.03.27 |
---|---|
Laplacian Smoothing / GraphCNN (0) | 2024.03.04 |
Norm (0) | 2024.02.27 |
R1 regularization (0) | 2024.02.27 |
Image Processing II (0) | 2024.02.27 |
- Total
- Today
- Yesterday
- focal length
- 머신러닝
- demo
- deep learning
- 헬스
- Interview
- 2d pose
- part segmentation
- pyrender
- spin
- 에디톨로지
- 컴퓨터비젼
- 컴퓨터비전
- nohup
- 인터뷰
- VAE
- Generative model
- 문경식
- Virtual Camera
- nerf
- Transformation
- densepose
- Docker
- 비전
- 피트니스
- Pose2Mesh
- camera coordinate
- pytorch
- world coordinate
- Machine Learning
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |