티스토리 뷰

What is covariance and correlation?

Covariance is the sum of the product of two centered variables. It measures the direciton of the two varaible's linear relationship.

Correlation is the standardize covariance, which means that it eqauls to covariance divided by standard deviation of two variables. It measures both the direciton and strength of the two variable's lienar relationship.

The statement that correlation captures the strength of the linear relationship means that the magnitude of the correlation value (r - Pearson’s correlation coefficient) represents the strength. 0-> -1 stronger. 0 -> 1 stronger. Different from correlation, the magnitude of the covariance doesn't mean anything because it is not standardized and affected by the scaling of the data. For example, when all data is multiplied by the same value, the correlation doesn't change but the covariance change. The strength of linear relationship does not change.

What are covariance matrix and correlation matrix? How are covariance and correlation matrix related?

Given n x p data matrix, n being the number of samples and p being the number of variables (e.g. x and y coordinate of sample, features of a data point), the covariance matrix is a p x p matrix that is product of the tranpose of the centered data matrix and the centered data matrix itself and dividing by n. Centering is done by subtracting each variable (column)'s mean from each variable (e.g., centering xy location of samples). The correlation matrix is the standardized version of the covariance matrix. It divides the centered data matrix with the standard deviation per column and does the tranpose and multiplication and dividing by n.  

When do you use covariance matrix and correlation matrix repsectively?

You use covariance matrix when variables (e.g. x and y coordinates) have similar scale and use correlation matrix when variables have very different scale.

https://builtin.com/data-science/covariance-vs-correlation


Bernard Flury, in his excellent book introducing multivariate analysis, described this as an anti-property of principal components. It's actually worse than choosing between correlation or covariance. If you changed the units (e.g. US style gallons, inches etc. and EU style litres, centimetres) you will get substantively different projections of the data.

The argument against automatically using correlation matrices is that it is quite a brutal way of standardising your data. The problem with automatically using the covariance matrix, which is very apparent with that heptathalon data, is that the variables with the highest variance will dominate the first principal component (the variance maximising property).

So the "best" method to use is based on a subjective choice, careful thought and some experience.

https://stats.stackexchange.com/questions/53/pca-on-correlation-or-covariance#:~:text=Using%20the%20correlation%20matrix%20is,when%20the%20scales%20are%20different.


 

 

Covariance vs. Correlation: Differences to Know

Understanding covariance vs. correlation is a key part of probability and statistics. Here’s what to know about these two concepts.

builtin.com

 

실제 사용

https://logicatcore.github.io/scratchpad/lidar/sensor-fusion/jupyter/2021/04/20/2D-Oriented-Bounding-Box.html

 

2D Oriented bounding boxes made simple

Calculating 2D oriented bounding boxes. Oriented boxes are useful to avoid obstacles and make best utilitsation of the real navigationable space for autonomous vehicles to steer around.

logicatcore.github.io

 

관련 영상

https://www.youtube.com/watch?v=PjeOmOz9jSY&list=PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv&index=13

이거만 다 기억하면 인터뷰 때 당황안할듯.

'Research (연구 관련)' 카테고리의 다른 글

What is DDPM and DDIM?  (0) 2024.03.27
Laplacian Smoothing / GraphCNN  (0) 2024.03.04
Norm  (0) 2024.02.27
R1 regularization  (0) 2024.02.27
Image Processing II  (0) 2024.02.27
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/01   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함