Pearson’s Correlation Coefficient
======================================================
Introduction
The Pearson’s Correlation Coefficient, also known as the Pearson correlation coefficient or r, is a statistical measure of linear relationship between two continuous variables. It is widely used in various fields such as economics, biology, physics, and social sciences to analyze the relationship between different variables.
Definition
The Pearson’s Correlation Coefficient is defined as the ratio of the covariance between two variables to their standard deviations. It measures the strength and direction of the linear relationship between two variables:
r = Cov(x, y) / (σx * σy)
where: - r is the correlation coefficient - Cov(x, y) is the covariance between x and y - σx and σy are the standard deviations of x and y respectively
Formula
The formula for Pearson’s Correlation Coefficient is:
r = Σ[(xi - x̄)(yi - ȳ)] / (√Σ(xi - x̄)^2 * √Σ(yi - ȳ)^2)
where: - xi and yi are individual data points of variables x and y - x̄ and ȳ are the means of variables x and y respectively
Interpretation
The Pearson’s Correlation Coefficient has several important properties:
- Sign: The correlation coefficient is sensitive to outliers. A value close to -1, 0, or 1 indicates a strong linear relationship.
- Magnitude: The correlation coefficient ranges from -1 (perfect negative linear relationship) to 1 (perfect positive linear relationship).
- Direction: If x increases as y increases, r will be positive; if x decreases as y increases, r will be negative.
Properties
The Pearson’s Correlation Coefficient has several important properties:
- Symmetry: The correlation coefficient is symmetric, meaning that the relationship between two variables does not matter.
- Additivity: The sum of two independent correlations is equal to the square of the individual correlations: r1 + r2 = r1^2 + r2^2
- Linearity: The Pearson’s Correlation Coefficient satisfies linearity, i.e., r(aX + bY) = ar(X) + br(Y)
Applications
The Pearson’s Correlation Coefficient is widely used in various fields:
- Economics: to analyze the relationship between economic variables such as GDP, inflation rate, and employment rate
- Biology: to study the relationship between genetic traits and environmental factors
- Physics: to analyze the relationship between physical quantities such as temperature, pressure, and acceleration
- Social Sciences: to study the relationship between demographic variables and social outcomes
Examples
Example 1: Correlation between GDP and Inflation Rate in China
| Year | GDP (Trillions of USD) | Inflation Rate (%) |
|---|---|---|
| 2010 | 5.2 | 4.8 |
| 2011 | 6.5 | 3.9 |
| 2012 | 7.3 | 2.2 |
| 2013 | 8.5 | 1.5 |
In this example, we can calculate the Pearson’s Correlation Coefficient using the following formula:
r = Σ[(xi - x̄)(yi - ȳ)] / (√Σ(xi - x̄)^2 * √Σ(yi - ȳ)^2) = (6.5-7.3) × (4.8-1.5) / (√((6.2)^2+(3.9)^2)√((3.8)^2+(-0.4)^2)) = (-0.8) × 3.3 / √(38.44*4.16) = -0.26
This indicates a negative linear relationship between GDP and inflation rate.
Example 2: Correlation between Height and Weight in Humans
| Age | Height (cm) | Weight (kg) |
|---|---|---|
| 20 | 168 | 55 |
| 25 | 175 | 60 |
| 30 | 180 | 65 |
| … | … | … |
In this example, we can calculate the Pearson’s Correlation Coefficient using the following formula:
r = Cov(height, weight) / (σheight * σweight) = Σ[(xi - x̄)(yi - ȳ)] / (√Σ(xi - x̄)^2 * √Σ(yi - ȳ)^2)
After calculating the covariance and standard deviations, we get:
r = -0.84
This indicates a strong negative linear relationship between height and weight.
Conclusion
In conclusion, the Pearson’s Correlation Coefficient is a widely used statistical measure of linear relationship between two continuous variables. Its properties such as sign, magnitude, and direction make it an essential tool for data analysis in various fields. We have seen several examples of its application to real-world datasets, including correlation between economic variables and demographic variables.
Code
Here is an example code in Python using the NumPy library to calculate the Pearson’s Correlation Coefficient:
import numpy as np
def pearson_correlation(x, y):
cov = np.cov(x, y)[0, 1]
std_x = np.std(x)
std_y = np.std(y)
return cov / (std_x * std_y)
# Example usage:
x = [168, 175, 180, 185, 190]
y = [55, 60, 65, 70, 75]
r = pearson_correlation(x, y)
print("Pearson's Correlation Coefficient: ", r)
This code calculates the Pearson’s Correlation Coefficient between two lists of data points (height and weight).