The various statistical methods discussed up to now consist only one variable. But in practice, we may come across a number of problems consisting of two or more variables. Distribution consisting of two variables are said to be bivariate distribution. In this chapter, we discuss various methods to determine if there exists a relationship between two variables. As for example, the amount of rainfall and the volume of production of a certain commodity, age and the blood pressure etc.
To variables are said to have a correlation, when they are so related that the change in the value of one variable is accompanied by the change of the value of the other. For example, The amount of rainfall to some extent is accompanied by the increase in the volume of production , the decrease in the price of a commodity is accompanied by the increase in the quantity demanded, an increase in advertisement expenditure is accompanied by the increase in sales. The measure correlation called the correlation coefficient summarises in one figure, the degree and direction of movement. But the important thing that is to be noticed here is that correlation analysis only helps in determining the extent to which the two variables are correlated but it does not tell us about cause and effect relationship. Though there is a high degree of correlation between two variables one cannot say which one is the cause and which one the cause and which one is the effect.
Types of Correlation
Correlation may be of the following types:
Positive and Negative Correlation
If two variables vary in the same direction i.e. increases or decrease in the value of one variable results increase or decrease in the value of other variables, then the two variables are said to have a positive correlation. For examples:
X: | 100 | 50 | 30 | 10 |
Y: | 8 | 5 | 3 | 2 |
X: | 10 | 20 | 25 | 50 |
Y: | 5 | 8 | 10 | 20 |
One the other hand two variables are said to have a negative correlation if two variables move in opposite direction i.e. If one variable increases or decreases the second decreases or increases. For example:
X: | 10 | 20 | 25 | 50 |
Y: | 50 | 20 | 10 | 8 |
X: | 100 | 50 | 30 | 10 |
Y: | 1 | 2 | 3 | 7 |
Linear and non-linear correlation
The correlation between tow variables is said to be linear when a unit change in one variable results from a constant in the other variable over the entire range of the values. As for examples:
X: | 1 | 2 | 3 | 4 |
Y: | 7 | 9 | 11 | 13 |
If corresponding to a unit change in one variable, there is not constant change in another variable, then the correlation is said to be non-linear. As for example:
X: | 1 | 2 | 3 | 4 |
Y: | 7 | 10 | 11 | 20 |
Methods of studying correlation:
The following methods can be used to study the correlation between two variable.
Scatter diagram
It is a graphical method of studying correlation. The simplest method of ascertaining the correlation between two variables is the scatter diagram. For this let X and Y be two variables, each consisting the same number of values. Points are plotted with the values of X as x-coordinates and the corresponding values of Y as y-coordinates. The points are represented by dots. The diagram consisting of the set of dots thus formed is said to be the scatter diagram. On seeing the scattering of the dots thus formed is said to be the scatter diagram. On seeing the scattering of the dots, an idea about the degree and the direction of correlation between two variables can be made. More the closeness of the dots to a straight line, higher will be the correlation between two variables. Greater the scattering less will be the correlation.
Karl Pearson’s Correlation Coefficient
The degree of association between two paired variables can be mathematically measured by Karl Pearson’s coefficient of correlation which was developed by the famous British statistician, Karl Pearson. It is one of the most widely used methods of calculating the correlation coefficient between two variables. It is also known as Pearson correlation coefficient. It is denoted by
$$r=\frac{Cov(X,Y)}{\sqrt{var(X)}\sqrt{var(Y)}}$$
$$where\;Cov(X,Y)=\frac{1}{n}\;\Sigma\;(X-\overline{X})(Y-\overline{Y})$$
\(\overline{X}\;and\overline{Y}\) being the arithmetic averages of X-series and Y-series respectively. The formula above can be put in the following forms:
$$r=\frac{\;\Sigma\;(X-\overline{X})(Y-\overline{Y})}{\sqrt{\Sigma\;(X-\overline{X})^2}\sqrt{\Sigma\;(Y-\overline{Y})^2}}$$
$$If,x=(X-\overline{X})\;and\;y=(Y-\overline{Y})$$
$$r=\frac{\Sigma\;xy}{\sqrt{x^2}\sqrt{y^2}}$$
$$Also\;r=\frac{\;\Sigma\;xy}{\;n\sigma_x\;\sigma_y}$$
On simplification:
$$r=\frac{n\Sigma\;XY-\Sigma\;X\Sigma\;Y}{\sqrt{n\Sigma\;X^2-(\Sigma\;X)^2}\sqrt{n\Sigma\;Y^2-(\Sigma\;Y)^2}}$$
Again,
$$r=\frac{\Sigma\;XY-n\overline{X}\overline{Y}}{\sqrt{\Sigma\;X^2-n\overline{X}^2}\sqrt{\Sigma\;Y^2-n\overline{Y}^2}}$$
Computation of correlation coefficient using product moment formula will be tediuos if the arithmetic mean be not a whole number . So, to avoid such a problem, we put \(u=\frac{x-a}{h}\;and\;v=\frac{y-b}{k}\;\) where a,b,h and k are constants. Then,
$$r=\frac{n\Sigma\;uv-\Sigma\;u\Sigma\;v}{\sqrt{n\Sigma\;u^2-(\Sigma\;u)^2}\sqrt{n\Sigma\;v^2-(\Sigma\;v)^2}}$$
This method is the simplest formula to calculate correlation coefficient r.
Taken reference from
( Basic mathematics Grade XII and A foundation of Mathematics Volume II and Wikipedia.com )
Correlation may be of the following types:
$$r=\frac{\Sigma\;XY-n\overline{X}\overline{Y}}{\sqrt{\Sigma\;X^2-n\overline{X}^2}\sqrt{\Sigma\;Y^2-n\overline{Y}^2}}
Computation of correlation coefficient using product moment formula will be tediuos if the arithmetic mean be not a whole number . So, to avoid such a problem, we put \(u=\frac{x-a}{h}\;and\;v=\frac{y-b}{k}\;\) where a,b,h and k are constants. Then,
$$r=\frac{n\Sigma\;uv-\Sigma\;u\Sigma\;v}{\sqrt{n\Sigma\;u^2-(\Sigma\;u)^2}\sqrt{n\Sigma\;v^2-(\Sigma\;v)^2}}
This method is the simplest formula to calculate correlation coefficient r.
ASK ANY QUESTION ON Correlation
No discussion on this note yet. Be first to comment on this note