Correlation Analysis

Correlation analysis involves examining the relationship between two variables. The presence of correlation should not be interpreted as meaning causation. Two variables can be highly correlated and not be related in a cause and effect manner. For example, one could take the number of ice cream cones sold in the city and correlate this with the number of drowning deaths. If there is a direct correlation discovered, does this mean that drowning deaths could be reduced by cutting down the number of ice cream cones sold? Of course not, there must be some other variable that could cause the increase in drowning deaths.

When examining cause and effect, it is important to consider three issues:

Does variable A come before variable B. For example studying comes before taking test. Increase in test score could be caused by increase in studying.

Are there any other variables that could account for increase in test score? Everyone takes same test and received same lecture notes; only amount of studying is different.

Are the two variables related to each other? This is where correlation analysis comes into the equation. If the variables Hours of Studying and Test Score are related then there could be a cause and effect relationship.

Considering all of this, the question may arise as to why someone would bother with correlation analysis. The answer is that correlation analysis can allow the researcher to yield better-than-chance predictions about relationships. If we know that events are correlated and we can determine the direction of the relationship then we can possibly control future events. For Example: We discover that Hours of Studying and Test Score are related. Further, as Hours of Studying increases, Test Scores increase. Knowing this we can increase our hours of studying to improve our grades.

There are three general forms of correlation:

Positive correlation – produced when high scores on one variable are present with high scores on a second variable. Inversely, low scores on one variable result in low scores on the second variable.

Negative correlation – produced when high scores on one variable are associated with low scores on the second variable. Inversely, low scores on one variable are associated with high scores on the second variable.

Zero correlation – there is no relationship between the variables. High scores on one variable could just as easily result in high or low scores on the second variable.

How to Determine the Form of Correlation

Correlation is obtained through the examination of a correlation value. When the variables are metric, correlation is examined through the value for Pearson’s r. Values for Pearson’s r range from –1.00 to +1.00, with values of +1.00 indicating a perfect correlation. For example, if studying and test scores were correlated at +1.00, then increases in studying hours would always result in increased test scores. It is extremely rare to find a perfect correlation. Inversely, a value of –1.00 would indicate that increases in studying hours would always result in decreases in test scores. A value of zero means that there is no correlation.

Strength of correlation increases as the value for Pearson’s r approaches 1.00, regardless of whether the sign is positive or negative. The strength of correlation is as follows:

Less than .20, the correlation is slight; almost negligible relationship

.20-.40, the correlation is low; definite but small relationship

.40-.70, the correlation is moderate; substantial relationship

.70-.90, the correlation is high; marked relationship

.90-1.00, the correlation is very high; very dependable relationship

If the correlation is positive, the relationship is said to be direct. In this case, as one variable increases the other variable will increase as well.

If the correlation is negative, the relationship is said to be inverse. In this case, as one variable increases the other variable will decrease.

Once a value for Pearson’s r is calculated, the next step is to determine the significance of the Pearson’s r value. This testing is done to determine whether the correlation can be inferred back to the general population. When testing for the significance of the Pearson’s r value we are testing the null hypothesis that there is no statistically significant relationship between the two variables and that any perceived relationship is due to chance or sampling error. The null hypothesis is written as follows:

The value for calculated Pearson’s r is compared with a table of known critical values and if the calculated value is greater than the table value then the null hypothesis is rejected. If the calculated value is less than the table value then the null hypothesis is rejected (or as some prefer to indicate: we fail to reject the null hypothesis).

There are 5 requirements to using Pearson’s r:

subjects have been randomly selected

the presence of normality

the association between the variables are linear

measurements must be at least interval data.

If the measurements are not at least interval data, and are instead ordinal data (or dichotomous dummy coded variables) then the statistical technique of choice is Spearman’s rho. Spearman’s rho is a nonparametric statistical technique, meaning that the statistical technique does not require normal distributions or data that is in interval or ratio scale.

Spearman’s rho is interpreted in much the same manner as Pearson’s r, and the significance of the correlation is expressed in the same manner. The null hypothesis is that there is no relationship between the variables and that any perceived relationship is due to chance or sampling error. The hypotheses are written as follows:

A correlation value is calculated and then compared to a table of critical values. If the calculated value is greater than the table value, then the null hypothesis is rejected. If the calculated value is less than the table value, then the null hypothesis is accepted.

To calculate correlation between variables in SPSS:

Open a data file

Analyze

Correlate

Bivariate

Move the variables of interest into the "variables" box

If the analysis conducted is Pearson’s r, then ensure that the box next to Pearson is checked. If the analysis conducted is Spearman’s rho, then ensure that the box next to Spearman is checked.

Click OK