ANOVA is used to determine whether there are any statistically significant differences between the means of three or more independent groups. In the case of only two samples, we can use the t-test to compare the means of the samples, but in the case of more than two samples, it may be very complicated. We are going to study the relationship between quantitative dependent variable returns and single qualitative independent variable stock. We have five levels of stock: stock1, stock2, .. stock5.
We can study the five levels of stock by means of a box plot and we can compare by executing the following code:
> DataANOVA = read.csv("C:/Users/prashant.vats/Desktop/Projects/BOOK R/DataAnova.csv") >head(DataANOVA)
This displays a few lines of the data used for analysis in tabular format:
|
| |
1 |
1.64 |
Stock1 |
2 |
1.72 |
Stock1 |
3 |
1.68 |
Stock1 |
4 |
1.77 |
Stock1 |
5 |
1.56 |
Stock1 |
6 |
1.95 |
Stock1 |
>boxplot(DataANOVA$Returns ~ DataANOVA$Stock)
This gives the following output and box plots it:
Figure 3.9: Box plot of different levels of stock
The preceding box plot shows that level stock has higher returns. If we repeat the procedure, we are most likely going to get different returns. It may be possible that all the levels of stock give similar numbers and we are just seeing random fluctuation in one set of returns. Let us assume that there is no difference at any level and it is our null hypothesis. Using ANOVA, let us test the significance of the hypothesis:
> oneway.test(Returns ~ Stock, var.equal=TRUE)
Executing the preceding code gives the following outcome:
Figure 3.10: Output of ANOVA for different levels of stock
Since the Pvalue is less than 0.05, the null hypothesis gets rejected. The returns at the different levels of stock are not similar.