Perform the following steps:
- First, we visualize the attribute, mpg, against am using a boxplot:
> boxplot(mtcars$mpg, mtcars$mpg[mtcars$am==0], ylab = "mpg",
names=c("overall","automobile"))
> abline(h=mean(mtcars$mpg),lwd=2, col="red")
> abline(h=mean(mtcars$mpg[mtcars$am==0]),lwd=2, col="blue")
The boxplot of mpg of the overall population and automobiles
- We then perform a statistical procedure to validate whether the average mpg of automobiles is lower than the average of the overall mpg:
> mpg.mu = mean(mtcars$mpg) > mpg_am = mtcars$mpg[mtcars$am == 0] > t.test(mpg_am,mu = mpg.mu)
- We begin visualizing the data by plotting a boxplot:
> boxplot
(mtcars$mpg~mtcars$am,ylab='mpg',names=c('automatic','manual')) > abline(h=mean(mtcars$mpg[mtcars$am==0]),lwd=2, col="blue") > abline(h=mean(mtcars$mpg[mtcars$am==1]),lwd=2, col="red")
The boxplot of mpg of automatic and manual transmission cars
- The preceding figure reveals that the mean mpg of automatic transmission cars is lower than the average mpg of manual transmission vehicles:
> t.test(mtcars$mpg~mtcars$am)
Output:
Welch Two Sample t-test
data: mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11 .280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
The next recipe will let you create your own data and perform the t-test. Think of 60 students divided into two equal divisions: A and B of 30 each. Perform the following steps in R:
> data = data.frame(marks=sample(40:100, 60, replace=TRUE),division=c(re
p('A',30), rep('B',30)))
> head(data)
> boxplot(data$marks, data$marks[data$division=='A'], ylab="Marks", names
=c("All Marks", "Div A"))
> abline(h=mean(data$marks), lwd=2, col="red")
> abline(h=mean(data$marks[data$division=='A']), lwd=2, col="blue")
>meanmarks = mean(data$marks)
>marksA = data$marks[data$division=='A']
>t.test(marksA, mu = meanmarks)
Output:
One Sample t-test
data: marksA
t = -0.80284, df = 29, p-value = 0.4286
alternative hypothesis: true mean is not equal to 72.5
95 percent confidence interval:
62.68524 76.78143
sample estimates:
mean of x
69.73333
>boxplot(data$marks~data$division, ylab="Marks", names=c("A", "B"))
>abline(h=mean(data$marks[data$division=="A"]), lwd=2, col="red")
>abline(h=mean(data$marks[data$division=="B"]), lwd=2, col="blue")
Boxplot for division A and B
> t.test(data$marks~data$division)
Output:
Welch Two Sample t-test
data: data$marks by data$division
t = -1.1407, df = 57.995, p-value = 0.2587
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-15.243379 4.176712
sample estimates:
mean in group A mean in group B
69.73333 75.26667