How it works...

Pearson's Chi-squared test is a statistical test used to discover whether there is a relationship between two categorical variables. It is best used for unpaired data from large samples. If you would like to conduct Pearson's Chi-squared test, you need to make sure that the input samples satisfy two assumptions: firstly, the two input variables should be categorical. Secondly, the variable should include two or more independent groups.

In Pearson's Chi-squared test, the assumption is that we have two variables, A and B; we can illustrate the null and alternative hypothesis in the following statements:

  • H0: Variable A and variable B are independent
  • H1: Variable A and variable B are not independent

To test whether the null hypothesis is correct or incorrect, the Chi-squared test takes these steps.

It calculates the Chi-squared test statistic x2:

Here, r is the number of rows in the contingency table, c is the number of columns in the contingency table, Oi,j is the observed frequency count, and Ei,j is the expected frequency count.

It determines the degrees of freedom, df, of that statistic. The degree of freedom is equal to:

Here, r is the number of levels for one variable, and c is the number of levels for another variable.

It compares to the critical value from the Chi-squared distribution with the degrees of freedom.

In this recipe, we use a contingency table and mosaic plot to illustrate the differences in count numbers. It is obvious that the number of forward gears is less in automatic transmission cars than in manual transmission cars.

Then, we perform the Pearson's Chi-squared test on the contingency table to determine whether the gears in automatic and manual transmission cars are the same. The output, p-value = 2.831e-05 (< 0.05), refutes the null hypothesis and shows the number of forward gears is different in automatic and manual transmission cars. However, the output message contains a warning message that the Chi-squared approximation may be incorrect, which is because the number of samples in the contingency table is less than five.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset