Understanding linear regression

A simple scenario would be where one would like to predict whether a student is likely to be accepted into a college undergraduate program (such as Princeton University) based on the data of the GPA score and the SAT score with sample data as follows:

Understanding linear regression

In order to be able to consider the acceptance versus some score that is a combination of the SAT score and the GPA score, just for the purposes of illustrating an example here (note that this does not resemble the actual admissions process), we will attempt to figure out the line of separation. As the SAT scores vary from 2100 to 2390 along the x axis, we can try five values from y=2490 – 2*i*2000. In the following example, we have 2150 instead of 2000. GPA along the y axis has extreme values as 3.3 and 5.0; therefore, we use the incremental values starting with 3.3 using 3.3+0.2i from one extreme and 5.0-0.2i from the other extreme (with a step size of 0.2).

As a first attempt to see how the data visually looks, we will attempt to explore it with matplotlib and numpy. Using the SAT and GPA scores in the x and y axes and applying the scatter plot, we will attempt to find the line of separation in the following example:

import matplotlib.pyplot as plt
import matplotlib as mpl
import numpy as np

mpl.rcParams['axes.facecolor']= '#f8f8f8' 
mpl.rcParams['grid.color'] = '#303030' 
mpl.rcParams['grid.color']= '#303030' 
mpl.rcParams['lines.linestyle'] = '--'
#SAT Score 
x=[2400,2350,2400,2290,2100,2380,2300,2280,2210,2390]

#High school GPA
y=[4.4,4.5,4.2,4.3,4.0,4.1,3.9,4.0,4.3,4.5]

a = '#6D0000'
r = '#00006F' 
#Acceptance or rejections core 
z=[a,a,a,r,r,a,r,r,a,a]

plt.figure(figsize=(11,11))
plt.scatter(x,y,c=z,s=600)

# To see where the separation lies
for i in range(1,5):
  X_plot = np.linspace(2490-i*2,2150+i*2,20)
  Y_plot = np.linspace(3.3+i*0.2,5-0.2*i,20)
  plt.plot(X_plot,Y_plot, c='gray')

plt.grid(True) 

plt.xlabel('SAT Score', fontsize=18) 
plt.ylabel('GPA', fontsize=18) 
plt.title("Acceptance in College", fontsize=20) 
plt.legend()

plt.show()

In the preceding code, we will not perform any regression or classification. This is just an attempt to understand how the data visually looks. You can also draw several lines of separation to get an intuitive understanding of how linear regression works.

You can see that there is not enough data to apply an accurate way to predict with the test data. However, if we attempt to get more data and use some well-known packages to apply machine learning algorithms, we can get a better understanding of the results. For instance, adding extracurricular activities (such as sports and music).

Understanding linear regression
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset