A sports example

To illustrate another example, and how a specific visualization method works better than another, let us consider a different question: What are the top five record touchdowns by quarterbacks in American Football as of Feb 2015? The original source of data for this are the Len Dawson NFL and AFL Statistics. (Data source: http://www.pro-football-reference.com/players/D/DawsLe00.htm.)

The data contains information about the top 22 quarterbacks: Peyton Manning, Brett Favre, Dan Marino, Drew Brees, Tom Brady, Frank Tarkenton, John Elway, Warren Moon, John Unitas, Vinny Testaverda, Joe Montana, Dave Krieg, Eli Manning, Sonny Jurgensen, Dan Fouts, Philip Rivers, Ben Roethlisberger, Drew Bledsoe, Boomer Esiason, John Hadle, Tittle, and Tony Romo:

A sports example

Before we think of a visualization method, a little bit of analysis needs to be done. These quarterbacks had played in different time periods. For example, Brett Favre played from 1991 to 2010, and Dan Marino played from 1983 to 1999. The challenge is that if we use a bar graph or a bubble chart, they will show the results in only one dimension.

The first step is to parse the CSV file, and we have several options here. We can either use the pandas read_csv function or the csv module, which has some convenient functions such as DictReader:

import csv 
import matplotlib.pyplot as plt

# csv has Name, Year, Age, Cmp, Att, Yds, TD, Teams
with open('/Users/MacBook/java/qb_data.csv') as csvfile: 
   reader = csv.DictReader(csvfile) 
   for row in reader:
     name  = row['Name']
     tds = row['TD']

The quarterback data was downloaded from the source listed previously in this section; the filtered data is also available at http://www.knapdata.com/python/qb_data.csv. The csv module includes classes for working with rows as dictionaries so that the fields can be named. The DictReader and DictWriter classes translate the rows to dictionaries instead of lists. Keys for the dictionary can be passed in or inferred from the first row in the input (where the row contains headers). Reading the contents of the CSV file is achieved via DictReader, where the column input values are treated as strings:

#ways to call DictReader

# if fieldnames are Name, Year, Age, Cmp, Att, Yds, TD, Teams
fieldnames = ['Name', 'Year', 'Age', 'Cmp', 'Att', 'Yds', 'TD', 'Teams'] 

reader = csv.DictReader(csvfile, fieldNames=fieldnames)
# If csv file has first row as Name, Year, Cmp, Att, Yds, TD, Teams
#   we don't need to define fieldnames, the reader automatically recognizes
#   them. 

In order to convert some of the values to a number, we may need a function that converts and returns a numeric value. We have also added functions like getcolors() and num() in prepare.py, which can be used in future examples:

# num(s) and getcolors() functions 
def num(s):
  try:
    return int(s)
  except ValueError:
    return 0  

def getcolors():
  colors = [(31, 119, 180), (255,0,0), (0,255,0), (148, 103, 189), (140, 86, 75), (218, 73, 174), (127, 127, 127), (140,140,26), (23, 190, 207), (65,200,100), (200, 65,100), (125,255,32), (32,32,198), (255,191,201), (172,191,201), (0,128,0), (244,130,150), (255, 127, 14), (128,128,0), (10,10,10), (44, 160, 44), (214, 39, 40), (206,206,216)]

  for i in range(len(colors)):
    r, g, b = colors[i]
    colors[i] = (r / 255. , g / 255. , b / 255.)
  return colors

Visually representing the results

Based on the field names in the input data, for every quarterback, their touchdown statistics or passing-yards statistics can be plotted on a timeline. Now that we know what to plot, the next step is to figure out how to plot it.

Simple X-Y plots with the fields (year, touchdown) or (touchdown, year) should be a good start. However, there are 252 quarterbacks so far in this input data file, and a majority of them are not relevant. Therefore, showing them all with different colors would not make sense. (Why? Do we have 252 different colors?) We can attempt to plot the top 7 or top 10 results, as seen in the following image:

Visually representing the results

The following Python program demonstrates how one can use matplotlib to display the top 10 quarterbacks in terms of the number of touchdowns and the plot produced by this program is shown in the preceding image:

import csv
import matplotlib.pyplot as plt

# The following functions can be in separate file 
#  (If it does, you need to import) 
def num(s):
  try:
    return int(s)
  except ValueError:
    return 0  

def getcolors():
  colors = [(31, 119, 180), (255,0,0), (0,255,0), (148, 103, 189), (140, 86, 75), (218, 73, 174), (127, 127, 127), (140,140,26), (23, 190, 207), (65,200,100), (200, 65,100), (125,255,32), (32,32,198), (255,191,201), (172,191,201), (0,128,0), (244,130,150), (255, 127, 14), (128,128,0), (10,10,10), (44, 160, 44), (214, 39, 40), (206,206,216)]

  for i in range(len(colors)):
    r, g, b = colors[i]
    colors[i] = (r / 255. , g / 255. , b / 255.)
  return colors

def getQbNames():
  qbnames = ['Peyton Manning']
  name=''
  i=0
  with open('/Users/MacBook/java/qb_data.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
      if ( row['Name'] != name and qbnames[i] != row['Name']):
        qbnames.append(row['Name'])
        i = i+1
  return qbnames

def readQbdata():
  resultdata = []
  with open('/Users/MacBook/java/qb_data.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    resultdata = [row for row in reader]
  return resultdata

fdata=[]
prevysum=0

#    -- functions End -- 

qbnames = getQbNames()
fdata = readQbdata()

i=0
rank=0
prevysum=0
lastyr=0
highrank=244
colorsdata = getcolors() 

fig = plt.figure(figsize=(15,13))
ax=fig.add_subplot(111,axisbg='white')

# limits for TD
plt.ylim(10, 744) 
plt.xlim(1940, 2021)

colindex=0
lastage=20

for qbn in qbnames:
  x=[]
  y=[]
  prevysum=0
  for row in fdata: 
    if ( row['Name'] == qbn and row['Year'] != 'Career'):
      yrval = num(row['Year'])
      lastage = num(row['Age'])
      prevysum += num(row['TD'])
      lastyr = yrval
      x += [yrval]
      y += [prevysum]

  if ( rank > highrank):
    plt.plot(x,y, color=colorsdata[colindex], label=qbn, linewidth=2.5)
    plt.legend(loc=0, prop={'size':10}) 
    colindex = (colindex+1)%22
    plt.text(lastyr-1, prevysum+2, qbn+"("+str(prevysum)+"):" +str(lastage), fontsize=9)

  else:
    plt.plot(x,y, color=colorsdata[22], linewidth=1.5) 
    rank = rank +1 
plt.xlabel('Year', fontsize=18)
plt.ylabel('Cumulative Touch Downs', fontsize=18)
plt.title("Cumulative Touch Downs by Quarter Backs", fontsize=20)
plt.show() 

When the plot (X,Y) is switched to (Y,X), there is enough room to display the quarterback names. In the preceding code snippet, we might have to make the following change:

Visually representing the results

If we flip the x and y axes, then there is more room to display the name of the quarterback and the total touchdown score, as shown in the preceding plot. In order to accomplish this, one may have to switch x and y, and have the label text properly positioned according to the new x and y axes.

plt.xlim(10, 744)  
plt.ylim(1940, 2021)

# remaining code all un-changed except

y += [num(row['Year'])]
x += [prevysum]

# Don't forget to switch the x,y co-ordinates of text display

plt.text(prevysum+2, lastyr-1, qbn+"("+str(prevysum)+"):" str(lastage), fontsize=9)

At first glance, we can only make out the quarterbacks who are leading in the number of touchdown scores in their career (as of the 2014-2015 football season). Based on this visualization, you can further try to analyze and understand what else can we infer from this data. The answer to this is based on the answers of the following questions:

  • Which quarterback has played the longest in their career?
  • Are there any quarterbacks today who can surpass Peyton Manning's touchdown records?

Among the fields that we read from the input file, Age happens to be one of the field values that we have. There are many ways to experiment with the starting value of Age that can be used to plot the Age versus Touchdown statistics. To answer the first question, we have to keep track of Age instead of Year. The following snippet can be either used in a separate function (if one has to use it often), or included in the main script:

maxage = 30

with open('/Users/MacBook/java/qb_data.csv') as csvfile:
  reader = csv.DictReader(csvfile)
    for row in reader:
      if ( num(row['Age']) > maxage ):
        maxage = num(row['Age']) 

print maxage 

Running the preceding block of code shows 44 as the maximum age of a quarterback (when actively played in the league, and there were three such quarterbacks: Warren Moon, Vinny Testaverde, and Steve DeBerg. Technically, George Blanda played until he was 48 (which is the maximum age as a player), but he started as quarterback and was also a kicker for some years).

In order to answer the other question, we plot the touchdown statistics against the quarterback age, as follows:

import csv
import matplotlib.pyplot as plt

# The following functions can be in a separate file
#    -- functions Begin -- 
def num(s):
  try:
    return int(s)
  except ValueError:
    return 0  

def getcolors():
  colors = [(31, 119, 180), (255,0,0), (0,255,0), (148, 103, 189), (140, 86, 75), (218, 73, 174), (127, 127, 127), (140,140,26), (23, 190, 207), (65,200,100), (200, 65,100), (125,255,32), (32,32,198), (255,191,201), (172,191,201), (0,128,0), (244,130,150), (255, 127, 14), (128,128,0), (10,10,10), (44, 160, 44), (214, 39, 40), (206,206,216)]

  for i in range(len(colors)):
    r, g, b = colors[i]
    colors[i] = (r / 255. , g / 255. , b / 255.)
  return colors

def getQbNames():
  qbnames = ['Peyton Manning']
  name=''
  i=0
  with open('/Users/MacBook/java/qb_data.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
      if ( row['Name'] != name and qbnames[i] != row['Name']):
        qbnames.append(row['Name'])
        i = i+1
  return qbnames

def readQbdata():
  resultdata = []
  with open('/Users/MacBook/java/qb_data.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    resultdata = [row for row in reader]
  return resultdata

fdata=[]
prevysum=0

#    -- functions End -- 

qbnames = getQbNames()
fdata = readQbdata()

i=0
rank=0
prevysum=0
lastyr=0
highrank=244
colorsdata = getcolors() 

fig = plt.figure(figsize=(15,13))
ax=fig.add_subplot(111,axisbg='white')

# limits for TD
plt.ylim(10, 744)
#change xlimit to have age ranges 
plt.xlim(20, 50)

colindex=0
lastage=20

for qbn in qbnames:
  x=[]
  y=[]
  prevysum=0
  for row in fdata: 
    if ( row['Name'] == qbn and row['Year'] != 'Career'):
      yrval = num(row['Year'])
      lastage = num(row['Age'])
      prevysum += num(row['TD'])
      lastyr = yrval
      x += [lastage]
      y += [prevysum]

  if ( rank > highrank):
    if ( lastage == 44):
      plt.plot(x,y, color='red', label=qbn, linewidth=3.5)
    else:
      plt.plot(x,y, color=colorsdata[colindex], label=qbn, linewidth=2.5)
      plt.legend(loc=0, prop={'size':10}) 

    colindex = (colindex+1)%22
    plt.text(lastage-1, prevysum+2, qbn+"("+str(prevysum)+"):" +str(lastage), fontsize=9)

  else:
    if ( lastage == 44):
      plt.plot(x,y, color='red', label=qbn, linewidth=3.5)
      plt.text(lastage-1, prevysum+2, qbn+"("+str(prevysum)+"):" +str(lastage), fontsize=9)
    else:         
      plt.plot(x,y, color=colorsdata[22], linewidth=1.5) 
    rank = rank +1 

plt.xlabel('Age', fontsize=18)
plt.ylabel('Number of Touch Downs', fontsize=18)
plt.title("Touch Downs by Quarter Backs by Age", fontsize=20)
plt.show() 
Visually representing the results

When you take a look at the plotting results, only two quarterback results are comparable to Peyton Manning at the age of 35, which are Drew Brees and Tom Brady. However, given the current age of Tom Brady and his accomplishments so far, it appears that only Drew Brees has a better probability of surpassing Peyton Manning's touchdown records.

This conclusion is shown in the following image with a simpler plot for data based on the age of 35. Comparing the top four quarterback results—Peyton Manning, Tom Brady, Drew Brees, and Brett Favre—we see that Drew Brees's achievement at the age of 35 is comparable to that of Peyton at the same age. Although the write-up by NY Times with the title Why Peyton Manning's record will be hard to beat concludes differently, the following plot, at least, is inclined towards the possibility that Drew Brees might beat the record:

Visually representing the results
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset