Recap of the levels of data

Understanding the various levels of data is necessary to perform feature engineering. When it comes time to build new features, or fix old ones, we must have ways of identifying how to work with every column.

Here is a quick table to summarize what is and isn't possible at every level:

Level of Measurement

Properties

Examples

Descriptive statistics

Graphs

Nominal

Discrete

Orderless

Binary Responses (True or False)

Names of People

Colors of paint

Frequencies/Percentages
Mode

Bar

Pie

Ordinal

Ordered categories

Comparisons

Likert Scales

Grades on an exam

Frequencies

Mode

Median

Percentiles

Bar

Pie

Stem and leaf

Interval

Differences between ordered values have meaning

Deg. C or F

Some Likert Scales (must be specific)

Frequencies

Mode

Median

Mean

Standard Deviation

Bar
Pie
Stem and leaf

Box plot

Histogram

Ratio

Continuous

True 0 allows ratio statements
(for example, $100 is twice as much as $50)

Money

Weight

Mean

Standard Deviation

Histogram

Box plot

The following is a table showing the types of statistics allowed at each level:

Statistic

Nominal

Ordinal

Interval

Ratio

Mode

Sometimes

Median

X

Range, Min. Max

X

Mean

X

X

SD

X

X

 

And finally, the following is a table showing purely the graphs that are and are not possible at each level:

Graph

Nominal

Ordinal

Interval

Ratio

Bar/Pie

Sometimes

X

Stem and Leaf

X

Boxplot

X

Histogram

X

X

Sometimes

 

Whenever you are faced with a new dataset, here is a basic workflow to follow:

  1. Is the data organized or unorganized? Does our data exist in a tabular format with distinct rows and columns, or does it exist as a mess of text in an unstructured format?
  2. Is each column quantitative or qualitative? Are the values in the cells numbers that represent quantity, or strings that do not?
  3. At what level of data is each column? Are the values at the nominal, ordinal, interval, or ratio level?
  4. What graphs can I utilize to visualize my data—bar, pie, box, histogram, and so on?

Here is a visualization of this flow:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset