Understanding the various levels of data is necessary to perform feature engineering. When it comes time to build new features, or fix old ones, we must have ways of identifying how to work with every column.
Here is a quick table to summarize what is and isn't possible at every level:
Level of Measurement |
Properties |
Examples |
Descriptive statistics |
Graphs |
Nominal |
Discrete Orderless |
Binary Responses (True or False) Names of People Colors of paint |
Frequencies/Percentages |
Bar Pie |
Ordinal |
Ordered categories Comparisons |
Likert Scales Grades on an exam |
Frequencies Mode Median Percentiles |
Bar Pie Stem and leaf |
Interval |
Differences between ordered values have meaning
|
Deg. C or F Some Likert Scales (must be specific) |
Frequencies Mode Median Mean Standard Deviation
|
Bar Box plot Histogram |
Ratio |
Continuous True 0 allows ratio statements |
Money Weight |
Mean Standard Deviation
|
Histogram Box plot |
The following is a table showing the types of statistics allowed at each level:
Statistic |
Nominal |
Ordinal |
Interval |
Ratio |
Mode |
√ |
√ |
√ |
Sometimes |
Median |
X |
√ |
√ |
√ |
Range, Min. Max |
X |
√ |
√ |
√ |
Mean |
X |
X |
√ |
√ |
SD |
X |
X |
√ |
√ |
And finally, the following is a table showing purely the graphs that are and are not possible at each level:
Graph |
Nominal |
Ordinal |
Interval |
Ratio |
Bar/Pie |
√ |
√ |
Sometimes |
X |
Stem and Leaf |
X |
√ |
√ |
√ |
Boxplot |
X |
√ |
√ |
√ |
Histogram |
X |
X |
Sometimes |
√ |
Whenever you are faced with a new dataset, here is a basic workflow to follow:
- Is the data organized or unorganized? Does our data exist in a tabular format with distinct rows and columns, or does it exist as a mess of text in an unstructured format?
- Is each column quantitative or qualitative? Are the values in the cells numbers that represent quantity, or strings that do not?
- At what level of data is each column? Are the values at the nominal, ordinal, interval, or ratio level?
- What graphs can I utilize to visualize my data—bar, pie, box, histogram, and so on?
Here is a visualization of this flow: