Chapter 2: Data

2.1 Getting Data into JMP

2.2 The JMP Data Table

2.3 Data and Modeling Types

2.4 Cleaning and Formatting Data

2.5 Selecting, Highlighting, and Filtering Data: Row States

2.6 Adding Visual Dimension to Your Data

2.7 Shape Files and Background Maps

2.8 The Tables Menu

2.9 Summary

The first step in creating a graph or analysis is to get your data into JMP. With JMP, you can easily import data from many different sources such as Microsoft Excel or ODBC-compliant databases, or you can enter your data directly into a JMP data table. Because most readers already have data in one form or another, this section focuses on getting that data into JMP from another file format. Sometimes data is not in the best condition when you import it. Later in this chapter, we discuss what you can do to format data or deal with missing data. JMP also supports unstructured text and shape files (that can be used to create maps). We will describe the special requirements of using these file types.

As mentioned in the previous chapter, we use Windows as our default operating system to illustrate JMP and its native menus as shown in the below image. JMP instructions for Windows and Macintosh are basically the same, though some operating system differences are noted when they occur.

Figure 1.1 Some JMP Help Options

Example 2.1 Big Class Families

We will be using the Big Class Families.jmp data file to illustrate the steps in this section. This data set consists of 40 middle-school students and their image, name, height, weight, gender, age, and other miscellaneous information. You can access this data set in the Sample Data folder that is installed with JMP: File Open C: Program Files SAS JMP 15 Samples Data Big Class Families.jmp. Alternatively, you can select Help Sample Data Library Big Class Families.jmp.

2.1 Getting Data into JMP

Getting your data into JMP is a familiar process. Like many other desktop applications, you can simply select File Open to import your data into JMP. JMP can handle many different data formats. Table 2.1 shows the default formats JMP recognizes. Other previously installed applications could contain proprietary formats that might also appear as import options. You can import files with these formats as well.

In this section, we show you how to open JMP data tables and how to import Microsoft Excel spreadsheets and text files in JMP. Each of these file formats follows the same basic procedure, but each has special options that enable you to import exactly what you want. JMP interfaces with databases using Open DataBase Connectivity standard (ODBC). Through the Query Builder dialog box, you can easily set up queries of your data and automatically create SQL code that can be saved and repurposed. And you don’t need to learn SQL to do this. We illustrate only the essential connectivity here; more information about querying your data is available in the JMP documentation (Help Search SQL). At the end of this section, we show you how to create a new data table in JMP.

Table 2.1 Default File Formats Supported by JMP

File Type

File Extension

JMP Files

.jmp, .jsl, .jrn, .jrp, .jmpprj, .jmpmenu, .jmpaddin, .jmpapp, .jmpquery

Excel Files

.xls, .xlsx, .xlsm

Text Files

.txt, .csv, .dat, .tsv

SAS Data Sets

.sas7bdat, .sas7bxat, .xpt, .stx

SAS Program files

.sas

R Code

.r, .R

MATLAB Code

.m, .M

SPSS Data Files

.sav

Minitab Worksheet Files

.mtp

Shapefiles

.shp

HTML

.htm, .html

XML Data files

.xml

JSON

.json

SQLite 3.0 or Higher

.sqlite, .db, .sqlite3, .db3

Teradata Database

.trd

xBase Data Files

.dbf

Triple-S Files

.sss, .xml

Flow Cytometry v2 or 3

.fcs

Hierarchical Data Format v5

.h5

Note

JMP can be used to initialize or pull data from other third-party applications: Python, ODBC, R, MATLAB, and so on when such integrations are implemented in JMP. The third-party application must match in architecture. The term architecture is sometimes referred to as “bitness” (32-bit or 64-bit). One can verify the architecture of JMP via Help About JMP (Windows).

Opening a JMP File

Let’s start by opening a JMP data table. At the top left of the JMP window is the File menu:

1. Select File Open (see Figure 2.1). A dialog box opens. (On a Mac, select File Open and locate your file in the appropriate folder.)

Figure 2.1 Opening a File

Figure 1.1 Some JMP Help Options

We will use the Big Class Families data table described earlier. Click the Big Class Families.jmp file and select Open. (See Figure 2.2.)

Figure 2.2 Open File Dialog Box

Figure 1.1 Some JMP Help Options

Note

To locate this file for the first time, select File Open C: Program Files SAS JMP 15 Samples Data Big Class Families.jmp. Alternatively, you can also select Help Sample Data Open the Sample Data Directory Big Class Families.jmp.

These steps open the JMP data table Big Class Families. (See Figure 2.3.) With these simple steps, you are now ready to analyze or visualize this data.

Figure 2.3 The JMP Data Table

Figure 1.1 Some JMP Help Options

This spreadsheet-like table is referred to as the JMP data table, which is JMP’s common data format regardless of where the data comes from. Section 2.2 discusses the components of the data table.

Note

In the Big Class Families example, each row of the table contains information about a single student in the class and is, therefore, one observation. Each column of the table is one piece of information (or variable) collected on each student. This structured format of the data table is required for most graphs and analyses in JMP. The importing examples that follow assume that your data already exists in this format. Section 2.4 introduces some tools to use if your data does not conform to this structure.

Importing Data into JMP

Importing data into JMP from another file format is similar to opening a JMP file. Within the File Open pop-up window, the Files of Type drop-down menu indicates All JMP Files as the default.

If you are importing another file type, simply click on the down arrow and select the correct type. You can also select All Files from the drop-down menu. (See Figure 2.4.) Select the file that you want, and then click Open. On the Mac, select File Open and available files will be highlighted. On the Mac, files that JMP cannot open will be dimmed.

Figure 2.4 Selecting All File Types

Figure 1.1 Some JMP Help Options

Note

If you know the format of your data, first select the correct format from the Files of Type drop-down menu. You will see the available files of that type within the folder. Once you have located the right file, select the file and click Open.

Importing an Excel File

Importing an Excel file is easy if your variables are in columns and your cases or observations are in rows. Ideally, any variable names should appear in the row directly above the first row of data, as shown in Figure 2.5. The import process automatically opens and converts the data into a JMP data table and uses your variable names as column headings:

Figure 2.5 Importing an Excel File

Figure 1.1 Some JMP Help Options

1. Select File Open.

The Big Class.xls file, which is illustrated here, can be found by selecting C: Program Files SAS JMP 15 Samples Import Data Big Class.xls.

2. From the Files of Type drop-down menu, select Excel Files.

3. Select the file that you want, then select Open to launch the Excel Import Wizard dialog box with a view of your data. If it looks correctly structured, select Import.

Shortcut

If you have an Excel worksheet or workbook on your desktop, you can simply drag the file over the JMP shortcut icon on your desktop to launch JMP and the Excel import wizard.

The Excel Import Wizard

While the previous example was simple and straightforward, a common characteristic of Excel worksheets is that data does not always conform to the essential column/row structure that is required by JMP. For example, you might have multiple nested headers where one row might represent year and the next row contains months within that year. The Excel Import Wizard has made importing this worksheet and maintaining the month-within-year structure much easier.

This wizard also provides options to specify which rows should be headers for columns, to specify hidden or merged columns, and to replicate these settings or merge data from multiple worksheets within a workbook.

To illustrate this feature, let’s try another example with one of these characteristics. We will use the Team Results.xlsx worksheet from the Import Data folder. This worksheet has headers/column names that appear to be in the second row. (See Figure 2.6.) The Excel Import Wizard will help JMP decide how to import this data.

Figure 2.6 The Excel Import Wizard

Figure 1.1 Some JMP Help Options

1. Select File Open, then select an Excel workbook. In this example, we are using Team Results.xlsx from the C: Program Files SAS JMP 15 Samples Import Data folder.

2. Select Team Results.xls Open to launch the Excel Import Wizard with an initial display of your data in the window (Figure 2.6).

3. As you can see, the column headers begin in row 3 (when you include the note that appears as a header in the preview) and the first set of observations in row 4. To get this into the right format, adjust the Individual Worksheet Settings as we have done in Figure 2.7.

Figure 2.7 Adjusting the Worksheet Settings

Figure 1.1 Some JMP Help Options

4. Once you have adjusted these settings, you should see your data take its proper shape in the Data Preview panel. Once you are satisfied with the adjustments you have made, select Import.

Importing a Text File

If you are importing a text file, another handy wizard is included in the Data with Preview file option. Like the Excel Import Wizard, this wizard enables you to view your data and specify how you want it to appear before importing it into a JMP data table. It also provides options to convert your text file if it is delimited by commas, tabs, or spaces:

1. Select File Open.

The Big Class_L.txt file, illustrated here, can be found by selecting C: Program Files SAS JMP 15 Samples Import Data Big Class_L.txt.

2. Select Text Files (*.txt, *.csv, *.dat, *.tsv) in the Files of Type drop-down menu. Select the file.

3. Select Data (Using Preview) (Figure 2.8).

Figure 2.8 Data Using Preview

Figure 1.1 Some JMP Help Options

4. Select Open.

5. Choose the settings that you may need to structure your data for import. (See Figure 2.9.) Click Next and then Import.

Figure 2.9 Text Data Preview

Figure 1.1 Some JMP Help Options

Importing a Database File

Options to import data extracted from a database are available through ODBC within JMP. To access this data, first connect to the database (the data source should already be defined) and then specify the table of interest. You can also create custom queries of your data using Query Builder (explained in the next section) or the Advanced button. If you need more help defining your data source, select Help JMP Documentation Library Using JMP Ch. 3 Import Data from a Database.

1. Select File Database Open Table (Figure 2.10).

Figure 2.10 Database Open Menu

Figure 1.1 Some JMP Help Options

2. The Database Open Table window appears. (See Figure 2.11.) It prompts you to connect to your database and either open a data table or specify a query. Clicking the Connect button launches the Select Data Source window to locate and connect to your database.

Figure 2.11 Database Open Dialog Box

Figure 1.1 Some JMP Help Options

3. Locate the table of interest, highlight it, and click Open Table to import the data.

Note

Note: You have the option of directly importing database files by selecting File Open as previously discussed (assuming that these programs are installed). Using this more direct option enables you to import only a single table.

Creating a Database Query Using Query Builder

While importing a database file is a quick way to bring in a single table, there are times when you want to import from more than one file at a time. A database query enables you to combine data tables, aggregate data, sort data, and filter data. Once the query is created, it can be saved and reused for future queries.

In other software environments, you would need to know SQL (Structured Query Language) to create such queries. However, the Query Builder in JMP enables you to create a query in an easy, point-and-click environment that creates the SQL code for you. As with importing a single database file, you must first connect to the database. The data source should already be defined. Once the connection is established, you can choose the data tables of interest and specify the parts of the data tables that you want to extract into your resulting JMP table.

1. Select File Database Query Builder.

2. Choose New Connection and connect to your database (this example is for illustration purposes only since you will be unable to connect to this database).

For this example, you want to create a data table with information about historical SAT scores for states in the South and Southwestern regions. You would like to include the following information in your data table:

● From the data table SATByYear: the average SAT Math and SAT Verbal scores for each state.

● From the data table SATStateInfo: the Student/Faculty ratio for each state.

● From the data table StateToRegion: the Region each state is assigned to.

1. Select the SATByYear data table and assign it as the Primary table for the query.

2. Select SATStateInfo and StateToRegion data tables and assign them as the Secondary tables for the query. (See Figure 2.12.) Note that the column State appears in each data table and is designated as a “key” variable. Key variables are automatically used as the matching column to join data table in the query.

Figure 2.12 Database Query

Figure 1.1 Some JMP Help Options

3. Select Build Query.

4. In the Tables panel, select the SATByYear data table.

5. Choose the State, SAT Verbal, and SAT Math from the Available Columns panel and click the Add button. (See Figure 2.13.) (Because these are small data tables, you might want to turn on the Update preview automatically option. This option can slow performance down if the data tables are very large.)

Figure 2.13 Database Query Building

Figure 1.1 Some JMP Help Options

6. Since you are interested in the average SAT scores for each state, change the Aggregation for the SAT Math and SAT Verbal columns to Average. Note that Group By was automatically turned on for State. Examine the sample shown in the lower panel. You have one row for each state. (See Figure 2.14.)

Figure 2.14 Aggregating Data

Figure 1.1 Some JMP Help Options

7. Select the SATStateInfo data table in the Tables panel. Add the Student/Faculty Ratio column to the included columns.

8. Select the StateToRegion data table in the Tables panel. Add the Region column to the included columns. (See Figure 2.15.)

Figure 2.15 Defining Included Columns

Figure 1.1 Some JMP Help Options

9. To include only the states in the South and Southwestern regions, select Region in the Included Columns list and drag it to the Filter pane on the right side of the window.

10. Control-click to select both South and Southwest in the Filters list.

11. Click Save to save the query as a .jmpquery file for use later.

12. Click the Run Query button to create the new data table. (See Figure 2.16.)

Figure 2.16 Resulting Data Table

Figure 1.1 Some JMP Help Options

Creating a JMP Data Table from Scratch

1. Select File New Data Table to create a new data table (Figure 2.17):

Figure 2.17 Creating a New Data Table

Figure 1.1 Some JMP Help Options

2. Double-click on the first column’s heading and type the column name (the variable name).

3. Press Enter and type the data into the first cell directly below the heading. Press Enter again, type the data, and repeat as needed. Rows within JMP are consecutively numbered as observations or cases. (See Figure 2.18.)

Figure 2.18 Add a New Column

Figure 1.1 Some JMP Help Options

4. To create another column, double-click on the next column’s heading and enter the data as you did before.

If it is more practical for you to enter a series of data for each row as you build your data table, set up all of your column headings first and then use the Tab key to move from the left columns to the right. When each column has been filled, the Tab key moves down to the beginning of the next row.

Note

JMP will recognize the type of data you are entering and assign a data type to the column, either numeric or character. It also assigns an icon next to the columns (or variables) in the box on the left. These icons are the modeling type and are discussed in Section 2.3.

2.2 The JMP Data Table

The JMP Data Table looks very much like any spreadsheet. (See Figure 2.19.) In JMP, column headings indicate variables (what you have measured or counted), and rows indicate individual cases or observations. JMP requires your data to be structured in this way. If it is not, JMP can help you reformat your data. (See Section 2.4.)

Figure 2.19 The JMP Data Table

Figure 1.1 Some JMP Help Options

Data Table refers to the spreadsheet-like grid where your data resides.

The data grid can contain any number of columns (your variables) or rows (observations or cases). In this sense, we refer to data within the JMP data table as structured data.

In addition to the data grid, notice the three panels to the left of the data table. These panels contain information about your data (metadata). They provide vital information about your data as well as options to streamline and save your analyses.

The first and upper-most panel contains the name of the data table. (See Figure 2.20.) This panel stores references, notes, and/or scripts. Scripts enable you to save, automate, and customize analyses. If you perform a regular analysis or scheduled task, you will want to learn more about JMP scripts. (See the JMP Scripting Guide at Help JMP Documentation Library JMP Scripting Guide.)

Figure 2.20 The Table Panel

Figure 1.1 Some JMP Help Options

The Columns panel (see Figure 2.21) is where your column names (or variables) appear. Each column has an icon in front of it.

Figure 2.21 The Columns Panel

Figure 1.1 Some JMP Help Options

These icons correspond to the modeling type of the data in each column. As discussed in the next section, this is vitally important. JMP produces only the graphs or statistics that are appropriate for a column’s modeling type. In most cases, you can change the modeling type by simply clicking on the icon and selecting another appropriate type. The first number in parentheses represents the total number of column names or variables in the data table. The second number represents any of those columns that are currently selected or highlighted.

The bottom panel is the Rows panel. (See Figure 2.22.) The Rows panel indicates how many rows (observations) are in your data table. This panel also indicates the number of selected, hidden, or excluded rows, if any.

Figure 2.22 The Rows Panel

Figure 1.1 Some JMP Help Options

When rows are hidden, the observations are not included in graphs. When rows are excluded, they are not included in analyses. This row state is effective when you want to see or analyze a subset of your data. You can also both hide and exclude specific rows, which effectively removes the row(s) from your analyses and graphs, but not from your data table. Section 2.5 provides more information about row states including hiding and excluding rows.

Note

Multiple data tables can be open at any time, but only one active data table can be analyzed at a time. If you have multiple data tables open within JMP and you want to switch to another open data table, go to the Home Window (see Figure 2.23) and select the desired data table under Windows List.

Figure 2.23 The Home Window

Figure 1.1 Some JMP Help Options

A special type of data table is shape files, which are used to create thematic maps. These data tables consist of two tables including a “Name” and corresponding “Boundary” table. These are stored in the Maps folder: C: Program Files SAS JMP 15 Maps. Section 2.7 covers some of the basics about shape files.

Note

There is no practical limit on the size of the data table that you can analyze. However, because JMP runs in your computer’s local memory, the amount of RAM that you have determines the upper size limit of your data table. Your computer should be equipped with at least twice as much memory as the size of the data table. Thus, if you have just 8 GB of RAM, you can analyze a 4GB data table or about a 10-variable data set with 4 million rows! More details about JMP system requirements can be found at www.jmp.com.

2.3 Data and Modeling Types

One of JMP’s great features is the ability to produce graphs and statistics that make sense for the data that you are analyzing. This feature assumes that your data is correctly classified in the data table. So, what do we mean by data type and modeling type? Let’s define a few terms.

Data

refers to any values placed within a cell of a JMP data grid. Examples include numeric and/or text descriptions: 3.6, $2500, Female, Somewhat Likely, or 7/7/19.

Data type

refers to the nature of the data. The data type is usually either numeric (numbers) or character (often words and letters but sometimes also numbers). Other special purpose data types include expression (used for images and matrices) and row state.

Modeling type

refers to how the data within a column should be used in an analysis or a graph. JMP uses three distinct and primary modeling types: continuous, nominal, and ordinal. (JMP also includes three additional special purpose modeling types: Multiple response, Unstructured text, and None. Since these are less common, we will only summarize them in the note at the end of this section.)

Continuous data (also referred to as quantitative, ratio, or interval scale data) takes a numeric form and is often thought of as some type of measurement. For example, home selling prices, income earned, costs per square foot, and dates are all examples of continuous data. As a rule of thumb, continuous data can be used in calculations. For example, calculating the average cost per square foot would be meaningful.

Nominal data is categorical data (also referred to as qualitative, discrete, count, or attribute data) and can take on either a character or numeric form. Nominal data fits into categories or groups such as car type, gender, department, and sales territory and also includes indicator variables such as yes/no or 0/1. In nominal data, it is helpful to count the frequency of the occurrence of values, but otherwise, nominal data is not used in calculations. For example, calculating the average car type would not be meaningful.

Ordinal data is categorical data that has an inherent order or hierarchy. For instance, Likert scales (such as levels of satisfaction) in a survey and grade levels in school (freshman, sophomore, junior, senior) are examples of ordinal data. That is, they represent categories that have some sequence or order that should be retained in any analysis. Ordinal data is less common than continuous and nominal data, but there are a few analyses designed specifically for it. In most JMP analyses, nominal and ordinal data are treated the same way.

Table 2.2 Data Types Appropriate for Modeling Types

Modeling Type

Data Type

Numeric

Character

Continuous

Yes

No

Nominal

Yes

Yes

Ordinal

Yes

Yes

Note

Numeric data is right-justified in the data table, whereas character data is left-justified. This can be useful to check whether data contains errors.

In our example, Big Class contains five variables (or columns) representing each of these modeling types. (See Figure 2.24.) Let’s briefly explain why they are classified by their data and modeling types:

Name is nominal because it is a character data type and the student’s name is arbitrary.

Age is ordinal because the values are rounded down and we want to retain the six ordered age groups (12 to 17) in our analysis.

Sex is nominal because its data type is character (M or F) and it has no order.

Height and weight are continuous because they are both numeric and represent a measurement.

Note

Age could also be considered continuous because the values are numeric, but this would treat age differently and yield different results.

Figure 2.24 Understanding Modeling Types in the Data Table

Figure 1.1 Some JMP Help Options

Note

Let’s briefly review some of the more specialized data and modeling types and what they are used for.

Row State is a data type that enables you to store and manage information about a row of data. (See Section 2.5.)

Expression is a data type that enables you to store images or matrices in a column.

Multiple Response is a modeling type that is commonly used in surveys where one may be asked for more than one answer.

Unstructured Text is a modeling type that is used for documents such as customer reviews, wine tasting notes, or an entire book. This modeling type is used in text mining with JMP’s Text Explorer.

None is a modeling type that tells JMP not to use that column in an analysis. This might be used for a column that represents an identifier for each row of data, for example, a student ID number, or patient number.

For more information, select Help JMP Documentation Library Using JMP Ch. 5, About Modeling Types.

Changing the Modeling Type

When you import data, the JMP default selects and assigns one of two modeling types based on whether the data is numeric or character. Numeric data becomes continuous and character data becomes nominal. Sometimes you might want to change the default modeling type of your data to generate results that are more meaningful.

For example, if we imported the Big Class data from Excel, age as numeric data would be imported as a continuous column. We might want to change that to ordinal. Changing the modeling type is simple in JMP. Click the column’s corresponding icon in the Columns panel in the data table and select the correct type. (See Figure 2.25.)

Figure 2.25 Changing the Modeling Type

Figure 1.1 Some JMP Help Options

If the Continuous option is grayed out, your data type is classified as character. To change the data type, double-click on the column heading and change the data type to numeric. (See Figure 2.26.) In this window, you can also change the modeling type along with a host of other formatting options, which are described in the next section.

Figure 2.26 Changing the Data Type

Figure 1.1 Some JMP Help Options

For more information, select Help Books Using JMP Chapter 5, Set Column Properties About Data and Modeling Types.

2.4 Cleaning and Formatting Data

Sometimes data is not in the best shape or in the right form when it is imported. Fortunately, JMP has extensive column formatting abilities. This section focuses on the most common features, including:

● Cleaning up your data format, such as decimal places, dates, times, and currency. We will use the Column Info window to accomplish these tasks.

● Introducing the Formula Editor, which enables you to create new columns from old ones, add IF statements, and transform data using basic or more advanced functions. We will introduce a basic example in this section. For more information, select Help JMP Documentation Library.

● Learning to use the RECODE command, which is a handy way to merge similar categorical responses into a single category. For example, if you have Woman, Female, and Girl as responses, you can merge these into a single response: Female.

Example 2.2 Movie Rentals

We will use the Movies.jmp data table to illustrate the concepts in this section. This data table consists of the 277 top-grossing movies released between 1937 and 2003. The columns are:

Movie: name of movie

Type: genre/category of movie (for example, comedy, family)

Rating: US movie rating system (for example, general audience [G], adult [R])

Year: year of movie release (for example, 1937)

Domestic $: US domestic revenue in $ earned by the movie in that year

Worldwide $: Worldwide revenue in $ earned by the movie in that year

Director: director of movie

You can access this data table in the Sample Data folder that is installed with JMP by selecting File Open C: Program Files SAS JMP 15 Samples Data Movies.

Getting your data into a standard format is done through the Column Info window, which is accessed from the Cols menu. Options to format your data are driven by the data and modeling types specified for that column of data. You can change these types, if necessary, to meet the requirements of your analysis. Recall that changing these types affects the graphs or statistics that you can generate from that column. (See the previous section.) Let’s begin by opening the Movies.jmp data table:

1. Open the Movies.jmp data table.

2. Select the Domestic $ column, and then select Cols Column Info.

3. Because Domestic$ is a numeric value, you see the Format drop-down menu (see Figure 2.27), which leads to several options. It is also our starting point for the next items we will discuss. Note that if you select a Character column, the Format menu does not appear in the Column Info window.

Figure 2.27 Column Info Format Menu – Continuous Variables

Figure 1.1 Some JMP Help Options

Note

You can also either double-click on the column name as mentioned in the previous section, or right-click on the column header and select Column Info from the menu.

Formatting Decimal Places

To change the number of decimal places displayed in a column of data, do the following:

1. Click on the column of interest. In our example, it is Domestic$.

2. Select Cols Column Info. JMP will make a best guess on the format of the data; in our example, Currency was correctly specified. (See Figure 2.27.) You can easily change this format by selecting another format from the menu.

3. To the right of the Format menu are two boxes, Width and Dec. Width refers to the number of characters that can be in the column, and Dec refers to the number of decimals right of the point. In our example, type “0” in the Dec box, then select Apply and OK. (See Figure 2.28.)

Figure 2.28 Formatting Decimal Places

Figure 1.1 Some JMP Help Options

Note

Formatting decimal places affects what is viewed in the data table and in output but does not affect the precision of the data when used in calculations.

Formatting Dates, Times, and Durations

Dates are numeric values in JMP, which allows them to be transformed into other date formats and calculated for duration or elapsed time. If you are importing data that contains dates, ensure that the data type is numeric.

The Column Info (Cols Column Info) window provides several date format options. (See Figure 2.29) When a date is selected from the Format menu, a secondary drop-down menu for the display format appears, along with a similar drop-down menu for the input format of your imported data. The format of your imported data needs to match one of JMP’s input format options, which can then be transformed into any format among the display format options.

Figure 2.29 Column Info Window for a Date Column

Figure 1.1 Some JMP Help Options

Let’s walk through a new example, TechStock, to illustrate this concept.

Example 2.3 TechStock

We will use the TechStock data table to illustrate dates in this section. This data set contains the stock price of the NASDAQ 100 (QQQ) at the high, low, and close for each trading day during the period 11/27/2000 to 2/26/2001. You can access this data set in the Sample Data folder that is installed with JMP: File Open C: Program Files SAS JMP 15 Samples Data Techstock.jmp.

1. Open the TechStock.jmp data table.

2. Click the Date column name.

3. Select Cols Column Info. Open the Format drop-down menu and select Date, which displays how the dates will appear in the data table.

4. It is currently displayed as d/m/y (see Figure 2.26), as indicated by the check mark. Change the format to Monddyyyy. Click Apply or OK. The date is now displayed as abbreviated month, day, and year in the Date column. (See lower part of Figure 2.30.)

Figure 2.30 Changing the Display Format of Dates

Figure 1.1 Some JMP Help Options

You can also format time and duration from this window provided you have these types of columns.

Column Properties Menu

Column Properties, another useful tool in the Column Info window (see Figure 2.31), enables you to add formulas, check ranges of values for auditing, and assign customized ordering to the data, among other tasks.

Figure 2.31 Column Properties

Figure 1.1 Some JMP Help Options

Some of the more commonly used column properties are:

Missing Value Codes: used to identify values placed in columns that represent unknown (or missing) data. By default, JMP uses a blank cell for missing character data and a for missing cells in numeric columns. In cases where the original data used a different missing identifier (such as 999 or unknown), these values can be entered in the missing value codes property so that JMP recognizes the values as missing.

Value Labels: used when values are entered into the data table rather than the actual label. For example, an analyst might code Low, Medium, High as 1, 2, 3 for ease of data entry. However, in analysis results, the labels are preferred. Using this property will enable you to enter the data as a code, but provide the labels in the appropriate display.

Value Order: used to specify the order in which column values should be sorted. For example, using the order Poor, Good, Better, Best rather than the alphabetical ordering of these data values. An example is provided later in this section.

Axis: allows the specification of axis properties (such as formatting, minimum value, maximum value, increment, tick marks, and grid marks) to be used for all graphs of the column.

These functions are described in detail in the Using JMP book (Help JMP Documentation Library Using JMP Ch. 5 Set Column Properties).

Formula Editor

JMP’s formula editor is handy and flexible. Use it when you need to create a new column that contains values that are calculated or derived from existing columns in your data table. You can also transform your data, add conditional statements, and much more. Due to the advanced nature of these features, we will cover only the most basic features here. For more information, see Using JMP at Help JMP Documentation Library Using JMP Ch. 5 Assign Column Properties.

One of the common operations performed with the Formula Editor is creating a new column of data that contains a calculation from existing columns. To illustrate this feature, let’s return to our Movies.jmp data table. For example, suppose we want to obtain the international revenues from these movies by subtracting the domestic revenues (Domestic $) from the worldwide revenues (Worldwide $).

1. First, we need to create a new column. Double-click in the column head to the right of our last populated column (Director). (See Figure 2.32.) Type “International $” in the heading, and press Enter. Click in the column head to highlight.

Figure 2.32 Creating a New Column

Figure 1.1 Some JMP Help Options

2. With the new column selected, select Cols Column Info. The Formula window appears in the Columns Properties menu. (See Figure 2.33.) Select Formula and Edit Formula. You see a list of columns or variables on the left side of the window.

Figure 2.33 Opening the Formula Editor

Figure 1.1 Some JMP Help Options

3. Click the Worldwide $ column, then the “-” symbol in the upper palette, and then the Domestic $ column. You see your formula take shape in the preview window. (See Figure 2.34).

Figure 2.34 Creating a Formula

Figure 1.1 Some JMP Help Options

4. When you click OK in the formula editor window, the calculated values appear in a new column in your data table.

5. Click OK in the Column Info window and you will see your new column, International $, within the Columns panel with a “+” next to it indicating that the column was created with a formula.

Value Order

The value order column property enables you to specify an order to the values of a categorical column. JMP’s built-in defaults order common ordinal columns such as months of the year or days of the week, but there are other instances when you would like to arrange responses (values) in some logical order for graphs and analyses. For example, some surveys have a range of responses from “Not Satisfied” to “Very Satisfied,” with a few intermediate responses in between. For specific examples of the built-in ordering, see the discussion of Column Properties in Using JMP.

In the Movies.jmp data table, we want to reorder the rating of the movies to display in this order: R, PG-13, PG, and G.

1. Click the Rating column. Select Cols Column Info, and then select Column Properties.

2. Select Value Ordering. (See Figure 2.35.) A new window appears with the available responses from the Rating column. Select a response and move it up or down or reverse the order, whichever is appropriate. Select Reverse for this example. (See Figure 2.36.)

Figure 2.35 Value Order-Column Properties Menu

Figure 1.1 Some JMP Help Options

Figure 2.36 Specifying the Order of Categorical Variables

Figure 1.1 Some JMP Help Options

3. When you are satisfied with the order, click Apply and OK.

Now let’s see the results of this exercise by using the distribution platform, which is discussed in Chapters 3 and 5. Here’s a preview:

4. Go to Analyze Distribution (Figure 2.37).

Figure 2.37 Launching the Distribution Platform

Figure 1.1 Some JMP Help Options

5. In the Distribution window, select Rating, Type, and Year, and click Y, Columns. Click OK. (See Figure 2.38.)

Figure 2.38 The Distribution Launch Dialog Box

Figure 1.1 Some JMP Help Options

6. Three bar charts appear side by side. Although “R” was the first response listed in the Value Order window, it appears at the bottom of the Distribution graph. If we did not make this change, the order of these responses would be reversed.

7. Click on the green G bar under Rating. (See Figure 2.39.) The G responses are now highlighted in the G bar as well as those same responses reflected by Type and Year. This dynamic visual feature is available in all JMP graphs.

Figure 2.39 The Distribution Results Window

Figure 1.1 Some JMP Help Options

Recode

The Recode command is useful when you have a column of data containing values that you would like to rename or consolidate. For example, if you have data labeled Apple Music, Pandora, and Spotify, you might want to consolidate these into one response: Streaming Music. Recoding assigns a specified new value to all of the existing responses of the original name or value.

In the Movies.jmp data table, we want to replace PG-13 movies with a PG rating because many movies made before 1985 only contain ratings of G, PG, and R.

1. Select the column that you want to recode, Rating. Select Cols Recode. (See Figure 2.40.)

Figure 2.40 The Recode Option

Figure 1.1 Some JMP Help Options

2. This command generates an input window of current and unique responses, with an area to the right to specify a new value. In the box to the right of PG-13, type PG, click out of the edited box, then click Recode. (See Figure 2.41.)

Figure 2.41 The Recode Dialog Box

Figure 1.1 Some JMP Help Options

Note

Once you have selected a new value, you can replace that value in the same column, create a new column with these values, or even create a formula column. Be careful! If you select In Place, these values cannot be changed back because the Recode command replaces values in that column. The default in JMP 15 places recoded values in a new column.

2.5 Selecting, Highlighting, and Filtering Data: Row States

Thus far, we have focused on column properties. Let’s now look at rows or the observations in your data. In the process of exploring or analyzing data, it is often valuable to drill-down or to see and compare subsets of your data or rows. JMP makes this task seamless and simple through a concept called Row State that assigns one or more of the following six conditions to one or more rows (or sets of observations) within a data table: Selected, Hidden, Excluded, Labeled, Marked, or Colored (Figure 2.42).

Selected will appear as a highlighted row(s) that will correspond to a highlighted point or area in any corresponding graph. You can also easily subset highlighted rows.

Hidden means that the row will not appear in any graph. A blindfold icon appears next to hidden rows because you will not see that row in any graph.

Excluded means that the row will not be used in any calculated result. A red circle with a line through it next to its row number indicates an excluded row.

Labels provide the columns value when the observation is selected within an appropriate graph. Labels look like a price tag and appear next to the column in the column panel.

Markers are distinguishing symbols typically used to represent a group within a categorical variable or to highlight particular points of interest.

Colors are used to represent different groups in a categorical variable or as a gradient in a continuous variable within graphs. Both Colors and Markers are covered in the next section.

Figure 2.42 Row States Indicated in a Data Table

Figure 1.1 Some JMP Help Options

Hiding and Excluding Data: Using Data Filter

Hiding a row prevents that data from appearing in any graph (but is not excluded from any analysis). Conversely, excluding a row will remove the row from any calculated result but will still show the point in a graph. If you prefer to both hide and exclude a row, you can directly select a row or rows in your data table with your mouse, then go to Rows Hide and Exclude. This is an effective approach when you are dealing with few specific points like outliers.

When exploring data, it can be more efficient to hide and/or exclude entire groups or ranges within your data table, and rather than thinking about what you DON’T want to see or analyze, it is more natural to think about what you DO want to see. That is, to Show (rather than Hide) and Include (rather than Exclude).

JMP’s Data Filter tool (Rows Data Filter) provides this capability and can be applied to any graph or analysis platform in the active data table. The Data Filter enables you to dynamically show, include, or select groups within a column and toggle between them, or to specify a custom range within a continuous variable and create a slider to filter the graph or analysis. The Data Filter automatically hides and/or excludes values that are not selected.

In Figure 2.43, we have launched and asked the Data Filter to “Show” and “Include” the male students in our Big Class data graph. Notice that it has automatically hidden and excluded the female students in data table in the background. The Data Filter will be illustrated in Chapter 6.

Figure 2.43 Using the Data Filter to Select a Group

Figure 1.1 Some JMP Help Options

Note

There are two types of Data Filters in JMP: the Global Data Filter and the Local Data Filter. The Global Data Filter, which we explained here is accessed in the Rows menu, is applied to the data table and all analyses and graphs for that data table. The Local Data Filter is found within individual output windows in JMP (under the red triangle) and can be applied to filter values within that particular output window only. Examples of the Local Data Filter will be illustrated later in the book.

2.6 Adding Visual Dimension to Your Data

JMP is designed to be visual. Its many useful tools help you visualize or communicate your data effectively. For example, you can use colors or unique markers to signify a range or value of another column in any appropriate graph. Any color or marker assigned to your data can be saved and used in any number of graphs.

You can change these colors or markers at any time. We return to our Big Class.jmp data table to illustrate this feature:

1. The Rows menu provides access to these features. Another way to access these features is through the “Rows” red triangle in the upper left side of the data grid. (Red triangles appear throughout JMP and provide context-specific options.) In this section, we will use the red triangle to access these features, but note that it or the Rows menu will provide the same access. (See Figure 2.44.)

Figure 2.44 Color or Mark by Column

Figure 1.1 Some JMP Help Options

2. Select Color or Mark by Column. Select the column that you would like to distinguish with color (sex, in this example). You can see how JMP will express these values in color on the right side of the window. (See Figure 2.45.) Once you are satisfied, click OK and you will see colored markers preceding the row numbers in the data table.

Figure 2.45 Color by a Column

Figure 1.1 Some JMP Help Options

3. Alternatively, in the same window, you can also distinguish points by using unique markers (for example, symbols). Like colors, unique markers can be assigned to categorical or continuous columns. Click the Markers drop-down menu. (See Figure 2.46.) JMP provides many different marker types, and a submenu enables you to view and select the desired type.

Figure 2.46 Mark by a Column

Figure 1.1 Some JMP Help Options

Note

You can color by one column and use distinct markers for a second column by first completing this process to apply markers on one column, then repeating this process to apply color on another column.

Adding Labels to Data

Sometimes in the process of exploring your data, it is useful to identify a point by a name, territory, or product type rather than its row number in the data table. Adding labels enables you to see these identifiers in a graph by simply clicking on a point of interest. For example, in the Big Class.jmp data table, we want to see the name of the student (rather than a row number) in a graph:

1. First, select a column by clicking on the column name in the data table. This selection activates the Label option when selecting the Columns red triangle. Select Label/Unlabel. (See Figure 2.47.)

Figure 2.47 Adding Labels to a Column

Figure 1.1 Some JMP Help Options

2. You then see a label or what might look like a price tag next to that column in the Columns panel of the data table. (See Figure 2.48.) When creating a graph, labels with the name of the student (rather than the row number) are displayed when they are selected with the mouse.

Figure 2.48 A Column to Display Labels in Graphs

Figure 1.1 Some JMP Help Options

Note

You can add labels for more than one column. For example, we might want to have a label containing both the name and age of the student. Simply highlight all the columns you would like to label first or repeat this process and add a label to “age.”

2.7 Shape Files and Background Maps

Creating thematic maps is easy to do and is explained step-by-step in Chapters 3 and 4. In this section, we will describe a special type of data table required to create thematic maps called shape files.

JMP includes 12 common shape files such as States and Counties of the US, Countries of the World, and others. You can import or create new shape files such as sales territories, and these do not need to be geographic maps. Shape files could represent any space: for example, a football stadium, an assembly plant, or an office building. You can create these special purpose maps with the Custom Map Creator add-in available at community.jmp.com.

Shape files consist of two data tables: a boundary file and a name file that share a common Shape IDs column. The boundary file (or XY file) provides the outline of the shape as a polygon that corresponds to each Shape ID. The name file provides the name (or abbreviations of the name) of the shape ID. Figure 2.49 provides an illustration of these special data tables.

Figure 2.49 Shape Files Contain Both a Name and XY Data Table

Figure 1.1 Some JMP Help Options

Built-in shape files are included in the Maps folder: C: Program Files SAS JMP 15 Maps. Should you want to create or import new shape files, they must be placed in this same Maps folder.

When a column containing names (corresponding with a Shape ID) is dragged into the Shape/Map box in Graph Builder, it will render the boundaries/shapes. (See Figure 2.50.)

Figure 2.50 Using Shape to Create Thematic Maps

Figure 1.1 Some JMP Help Options

Note

In addition to shape files, JMP contains a variety of background or reference maps that provide the base map that you can plot data upon, provided that the points you want to plot on the background contain latitude and longitude information. OpenStreet Maps for example, enables you to show data on a map at the street address level. (See Section 4.2.)

2.8 The Tables Menu

The Tables menu is a collection of JMP tools that you will need to manage your data, whether you are sorting it, transposing it, or joining multiple data tables. Put another way, if your data is not structured in a manner that fits the JMP analysis framework, you need to use these commands to improve the structure. To keep things simple, we will cover just a few of these features, including sorting, joining, and dealing with missing data. In this section, we learn:

● How to structure your imported data into a form that you would like to see or that JMP will recognize.

● What to do when you have missing data.

Using the Big Class.jmp data table, let’s first take a quick look at the Summary option under the Tables menu. This command enables you to obtain a variety of summary statistics for any column.

1. Select Tables Summary. (See Figure 2.51.) Choose height from Select Columns and Mean from the Statistics menu. (See Figure 2.52.) Select age from Select Columns and place in the Group area, Click OK. This action will generate a new data table, Big Class By (age), with mean heights by age. (See Figure 2.53.)

Figure 2.51 Summary Platform from the Tables Menu

Figure 1.1 Some JMP Help Options

Figure 2.52 Summary Dialog Box

Figure 1.1 Some JMP Help Options

Figure 2.53 Summary Data Table

Figure 1.1 Some JMP Help Options

Sorting

You can sort numeric columns from highest to lowest or lowest to highest. With character columns, you can sort character data by alphabetical or reverse alphabetical order. Using JMP’s sorting option keeps the rows (your sets of observations) intact. Sorting also creates a new JMP data table with the sorted values (if you check Replace table, the sorted values replace the existing data table).

1. Returning to the Big Class data table, Select Tables Sort. In the resulting window, identify which column you want to sort. Select height and click By.

2. Click on the column(s) that you want to sort in the right window (height, in our example) to highlight the column.

3. Select the way you want to sort them, highest to lowest or lowest to highest, using the corresponding triangle icon. (See Figure 2.54.) Click OK. Each entire row is sorted according to the conditions you apply.

Figure 2.54 The Sort Launch Dialog Box

Figure 1.1 Some JMP Help Options

More information about sorting is available in Help JMP Documentation Library Using JMP Ch. 6 Reshape Data.

Joining

The Join option from the Tables menu enables you to combine or merge two or more different data tables into one. If some of the columns in your original data have the same name and type, this is a simple process. If not, there are some handy JMP tools to help you specify how two different data tables can be joined. Let’s look at a simple example:

1. First open the data tables that you would like to join. Select Tables Join.
We will use Trial1.jmp and Trial2.jmp for this example, which can be found at Help Sample Data Open the Sample Data Directory Trial1.jmp. Repeat for Trial2.jmp.

2. The window indicates your active data table (Trial1.jmp) and prompts you to select another data table that you want to merge or join (Trial2.jmp). Select the data table(s) you want to join. (See Figure 2.55.) The column headings of each appear in the Source Columns windows.

Figure 2.55 The Join Launch Dialog Box

Figure 1.1 Some JMP Help Options

3. Decide how you want to join the data under the Matching Specification drop-down menu.

a. By Row Number joins your data side-by-side by its row number.

b. If your data has different column headings or you want to select specific columns to match, use By Matching Columns. Click on a column from each of the Source Columns windows that you would like to match and click Match. You now see each of those selected columns in the Match columns window with an “=” symbol between them.

4. If you want to name the new data table, enter the name in the Output table name box (otherwise, it will be named Untitled), and click OK. (See Figure 2.56.) A new data table appears. (See Figure 2.57.)

Figure 2.56 Join by Row Number

Figure 1.1 Some JMP Help Options

Figure 2.57 The Joined Data Table

Figure 1.1 Some JMP Help Options

Join by Matching Columns

While joining by row number is very easy to do, it relies on the data in the two data tables being in the same row order with no items missing. This might not always be the case. Examine the data tables Candy Bar Sizes.jmp and Candy Nutrition Information.jmp shown below in Figure 2.58. These data tables are available at the author’s book page. As an analyst, you might want to have all this information for the candy bars in a single data table.

Figure 2.58 Data Tables to Be Joined by Matching Columns

Figure 1.1 Some JMP Help Options

Figure 1.1 Some JMP Help Options

A quick look at these data tables reveals that the data is sorted differently in the two tables. For example, Bit-O-Honey candy bar appears in row 7 in the Candy Bar Sizes data table, but in row 14 in the Candy Nutrition Information data table. Joining these data tables by row number would result in incorrect information. In situations such as these, you can join by Matching Columns. In this case, you can join by the name of the candy bar.

To join these two data tables requires a few more steps than joining by row number but will keep the data for each candy bar correctly together in a single row.

1. First, open the data tables, then choose Tables Join.

2. As before, the window indicates the active data table. Choose the data table that you want to join with the active data table (Figure 2.59).

Figure 2.59 Joining by Matching Columns

Figure 1.1 Some JMP Help Options

3. Be sure By Matching Columns is selected, then choose Name from the Candy Bar Sizes table columns and Candy Bar Name from the Candy Nutrition Information table columns.

4. Click the Match button. (See Figure 2.60.)

Figure 2.60 Join Dialog Box with Matching Columns Assigned

Figure 1.1 Some JMP Help Options

Note

If the two data tables do not contain the same number of candy bars or if there may be multiple rows for the same candy bar, you might need to choose whether to drop multiple rows or include non-matches in the resulting data table.

5. Click OK.

The final data table has all of the information for each candy bar. (See Figure 2.61.) You might want to delete the second column of candy bar names in the new data table.

Figure 2.61 Joined Data Table

Figure 1.1 Some JMP Help Options

Missing Data

The Missing Data Pattern window can help you identify the quantity of missing data or whether any patterns exist due to non- response, data importing, or data entry errors. The Missing Data Pattern feature under the Tables menu searches your specified columns and summarizes the frequencies of missing data. To explore this feature, some values from the Big Class.jmp data table have been removed.

1. Select Tables Missing Data Pattern. This generates the window on the right. (See Figure 2.62.)

Figure 2.62 Missing Data Pattern Dialog Box

Figure 1.1 Some JMP Help Options

2. Select the columns in the left panel that you want to search. Select Add Columns and then click OK.

3. This command generates a new Missing Data Pattern table that contains a count of rows that have missing values and a count of rows that have the same missing values among the same column(s). (See Figure 2.63.)

Figure 2.63 Missing Data Pattern Results

Figure 1.1 Some JMP Help Options

Note

You can proceed without addressing missing values, but JMP will by default ignore (or exclude) any rows containing missing values in most analysis platforms. JMP includes “Informative Missing” options in a few platforms that use rows that would otherwise be ignored (JMP Pro has this feature in several platforms). This is important if you have lots of missing data because more usable observations will generally lead to better statistical models.

One solution to missing continuous data is to impute them. Imputing analyzes similar values in other columns and rows to estimate the missing value. JMP has an imputation feature under the red triangle within the Multivariate window. To illustrate, some values from the height and weight columns in Big Class.jmp have been removed:

1. First, run the multivariate platform. Select Analyze Multivariate Methods Multivariate. Select height and weight (the continuous columns), in the Y, Columns window, and then click OK. (See Figure 2.64.)

Figure 2.64 The Multivariate Platform Offers Imputation

Figure 1.1 Some JMP Help Options

2. Click the red triangle next to Multivariate and select Impute Missing Data. (See Figure 2.65.) Note: Like many menu options in JMP, Impute Missing Data only appears in the menu when appropriate – in this case, when data is missing. A new data table is generated with the estimated missing values in place.

Figure 2.65 Launching the Imputation Option

Figure 1.1 Some JMP Help Options

3. Because you can only impute continuous values, cut and paste these columns into your original data table, which might contain other data types. (Alternatively, use Update from the Tables menu.)

Note

JMP provides several methods of imputing missing values. Additional functionality is available in the Analyze menu under Analyze Screening Explore Missing Values. Methods of handling missing values are best selected with the help of an expert.

2.9 Summary

In this chapter, we covered a wide range of topics on getting your data into JMP and learning how to manage it. Because the data table not only stores your data but also stores key information that drives the appropriate analysis and graphs, it serves as the critical starting point for all exploration and visualization within JMP.

Analyzing and visualizing data often requires special features, and there are many advanced features in JMP that we did not address. As we have indicated, your copy of JMP includes extensive documentation, which you can access through the JMP Documentation Library under the Help menu. We recommend the Using JMP book for a complete discussion of data, data tables, and the Tables menu.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset