Chapter 8
Informing Pharmaceutical Sales and Marketing

Six Sigma is often positioned as a project-based approach. Certainly, using identifiable and well-defined projects aligns well with approaches to managing change, and change management can be an important aspect in spreading the use of data within a company. But, insofar as Visual Six Sigma makes data analysis lean, it has an important role to play both within companies whose operations and organizations do not (or cannot) support the traditional project infrastructure, as well as for projects that, by their nature, do not require a formal project structure. This scenario looks at one such “project” set in one such company.

You have recently been hired as the sales manager for PharmaInc and are now responsible for their U.K. operations. PharmaInc markets drugs that address several therapeutic areas. Although you have a lot of sales experience, as a new hire you are still learning about how PharmaInc's own sales operations run and perform.

Executive management perceives that PharmaInc has lost ground recently. Consequently, you have been asked to look into the performance of your sales force. Specifically, you have been asked to:

  • Verify the claims of the marketing group that a limited promotion they ran for PharmaInc's major product from May through December of 2014 was very successful.
  • Determine whether there are regional differences in the sales performance of PharmaInc's major product.

In addition, you see this as an opportunity to learn more about what actually happens in the field.

To answer the questions posed above, you download company data on the monthly performance of sales representatives for May through December of 2014. The data table is fairly large, containing 95,224 rows and 16 columns. The data consist of a number of nominal demographic variables as well as a few continuous and ordinal variables. One of the variables, the number of prescriptions for PharmaInc's product, is the major Y of interest.

Your data is typical of large observational data sets. As such, you first spend some time assessing and dealing with the quality of the data. You move on to obtaining a better understanding of the deployment of your sales force using a bubble plot to display the location of physician's practices and see who has been selling to them. In answering the specific questions posed, you use summary tables, oneway analyses with comparison circles, regression plots, and even an animated bubble plot. (See Exhibit 8.1 for a full list of the JMP Platforms and Options you will encounter in this Chapter.)

Exhibit 8.1 Platforms and Options in This Case Study

Menus Platforms and Options
Tables Summary
Missing Data Pattern
Rows Color or Mark by Column
Cols Column Info
Column Properties
Formula
Hide/Unhide
Exclude/Unexclude
Columns Viewer
Group Columns
Analyze Distribution
    Histogram
    Frequency Distribution
Fit Y by X
    Bivariate Fit
        Fit Line
        Fit Polynomial
    Oneway
        Box Plots
        Compare Means
Tabulate
Fit Model
    Standard Least Squares
    Random Effects (REML)
Quality and Process Control Chart Builder
    Process Capability Analysis
Graph Graph Builder
Bubble Plot
Other Local Data Filter

After the investigation, you find that the 2014 marketing promotion was, indeed, successful, and you are able to identify some regional differences in sales effectiveness. You also uncover what seems to be an uncanny and suspicious adherence to an unstated operational rule that each physician should be visited once each month.

The upshot of your single-handed analysis is that you can easily answer the questions posed by your executive, using results and arguments supported by data and informed by visual displays that make it easy to get you r points across. In addition, you uncover a number of interesting facts that will help you to manage and deploy your sales force more effectively in the future.

Had you simply used a spreadsheet, by far the most common tool used for such ad hoc analyses, you would not have arrived at your conclusions so quickly, if at all. You certainly would have struggled to construct informative graphical displays and to apply appropriate statistical techniques, and it is very likely that you would not have noticed the one-visit-per-month working practice. In short, it is probable that you would not have gained the useful new insights that you obtained from your quick visual analysis. It would also be harder to reproduce a similar analysis at a later date.

The platforms and options used by you and your team are listed in Exhibit 8.1. The data sets are available at http://support.sas.com/visualsixsigma.

SETTING THE SCENE

You are the sales manager for PharmaInc, responsible for their U.K. operations. PharmaInc markets drugs that address several therapeutic areas, competing for market share and revenue with other companies doing the same thing. You have only been working for PharmaInc for three months, having previously been a senior account executive with a larger company. Although, as an industry veteran, you are very familiar with the general landscape, you have yet to come to terms with the nuances of exactly how PharmaInc's operations run and perform.

Even though the markets view PharmaInc as generally doing well, executive management thinks that the company has lost ground over the last couple of years. So, early in 2015, you are chartered to look into the recent performance of your sales force to shed some light on this perceived decline.

More specifically, your manager, the vice president (VP) of sales and marketing, has asked you to:

  • Verify the claims of the marketing group that a limited promotion they ran for PharmaInc's major product from May to December of 2014 was very successful.
  • Determine whether there are regional differences in the sales performance of PharmaInc's major product.

You also see this investigation as a chance to get better acquainted with exactly what happens in the field, and why.

Outside of work, you are a keen golfer, and regularly play against Arthur, an engineering manager in a high-tech manufacturing company. While you were in your previous position, Arthur had convinced you to use JMP for data analysis. Knowing how visually oriented JMP is, you see that it will be very useful in exploring the data relating to the task at hand. You also know that, if you need help, you can count on your golfing buddy to give you guidance!

COLLECTING THE DATA

Each PharmaInc sales rep works his or her own territory, which is an agglomeration of U.K. postal codes lying wholly within a specific region of the United Kingdom. A given territory contains physicians working in practices, and each physician has a particular medical specialty. The physicians write prescriptions for patients who visit their practices, aiming to provide them with the best possible care. A sales rep will periodically make calls on a prescribing physician to promote the use of PharmaInc's products, and reps from competing companies do the same thing. During a visit, if a promotion is running, the sales rep may leave behind a free promotional sample kit.

You download company data on the performance of the sales reps for May through December of 2014 into a Microsoft® Excel spreadsheet. You realize that there has likely been some turnover in the sales force in 2014, but that this was probably minor. For your purposes, you decide that you will only extract data about sales reps who worked for the entire eight-month period.

You easily import your Excel data into a single JMP table called PharmaSales_RawData.jmp. The resulting data table contains 95,224 rows and 16 columns. A partial view of the data table is given in Exhibit 8.2.

Image described by caption/surrounding text.

Exhibit 8.2 Partial View of PharmaSales_RawData.jmp

The columns are listed and described in Exhibit 8.3. Note that there are four pairs of ID and corresponding Name columns. The ID columns are a coding of the Name columns so that one of each pair is redundant.

Exhibit 8.3 Column Descriptions for PharmaSales_RawData.jmp

Column Name Description
Date Month and year of current record
SalesrepID Sales representative identifier
Salesrep Name Name of the sales representative
RegionID Region identifier
Region Name Name of the region
PhysicianID Physician identifier
Physician Name Name of the prescribing physician
Physician Specialty Specialty of physician
PracticeID Practice identifier
Practice Name Name of the practice
Postcode Location (postal code) of the practice
Practice Latitude Latitude of the practice
Practice Longitude Longitude of the practice
Visits Number of visits made by the sales rep to this physician this month
Visits with Samples Number of visits during which a promotional sample kit is left by the sales representative with this physician this month
Prescriptions Number of prescriptions for PharmaInc's product written by this physician this month

As indicated in Exhibit 8.3, each row in Exhibit 8.2 provides the number of events in a particular month (visits made, visits when a sample kit is left, prescriptions written). Each record or row is uniquely defined by the combined values of Date, Salesrep Name, and Physician Name. Note that we assume that each sales rep has more than enough sample kits, so if a kit is not left during a visit, it is not because they were out of supply.

Recall that we use Ys to represent responses and Xs to represent variables that might impact the Ys. Prescriptions is the main Y variable of interest. Visits and Visits with Samples may be considered as input or X variables (because they are under the control of the sales reps and are expected to have an impact on Prescriptions) or as Y variables (since they are the outcomes of past choices made by the sales reps). Aside from Practice Latitude and Practice Longitude, which will be used for mapping purposes, the other variables are either ancillary variables, or descriptive (stratifying) variables that relate to how your sales system is currently configured and operated and the environment in which it functions.

VALIDATING AND SCOPING THE DATA

Before starting to understand how the data can help answer the questions raised by your VP, your first goal is to get a feel for what is actually in the data.

Questions such as the following come to mind:

  • How many sales reps are there and where do they operate?
  • How many practices are there?
  • How many physicians are in each practice?

In your initial review of the data, you also want to look at the general quality of the data, since you have seen problems with this in the past and are reminded of the old saying, “Garbage in, garbage out.”

Preparing the Data Table

You decide to arrange the data table in a way that will facilitate your data analysis. You will document and rearrange your data table and save it in a new file called PharmaSales.jmp.

The steps described in this section illustrate some features of JMP that are useful for documenting and streamlining your work. For a quick-and-dirty analysis, the steps in this section are unnecessary. But when you are asked to conduct a similar analysis a year from now, you will see the value of having documented your work. That said, we leave you to judge whether to follow this section. Feel free to skip ahead to “Dynamic Visualization of Variables One and Two at a Time.” Otherwise, to follow along, open the file PharmaSales_RawData.jmp.

Defining the Notes Column Property

First, to document the data, you insert a Notes property in each column describing the contents of that column using the information in Exhibit 8.3.

Excluding and Hiding Columns

Next, you remember that the four ID columns are redundant. You do not want to delete them, so you hide them so that they do not appear in the data table grid and exclude them so they don't appear in variable selection lists for analyses.

For convenience, you group the four ID columns into a new IDs column group and move this to the bottom of the Columns panel.

Exhibit 8.4 is a partial view of the data table after defining the IDs column group.

Saving the Data Table

Finally, save the data table with the new name PharmaSales.jmp (select File > Save As, enter the desired table name, and press Save), or close it without saving and open the same table from the Journal.

Dynamic Visualization of Variables One and Two at a Time

We continue our analysis with PharmaSales.jmp, the data table that you just prepared for analysis. (You will notice that the table PharmaSales.jmp linked to in the book's Journal contains scripts for convenience. If you prefer to replicate this work on your own, then continue using your own instance of PharmaSales.jmp, the data table obtained from PharmaSales_RawData.jmp, after following the steps in the previous section, “Preparing the Data Table”.)

Using Column Viewer and Distribution

To get an idea of the complexity of the data, you first obtain the high-level view of all the variables in the table shown in Exhibit 8.5. Then you produce linked histograms and bar charts of all the variables that do not have a large number of categories. You remove the numeric summaries in Distribution to focus on the graphical displays. (Variables with a large number of categories are difficult to assess using a bar chart representation because there is one bar per category.)

Image described by caption/surrounding text.

Exhibit 8.5 A High-Level View of the Variables in PharmaSales.jmp

From the information in reports partially shown in Exhibits 8.5 and 8.6, you observe the following:

  • Date. JMP stores dates internally as the number of seconds since January 1, 1904. Any summary statistics such as Min, Max, Mean, and Std Dev reflect this internal representation and are not immediately interpretable unless you apply an appropriate format. Nonetheless, the histogram indicates that eight months of data are represented and that there are roughly equal numbers of rows for each month. You also see that there are no missing values, so each of the 95,224 rows has a value for date.
  • Salesrep Name. There are 102 sales reps represented.
  • Region Name. There are nine regions, and Northern England and Midlands have the most rows (the longest bars).
  • Physician Name. There are 11,833 physicians represented.
  • Physician Specialty. There are 15 specialties represented, and there are 368 rows for which this information is missing.
  • Practice Name. There are 1,149 practices.
  • Visits. There can be anywhere from 0 to 5 visits made to a physician in a given month. Typically, a physician receives one or no visits. There are 975 rows for which this information is missing.
  • Visits with Samples. There can be anywhere from 0 to 5 visits with samples made to a physician in a given month. Typically, a physician receives one or no such visits. There are 62,156 rows for which this information is missing.
  • Postcode, Practice Latitude, Practice Longitude. There is no missing data and the values seem to make sense.
  • Prescriptions. The distribution for the number of prescriptions for PharmaInc's product written by a physician in a given month is skewed, with a long “tail” of high values. The average number is 7.40, but the number can range from 0 to 58.

Note that Visits and Visits with Samples have been defined in the table with Ordinal Modeling Type in anticipation of the fact that both contain count values with relatively small counts. Even though Prescriptions also contains counts, the Continuous Modeling Type is used. For this variable, the use of Ordinal rather than Continuous Modeling Type would make for unwieldy displays given the relatively large number of values.

Using Local Data Filter for Descriptive Variables

Now that you have taken a preliminary look at all of data, you are ready to study some of the key descriptive variables more carefully. These include Date, Salesrep Name, Region Name, Practice Name, and Physician Specialty. Given the large number of physicians, you decide not to include Physician Name in this list.

You can think of these variables in two ways; use Region Name as an example. First, Region Name allows you to stratify the data, namely, to view all other variables in terms of the layers, or strata, defined by this variable. Using it as a stratification variable allows you to look for interesting and important differences in the data between the levels (for instance, how Prescriptions change between Midlands and Scotland).

However, you can also think of such variables as chunking variables.1 Use of this term emphasizes the fact that the data values arise from the outcomes of large, indiscriminate groupings of (presumably) very specific root causes active within each stratum. Each of these variables will require further investigation if it turns out to be of interest. For example, an identified difference between Midlands and Scotland could be caused by any combination of the following more specific causes within these regions: the educational, financial, or social level of patients; the knowledge level or specialties of physicians; the causes of underlying medical conditions; the hiring practices and management of sales representatives, or their training or attitude; and so on. Most importantly, you note that such causes are not usually represented directly in the data you have.

You will use the Local Data Filter, in conjunction with Distribution, to investigate the other four variables.

Using the report you have generated, you can investigate the relationships between the five chunking variables. You quickly discover, for example:

  • The names of the sales reps who work in Northern England
  • The names of the practices located in Northern England
  • The physician specialties represented in Northern England
  • The practices that handle the Internal Medicine specialty

Using the Local Data Filter for Response Variables

Having developed a feel for some of the chunking variables describing sales operations, you now turn your attention to the monthly outcomes. You will use the Local Data Filter to explore the three outcome variables by month.

By interacting with the report in Exhibit 8.8 for Visits, Visits with Samples, and Prescriptions, you easily find that:

  • The numbers of monthly Visits by sales reps run from 0 to 5.
  • About 23 percent of the time, a physician is not visited in a given month. About 65 percent of the time, a physician receives exactly one visit in a given month. However, about 12 percent of the time, a physician receives two or more visits in a given month.
  • There are 975 rows for which a value for Visits is missing.
  • There are 62,156 rows missing a value for Visits with Samples, probably because the promotion was limited in scope.
  • For the rows where Visits with Samples was reported, about 37 percent of the time, no sample kit was left.
  • The monthly numbers of Prescriptions written by each physician vary from 0 to 58.
  • Generally, physicians write relatively few prescriptions for PharmaInc's major product each month. They write six or fewer prescriptions 50 percent of the time.
Histograms for Exploring the Three Outcomes Variables Filtering by Month.

Exhibit 8.8 Exploring the Three Outcomes Variables Filtering by Month

In some cases, these findings are simply confirmation of what you discovered earlier. By adding the Local Data Filter, you can also gain a first impression of whether, and how, outcomes have changed over time. The updating report confirms to you that there is no large month-to-month variation in any of the outcomes.

Using Graph Builder

You can also confirm that there is no large month-to-month variation in the outcomes using the more conventional Graph Builder representation in Exhibit 8.9.

Snapshot of Graph Builder Plot for Three Outcomes by Month.

Exhibit 8.9 Graph Builder Plot for Three Outcomes by Month

Using the Local Data Filter to Find Relationships

To take a first look for any relationships between the five chunking variables and three outcomes, you combine Exhibits 8.7 and 8.8 into a single display and use dynamic linking. Given that Date (or Month) does not have a large effect, you choose to filter the report by Region Name, to produce Exhibit 8.10.

Histograms for Exploring Relationships between Region and Outcome Variables Using Distribution and Local Data Filter.

Exhibit 8.10 Exploring Relationships between Region and Outcome Variables Using Distribution and Local Data Filter

By clicking on the bars in the report and using the Local Data Filter, you can see, for instance, whether certain regions are associated with larger numbers of visits than others, or whether certain physician specialties tend to write more prescriptions. You note that Midlands is associated with a large proportion of Visits with Samples and that it has a bimodal distribution in terms of Prescriptions. But you do not see any other convincing relationships at this point and realize that you may need to aggregate the data in some way to better see relationships.

First, however, you want to get a better sense of the quality of the data. Then you will move on to your private agenda, namely, to try to understand how your sales force is deployed.

Missing Data Analysis

Earlier you noted that some variables have a significant number of missing values. You would like to get a better understanding of this phenomenon. Exhibit 8.11 is obtained using Tables > Missing Data Pattern and selecting all the columns (saved script is Missing Data Pattern).

Image described by caption/surrounding text.

Exhibit 8.11 Partial View of Missing Data Pattern Table

In the Missing Data Pattern table, consider the columns from Date to Prescriptions. Each of these columns contains the values 0 or 1, with a 0 indicating no missing data and a 1 indicating missing data. You observe that the Patterns column is a 12-digit string of 0s and 1s formed by concatenating the entries of these columns. A 0 or 1 is used to indicate if there are missing data values in the column corresponding to that digit's place in the string.

The table has eight rows, reflecting the fact that there are eight distinct missing data patterns. For example, row 1 of the Count column indicates that there are 32,917 rows that are not missing any data on any of the 12 variables. Row 2 of the Count column indicates that there are 60,967 rows where only the 11th variable, Visits with Samples, is missing. Row 3 indicates that there are 347 rows where only the two variables Visits and Visits with Samples are missing.

You notice the saved scripts Treemap and Cell Plot in the Missing Data Pattern table, but decide not to run them because the main message is already clear: You need to find out if data on Visits with Samples was only entered for those locations where the promotion was run.

Before picking up the phone, you decide to see if the data offer any further clues about this issue. Go to the data table PharmaSales.jmp and run Analyze > Distribution on Region Name. In the Missing Data Pattern data table, select rows 1 and 5. These are the rows where Visits with Samples is not missing. Back in the Distribution plot, you see that Visits with Samples is not missing for only three regions: Southern England, Northern Ireland, and Midlands.

You call the VP of sales and marketing, whose administrative assistant researches the question. Within a few hours, you receive a phone call confirming that the promotion was indeed run in only the three regions you identified. To record this in the data table you make a new column in PharmaSales.jmp called Promotion?.

Further, the IT associate who was responsible for loading data from the promotion confirms that, for those six regions where the promotion was not run, the Visits with Samples field was populated with missing values. You ask her to modify the data recording protocol so that, in the future, such values are logged as zero rather than missing. She also confirms that, in the three regions where the promotion was run, missing values appear when the sales reps did not report whether a sample kit was left on a given visit. Given what you have seen so far, you think this was a relatively rare occurrence, but decide to confirm this with a summary table (Exhibit 8.12).

Snapshot showing the Missing Values of Visits with Samples by Region Name.

Exhibit 8.12 Missing Values of Visits with Samples by Region Name

You feel much better upon seeing this result, since the largest number of missing values in any of the three regions where the promotion was run is only 218.

It's time for a little housekeeping. You select Window > Close All Reports to close any reports that are open, and close all open data tables other than PharmaSales.jmp. Finally, you deselect any selected rows by selecting Rows > Clear Row States (or click in the lower triangle at the upper left of the data grid).

UNCOVERING RELATIONSHIPS

Dynamic Visualization of Sales Reps and Practices Geographically

From the Distribution plots you have seen, you know that most of the sales rep activity is in Northern England and the Midlands. You are curious to see a geographical picture showing the practices being called on by your sales reps, and how these are grouped into regions. Thinking about the fact that sales reps serve several practices, and that each practice is only served by a single sales rep, you realize that a bubble plot may be a useful way to show this information, since it allows you to view hierarchical data (Exhibit 8.13).

Partial View of Sales Reps and Practices in the territory of Yvonne Taketa.

Exhibit 8.13 Sales Reps and Practices (Partial View)

Exhibit 8.13 shows you that Yvonne Taketa has a territory that appears to be rather wet. Select the bubble corresponding to Yvonne Taketa and press the Split button. This reveals all of her practices, which are indeed all on dry land! Splitting bubbles in this way allows you to drill down from the summary level data (Salesrep Name) to the detailed level data (Practice Name), so that you gain insights at both levels of the data hierarchy. You can see, for instance, that Lawrence Goetsche (in Wales) has some far-flung practices, so he probably enjoys driving.

Dynamic Visualization of Prescriptions with a Tabular Display

At this point, you decide to create a listing to show which physicians are prescribing PharmaInc's major product. Because a static list will be rather long and unmanageable, you use the Local Data Filter in conjunction with Tabulate to produce Exhibit 8.14.

Partial View of Prescriptions by Physicians at Practices, Filtered by Salesrep Name.

Exhibit 8.14 Prescriptions by Physicians at Practices, Filtered by Salesrep Name (Partial View)

The resulting table is partially shown in Exhibit 8.14. Scroll through the list of salesrep names to see that there are some sales reps who call on just a very few physicians, whereas some are calling on very many. You assume this is due to some sales reps working part time, and make a note to check this with the HR Department.

At this point, you conclude that you have a clean set of data and a good grasp of its content. You also feel that you have a better idea of how your sales force is deployed. Using JMP's visual and dynamic capabilities has been very helpful in this pursuit, and you have very quickly learned things you could never have learned using a spreadsheet. Now, finally, you are ready to roll up your sleeves and to address the business questions posed by your Executive.

INVESTIGATING PROMOTIONAL ACTIVITY

Your first task is to verify the marketing group's claims that the test promotion they ran in 2014 was successful. You know that the promotion was run in three regions, Midlands, Southern England, and Northern Ireland, and it was run for the full year of 2014.

Your first thought is to look at total prescriptions written per physician across the eight months for which you have data to see if the three promotion regions stand out. To accomplish this, you must first construct a summary table giving the sum for relevant variables over the eight-month period. You then intend to compare regions in terms of the 2014 physician totals, taking into account physician variability. But you are also interested in how prescription totals vary by region when considering the number of visits by region. Do the promotional regions stand out from the rest in terms of prescriptions written if we account for the numbers of visits per physician?

Preparing a Summary Table

Given your discovery that Region Name is one of the key chunking variables, you decide to color and mark rows in PharmaSales.jmp by this variable so that any further displays that show rows as points will color and mark these points appropriately. You do this by selecting Rows > Color or Mark by Column (saved script is Color and Mark by Region Name).

You construct the Summary table shown in Exhibit 8.15 and note that the row colors and markers are inherited from PharmaSales.jmp.

Snapshot showing the PharmaSales By (Physician Name, Region Name, Salesrep Name).

Exhibit 8.15 PharmaSales By (Physician Name, Region Name, Salesrep Name)

Uncovering Relationships: Prescriptions versus Region with Fit Y by X

Your first thought is to use Fit Y by X to investigate the relationship between Sum(Prescriptions) and Region Name to see if the promotion regions had significantly more prescriptions written than did the non-promotion regions (see Exhibit 8.16).

Snapshot showing the Oneway Plot of Sum(Prescriptions) against Region Name.

Exhibit 8.16 Oneway Plot of Sum(Prescriptions) against Region Name

You note that that Midlands and Northern England have comparatively large numbers of rows. But remember that Southern England, Midlands, and Northern Ireland are of particular interest because this is where the promotion was run. Looking at the report again, it does appear to you that more prescriptions were indeed being written in these three regions!

You think about how you can verify this finding more formally. Given that there are nine different regions, you decide that the All Pairs, Tukey HSD (Honestly Significant Difference) test should be used rather than Each Pair, Student's t because there will be pairwise comparisons of nine regions. There are (9 × 8) / 2 = 36 possible pairs to compare. You want to control the overall false alarm rate for the set of 36 tests to 0.05. If you had chosen the Each Pair, Student's t option, each individual comparison would have a 0.05 false alarm rate. Over 36 tests, these individual false alarm rates could combine to a very large overall false alarm rate (about 1 – (1 – 0.05)36 = 0.84, or 84 percent). Note that JMP refers to the false alarm rate as error rate. Using All Pairs, Tukey HSD also produces the comparison circles as an aid to the interpretation of the test results (the rightmost panel in Exhibit 8.16).

You remind yourself of how the comparison circles work. There is one circle for each value of the nominal variable (so here, there is one circle for each region). Each circle is centered vertically at the mean for the category to which it corresponds. When you click on one of the circles, it turns a bold red, as does its corresponding label on the x-axis. Each other circle either turns bold gray or normal red (but not bold red). The circles that turn gray correspond to categories that significantly differ from the category that corresponds to the chosen, bold red, circle. The other circles that turn red correspond to categories that do not significantly differ from the chosen category.

With this in mind, you click on the tiny topmost circle. You see, from the bold red label on the graph, that this circle corresponds to Midlands. (You infer that it is tiny because Midlands has so many observations.) Once that circle is selected, you see that all of the remaining circles (and graph labels) are gray. This means that Midlands differs significantly from all of the other regions. Technically stated, each of the eight pairwise tests comparing the mean of Sum(Prescriptions) for Midlands to the mean Sum(Prescriptions) for the other regions is significant using the Tukey procedure.

Next, you click on the big circle that is second from the top, shown in Exhibit 8.16. You are not surprised that this corresponds to Northern Ireland—the circle is large because there are so little data for Northern Ireland, compared with the other regions. All the other circles turn gray except for the very small circle contained within, but near the top of, Northern Ireland's circle.

That small circle corresponds to Southern England. This is easy to see on a computer monitor, where the label for Southern England is red on the graph. With gray-scale printed output, one must look carefully at the labels on the graph. Note that the label for Northern Ireland is bold. Labels that correspond to gray circles (significantly different from the bold label) are italicized. Those that do not differ significantly from the bold label are not italicized. Note that the label for Southern England is the only label, other than Northern Ireland's, that is not italicized.

You look around in the output under the Means Comparisons heading, and discover a table that summarizes the significant differences among regions (see Exhibit 8.17, where the table of interest is enclosed in a rectangle). You see that this table divides the nine regions into six groups, based on an associated letter, and that these groups differ significantly from the others. Midlands (letter A) has significantly more sales than all of the other groups. Next come Southern England and Northern Ireland, both of which are associated with the letter B and so can't be distinguished statistically; however, both of these have significantly more sales than the regions associated with the letters C, D, E, and F. The smallest numbers of sales are associated with Wales and Northern England (letter F).

Illustration showing the Significant Differences Summary Table for Sum(Prescriptions) by Region Name.

Exhibit 8.17 Significant Differences Summary Table for Sum(Prescriptions) by Region Name

You are excited about these results! You have learned that the three promotional regions differ significantly from the six nonpromotional regions in terms of total prescriptions written over the eight months. You have also learned that Midlands had significantly more prescriptions written than did Northern Ireland or Southern England, even though they all ran promotions. You are intrigued by why this would be the case and make a note to follow up at your next meeting with the sales reps.

You also note that there are significant differences in the nonpromotional regions as well. Northern England and Wales have the smallest numbers of prescriptions written, and you want to understand why this is the case. But, you keep firmly in mind that many factors could be driving such differences. It is all too easy to think that the sales reps in these two regions are not working hard or smart enough. But, for example, there could be cultural differences among patients and physicians, making them less likely to request or prescribe medications. There could be age, experience, or specialty differences among the physicians. You realize that many causal factors could be driving these regional differences.

Uncovering Relationships: Prescriptions versus Region with Fit Model

At your next golf outing, you discuss these results with your golfing buddy, Arthur. He indicates that there could be a problem with conducting these statistical tests on the data in your summary table. Each row represents the total number of prescriptions written by a given physician. But the number of prescriptions written by a given physician might (you hope!) be influenced by the physician's sales rep. This means that the values of Sum(Prescriptions) in each row may not be independent, as required by the Means Comparisons test that you utilized. Rather, they are potentially correlated: Sum(Prescriptions) within a sales rep are likely to be more similar than Sum(Prescriptions) between sales reps. In fact, your buddy points out that, in technical terms, the variable Physician Name defines a subsample of Salesrep Name.

Arthur tells you that there are two options if you want to be sure that the assumptions behind your statistical approach are more appropriate for these data: Summarize the data over sales reps and then utilize comparison circles as before, or construct a model that accounts for the subsampling. You are intrigued by this second approach and so ask Arthur to show you how to do this.

Arthur reminds you that because a given sales rep only works in one region, his or her effect on another region cannot be assessed. He points out that because the sales reps are nested within region in this way, only the variability contributed by sales reps can be estimated in a statistical model.

Arthur explains to you that one of the many things Fit Model can do is fit these so-called random effects models. He completes the Fit Model dialog for you as shown in Exhibit 8.18. After some thought, you realize that completing the dialog this way will mean that the individual observations for Physician Name will be treated as the data values for each sales rep. In other words, the role of Physician Name as defining subsamples of Salesrep Name is built into this model.

Snapshot of the Fit Model Dialog for Model with Subsampling.

Exhibit 8.18 Fit Model Dialog for Model with Subsampling

When you run the model, you obtain the output in Exhibit 8.19. The very small value for Prob > F in the Fixed Effect Tests report indicates that Region Name is significant. The REML Variance Component Estimates report indicates that Salesrep Name contributes only about 1 percent of the variation (with a variance component of about 2.1). It follows that, within a region, most of the variation is due to physician differences. This indicates that the earlier conclusions you drew from your simplified analysis in Fit Y by X are actually not misleading in this case.

Snapshot of the Fit Model Report.

Exhibit 8.19 Fit Model Report

Arthur points out that you can also obtain an analog of comparison circles from the Fit Model report, shown in Exhibit 8.20.

Illustration showing the Tukey HSD Pairwise Comparisons for Random Effects Model.

Exhibit 8.20 Tukey HSD Pairwise Comparisons for Random Effects Model

The test results are represented in matrix form as shown in Exhibit 8.20. The first row in each cell of the matrix gives the mean difference between the two groups. You note that some differences are large. Regions with results in gray (red on the screen) are significantly different (with a false alarm rate of 0.05). If the results are in bold (black on the screen), the difference is not statistically significant. The summary table below the matrix summarizes the differences. You note that these conclusions are exactly the same as obtained using your simplified approach. But, had the data contained more variability within region due to Salesrep Name, these conclusions may well have differed from those you obtained earlier.

With this, you thank Arthur for his help. You feel much more comfortable now that you have a more rigorous way to analyze your data. However, you reflect that you learned a lot from the simple, slightly incorrect, comparison circle analysis, which would also be much easier to explain to your VP. You close all report windows, but keep the summary table open.

Uncovering Relationships: Prescriptions versus Visits by Region

Reflecting on the regional differences, you start to wonder if the number of visits has an effect on a physician's prescribing habits. Do more visits tend to lead to more prescriptions for PharmaInc's major product? Or, do physicians tire of visits by sales representatives, leading perhaps to a negative effect? Do more sample kits lead to more prescriptions, as one might hope?

You proceed to get some insight on these questions by running Fit Y by X again, producing Exhibit 8.21.

Illustration of Linear Bivariate Fit of Sum(Prescriptions) versus Sum(Visits).

Exhibit 8.21 Linear Bivariate Fit of Sum(Prescriptions) versus Sum(Visits)

You remind yourself that the x-axis represents the total number of visits paid to the given physician by the sales rep over the eight-month period. It does appear that, at least to a point, more visits result in more prescriptions being written. Given the pattern of points on the graph and the fact that, at some point, additional visits will have diminishing returns, you suspect that there may be a little curvature to the relationship.

So, you start thinking about a quadratic fit. But, given the regional differences found already, you would like to see such a fit for each of the nine regions, that is, by the values of Region Name.

To generate the output shown in Exhibit 8.22, select Group By from the Bivariate Fit red triangle menu, select Region Name, and click OK. Then, from the Bivariate Fit red triangle menu, select Fit Polynomial > 2,quadratic (saved script is Bivariate Fit 1).

Snapshot showing the Quadratic Fits to Each of the Nine Regions (Southern England Selected).

Exhibit 8.22 Quadratic Fits to Each of the Nine Regions (Southern England Selected)

When you see this, you are immediately struck by the three curves that start out fairly low in comparison to the others, but which exceed the others as Sum(Visits) increases (within the range of most of the data). You can tell from the color-coding on the screen that these curves are for the three promotional regions. This suggests that leaving sample kits behind makes a big difference. You make a mental note to run this analysis with Sum(Visits with Samples) as X, Factor in a few minutes.

The other six regions may or may not see increased prescriptions with increased visits. You quickly peruse the Prob > F values for the model tests in the Analysis of Variance tables (realizing that these tests suffer from the same deficiency as did the HSD-tests in your previous Fit Y by X analysis and so are not technically correct). Models that are associated with Prob > F values less than 0.05, which are marked by asterisks, are considered significant. You see that for Eastern England, Southwest England, and Wales, the p-values do not indicate that the quadratic fits are significant (nor are linear fits, which you also check). This suggests that, in these regions, the sales reps may need to do something other than increase visit frequency if they wish to increase sales.

Picking up the thought about the efficacy (or otherwise) of leaving promotional kits behind with physicians, you quickly generate Exhibit 8.23 (saved script is Bivariate Fit 2).

Snapshot showing the Bivariate Plot for Promotional Regions, with Sum(Visits with Samples) on X Axis.

Exhibit 8.23 Bivariate Plot for Promotional Regions, with Sum(Visits with Samples) on X-Axis

Since only the three promotional regions have non-missing data for Visits with Samples, these are the only three regions for which fits are possible. All three polynomial fits are significant. It seems clear that, over the range of the number of visits studied, the more frequently the sales rep visits and leaves sample kits behind, the larger the number of prescriptions written. This analysis provides more evidence that the promotion was indeed successful!

Of course, it raises the question about tradeoffs: When does one reach the point of diminishing returns relative to the number of visits with sample kits being left and the costs of running the promotion? This is an important marketing question. You decide to do some thinking and reading about how to address this question before raising it with your VP. For now, though, you have strong evidence that the promotion was successful and that you can use Exhibits 8.17, 8.22, and 8.23 to communicate this to your VP. You close your summary table and all open reports.

A DEEPER UNDERSTANDING OF REGIONAL DIFFERENCES

Your next charge is to understand more about the regional differences in performance of sales of PharmaInc's major product. You realize that you have already gone a long way toward answering this question. Yes, there are statistical differences between regions, as indicated by the oneway comparison circles in the section “Uncovering Relationships: Prescriptions versus Region with Fit Y by X.” You can say that, in terms of total prescriptions written over the eight-month period, the regions fall into five or six groupings, as shown in Exhibits 8.17 and 8.20. The important questions now revolve around the practical importance of such differences and which chunk variables are related to the differences.

So at this point, you want to go beyond simply the counts of prescriptions written. Again, you have already done some of the work to understand the relationship between the number of prescriptions and the number of visits. Now you want to see if you can gain additional insight by seeing this from the perspective of your sales reps, and by taking into account behavior over time. In fact, when you think about this, you realize that you would like to see, on a year-to-date basis, how the number of prescriptions written by each sales rep's physicians is related to the total number of visits. But you would also like to be able to visualize the effect of the total number of physicians assigned to a sales rep and to easily identify the sales rep's region. In short, you want a more informative version of the scatterplots presented in the bivariate reports in “Uncovering Relationships: Prescriptions versus Visits by Region” section.

You have seen demonstrations of JMP's Bubble Plot using animation to show changes in the scatter of points over time. You quickly realize that you must do some data preparation before using such a visualization. So you summarize your data across sales reps and define year-to-date summaries of Prescriptions and Visits, producing the table partially shown in Exhibit 8.24.

Image described by caption/surrounding text.

Exhibit 8.24 Partial View of Summary Table with YTD Aggregation

You recall there are 102 sales reps (Exhibit 8.5). There should be eight YTD values for each sales rep, so the summary table you just made should contain 816 rows, which it does.

Now you are ready to construct an animated bubble plot. You decide to look at the relationship between Prescriptions YTD (Y) and Visits YTD (X), with bubbles sized by Number of Physicians and colored by Region Name, over the time period defined by Date. You enter Region Name and Salesrep Name as ID columns, in that order. This will allow you to use the drill-down capability of the bubble plot (see Exhibit 8.13). After completing the Bubble Plot dialog, you obtain Exhibit 8.25 (showing the relationship between Prescriptions YTD and Visits YTD for May 2014). The saved script is Bubble Plot 1.

Snapshot showing the Bubble Plot for Prescriptions YTD versus Visits YTD.

Exhibit 8.25 Bubble Plot for Prescriptions YTD versus Visits YTD, May 2014

Now you take a moment to figure out what is being plotted. The vertical center of each bubble is at the average of Prescriptions YTD for the given region. The horizontal center is at the average of Visits YTD for the given region. The sizes of the bubbles are proportional to the number of physicians in the regions. By using the Date slider at the bottom left of the plot, you can follow the relationship through the eight-month period to December 2014.

Notice that you can animate the plot by clicking the right-pointing arrow in the animation controls at the bottom left of the window. You do this and find it interesting that the two top regions, as shown in the December plot, achieve the same general average prescription totals, yet one requires many more visits, on average, than the other.

To help find out which regions these are, you select Label > All from the red triangle menu in the plot report. The labels appear on the plot and you see that the regions of interest are Midlands and Northern England. Now, Midlands was part of the promotion, while Northern England was not. You find it striking that Midlands achieved roughly the same mean number of prescriptions per sales rep as did Northern England with far fewer visits. But, you note, the Midlands, whose bubble is smaller than Northern England's, evidently has fewer physicians (it looks to be about half as many as in Northern England).

With this in mind, you click the right-pointing arrow to restart the animation. You notice that Midlands and Northern England tend to have the same general mean level of Prescriptions YTD over time. But, you think, Northern England does have more physicians, and so perhaps those sales reps have a larger physician workload than do the sales reps in Midlands, and so have to make more visits. Perhaps you should be looking at year-to-date prescriptions and year-to-date visits per physician?

But first, you want to look at this plot with the bubbles split by Salesrep Name. You go to the red triangle, select Label > None, and then go back to the red triangle menu and select Split All. Now, for each month, you see a single bubble for each sales rep. You animate the plot and observe what is happening. (Exhibit 8.26 shows the plot for December 2014.)

Snapshot showing the Bubble Plot for Prescriptions YTD versus Visits YTD, Split by Salesrep Name.

Exhibit 8.26 Bubble Plot for Prescriptions YTD versus Visits YTD, December 2014, Split by Salesrep Name

Looking at the plot over time, you find it interesting that the sales rep bubbles within a region stay tightly clustered, indicating that the aggregated number of visits is fairly homogeneous within regions, as are the aggregated prescription totals. You also notice that the circle sizes differ greatly from region to region, but are fairly uniform within regions, meaning that the number of physicians assigned to sales reps may differ quite radically for different regions, but that, within regions, the allocation is fairly consistent. It does appear that a typical sales rep in Northern England has more physicians than does a typical sales rep in Midlands. This plot provides very useful information and could help you in thinking about ways to realign your sales force in 2015.

Now, you go back to your idea of normalizing by Number of Physicians. You already created the two columns that you will need, Prescriptions YTD per Physician and Visits YTD per Physician. You complete the Bubble Plot dialog as before, but using the new variables, entering the first as Y and the second as X. From the red triangle menu, you select the options Trail Bubbles > All, Trial Lines > All, and Label > All. This generates Exhibit 8.27. The saved script is Bubble Plot 2.

Snapshot showing the Bubble Plot for Prescriptions YTD per Physician versus Visits YTD per Physician.

Exhibit 8.27 Bubble Plot for Prescriptions YTD per Physician versus Visits YTD per Physician, December 2014

You are struck by how similar the regions are in terms of Visits YTD per Physician. You realize that the norm for visits to physicians is one visit per month. But the data show this happening with almost uncanny regularity.

You wonder if, perhaps, sales reps make one visit per month to each practice, and then count this as a visit to all physicians at that practice even though they do not meet individually with all those physicians. In other words, what does it mean to “make a visit to a physician”? Does it mean that the sales rep talks with the physician face to face? Or that the rep talks with a secretary or technician? Or that the rep simply drops in and leaves a card? And, is it possible that the data are not quite representative of reality? You note that you need to discuss this with the reps when you next meet.

By selecting various bubbles, and holding the Control key to select more than one region, you are able to select Midlands and Northern England. Since you have enabled Trail Bubbles and Trail Lines, as you step through the months, the plot shows the bubbles for the preceding months. You step through the eight months, looking at these two regions. You see them drift further apart over time, with Midlands greatly exceeding Northern England in mean number of Prescriptions YTD per Physician. So, even accounting for number of physicians, Midlands is ahead. This is further confirmation that the promotion did enhance sales.

You deselect these two regions, and then deselect Label > All and select the Split All option to see the individual sales rep behavior. Once again, you animate the plot. You note that there is a little less regularity in the Visits YTD per Physician for individual sales reps, with Scotland showing the most variability. Again, the question of what the sales reps are recording comes to mind. Are the reps in Scotland unknowingly using different criteria?

You would like to show these results in a form that the reps can understand, but you realize that, typically, you do not carry your laptop in the field. Given the reps only have access to the usual office software and a browser, you decide to export Exhibit 8.27 as dynamic HTML, to produce Exhibit 8.28.

Snapshot showing the Bubble Plot for Prescriptions YTD per Physician versus Visits YTD per Physician, as Interactive HTML.

Exhibit 8.28 Bubble Plot for Prescriptions YTD per Physician versus Visits YTD per Physician, December 2014 as Interactive HTML

SUMMARY

You now gather your findings and recommendations from your investigations in relation to the two tasks you were given by your VP of sales and marketing:

  • Verify the claims of the Marketing Group that a limited promotion they ran for PharmaInc's major product from May through December of 2014 was very successful.
  • Determine if there are regional differences in the sales performance of PharmaInc's major product.

As for the first task, given the available evidence, the 2014 promotional activity in Midlands, Southern England, and Northern Ireland did have a positive impact, just as the Marketing Group had claimed. Physicians in these regions averaged 82.3, 76.9, and 72.9 prescriptions, respectively, over the eight-month period. Meanwhile, physicians in the best nonpromotional region, Greater London, averaged 67.3 prescriptions. In the worst nonpromotional regions, Wales and Northern England, physicians averaged 42.6 and 42.8 prescriptions over that eight-month period. Exhibit 8.16 (oneway plot) shows the picture by region with the data summarized by physician and Exhibit 8.26 (bubble plot) shows the picture summarized by sales rep.

You also learn that more visits by sales reps generally lead to more prescriptions for PharmaInc's major product. This effect is especially true when reps leave behind promotional sample kits.

Relating to the second task, your initial analysis of the data shows that some sales reps are in contact with many physicians, whereas others are in contact with far fewer physicians, and that this depends on region. Even though you do not have to account for sales reps' time, you would like to understand better why this difference occurs, because you might then be able to increase the yield of your sales force as a whole in 2015. But, more to the point, you have learned that there are large differences in sales between the different regions. Rather than simply attribute this to a failure of your sales force in some regions, you are prepared to at least consider that this may also be due to regional differences in the physician and population demographics.

One issue that your data analysis has surfaced is the extreme regularity of monthly visits over almost all regions. When data are this regular, you know that there is probably some underlying reason. You need to convey to the sales reps that there will not be negative consequences if they fail to adhere to a one-visit-per-month doctrine, so long as they show good performance overall.

As a first step, you decide to initiate interviews with some selected sales reps so that you can learn something about this issue, as well as why there is such large variation in number of physicians per sales rep between regions. You expect to need to realign your sales force for 2015 and would like to do this in a logical fashion. You also need to enlist the sales reps' help in more clearly defining measures such as “a visit to a physician” that will help to clarify the meaning of the data by which their performance will be measured.

CONCLUSION

Using visual techniques, you have been able to construct a good understanding of the operation of your sales force and to quickly answer the two questions posed by your VP of sales and marketing in a complete and compelling manner. Your work in the “Validating and Scoping the Data” section was largely directed toward your personal agenda, namely to obtain better knowledge of how your sales force is currently deployed. In the sections “Investigating Promotional Activity” and “A Deeper Understanding of Regional Differences,” you addressed the business questions posed by your executive. The analysis that led to the answers in these two sections was clean, quick, and compelling, and much more efficient than anything you could have done without JMP.

This case study is a good example of where force-fitting important business questions into a project framework or the traditional Six Sigma DMAIC methodology would be ill advised and probably counterproductive.

The analysis you have done in this scenario raises two further important, more general, points:

  1. If anything, the data used are probably too simplistic. Real-world data in such a setting are likely to be more, rather than less, complex and there are also likely to be more records. Questions relating to data quality, the distribution of missing values, and the balance of the data itself (which levels of descriptive variables occur jointly) become crucial. As this case study seeks to show, understanding these issues is a necessary step in any kind of sensible analysis. Once these patterns of variation are understood, the analysis itself is often relatively straightforward given a typical set of business objectives.
  2. A key requirement of software used in an analysis such as this one is that it should allow you to easily construct insights from real-world, unruly data. Answering one question typically leads to several other questions, and static reports with fixed hierarchies do not lend themselves to this pattern of use. The process of discovery is necessarily personal, so software should aspire to provide a wide repertoire of techniques that allow you to visualize patterns of variation in your data in an unfettered way. Your analysis took one direction, but there are many others that might have been taken.

NOTE

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset