images

Custom Data Integration

This chapter was contributed by Corey L. Koberg, Founder and Senior Partner at Cardinal Path, where he leads the data science, analysis, and product development teams. He is a well-known speaker, having keynoted and led sessions on advertising, analytics, and optimization at conferences and events across the globe. He has consulted for dozens of Fortune 500 companies, such as Google, Chevron, NBC, Papa John's, National Geographic, Time Warner, Universal Music, DeVry University, and others, to improve the effectiveness of their digital presence through results-oriented, data-driven optimization.

During Google Analytics' first few years there was really only one way to get data into it: the JavaScript tracking code that was embedded into website HTML. Additional data could be included with the data sent back to Google Analytics, such as a custom variable, campaign information, or transaction results. With time, Google Analytics has become a much more versatile platform that accepts data in four primary ways:

  • JavaScript tracking code for websites
  • Mobile SDK for apps
  • Measurement Protocol for hits from anywhere (such as a point of sale)
  • Data Import for enhancing data (CSV uploads)

Methods to Import Data into Google Analytics

The first three of the preceding methods generate new data. For example, a new user to a website initiates a session that sends data to Google Analytics via a JavaScript tag on the page. The Universal Analytics tracking script is extremely versatile and comprehensive. However, the Mobile App SDK and the Measurement Protocol are both important additions to an analytics toolkit as businesses try to capture all customer touch points—websites, mobile apps, and even offline elements such as point-of-sale terminals.

But despite sending data to Google Analytics, both the JavaScript and the SDKs are not what you might consider a data import; they are the most common ways to track online behavior and send it back to Google Analytics. In this section, you learn more about two ways to import additional data into Google Analytics: the Measurement Protocol and the Data Import. Following this overview, you'll see a few common examples of exactly how to conduct the import.

The Measurement Protocol

When you think about “importing” data into Google Analytics that you want to analyze, you are likely thinking about data related to events and interactions that you have stored elsewhere (perhaps in a spreadsheet or database) and want to upload to Google Analytics, even if those interactions happened in the past.

The Measurement Protocol is a completely flexible method for sending that data to Google Analytics directly, without the need to use JavaScript, SDK, or any other collection mechanism. All you have to do is format the data in the proper way (a simple HTTP GET or POST request), and Google will accept and process the data. Google will even join that data to existing sessions and backdate it if they receive it soon enough (within four hours of the event).

While the flexibility of the Measurement Protocol means that you can now theoretically send data to Google Analytics from theme park turnstiles, airplane transponders, and other “sensor data,” the most common use cases still involve digital marketing. For example, if a customer clicks on an ad and receives a quote online, but completes the purchase in a call center, that revenue (and attribution credit for the ad) would normally go untracked. However, if the call center software was to send the conversion data back to Google Analytics, the revenue would be tracked and potentially even connected all the way back to the original ad that drove the purchase.

A similar scenario unfolds when free trials automatically roll over into paid accounts if not canceled by a certain date. Normally that “conversion” would happen behind the scenes in the back-office order processing software and Google Analytics would never be able to distinguish between which visitors (and which marketing campaigns) canceled their trials and which ones went on to convert into customers. Using the Measurement Protocol, that software could process a report each night that sends the conversion data back to Google Analytics identifying which of the trials converted, as well as any auto renewals that may take place.

Point-of-sale systems that process in-store pickup of online orders, coupon redemptions, and repeat purchases where the original purchase was online but follow-ups occurred offline are all common use cases that show the versatility of the Measurement Protocol. To learn more about the exact format required by the Measurement Protocol, see the Developer's Guide at http://goo.gl/5VfF9k.

NOTE See Chapter 8, “User Data Integration,” for a detailed example of how to use the Measurement Protocol to integrate customer usage behavior data across devices into Google Analytics.

Data Import

While the Measurement Protocol contains entirely new interaction hits that may be joined with existing sessions or start entirely new ones, some of the data you'll import to Google Analytics is specifically designed to supplement existing hits and simply provide more context.

For example, a news publisher might want to classify and analyze the content on the site by the page author and category in order to determine the most popular authors and topics. That data can be uploaded to Google Analytics via a spreadsheet and associated with those pages using the page URL as the key that links the existing data with the new, as shown in Figure 7-1. This is known as “widening” the existing dimensions to include new data fields, known as dimensions, that are now associated with the original data. That's why a previous version of this feature was called “dimension widening.”

images

Figure 7-1: Widening the page URL dimension using Data Import

Advantages of Data Import

Here are some of the most prominent advantages of using Data Import:

  • No code to write, maintain, or publish: A site owner could potentially include the author and category data in the previous example as Custom Dimensions sent back to Google Analytics included with the initial pageview tracking. However, this would require the tracking code on the site to be modified, which in many organizations can be a long and arduous process.
  • Data may not be available before the pageview hit is sent: In order to send the data to Google Analytics with the pageview, the information must be available at the time the pageview hit is sent, which is usually immediately upon page load. Oftentimes, the kinds of data you'll want to supplement with is either not available to the web server (back-office and CRM data) or simply hasn't happened yet (delayed conversions and refunds).
  • Data may be confidential: Anything sent with the hit itself can be seen in plain text, but some information that is useful for analysis may be too confidential to display publicly. For example, perhaps you are analyzing results of a new page layout and want to understand not only the revenue and quantity of items sold, but also the margin and profitability of each item. While gross revenue is obvious to the purchaser (they know what they paid!), the profit margin is likely a confidential value and is better uploaded securely after the fact.
  • Can accommodate larger sizes: If the data is uploaded via Data Import rather than as part of the pageview, it avoids the size limitations that exist regarding what can be sent via the client's browser and also reduces the burden on their Internet connection.

Data Types and Use Cases

There are lots of business uses for Data Import, but they tend to fall into a few categories, some of which have specific formats and processes for uploading to Google. The following sections explain several examples.

Campaign Data Normally with Google Analytics, you use campaign tagging to include information about the source, medium, campaign, and even ad creative and keywords by individual query string parameters, such as this:

http://cardinalpath.com/?utm_source=newsletter&utm_medium=email&utm_
content=button2&utm_campaign=sitelaunch

But that long URL may pass information you don't want to reveal, requires Google-specific formatting, and is limited to a few dimension types. Many systems, such as marketing automation and ad networks, utilize a single campaign ID, which you can now expand to the full dataset via Data Import. An example of the previous URL formatted to utilize this single parameter campaign tracking may look like this:

http://cardinalpath.com/?utm_id=4567

NOTE You can find a step-by-step guide showing how to import campaign data to your Property at http://goo.gl/yYneir.

CRM Data Many companies have a wealth of customer data that can be highly illuminating when merged with online behavior data, such as what you would get from Google Analytics.

For example, a newspaper may want to indicate whether users are subscribers, if they subscribe to print and digital, how long they have been a subscriber, and so on. Another company may have already classified its customers into “high-value prospects” and want to confidentially add those fields to the customer record to enable that group to be analyzed. Lifetime value (LTV) calculations are often kept in CRMs and can be useful to import and attach to user records for analysis. Perhaps your company divides geographies into sales regions, such as “Southern Europe,” and wants to do analysis on performance for each region. B2B companies may classify visitors by their organization size (SMB versus enterprise) so they can understand how each consumes their site content.

The examples are endless and there is often a high business value for integrating this data. You learn more about this use case in Chapter 8, which deals specifically with user data integration.

Be sure to avoid pulling in names, phone numbers, emails, social security numbers, or any other personally identifiable information (PII).

NOTE You can find a step-by-step guide showing how to import user data to your Property at http://goo.gl/Glzg7c.

URLs and Query String Parameters The initial example in this chapter pointed out that pages (identified by their unique URLs) may have other metadata—such as author, category, and section—that is useful for analysis. But if your site has a clear URL structure with logical subdirectories or utilizes query string variables to hold information that can be used for analysis, you can parse that data from the URL and include it as a dimension along with the pageview. You will find a step-by-step guide on how to perform this integration in the following examples.

Transactions When a transaction takes place, there are many opportunities to supplement this data. Previously, we discussed adding margin data to products or transactions, but you could also include additional category information, suppliers, and cart options such as in-store pickup. You can also more accurately measure revenue by tracking refunds associated with transactions to get a true accounting of site, audience, product, and campaign performance.

NOTE You can find a step-by-step guide showing how to import refund data to your Property at http://goo.gl/1whJbe.

Cost Data for Marketing Campaigns Google AdWords, Doubleclick, AdSense, and other Google properties easily integrate critical ad performance data, such as cost, clicks, and impressions, into Google Analytics. Utilizing the Cost Data Import feature, you can now upload this critical data from non-Google sources. You learn more about this use case in Chapter 9, which deals specifically with marketing campaign data integration.

Custom Data The previous sections have outlined some common examples, but you are not limited to them. Google allows you to widen dozens of the data dimensions flowing into Google Analytics automatically, as well as the Custom Dimensions that you add to your site.

NOTE You can find a step-by-step guide showing how to import Custom Data to your Property at http://goo.gl/w7nxXj.

Real-World Examples

While the process of importing data is relatively straightforward, there are a few best practices and pitfalls you'll want to avoid. This section walks you through a few examples to show you exactly how to proceed.

Importing Content Data

The example shown previously in Figure 7-1 is useful for illustration purposes because it's easy to understand how the key joins those two items. But in the real world, this rarely works out so easily. Even if you did have only nice, clean URLs, you would have to upload a dataset for each new page published on the site. Most sites that are interested in analyzing author and content categories are updated frequently and would have to manually upload a new spreadsheet each time an article is published, which is often untenable.

If the only option is to match each specific URL on a 1:1 basis with the imported data, the Data Import feature available through the Management API can automate the upload process. But even then, you'll want to be careful that a single page isn't represented by multiple URLs. For example, the content on the page:

/blog/bigquery.php

May also be returned by URLs such as:

/blog/bigquery.php?char=utf-8
/blog/bigquery.php?char=utf-8&lang=EN
/blog/bigquery.php?affiliate=cj345&emailID=7893

In that case, you should utilize Regular Expression (RegEx) pattern matching to look specifically for the part of the URL that indicates the information and avoid the often useless info at the end. In the previous example, the RegEx pattern you would use is as follows:

/blog/([^/]+).php

This pattern matches just the initial part of the URL and ignores the trailing parameters. A strategy using automated API uploads and RegEx pattern checking would be far more efficient and robust than manual imports.

But there may be a simpler way, as many Content Management Systems include data in the query string parameters of the URL. For example, if the previous URLs were written as:

/blog/bigquery.php?char=utf-8&lang=EN&author=CKoberg&cat=bigdata

You could ignore the format of the URL, avoid any regular expression pattern matching, and simply tell Google Analytics to populate the dimensions from the parameters author and cat. You can learn how to do this in the following step-by-step guide.

Step 1: Create the Custom Dimension Data Fields that Will Hold the New Data

In this case, you might call them authorlD and catlD. This is done via the Admin section on Google Analytics by clicking on Custom Definitions and then Custom Dimensions. Check out this guide from the Help Center: http://goo.gl/dLHx3a.

Step 2: Choose the Dataset Type

From the Admin section, under the Property menu, select Data Import, then + New Data Set, and then for this example choose Content Data, as shown in Figure 7-2.

images

Figure 7-2: Choosing the dataset type for Data Import

Step 3: Provide the Dataset Details

Choose an easily recognizable and logical name, such as AuthorlD, and select the views you want to associate with the data import, as shown in Figure 7-3. Note that you need to import AuthorID and CatID separately.

Step 4: Create the Schema

This is a critical step where you tell Google Analytics which shared key will join the data (in this case, the Page dimension, which by default is set to the Page URL) and which Custom Dimension fields to map the new data into. In this example, you will specify not only that it is the URL, but specifically the query string parameter (see Figure 7-4). Click the Query Refinement link and enter author as the variable that will hold the value you're interested in matching. Next, select the Custom Dimension AuthorID as the field you want to populate with the data.

In this case, you aren't overwriting any data that would be part of the standard pageview, so the overwrite hit data option doesn't apply. Finally, select Save.

images

Figure 7-3: Naming the dataset and choosing availability across views

images

Figure 7-4: Creating the dataset schema to import custom data

Step 5: Download the Schema

After saving the previous step, Google will prepare a CSV file that you will need to populate with your data to import. Click the Get Schema button (see the bottom of Figure 7-4) to download the specially formatted file. The file will be an empty CSV that simply has the key to match and Custom Dimensions that hold the imported data as the headings. In this case, the dimensions would be ga:pagePath and ga:dimension1 (the number at the end depends on which Custom Dimension slot you are using).

Step 6: Populate the CSV with the Data to Import

In this example, you need to specify the author names that correspond to each value, as shown in Figure 7-5. Note that although this example uses author names in both cases, it doesn't import the data from the query string or URL; it's simply using the data as a match to indicate which data should be populated. See the sample populated CSV in Figure 7-5.

images

Figure 7-5: Sample populated CSV to upload

Step 7: Upload the Data

Now that Google Analytics understands how to receive and process the data and you've populated the CSV file, it's time to upload the data. Click on Manage Uploads from the Data Import section (Figure 7-6) on the row that matches your new dataset and select your CSV file for upload. A progress bar will indicate upload status and whether your data file was imported successfully. Note that you may have to refresh the page.

images

Figure 7-6: Uploading the data to Google Analytics

Step 8: Create a Custom Report

That's it! Google Analytics now has your data and will join it with the pageview data. But since the data you uploaded is not part of the default Google Analytics dataset, there are no built-in reports that will display the data. Therefore, you will need to create a custom report by clicking Customization from the top navigation bar and then + New Custom Report, and then selecting your new Custom Dimensions along with any metrics you would like to analyze their performance by.

Importing Product Profit Margin Data

You learned earlier in this section that understanding profit alongside quantity and revenue would be very useful, but likely confidential, so it makes a great use case for direct Data Import. In this case, you follow the same steps, with some slight variations. When you create the Custom Dimension to store the data, you select a Product Scope dimension instead, as shown in 7-7.

images

Figure 7-7: Creating a Product Scope custom dimension

The dataset schema will be similar, except you need to choose Product SKU as the key and the newly created Profit Margin dimension as the container, as shown in Figure 7-8.

images

Figure 7-8: Defining the dataset schema for profit margin

Following this step you should upload a file containing your product SKUs and their profit margins. Once this step is completed, you can create Custom Reports, as described in the previous example.

Importing Refund Data

Refunds happen. To get your metrics as accurate as possible, you should upload the refunds into the system if and when they do happen. The Enhanced Ecommerce feature of Universal Analytics allows you to import Refund Data. The process is the same as before, but there is a specific dataset type for refunds in Google Analytics. In the case of refunds, the system limits the choices to a predefined set of commerce dimensions and metrics rather than your own custom ones.

Once you choose Refund Data from the dataset types options, you can select one or more data elements as part of your upload, choosing from Product SKU, Product Price, Quantity Refunded, and Revenue (as shown in Figure 7-9).

images

Figure 7-9: Uploading refund data into Google Analytics

Download the schema and upload the data as described in the previous content data example.

Limitations and Best Practices

There are numerous advantages to using Data Import to improve the completeness and accuracy of your data. However, there are some things to keep in mind as you architect your solution:

  • You may upload 50 total datasets per property.
  • Data is limited to 50 uploads per day per property and 1GB max upload file size.
  • Data will be joined only for new hits as they are processed, except for refund and cost data that's treated as special cases. Google Analytics Premium users who select Query Time Processing have an additional option (see the sidebar at the end of this section).
  • Deleting a dataset means that future data won't be joined, not that previously joined data will be stripped from reports.
  • Filters will affect your imported data, so consider if and how they will affect your data. Also note that keys (such as URL) will be joined before any filter is applied.
  • Consider utilizing both Custom Dimensions and Custom Metrics.
  • You must have edit permissions to a Google Analytics property in order to upload data to it.
  • If you import an empty string, it will be interpreted as “not set” in the reports.
  • Real-time reports are not yet supported.
  • Not every dimension or metric is available. Specifically, custom variables, time-based dimensions (hour/minute/second), and geo-dimensions (country, city) are not available for widening.
  • No personally identifiable information (PII) may be uploaded under any circumstances.
  • Uploaded data needs to be processed before it can show up in reports. Once processing is complete, it may take up to 24 hours before the imported data will begin to be applied to incoming hit data.

QUERY TIME VERSUS PROCESSING TIME

Normally when a dataset is uploaded for import, the next time Google sees a hit that matches the key, that data is joined for that particular hit. Google Analytics Premium users have another desirable option called Query Time Processing, which immediately joins the matching data when a report requests that data. This means that historical data contained in the report is also joined, not just the new hits going forward. Also, data joined using Query Time will be reversed if the dataset is deleted.

At this time, Query Time Processing is only available to the campaign data, content data, and product data hit types and cannot be used with the following:

  • Remarketing audience exports
  • Datasets with date-based keys (for dimensions that change over time)
  • Unified segments
  • Multi-channel funnels
  • Cohort reporting
  • Real-time reports

Summary

In this chapter you learned about the different ways to import data into Google Analytics and how they can be used. However, most of the chapter focused on Data Import, as it has four important advantages over other methods:

  • There is no code to write, maintain, or publish.
  • Data may not be available before the pageview hit is sent.
  • Data may be confidential.
  • It can accommodate larger sizes.

Following a description of all types of data that can be imported through the Data Import feature, you learned about a few real-world use cases where you might want to implement them. The following examples were described in-depth:

  1. Importing content data
  2. Importing product profit margin data
  3. Importing refund data

Finally, you learned about some of the best practices and limitations when using Data Import to improve the completeness and accuracy of your data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset