6

Big Data in Your Organization

A Big Data study from 2013 by Tata Consulting Services showed that 47 percent of the 1217 firms surveyed had not yet undertaken a Big Data initiative.1 A similar research project by the SAS Institute in 2013 revealed that 21 percent of the 339 companies questioned did not know enough about Big Data and 15 percent of the organizations did not understand the benefits of Big Data.2 Several other surveys show more or less the same picture. Many organizations have no idea what Big Data is, even though all those brontobytes heading our way will change the way organizations operate and are managed. Big Data offers a lot of opportunities for organizations. An IBM 2010 Global CFO Study indicated that companies leveraging Big Data financially outperform their competitors by 20 percent or more, and McKinsey reported a potential increase of 60 percent in operating margins with Big Data.3,4

Although many organizations do not yet understand Big Data, it is pouring into all organizations from almost every angle imaginable. Every day, small and medium-sized enterprises can easily collect terabytes of data, while startups can effortlessly reach gigabytes and large multinationals can even generate petabytes without any problem. However, simply having massive amounts of data is not enough to become an information-centric organization that stays ahead of its competition.

Note that I deliberately do not call these organizations data-driven organizations, but rather information-centric. The difference might seem subtle, but in fact the two terms are very different. Data, after all, is useless without the right tools at hand and the right culture in place. Only when data is transformed into information can it become valuable for an organization. Information-centric companies have a culture that relies on data that is stored, analyzed, and visualized, and in which the results form an integral part of the company's strategic decision making.

According to Paul Kent, Global vice president of Big Data at SAS, 37 percent of managers surveyed still base their decision on gut feelings, instead of data analytics.5 A precondition for an information-centric company is therefore a cultural shift that allows data and the tools required to analyze and visualize it to be accessible to large groups within the organization, so that the decision will not be based on gut feelings or raw estimates.

Moving from a culture in which gut-feeling decisions or raw estimates are acceptable to a culture that truly incorporates Big Data is challenging. So, what is required to move from a product-centric organization to an information-centric organization, one in which decisions are based on hard data and extensive analyses? Where do you start? How do you convince your CEO, and which questions do you need to ask?

Many organizations have already successfully adjusted to the new Big Data reality. At Walmart, local stores are allowed to adjust their product assortment to match what local people are saying on social networks.6 The Dutch SatNav company TomTom takes Big Data to the extreme by capturing 5.5 billion new global position measurements daily to improve its products.

TOMTOM AND BIG DATA

TomTom is a Dutch manufacturer of automotive navigation systems that was founded in 1991.7 Apart from stand-alone units for cars, the company also develops mobile navigation apps and business solutions. On June 11, 2012, at a product launch of iOS 6, it was announced that TomTom would be the main mapping provider for Apple's new Maps app.8,9 In 2012, TomTom's revenue was over $ 1 billion, and the company is massively into Big Data.10

On an average day, TomTom receives approximately 5.5 billion anonymous global positioning measurements from all its products. These include TomTom Home, the mobile app, In-Dash Navigation, Business Solutions, and Connected PND, which are used by over 60 million customers worldwide.1115

In addition, TomTom has developed the community MapShare, which makes it possible for users of TomTom products to report changes on the roads as they are driving. These can be different speed limits, new street names, blocked roads, and new and altered traffic directions.16 The collection of this crowd sourced data began in 2005.

TomTom does not only ask users to report traffic information, it also collects trip information from TomTom SatNav. Every time users docked their SatNav, their anonymized information was sent to TomTom. This allowed the company to collect 5,000 billion (5,000,000,000,000) items of trip data. Big Data at TomTom is seriously big business.

Map Speeds Versus Actual Speeds

All the data that TomTom collects is used to provide an actual up-to-date picture of local traffic. First, because of the massive amounts of data points received each day, TomTom can show the differences between map speed and actual speed, that is, the maximum speed allowed on a certain road versus the actual speed of people driving on that road. Quite often, the two are different.

This same information is used to determine how long it will take to drive from point A to point B. This information is very relevant, especially in The Netherlands where traffic jams are also monitored by the duration of a delay in addition to the length of a traffic jam. TomTom can measure the duration of a traffic jam by monitoring the GPS speed of drivers on the road. Of course, this also requires massive amounts of data.

Selling Location Data

The collected data was not always used in favor of drivers. In The Netherlands, TomTom sold the speed data captured by its SatNav's to the local police for a short period of time.17 The police used this information to determine the location and frequency of speed traps, but the policy ended after complaints by Dutch citizens.

Open Map Data

In 2012, TomTom announced its latest endeavor in using crowd data to improve its products.18 The company made its vast mapping data available to app developers, thereby providing an alternative to companies that now use maps from, for example, Google. The software is free for a trail period, but then a fee will be charged.

Taking Big Data to the Extreme

TomTom has been capturing incredibly vast amounts of data for a long time now. All this data is used to improve in real-time the driving experiences of its customers.

KEY CHARACTERISTICS OF INFORMATION-CENTRIC ORGANIZATIONS

Information-centric organizations know that data alone is useless. It becomes valuable through careful analysis by the right algorithms to retrieve the information necessary to make the correct business decisions. Companies with a successful Big Data strategy have an information-centric culture, in which all employees are fully aware that well-analyzed and visualized information results in better decisions. They have made the information available to anyone (of course, which data is made accessible depends on the job title of the employee), everywhere and at any moment. A good example is US Xpress, where all truckers have all information needed at their fingertips via iPads while on the road. The entire organization revolves around the use of information to make the correct (business) decisions.

Information-centric organizations also stay ahead of the pack through innovation, which allows them to constantly reinvent themself. These organizations are doing everything to lead the market. Because they are innovators and early adopters of new technologies, they have already implemented a Big Data strategy. Big Data timing is important, as in 5 to 10 years, it will be a commodity. It will no longer be called Big Data; it will just be data again.

Another strong characteristic of Big Data organizations is that they collect information about absolutely everything: social media data, log data, sensor data, and so on. So, store it now, and decide later if you need it. You can always decide to leave data out of your analysis, but you cannot analyze data you don't have. The price of storage should not be a barrier, as with Hadoop you can use commodity hardware to store unstructured and semistructured data in a raw format. Compressing data can even save a lot of storage. Store the data in a centralized location to prevent a balkanized IT infrastructure. Data that resides in silos across the organization is useless, as it cannot be combined easily in real time with other datasets. It is difficult to access and does not give the organization a helicopter overview of what sort of data is available.

Obviously, information-centric organizations collect a lot of data—and many different types. Apart from common data streams, such as social media, CRM, websites, and logs, these organizations ensure that many of the products they offer can also collect data. For online products, this is easier to achieve, but more and more offline products can collect massive amounts of data as well. Automotive companies, for example, can include hundreds of sensors in their cars to monitor how they are doing and plan maintenance checks before the car breaks down. And, then there is John Deere, which equips its tractors with intelligent sensors to monitor the machine's operations, as well as ground and crop constituents.19 The more data that is collected, the better your Big Data strategy works. It does require out-of-the box thinking to find data in new products. As discussed in Chapter 2, theoretically it is possible to even turn a cup of coffee into data. So, think out of the box when finding data within your organization.

Analyzing data can be a difficult task when you have terabytes of different types of data. Although many Big Data startups claim that their products do not require an expensive IT department (Big Data Scientists are scarce and, thus, expensive), organizations implementing a Big Data strategy should at least train their IT staff to be able to deal with Big Data and perform at minimum basic analysis. Larger organization, of course, should focus on hiring Big Data employees. LinkedIn for example, has over 100 data scientists on its payroll. Similarly, many of the 10,000 in-sourced IT staff at General Motors are capable of performing Big Data analyses. A well-trained data scientist can help you figure out the right questions you need to ask to get the right answers to take advantage of all the data available. Be sure to treat them well, because they are scarce and in high demand.

SOME GENERIC BIG DATA USES

Big Data has the potential to benefit organizations in any industry in any location across the globe. Big Data is much more than just a lot of data, and combining different data sets will provide organizations with real insights that can be used in decision-making and to improve the financial position of the company. Of course, for each industry and each individual type of organization, the possible uses will differ. There are however, a few generic Big Data uses that show the possibilities it has for your organization. In Chapter 7, I will discuss different industries in more detail, including some examples.

1.  Truly get to know your customers, all of them in real time.

In the past, we used focus groups and questionnaires to identify customers. This information was outdated the moment the results came in. With Big Data, this is no longer true. Big Data allows companies to completely map the DNA of their customers. Knowing your customer well is the key to selling to them effectively, but implement these strategies carefully so as not to cause privacy issues. A famous example of this is when Target found out about a teenager's pregnancy before her father even knew. The daughter received advertising for pregnancy products, which outraged the father. Later, they learned that Target knew this private information by analyzing which products the 16-year-old teenager bought at the local Target store.20

If companies ensure that the privacy of customers is not threatened, Big Data can deliver personalized insights. Using interconnected social media data, mobile data, web and other Big Data analytics, it is possible to identify each customer, as well as what he or she wants and when, all in real time. Big Data enables a complete 360-degree view of all your customers, even if you have millions of them.

The benefits of such knowledge are that you can tailor recommendations or advertising to individual needs. Amazon has mastered this to perfection, as its recommendation engine determines what products a user has bought in the past, which items users have in their virtual shopping carts, which items they've rated and how, and what other customers with similar profiles have viewed and purchased.21 Amazon's algorithm gives each customer a different webpage. And, this strategy pays off. The company reported a 27 percent sales growth to $13.18 billion during its third fiscal quarter in 2012, up from $9.6 billion during the same time in 2011.22

2.  Co-create, improve, and innovate your products in real time.

In the past, consumer panels discussed what they thought, what they wanted, and why they wanted it. Companies also used panels to show consumers new products and find out what they thought of them. If they did not like a product, companies could potentially have to start all over again. With Big Data, such panels belong to the past.

Big Data analytics can help organizations gain a better understanding of what customers think of their products or services. What people say about a product on social media and blogs can give more information than a traditional questionnaire. If it is measured in real time, companies can act immediately. Not only can the reaction to products be measured, but also how that reaction differs among different demographic groups or people in different geographical locations or people expressing views at different times.

Big Data also allows companies to run thousands of real-time simulations to test a new or improved product virtually. By combining scalable computing power with simulation algorithms, thousands of different variations can run and be tested simultaneously. The simulation program can combine all the minor improvement tweaks into one product.

3.  Determine how much risk your organization faces.

Determining risk is an important aspect of today's business. To define the potential risk of a customer or supplier, a detailed profile is created and placed in specific categories, each with its own risk levels. Currently, this process is often too broad and vague to be helpful. Often, a customer or supplier is placed in a wrong category and thereby receives an incorrect risk profile. A risk profile that is too high may not be that harmful, although income will be lost, but a risk profile that is too low could seriously damage an organization. With Big Data, it is possible to determine the proper risk category for each individual customer or supplier based on all of their data from the past and present in real time.

Especially in the insurance business, predictive analyses are used to determine how much money a customer will cost a company in the future. Insurers want to identify the right customer for the right product at the right price and lowest risk in order to ensure reducing claim costs and fraud. Using Big Data techniques, such as pattern recognition, regression analysis, text analysis, social data aggregation, and sentiment analysis (via natural language processing or monitoring social media), a 360-degree view of a potential customer is formed. This complete and up-to-date representation of a customer can lower risk significantly. Such an analysis can, of course, also be used to determine the potential risk of a new or existing supplier. For many financial institutions, this is a top priority in the coming years.23

4.  Personalize your website and pricing in real time toward individual customers.

Companies have used split tests and A/B tests for some years now to define the best layout for their customers in real time. With Big Data, this process will change forever. Many different web metrics can be analyzed constantly and in real time, as well as combined for additional results. This will allow companies to have a fluid system, in which the look, feel, and layout changes reflect multiple influencing factors. It will be possible to give each visitor a website especially tailored to his or her wishes and needs at that exact moment. A returning customer might see a different webpage a week or month later if his or her personal needs changed.

Big Data can also affect prices. Yield management in ecommerce could potentially take on a whole new meaning. Orbitz experimented with this by showing Apple users more expensive hotels than PC users.24 Orbitz had learned that Mac users spend $20 to $30 more a night on hotels on average than PC users.

Algorithms make it possible to react to events in the market or actions of competitors in real time and adjust prices accordingly. Companies that started using Big Data to personalize online offering toward individual needs are enjoying an increase in sales and profits.

5.  Improve your service support for your customers.

With Big Data it is possible to monitor machines from a (great) distance and check on how they are performing. Using telematics, each part of a machine can be monitored in real time. Data will be sent to the manufacturer and stored for real-time analysis. Each vibration, noise, or error is detected automatically and, when the algorithm detects a deviation from the normal operation, service support can be alerted. The machine can even automatically schedule maintenance for a time when the machine is not in use. When the engineer comes to fix the machine, he or she will know exactly what to do because all the information is available. A good example is the construction company Nick Savko & Sons, Inc., a Columbus, Ohio, site-development company, that already uses telematics to improve the efficiency of its operations.25 It uses GPS devices to monitor data, such as idle time, cycle times, productivity, and more. These devices were installed on the equipment required to complete work on the SX Railroad's $175-million transshipping terminal. All information could be monitored from a distance; it allowed the company to complete the project a month ahead of schedule.

6.  Find new markets and new business opportunities by combining your own with public data.

Governments around the world are making their datasets public in an effort to stimulate innovation. In 2011, the European Union organized the Open Data Challenge,26 which was Europe's biggest open data competition to stimulate startups to devise innovative solutions using the massive amounts of open data generated by governments. For example, the Dutch government focuses actively on stimulating the reuse of open cultural datasets and organizes hackathons to come up with new solutions.27,28 By combining various datasets, companies can give new meanings to existing data and find new markets, target groups, or business opportunities.

Companies can also discover unmet customer desires. By doing pattern and/or regression analysis on your data, you might find needs and wishes of customers of which you were previously unaware. Big Data can also show companies where to market a product first or where to place a product. Vestas Wind Systems, a Danish energy company, used Big Data and analytics to select the best locations for wind turbines.29 With that information, the company was able to harvest the most energy at the lowest costs.

7.  Better understand your competitors and, more importantly, stay ahead of them.

What you can do for your own organization can also be done, more or less, for your competitors. Big Data will help organizations better understand their competition and where they stand relative to each other. It can provide a valuable head start. Using Big Data analytics, algorithms can determine if, for example, your competitor changes its pricing. You can then automatically adjust your prices as well. Organizations can also monitor the actions of the competition, such as following new products or promotions (and how the market responds to them). It can also track how the response changes over time. Remember that much that is done by you or your competitors is available as open data.

8.  Organize your company more effectively, and save money.

By analyzing all the data in your organization, you may find areas that can be improved and better organized. For example, the logistics industry in particular can become more efficient by using the new Big Data source available in the supply chain or during transportation. Electronic on-board recorders in trucks can tell how fast they are driving, where they are driving, and so on. Sensors and RFID tags in trailers and distribution help on-load and off-load trucks more efficiently. In addition, combining information about road, traffic, and weather with the locations of clients can save substantial time and money.

Of course, these generic uses are just a small indication of the massive possibilities of Big Data, but it shows that Big Data provides endless opportunities to add business value and help you stand out from your competition. Each organization has different needs and will require a specific Big Data approach.

NIKE'S OPTIMIZED SUPPLY CHAIN

Nike wanted to understand the footprint of all materials used in its products. The company had no information about the 57,000 different materials used because the materials came from vendors two-to-three steps removed.

To gain insights, the company collected and performed a lifecycle analysis on all data related to those materials. This information was placed into a central database that has helped its 600 designers make much smarter decisions. As a result, business, sustainability, quality, and cost were affected.

Then, Nike took a remarkable step. It decided to share that data with the rest of the industry, so that all companies could populate the database and use it to make better decisions. The objective is to build a “vendor index” that contains details concerning every supplier, including ratings and trustworthiness.30 The key is to turn Big Data into smart data at the point where the people in the supply chain who need to use it can actually have access to it. This is a great example of how opening up datasets can bring additional value to the entire supply chain.

For a company as large as Nike, it is, of course, difficult to move to an information-centric organization at once. As with any company that starts to build a Big Data strategy, Nike had to deal with different silos across the organization that contained valuable information. However, to use the data effectively, it had to be identified and aggregated. According to Hannah Jones, VP of Sustainable Business and Insights at Nike, “Innovation lurks in the shadows of silos.”31 Therefore, Nike started to remove the silos between the data and then identified the key performance indicator and key data needed from across the entire company. From that point forward, the company was able to create a platform that was truly valuable to Nike and its related companies.

BIG DATA AND RETURN ON INVESTMENT (ROI)

Understanding that Big Data offers a lot of value is an important starting point. However, as is common in almost any organization around the world, Big Data, like every new technology, needs to be sold to (senior) management in order to be executed. As with every other technology, management needs to understand what the return on the investment will be. Many organizations believe that a Big Data strategy requires a big investment with no guarantee on the usefulness of the results. As discussed in Chapter 4, this is not the case Although McKinsey reported that companies using Big Data can increase their operating margin by 60 percent and reduce expenditures by 8 percent, a lot of executives are reluctant to go ahead with Big Data because of the uncertainties.32

On the other hand, a 2013 study by Wikibon Consulting Group showed that an unfortunate 2 percent of the companies researched declared their Big Data deployments total failures, with no value achieved. The reasons for failure were, according to the research, a lack of skilled Big Data practitioners, immature technology, and a lack of a compelling business use. In order to ensure that this does not happen to your Big Data project, I will discuss a roadmap for a successful implementation of a Big Data strategy later in this chapter.33

As with any other new strategy; Big Data will affect the organization in unpredictable ways, and it will cost money to implement. How can an organization know what the return will be and how much budget should be allocated to developing Big Data? A 2012 research by the Columbia Business School and the New York American Marketing Association showed that a staggering 57 percent of marketing budgets are based on past budgets and not on the ROI of marketing efforts.34 Results from the past are no guarantee of success for the future, so why would you use previous budgets to determine the investment for next year? Further, past budgets do not exist for Big Data in most companies.

The main cost involved in establishing a Big Data strategy is the operation and overall management (or integration) of Big Data into the organization. Good Big Data Scientists are as expensive as they are rare, and managing thousands of nodes within a data grid requires great skill. Luckily, there are Big Data startups that have developed efficient algorithms and/or data-as-a-platform solutions that provide all that is necessary to start with Big Data. Most have a transparent pricing plan that gives some insights into the expected costs.

However, determining the ROI will remain difficult, especially since no existing IT ROI models can be used. Traditional IT ROI models are based on elements, such as speed per transaction, energy saving from data centers, or minimizing data center equipment.35 Big Data does not work on a speed-per-transaction basis. Often, what can be expected from the analysis remains completely unknown until it is done, thereby making traditional models useless. To develop a Big Data ROI, companies will have to start with the following steps.

  1. Understand why you want to use Big Data and for what reason. Then, set your desired Big Data objective. For example, your objective might be a better understanding of your customers that will allow you to give them a better experience. In fact, 86 percent of people are willing to pay more for a great customer experience with a brand.36 Selecting the right objective can therefore help determine the ROI.
  2. Select the tools needed to meet your objectives. Different Big Data startups offer different solutions at different prices. Open source tools are free, but most of the time they offer a commercial support plan to help you implement the tool. Depending on the tool selected, commodity hardware or a cloud date storage solution may need to be bought. This gives some insight into the costs involved.
  3. Start with a pilot project that will achieve your objectives on a smaller scale. The investments needed are quite often less problematic for CFOs to make available. The costs and returns will give far more valuable insight into the ROI than benchmarks or pre-Big-Data-era figures.

If implemented correctly, for sure, Big Data will bring value to your organization. Value can be in the form of faster time bringing a product to market because you know exactly what customers want and what their buying patterns are (perhaps before they even know). Big Data can also help you learn what your competition is doing or provide a better understanding of where the market is heading. Big Data can also offer efficient resource utilization. The amount of the ROI depends on the objectives set, the size of the organization, the (open source) tools selected, the hardware chosen, and the processes implemented. Many variables affect Big Data ROI. If chosen wisely, Big Data will result in a positive ROI and setting up a pilot project can give you the valuable insights needed.

BIG DATA ON THE BALANCE SHEET

At one time, IT was about saving money. In contrast, Big Data is about making money and creating value. As such, it could be placed on the company's balance sheet. A 2012 study by SAS showed that approximately 20 percent of large companies in the United Kingdom are already assigning financial value to their data on their balance sheets, so it is clear that more companies are beginning to understand and appreciate the value of data.37 So, how should organizations proceed with Big Data and how should they account for it on their balance sheets?

A balance sheet shows the financial status of an enterprise at a given point in time, that is, its assets and liabilities. Assets can be tangible (e.g., machines and hardware) or intangible (e.g., trademarks, copyrights, and algorithms). Liabilities are all legal debts or obligations that arise during the course of business operations, such as loans and accounts payable.

Intangible assets can include data as well. In fact, AT&T valued data such as customer lists and relationships in 2011 at $2.7 billion on its balance sheet.38 If such data can be included, why not the derived value of data, Big Data, or information as an asset?

However, data can become a liability if poorly managed or secured. A good example is the Dutch data security firm Diginotar, which went bankrupt because its data was not correctly secured.39 As a result, the company was hacked. It is an extreme example, but it shows that the security of Big Data is of major importance and one that can be a liability if not properly protected.

Let's assume Big Data is an asset for a company. In fact, often people say “information is our greatest asset.” David Rajan, Director of Technology for Oracle, found that 77 per cent of CIOs thought data should appear on the balance sheet as a key metric to define the value of a business.40

If that's the case, how do you value Big Data within an organization? Determining the cost of Big Data is at least easier: “simply” total how much it costs to create, update, store, retrieve, archive, and dispose of data. However, determining the ROI on Big Data projects is a lot more difficult, because of the many uncertainties.

The term Return on Data is a metric that defines the value of Big Data within an organization. Better usage of data could lead to more knowledge about your customer, which could lead to better products delivered in a shorter time span, which could lead to an increase in customer lifetime value (CLV). The difference between predicted CLV and current CLV could be the value of Big Data. When the purpose is to reduce fuel consumption within transport organizations, the return on data can be the amount of fuel saved annually less the costs required to install all the Big Data technology.

Formally putting Big Data on a company's balance sheet is a big decision that should be well founded. An advantage of placing Big Data on your balance sheet is that it would drive better control and governance of that data. Putting it on the balance sheet would therefore make people aware of the presence and value of data within an organization. This could lead to better use of it and could spur acceptance of Big Data as a strategy within organizations.

INTERNAL BIG DATA OPPORTUNITIES

Big Data affects all industries, but how does it affect different departments within an organization? Obviously, it has a positive affect on marketing, particularly in developing PR campaigns and Customer Relationship Management (CRM). But, Big Data also affects how the Human Resources department is managed. In addition, many organizations have legacy systems that contain a lot of usable and needed data. Not using that data would be like solving a puzzle with a few pieces missing; the picture is not complete.

Big Data and Its Impact on Customer Relationship Management

CRM is defined as a company's interaction with existing and future customers. It consists of all customer contact moments combined and is analyzed to provide better service. CRM has always involved collecting data, but most of it used to be structured data, such as contact information, most recent point of contact, products bought, and so on. With Big Data techniques, it is feasible to process, store, and analyze massive amounts of unstructured data not supplied directly by the customer and to use this to gain additional insights into customer behavior. With Big Data technologies, CRM can become a true revenue driver.

Previously, CRM systems usually failed to meet expectations, because they only managed the customer relationship.41 Big Data CRM goes a lot further; it is all about serving the customer. By using Big Data, companies should be able to successfully resolve the problems with most standard CRM programs. Note that:

  • A CRM system in place is only one part of the deal. The second is that employees need to be encouraged to use it effectively. As Big Data should be part of a complete cultural change within the organization, this problem should be solved.
  • The lack of clear (technical) objectives leads to a poor system and incorrect collection and storing of data. Big Data requires strict guidelines and processes for storing and using data to guarantee that it is all compatible.

A few steps are necessary when implementing a Big Data CRM program. Let's look at what serving the customer from a Big Data viewpoint actually means. Serving the customer effectively will yield better results, but it requires a lot more than just managing the relationship. Four important phases of Big Data CRM create a complete circle of customer service: managing the relationship, interacting with the customer, analyzing customer touch points, and truly getting to know the customer (see Figure 6-1).

  • Managing the customer with structured data, such as contact information, address, and latest contacts is one part. This is mainly an inside-out approach, in which the company “manages” the customer by sending messages and storing rudimentary information. This is done with predefined channels at set business hours. It is a company-defined process performed by designated departments without a lot of flexibility. Nonetheless, it is an important starting point.
  • Interacting with the customer using unstructured data, such as emails, tweets, Facebook posts, comments, and so on, is customer driven or an outside-in approach. This is a two-way communication process. The customer determines when to establish contact with the organization and expects to receive a quick response, even outside of traditional business hours. Everyone in the company should be included in such interactions.
  • Analyzing the customers’ activities with structured data, such as online visits, clicks, bounce rates, and so on, is a company-driven process, performed primarily by web analysts who complete an action when asked for an insight or deliver standardized reports to marketers on a regular basis. Big Data techniques will significantly change the role of the analyst, as he or she will be required to deliver results more proactively on a regular, preferably real-time, basis about more topics.
  • Knowing the customer is where it really becomes fascinating. Big Data Scientists use unstructured and structured data to build algorithms that can perform extensive analyses on the data, allowing the organization to know each customer individually on a real-time basis. They can deliver predictive models to develop/deliver products truly needed by the customer, resulting in an increased conversion rate and more satisfied customers.

Figure 6-1 Big Data CRM Phases

images

BIG DATA ENABLES THE INTERCONTINENTAL HOTEL GROUP TO BECOME A SERVICE-ORIENTED AND DATA-DRIVEN ORGANIZATION

With 4,602 hotels containing 675,982 rooms around the world in 2013, the InterContinental Hotel Group (IHG) is collecting massive amounts of data across its brands.42,43 For the last few years, IHG has embraced the use of advanced analytics. The company moved from a structured dataset with approximately 50 variables to a Big Data solution that analyzes both unstructured and structured data in real time. Today, the company uses up to 650 variables from different sources to gather information about its hotels, competitors, and guests, as well as other internal and external data.

IHG has become a truly data-driven organization. As a result, employees make better decisions. Not surprisingly, reservations reach 150 million room nights each year. Each reservation requires and creates a lot of data, including the channel of the reservation, time of booking, location, and information about the guests. Further, IHG uses the data from its loyalty program, Priority Club Rewards, to create a better experience for guests.44 Thus, the 71 million customers who are part of the world's largest hotel loyalty program, including alliances with 45 airlines, drive a lot of data.45

To streamline all data from the reservation system and the loyalty program, IHG decided to rebuild its reservation system a few years back.46 The new system operates in multiple languages and provides real-time access to the loyalty program. It uses a service-oriented architecture based on open standards to ensure easy integration with existing business processes and make it scalable for future needs. Today, IHG can personalize each web experience for each guest, ensuring high conversion rates and driving growth in its own booking channel.

To ensure satisfaction, IHG also surveys guests on a wide range of topics and combines the answers with industry performance and economic data to help benchmark how well the company is doing. This helps IHG understand all external factors that influence performance. The chain also collects a lot of general metadata about its hotels and staff, such as the number of hotel rooms, the age of a property, available amenities, location, and tenure and experience of the staff, but the company also amasses less expected data, such as local demand drivers and the density of nearby competition.

All data is analyzed in real time and used to evaluate execution of marketing plans on a daily basis. Because of all this available data, IHG is able to make better decisions, especially in times of economic downturn. As Manish Shah, Director of Marketing Strategy and Analytics for IHG in the Americas, explains “[we] have re-evaluated the marketing mix and adjusted spending to better suit the business’ needs in the current economy.”47,48 IHG uses several types of analyses to evaluate and sift through the massive amounts of datasets:

  • Operational analytics are used to provide clear reporting to stakeholders within the organization on a tactical and operational level so they can make better decisions.
  • Regression and correlation analyses discover patterns in the data, show where trends are going, and indicate which trends should be followed.
  • Predictive analytics are used to predict guests’ purchasing and stay behaviors in the different online and mobile channels.

IHG's main goal is to use Big Data to create a company-wide view of data that can be analyzed using robust analytics so that key insights are identified and can be used to make better decisions. Using all this data, IHG can take a deep dive into the analytics of its particular brand in any one country to see what really drives the top-line performance for that hotel. It can go even further and use the insights to segment hotels across the group into clusters with common interests, regardless of the brand. Within these clusters, predictive analytics can help determine what drives each cluster.

David Schmitt, Director of Performance Strategy and Planning, offers three lessons he has learned over the years, as IHG moved to a data-driven organization.49,50 They are:

  1. Create demand from the bottom-up with a small test case. Success stories will spread and make the entire organization smarter. Starting big is very difficult.
  2. Perfection is unachievable, so don't wait for the next best tool to arrive. It is impossible to create the perfect dataset or the perfect model. New models and tools constantly appear on the market that can result in improvements, but waiting for such models will result in inertia. Start with the data you have, and move on from there.
  3. You need to tell a story, as most managers do not care about technical details and the math behind the models; they just want it to work. Visualizing the story is very important, and IHG uses visualizations to tell a story rather than dive into the technical details.

Service-oriented, data-driven organizations, such as IHG, are great examples of how Big Data is helping industries become better and more efficient, while improving customer experiences. All organizations can learn from these examples and should work toward an information-centric organization as well.

When combined, all aspects of Big Data bring real value to an organization and take traditional CRM to a higher level. Organizations can use Big Data tools and techniques to process massive data streams that will flow into the organization. With the right algorithms, it is possible to perform the following analyses to deliver better service to the customers:

  • Pattern analysis uncovers new patterns within a dataset or a combination of datasets. These can be structured data (rows of sequential demographic data) or unstructured data (tweets about products).
  • Sentiment analysis discovers what customers are saying about your products/service. It can help address issues before they become too widespread and can help to improve service.
  • Marketing analysis analyzes customer interactions not only with your organization but with each other to optimize marketing decisions and messages.
  • Recommendation analysis gives the best recommendation to your customers to increase the conversion rate. The better the recommendation fits with the needs of the customer, the higher the conversion rate.
  • Influence analysis determines which of your customers has the most influence over other customers. Knowing who influences whom will give organizations a big advantage and help better serve your customers.

Collecting, processing, and storing the correct data are only part of a truly valuable Big Data CRM program. Analyzing with the right tools to achieve valuable insights is another. To really have a CRM program that increases customer satisfaction, the organization must become information-centric.

It is no longer acceptable not to know your customer. Consumers who contact organizations through whatever channel want to be recognized and served appropriately. Using Big Data technologies to collect, store, and analyze the necessary data will truly make your CRM manageable and valuable, thereby giving your organization a competitive advantage.

Big Data and Its Impact on Public Relations Campaigns

Public Relations (PR) entails managing the spread of information between an organization and the public. As all of that information is data, it can be analyzed and used to improve PR activities. Enter the world of Big Data and that information can be turned into valuable insights. PR is all about getting the stakeholders to maintain a certain point of view about the organization. With Big Data, it is possible to understand what that point of view is, how it changes over time, what it takes to improve it, and what the effects are of PR activities.

Big Data can affect PR in several ways. Big Data delivers insights about the stakeholders, including who they are, what their beliefs are, and where they are from. This information can be used to develop a message tailored to the characteristics of individual stakeholders. One of the stakeholders should be the influencer. Companies need to identify the influencers: Who are they, where are they from, and how can they be influenced positively? However, in the world of social media, any customer can ultimately become an influencer. Big Data determines the sentiment among customers and how it changes over time in relation to activities done. Companies that have a 360-degree overview of its customers will be able to profile customers correctly and reach them with the right message to influence their sentiments. In addition, Big Data can provide the information and numbers needed to develop a PR story that resonates with the stakeholders. Finally, Big Data can save your organization in times of crisis. Let's discuss each of these groups one by one.

The Influencers

Influencers are people who have a large network and are therefore able to spread a message widely and quickly. Influencers can help the organization if you know where to find them and how to approach them. As Michael Wu, Lithium's Principal Scientist of Analytics, described, six types of data can be used to find the influencers:51

  1. Involvement velocity data: How often someone shares information via the social networks, or the number of tweets or blog posts on a certain topic. This tells you how involved the customer is in your company.
  2. Social equity data: The number of followers someone has on a social network or the number of unique visitors to a blog.
  3. Citation data: How often do others cite someone in tweets, comments, or posts. The more someone is cited, the higher his or her credibility.
  4. Status data: Says something, but not everything, about someone's credibility.
  5. Self-proclaimed data: The data someone posts about himself or herself, for example, on LinkedIn. However, as the influencer posts the information, it is less reliable.
  6. Social graph data: The relationships of the influencers and how the network of the influencer is constructed.

With Big Data technologies, the above information can be aggregated, stored, analyzed, and visualized to find the list of top influencers for the organization. Once the influencers are identified, it is important to understand what they are saying and how they can be influenced positively.52

Customers

The top influencers can help spread your message. Knowledge about your customers will help you construct the right message for the right target group at the right moment. With Big Data, it is possible to understand who your customers are and how each customer group should be approached. When different sources of data, such as loyalty programs, CRM systems, reviews, and social media are connected with each other, a true 360-degree customer overview will be become apparent. This valuable information can help create a message that charms customers.

Tailored Stories with High Stickiness Factor

Understanding what information is best suited for which (potential) customers or influencers is only one aspect of the PR story. Great PR campaigns have two things in common: a high likeability and stickiness factor and the ability to go viral on the Internet. Although many people or companies claim to have the recipe for the perfect PR campaign, developing a viral PR campaign is difficult. Big Data however can make that process slightly easier. Combining public and social data of people who see the campaign can turn a PR campaign in a very personal, and potentially fun, message that is likely to be shared across social networks. Data technologies enable you to find the right persons—the persons whose public and social data in real time will help you to create a tailored message that sticks.

Visualize the Message

Visualizing datasets or combinations of datasets into rich graphs or infographics can help explain a story clearly and spread information about a product or service. Infographics, in particular, are great narrative tools that include data that impacts your organization.

Fine Tune the Communication over Time

Successful PR campaigns should be continuously observed and tweaked. Tracking how campaigns develop is important if your organization is to learn what does and what does not work, as well as for whom it works. Big Data can assist in providing real-time insights from around the world about how the PR campaign is perceived, what the online sentiment is, and what people are saying about it. A company that receives the analyzed data in real time has the opportunity to adjust quickly if necessary.

Save Your Organization in Times of Crisis

With Big Data tools, it will be feasible to know instantly when a crisis is about to hit the company. With the right predictive algorithms, it will even be possible to anticipate a crisis before it happens. Algorithms can analyze all the data that flows in and out of the company. It can also analyze all relevant internal and external data and determine when negative messages are being spread, where they are being spread, and about what topic. Your objective is to find these negative messages before they reach the mass public and have the potential to go viral.

If a crisis hits your organization, Big Data can help your PR department by providing valuable insights that can limit the effects of the crisis. This is true not only for businesses, but also for disaster relief. Big Data came in handy after Hurricane Sandy in New York, as it was able to identify citizens most in need of food packages and other supplies. Jim Delaney, COO of Marketwire, gives the example how the app Waze, the community-driven app that learns from users’ driving mannerisms to provide routing and real-time traffic updates, helped during that disaster: “The Federal Emergency Management Association (FEMA) and the White House called upon Waze to determine where to send gasoline trucks in New Jersey.”53 And, “Based upon the data they found on Waze, FEMA and the White House informed the public which gas stations had fuel during the gas shortages and power outages.”

Measuring the Big Data Results

Defining KPIs in the Big Data era is possible, but it depends on what you want to achieve. Do you want to improve sentiment? Do you want to involve your customers, or do you want to increase customer satisfaction? PR in the world of Big Data is about (online) reach and sentiment and bringing the right message to the right person at the right time. With the right Big Data technologies, it is possible to analyze your actions in real time, adjust if necessary, and then improve the results before the PR campaign is over.

Big Data and Its Impact on Human Resources

Big Data impacts how we live, how we work, and consequently how we work together. Therefore, Big Data should be on the agenda of the Human Resources (HR) Manager. When HR applies Big Data properly, it can improve employee productivity, decrease costs, and increase incomes. With Big Data, organizations can create better and happier employees.

Analyze Workplace Behavior

First of all, Big Data can help companies better understand workplace behavior. Sensors installed on office furniture can give insights into how meeting rooms are used by how many employees for how long and when.54 It can provide information about how often employees are away from their desks, how long meetings take, and who meets with whom. Collecting this data and combining it with other data sources, such as who works with whom, who emails whom, and who has what knowledge, can spur collaboration and increase efficiency. The data can help design offices and place employees who deal with each other frequently closer together.

Research at a tech company has revealed that large lunch tables that can seat up to 12 people result in increased productivity because of the greater social interaction.55 Another company, Cubist Pharmaceuticals, used data to reveal it had too many coffee machines. By reducing the number of coffee machines and creating centralized coffee spots, the company increased serendipitous interactions among employees.56,57 When employees feel they are more efficient and have more interactions with others, they are likely to be happier at work. Happier employees work harder and, therefore, it is wise to use Big Data to design the office in the best way possible.

The Available Information

Big Data can also reveal the type of information available within the organization. There are tools available in the market that can scan and analyze all documents, emails, phone calls, chat messages, intranet data, and other communications to understand who possesses what information. These tools can reveal which employee is an expert on what topic and which information is lacking. Even more, it can provide insights into what information is about to be lost—if, for example, the one employee with specific knowledge about a certain topic is about to retire.

The data can also reveal which people within an organization do not have that much knowledge but can be regarded as networkers. These “connectors,” as Malcolm Gladwell called them in Tipping Point, are very important within an organization, as they bring people together.58 Losing a connector to a competitor can do serious harm to an organization, so it is wise to know who these employees are within the organization.

In addition, if employees can simply question the Intranet when they need information about a certain topic, Big Data can bring them in contact faster with the right person across your organization, anywhere in the world. The available knowledge becomes visible to everyone rather than remaining in silos.

Hiring New Employees

Knowing what knowledge is present also reveals what knowledge is missing. Such information can help HR find the right employees for the right job.59 Even more, Big Data can also help evaluate job applicants. Tools can automatically scan résumés and social public data of applicants and score how well each applicant matches the job requirements. In addition, Big Data provides even more information during online assessments. The way an applicant answers questions, how long he or she takes per question, how many resources he or she uses, in which order the questions are answered, and so on, can provide more information about the applicant's behavior than the answers to the questions themselves. Test scores and college grade-point averages only say a little bit about the potential of an applicant. How he or she works is more important.

CATALYST IT SERVICES HIRES BIG DATA TO DO THE JOB

Catalyst IT Services is a technology outsourcing company in Baltimore that has screened more than 10,000 candidates.60 Screening by hand would have been a daunting process requiring many employees and costing a lot of money. The founder of Catalyst IT Services, Michael Rosenbaum, came up with a major plan to change the traditionally slow process of recruiting.61 Using technology and algorithms, the company built a program that screens applicant based on how an obligatory survey is completed. By successfully implementing a Big Data hiring strategy, the company managed to reduce employee turnover to 15 percent, compared to 30 percent for its U.S. competitors.62

Using Big Data, the process of hiring the right candidate for the right job has been much improved. Previously, the subjective opinions of hiring managers could result in the right person ending up in the wrong job. The data-crunching process now in place was developed around an online assessment that candidates need to complete. During the process of completing the assessment, the software gathers thousands of bits of job applicant data. This data about how a candidate completes the assessment is often more important than what the candidate actually said.

For example, someone who comes across a difficult question can immediately skip it or he/she can ponder it for a long time, going back and forth. This information tells a lot about how someone deals with challenges. One candidate might be better at assignments requiring a methodical approach, while another candidate might be better in another setting.63 Catalyst IT Services likes to call it the “Moneyball-like model.”64 By using analytics based on data rather than personal perceptions, the company manages to recruit talent and assemble highly skilled teams that are custom tailored to its clients.

Analyzing these different attributes and matching them to specific situations can be done much better by a computer than a human, particularly because the method looks at many more variables than a human could ever do. Typically, the method looks at a couple of thousand data points,65 such as time on a page in the assessment, keystrokes, public domain data, social network data, interaction data during the application, and résumé data. Subsequently, the algorithms will calculate a probability score of how a candidate will perform over a certain period on a certain project. Candidates who reach the threshold are interviewed. A majority of those interviewed are brought on board.

Using Big Data in the recruitment process is definitely an interesting approach, as recruitment has long been perceived as a task that could not be done by computers. Of course, Catalyst IT Services also still uses hiring managers to interview those candidates who reach above the threshold, but they have much more information at hand. Consequently, they can make a more informed decision that eventually will save money as the right person ends up in the right job.

Big Brother

There are also downsides to using Big Data within HR. One of the most important is that employees can get the feeling that their boss is watching their every move. A “Big Brother is watching feeling” among personnel can seriously upset productivity and employee happiness. As always with Big Data, employers have to be transparent to employees about why specific data is used, analyzed, and made available internally. Employees need to shift their thinking so they understand that Big Data can help them and is not spying on them. Then again, for years, employees have needed ID badges with sensors to gain access to offices. As employees should be aware, the same technology can already be used to monitor employee behavior.

Legacy Systems and Big Data

Although 90 percent of the available data in the world was created in the last two years, there is still a lot of “old data.”66 In 2010 and 2011, we created a total of 3 zettabytes of data.67 By using a very simple calculation, we can see that the amount of “old data’” is still approximately 0.3 zettabytes or 300 exabytes. If we compare that to the 2.5 exabytes of data that we currently create every day, it looks like it is nothing to worry about. Unfortunately, that is wrong. Those 300 exabytes of data can give us headaches and sleepless nights, while costing a lot of energy and money.

Why? Because a large proportion of those 300 exabytes reside in legacy systems that are incompatible with modern technology. Switching those systems off or simply importing the data into modern Big Data platforms is difficult. In particular, insurance companies and banks have legacy systems, some of which have been in place for decades. As a result of the many mergers and acquisitions in the finance world, banks sometimes have dozens of different legacy systems.68 One bank even had 40 different legacy systems.69 These aging cobbled-together legacy systems can often be found in payment and credit card systems, ATMs, and branch or channel solutions.70 Such legacy systems cause a lot of headaches to organizations. For example, the Deutsche Bank had to postpone its Big Data plans due to its legacy systems.71

Not only banks have to deal with legacy systems. The car industry faces similar problems. Ford Motor Company has data centers that are running on software that is 30 or 40 years old.72 The pharmaceutical and travel industries, as well as the public sector, also have to deal with legacy systems.73,74 Replacing these legacy systems is almost impossible. Karl Flinders, an editor at ComputerWeekly refers to it as “changing the engines on a Boeing 747 while in flight.”75

Legacy systems consist of traditional, relational database management systems often on old and slow machines that are not capable of handling too much data at once. Hence, most of these systems process data at night, and it can take a lot of time to query the data required. Real-time processing and analyzing of data in legacy systems is impossible.

One of the solutions for this problem is to replace the entire legacy system of a company. Apart from the massive risks involved in such an operation, there are also significant costs involved, so it is not very likely that many organizations will adopt this strategy. However, as hard as it may seem, it is not impossible, as the Commonwealth Bank of Australia has shown. In the past five years, it has replaced the entire bank's core system, moved most of the services into the cloud, and developed many apps and innovations that brought the bank to the forefront of innovation.76

As such, it is imperative that a way should be found to have new innovative technologies that allow real-time analysis of various datasets coexist with legacy systems. Systems from the gigabyte or even megabyte era still contain valuable (historical) information. There are several ways to keep and use the historical data in the data warehouses.

  1. Data can be macrobatched into the new Big Data solutions on a periodic timescale, for example every night. This data can then be used together with the “new” data.
  2. One can send periodic summaries of the data from the legacy systems to the Big Data warehouse in order to use the data in those warehouses while preventing continuous querying of the legacy data.

Such solutions will enable the analysis of both new data and structured legacy data within a single integrated architectural framework. A platform like this will allow legacy data to remain within the existing data warehouses and, at the same time, enable near real-time analyses.

Using middleware to enhance systems and replace the hardware that supports them is, however, not ideal.77 Another problem for the legacy systems is that a larger percentage of the IT budget will go to support Big Data projects. This will leave less money available for the legacy systems. In addition, the employees who are able to work with the legacy systems will become scarce and expensive.

If such a trend continues for a too long, there is a danger that the legacy system will fail one day, placing the company in a lot of trouble. The later organizations start to replace legacy systems—or at least try to make them compatible with Big Data technologies—the more expensive and difficult it will become.

Real transformative insights can only come when all data is used, including the data from legacy systems with incompatible data formats.78 Therefore, eventually the data in legacy systems will need to be transferred to massively scalable storage systems, thereby replacing the critical search, calculation, and reporting functions of those legacy systems.79

In the end, the goal for any organization with legacy systems should be to truly retire these systems, as companies will not be able to support them forever. If, in the mean time, they simultaneously integrate legacy data in one platform to produce data aggregation, they can already reap the benefits from historical data to create valuable insights.

BIG DATA ROADMAP

The first step is knowing what Big Data is; the second is knowing how a Big Data strategy can benefit your organization; the third—and most difficult—is knowing how to implement that Big Data strategy. A lot of organizations perceive the third step as the hardest. And, in large process-directed organizations, that can be true. Therefore, convincing the board to proceed and defining where to start can be unnerving. In truth, however, the steps that need to be taken are clear and straightforward (see Figure 6-2).

First, organizations need to understand what Big Data is. Otherwise, defining a strategy is impossible. Understanding what Big Data is can help you get management buy-in within the organization. Quite often, Big Data is seen as an IT matter because, after all, you need hardware and software to implement the strategy. The hardware and software need to be developed by highly skilled technical Big Data employees.80 These experts, especially in the beginning and when developing a solution on-premises, will form a large part of the Big Data team, particularly when implementing proof of concepts and/or a Big Data strategy. There are also many blogs and events around the world targeted to the technical IT point of view that focus on Big Data engineering, architecture, and analytics. This is nothing strange, as Big Data has different IT requirements than earlier strategies. Therefore, sharing information is important and valuable.

Figure 6-2 The Big Data Road Map

images

However, we should not forget that IT is merely a means to an end to achieve a strategy defined by the organization. This strategy could be “to increase customer satisfaction” or “to increase revenue” or “to improve the operational efficiency.” The route to achieve that strategy could be Big Data or any other solution. If the strategy is “to increase customer satisfaction,” it would be strange to define it as an IT matter or have the IT Director be the sponsor. IT is simply too operational and supportive for it to lead or sponsor Big Data projects.

Finding a sponsor for a Big Data project is, however, vital for its success. The sponsor can be senior management or someone on the board, and he or she should be involved in and support the decision to move forward with Big Data. The reason is that starting a Big Data project is difficult and, in the beginning, the results can be uncertain.

Management buy-in ensures that the project is not stopped before any real results can be shown. The person in charge should be someone within the organization who understands all the different departments, has a helicopter view of the project, and is high enough in the company to direct and align the entire company.

Other critical requirements for a potential sponsor of Big Data projects are given below. The more of these requirements the sponsor meets, the more likely it is that the Big Data project will succeed.

  • He or she must have a large network within the organization to get all different divisions aligned for the Big Data strategy.
  • He or she must have some technical expertise to understand how Big Data works and how it will drive change.
  • He or she must be able to identify new business opportunities based on the available data spread out in silos.
  • He or she must be able to lead the team in the right direction, especially in uncertain situations.
  • He or she must be able to get the necessary financial funds to start a Big Data strategy, but they also need to realize that failures in the beginning are part of the deal.

With the above list in mind, a C-level executive sponsor would be the best choice to spur the acceptance of the Big Data strategy. He or she can speed up necessary cultural change and can ensure that the four ethical guidelines are enforced. A board that is involved in a Big Data strategy can guarantee that the project will not be stopped before any real results can be shown.

Once senior management or the board approves the decision to move forward and a C-level executive sponsors the project, it is important to assemble a multidisciplinary team from all different departments within the organization that have links with the Big Data project. Data tends to be kept in silos throughout an organization. Focusing only on one part of the company could result in the omission of valuable data sources. Therefore, many departments should be included: marketing needs to be involved because of the customer point of view; product management to know how data is gathered by the products or services offered; human resources to comprehend the effect of data gathering on employees; compliance and risk to guarantee that the organization sticks to the four ethical guidelines discussed in Chapter 5; finance to keep the budget under control; and IT to build the required hardware and software.

Including all departments has a major advantage when defining the possible business use cases of Big Data. Brainstorming sessions will be more successful when people from different disciplines are involved. Each member of the multidisciplinary Big Data team should be able to offer a distinct point of view. Together, a large pool of possible uses can be defined. It is important during this phase to accept all possible ideas brought up during the sessions (as in normal brainstorming sessions, “no” and “that's not possible” do not exist). It is essential to let creativity flow, as this will allow you to find new data sources previously not considered.

Once a few dozen possible uses have been defined, it is time to develop criteria to rank them. It will help to divide the uses into different categories, such as those that fix bottlenecks within the operation or those that improve efficiency. Use criteria to rank all business use cases in the different categories. Criteria can be the impact on IT, the effect on implementing the solution, and/or a possible value proposition. It is not necessary to develop a scenario for each business use case, as there are too many unknowns at this moment.

Based on the criteria and the selected categories, it is possible to select the Proof of Concepts that will be realized. The multidisciplinary Big Data team should be able to realize the Proof of Concepts with minimal efforts. It is better to fail fast and fail often than to develop a complete solution and notice in the end that something was wrong. While Big Data has the potential to bring a lot of positive results, this may not be evident from the start. Don't be afraid to fail and restart, as it is part of the learning curve in how to deal with Big Data and to better understand how your organization can best benefit from it. For each organization, after all, the benefits will differ.

The moment the first results come in from the Proof of Concept, it is important to share them immediately with the entire organization. It will help get everyone involved in the Big Data effort, as for organizations to truly succeed with Big Data, an information-centric culture should be present. If the results of the Proof of Concepts are positive, it is time to expand the multidisciplinary Big Data team throughout the organization and to start more and larger projects. Extrapolate the lessons learned from the first projects and apply them to the new projects by better defining a possible ROI, IT impact, process implications, and other important criteria.

From there on, the entire process starts all over. Future projects should have a higher success rate and be implemented faster. In the end, Big Data projects should affect the net results of the organization positively, as long as they are implemented wisely and correctly. As an organization becomes more involved with Big Data, it is wise to train or hire Big Data Scientists. Some organizations train with their own personnel, while other companies hire consultants during the pilot project and then use them to train in-house staff. Both options are valid, and selection depends on the amount of time and money the company is willing to spend. When the right staff (trained or hired) is in place, building the right Big Data infrastructure and extrapolating the Big Data uses throughout the organization will happen faster. However, do not overspend during this phase; if the datasets only require, for example, a mere 1 to 10 TB of data, you do not need to build a complete Hadoop system. A 2012 Microsoft study revealed an overenthusiastic use of Hadoop, which is a waste of resources.81

BIG DATA COMPETENCIES REQUIRED

The Big Data era will require employees with different skills doing jobs that until recently did not exist. Developing and implementing a winning Big Data strategy also involves hiring the right employees. What type of jobs should be present within an organization and what are the competencies of the different employees? The following are the seven most important Big Data employees an organization should have if it has moved thoroughly into Big Data. Of course, if a company starts small, not all job types are required.

  • Chief Data Officer
  • Big Data Scientist
  • Big Data Analyst
  • Big Data Visualizer
  • Big Data Manager
  • Big Data Engineer
  • Big Data Consultant

Chief Data Officer

The Chief Data Officer (CDO) should be responsible for the overall Big Data strategy within an organization. If Big Data is realized throughout the organization, at all different levels and departments, the CDO will have the responsibility to ensure the strategy is implemented correctly, the data is accurate and secure, and the customers’ privacy protected. As data-driven decision making is an important aspect of information-centric organizations, the CDO should be a member of the executive board, reporting directly to the CEO.

Data in itself does not bring any value. It only becomes valuable when analyzed and turned into information. Therefore, the role could also be titled Chief Information Officer. However, to stress the importance of data and the enterprise-wide governance and utilization of data as an asset, Chief Data Officer is more appropriate. It emphasizes the connection between the officer and data that will lead to an information-centric organization.

The Chief Data Officer should supervise all the different Big Data initiatives. He or she should craft and/or manage standard data operating procedures, data accountability policies, data quality standards, data privacy, and ethical policies, as well as understand how to combine different data sources within the organization. The CDO should have the total overview of what is going on within the organization related to data and represent this as a strategic business asset at board level.

The Chief Data Office should also be responsible for defining the strategic Big Data priorities within the company. As the CDO has the best total overview, he or she can identify new business opportunities based on the available data and guide the different Big Data teams within the organization on which data to store, analyze, and use for what purposes. In the end, the CDO should be responsible for generating more revenue or decreasing costs through the use of data.

Another responsibility of the Chief Data Officer should be to adapt the four ethical guidelines discussed in Chapter 5. While driving the usage of data within the organization, the CDO should also be the data conscience of the company.

The Chief Data Officer should enforce radical transparency about what sort of data is collected and for what it is used. Consumers should be made aware of what data has been collected about them and allowed to delete it if the data is not stored anonymously. In addition, the CDO should enable users to easily adjust any privacy setting related to data collected. If this is done throughout the organization, it will build trust.

With all data collected and stored, security is crucial. Being hacked or having data stolen can seriously endanger the existence of a company. Therefore, this should be discussed at the highest level.

In the end, the CDO should be made accountable for whatever data is collected, as well as how it is stored, shared, sold, or analyzed. Big Data privacy and ethics are too important not to be discussed at the C-level.

The Chief Data Officer should also stipulate that all data within the organization be readily available to all departments (except, of course, sensitive data). Sharing data throughout the company can make the company more efficient and drive innovation. In organizations that store data and information in silos, the CDO should tackle the problem of separate data. So, the Chief Data Officer should share internal best practices and prevent the constant reinvention of the wheel.

Quite often, business and IT cannot agree on who owns what data within an organization. Therefore, the CDO should own all data because he or she also has the final responsibility for it. In the end, the Chief Data Officer should bridge the gap between IT and business. As Big Data should be a marketing and strategy matter, with support from IT, the CDO should be able to leverage and represent all different interests.

In addition to the above Chief Data Officer skill set, some other important qualifications should not be forgotten:

  • Strong leadership and C-suite board communication skills.
  • Experience in leading major information management programs in key business areas.
  • Expertise or familiarity with Big Data solutions, such as Hadoop, MapReduce, and/or HBase.
  • Experience in operationalizing data governance and data quality.
  • Familiarity with the major Big Data solutions and products available on the market.
  • Expertise in creating and deploying best practices and methodologies.
  • Knowledge about managing and leading technical Big Data teams across multiple departments.
  • Knowledge about building and supporting Big Data teams across the organization.
  • Knowledge about developing business cases for technical projects with a lot of uncertainties.
  • Familiarity with different modeling techniques, such as predictive modeling.

In the end, the Chief Data Officer should have a mix of technical and business backgrounds. He or she should not be too technical, as it could lead to excessive focusing on the bits and bytes instead of on the strategy. However, a CDO without technical background is not able to understand and talk to his or her Big Data team members. The balance between these skills is important so that the CDO will be able to smoothly navigate the technical and political hurdles.

The trend of Chief Data Officers is growing, and more and more organizations are making a seat available in the boardroom. With the increasing importance of Big Data, more organizations should do this, as it will drive innovation, lead to an information-centric culture, and eventually impact profits positively.

Big Data Scientist

The Big Data Scientist has been described as the sexiest job in the twenty-first century. Successful Big Data Scientists will be in high demand and will be able to earn lucrative salaries. But to be successful, Big Data Scientists need a wide range of skills that until recently did not even fit into one department.

They need to be familiar with statistical, mathematical, and predictive modeling techniques, as well as with business strategy. They need to have the skills to build the algorithms necessary to ask the right questions and find the right answers. They must be able to communicate their findings orally and visually. They should also understand how products are developed and, even more important, as Big Data affects the privacy of consumers, they need to have a set of ethical responsibilities.

Apart from the skills that Big Data Scientists can learn at a college or university, they also should possess the following special set of personality traits. They need to be:

  • Curious. They should enjoy diving deep into material to find an answer to a yet unknown question, that is, a natural desire to go beneath the surface of a problem.
  • Thinkers, who ask the right (business) questions.
  • Confident and secure, as more often then not they will have to deal with situations with a lot of unknowns.
  • Patient, as finding the unknown in massive datasets will take a lot of time and developing the algorithm to uncover new insights will often occur by trial and error.82
  • See examples in totally different industries and be able to adapt that to their current problem. For example, the Los Angeles Police department uses an algorithm designed to predict earthquakes to anticipate where crimes are likely to happen.83

Big Data Scientists understand how to integrate multiple systems and datasets. They need to link and mash up distinctive datasets to discover new insights. This often requires connecting different types of datasets in different forms, as well as being able to work with potentially incomplete data sources and cleaning datasets.

Of course, Big Data Scientists need to be able to program, preferably in different programming languages, such as Python, R, Java, Ruby, Clojure, Matlab, Pig, or SQL. They need to have an understanding of Hadoop, Hive, and/or MapReduce. In addition, he or she needs to be familiar with many disciplines, such as:

  • Natural Language Processing: the interactions between computers and humans.
  • Machine learning: using computers to improve as well as develop algorithms.
  • Conceptual modeling: the ability to share and articulate modeling.
  • Statistical analysis: to understand and work around possible limitations in models.
  • Predictive modeling: the ability to predict future outcomes of events.
  • Hypothesis testing: the ability to develop hypotheses and test them with careful experiments.

The exact background of Big Data Scientists is of less importance. Great Big Data Scientists can have backgrounds in different fields, including econometrics, physics, biostatistics, computer science, applied mathematics, and engineering. Most of the time, their educational background is a Master's Degree or even a PhD. However, to be successful, they should have at least some of the following capabilities:

  • Strong written and verbal communication skills.
  • The ability to work in a fast-paced multidisciplinary environment, as in a competitive landscape new data keeps flowing in rapidly and the world is constantly changing.
  • The ability to query databases and perform statistical analysis.
  • The ability to develop or program databases.
  • The ability to advise senior management in clear language about the implications of their work for the organization.
  • At least a basic understanding of how a business strategy works.
  • The ability to create examples, prototypes, and demonstrations to help management better understand the work.
  • A good understanding of design and architecture principles.
  • The ability to work autonomously.

In short, the Big Data Scientist needs an understanding of almost everything. Depending on the industry, the Big Data Scientist will need to specialize even further, as for example a marine Big Data Scientist requires a different set of skills than a historical Big Data Scientist.

Of course, the perfect Big Data Scientist who possesses all of the skills and capabilities described is extremely rare. Perhaps only a handful of Big Data Scientists have all the skills mentioned here. Organizations should chose and pick from this list what they deem most important in their Big Data Scientist and what the particular requirements are for the job.

Big Data Analyst

If the Big Data Scientist is king, the Big Data Analyst is the servant. A Big Data Scientist requires a wide range of skills and capabilities in order to, among other things, mash up and analyze different data sources. The Big Data Analyst primarily analyzes data in a given system and helps the Big Data Scientist perform the necessary jobs.

The Big Data Analyst requires specific set of skills and capabilities. In general, an analyst's next step may be that of a Big Data Scientist. An analyst needs to be able to provide business and management with clear analyses of the data at hand. This includes data mining skills (including data auditing, aggregation, validation, and reconciliation), advanced modeling techniques, testing, and creating and explaining results in clear and concise reports.

The analyst should have a broad understanding of and experience with real-time analytics and business intelligent platforms, such as Tableau Software. He or she should be able to work with structured query language (SQL) databases and several programming languages and statistical software packages, such as R, Java, MatLab, and SPSS. They also need knowledge of Hadoop and MapReduce. By using scripting languages, an analyst should be able to develop new insights from the available data.

The testing skills of a Big Data Analyst are particularly important. He or she should be able to perform A/B testing based on different hypotheses to directly and indirectly impact different KPIs. To perform such tests, as well as to build the reports that senior management needs, the analyst should have a certain business acumen, one that knows what drives an organization, what influences a strategy, and how the available data within an organization can contribute to the success of a strategy.

The personality traits needed for a Big Data Analyst are similar to those of a Big Data Scientist. He or she needs to have the curiosity to dive into the available data and enjoy searching for patterns that could indicate new insights. They should also be confident and independent enough to use very large datasets and come up with the questions that can help create management reports. Big Data Analysts generally have a Bachelor's Degree in subjects ranging from mathematics, statistics, and computer science to business administration, economics, and finance. In addition, a Big Data Analyst should have at least some of the following capabilities:

  • Strong interpersonal, oral and written communication, and presentation skills.
  • The ability to communicate complex findings and ideas in plain language.
  • The ability to work in teams toward a shared goal.
  • The ability to change direction quickly based on data analysis.
  • Enjoyment in discovering and solving problems.
  • The ability to proactively seek clarification of requirements and direction and taking responsibility.
  • The ability to work in stressful situation when insights in (new) datasets are required quickly.

The Big Data Analyst supports the business and the Big Data Scientist in delivering valuable insights. The analyst should therefore enjoy working with others and be willing to learn more. For each organization, an analyst will, of course, need different specialized skills, but the above-mentioned skills are a good starting point for finding the right Big Data Analyst.

Big Data Visualizer

One of the most important aspects in working with Big Data is the ability to visualize the information in a way that it is understandable for (senior) management. Visualizing data helps them understand the data and find new patterns and insights. Some Big Data startups are developing a completely new way of visualizing data. One of them is Ayasdi84; another is Synerscope.85 They take a new approach and do not use old-fashioned and not-very-insightful graphs and pie charts. If your organization can gain valuable insights with interactive visuals, you can be one step ahead of your competition. A Big Data Visualizer can help in creating these important insights.

A Big Data Visualizer should be a creative thinker who understands user interface design, as well as other visualizations skills, such as typography, user experience design, and visual art design. These give the visualizer the ability to turn abstract information into appealing and understandable visualizations that clearly explain the results of the analyses.

There is, however, a potential problem. Because the Big Data Scientist is the person who best understands the results of the data and the story that it tells, when turning the results over to the Big Data Visualizer, misinterpretation and biased presentation of results can occur. Therefore, a visualizer needs to understand how Big Data analyses are done, and he or she needs to have the necessary programming skills to actually build accurate visualizations. A background in computer science can help a visualizer better understand what the data means.

A Big Data Visualizer should have a solid background in using source control and testing frameworks, as well as agile development practices, to create and build compelling data visualizations and the ability to lead and advise management on how the visualizations work. He or she should be able to tell an understandable and comprehensible story, one that can be understood by the decision makers within an organization.

Mapping data is the difficult process of transforming structured and unstructured data into a graph. A Big Data Visualizer should be able to use metadata and metrics, as well as color, size, and position to highlight, group, and set up a hierarchy in the graphics. The visualizations should attract the user to play and interact with them.

As a Big Data Visualizer should be able to read the raw analyses, or even perform the analyses, as well as design, illustrate, and create the results, several skills are required:

  • In-depth knowledge of JavaScript, HTML, and CSS and statistical programming languages.
  • Familiarity with modern (JavaScript) visualization frameworks, such as Gephi, Processing, R, and/or d3js.
  • Experience with common web libraries, such as JQuery, LESS, and Functional Javascript.
  • Understanding of efficient and effective human–computer interaction.
  • Sharp analytical abilities and proven design skills.
  • A strong understanding of typography and how it can affect visualizations, principles of a good layout, a good use of space, and an inherent feel for motion.
  • Proficiency in Photoshop, Illustrator, and InDesign, as well as other Adobe Creative Suite products.
  • Excellent written and verbal communication skills, including the ability to explain the work in plain language to management people with no data experience.

In the end, the most important job of a Big Data Visualizer is to create compelling data visualizations out of abstract data that will help decision makers in their work. Exactly which skills are necessary depends, of course, on the type of job that needs to be done. A Big Data Visualizer, however, should always be able to select the best data visualization technique based on the characteristics of the underlying data to illustrate certainty, patterns, and other statistical concepts that will guide decision makers.

Big Data Manager

A team of Big Data Scientists, Big Data Analysts, and Big Data Visualizers, of course, need to be managed. The Big Data Manager is the middleman between the technical team members and the strategic management of an organization. Therefore, he or she needs to understand both sides of the coin. Ideally, the Big Data Manager has an IT background with strategic experience.

Big Data Scientists and Analysts are great at building the necessary tools for an organization, but in general they tend to be less effective in leading a team and dealing with changes from the top down. Therefore, a manager needs to lead the team.

The Big Data Manager has to coordinate team efforts, reward and/or stimulate certain behaviors, and ensure that the team keeps moving in the right direction. He or she must create a culture of innovation and creativity within the team and guarantee that the members are comfortable with the rapid changes likely to occur in this new field. In addition, the manager needs to keep the team focused on what needs to be done, as it is easy for Big Data Scientists and Analysts to become distracted and loose focus while building great algorithms. Sometimes, such distractions are good for the company, but others might do more harm. It is the manager's task to separate the good from the bad distractions.

As Big Data is most of all a marketing and strategic matter, the Big Data Manager needs to be able to explain the work done by the team members to the senior management of an organization. The manager coordinates the work of the stakeholders within the organization and ensures buy-in on time for the tasks at hand. As Big Data can affect any aspect or department within an organization, the manager needs good networking and strong communication skills. The manager should also be responsible for aligning the different data requests, as well as the different data sources within the organization.

As is always the case with new projects in a corporate environment, the expected ROI needs to be calculated. Although this is difficult in the beginning, it is the Big Data Manager's job to develop the business case, manage the planning and budget, decrease the risks involved, and ensure sufficient resource allocation. The team members should not be involved in project management. Rather, the manager should be a strong person, who is able to keep stand firm in complex and fast-changing environments.

A Big Data Manager should, of course, also have the following core management skills.

  • The ability to communicate efficiently and effectively and be able to understand, interpret, and relate an organization's strategy and vision to the Big Data team.
  • The ability to strive to build personal relationships with the team as well as to promote a true Big Data culture.
  • The ability to be flexible in a changing environment and to explain changes correctly to the team members.
  • The ability to instill trust in team members, as well as help them grow in their role.

Another important skill is to be able to work with Big Data Scientists, Analysts, and Visualizers. Big Data Scientists, in particular, are quite often highly educated, and this requires a different management approach. A good manager will provide the best working environment for all team members, removing them from all complexities and administrative work and ensuring timely progress. As the manager will supervise technical projects, he or she should have familiarity with several programming languages, including Python, R, Java, Ruby, Clojure, Matlab, Pig, and SQL, and at least a basic, understanding of Hadoop, Hive, and/or MapReduce. In addition, the manager needs to have at least some knowledge of the following disciplines.

  • Natural language processing
  • Machine learning
  • Conceptual modeling
  • Statistical analysis
  • Predictive modeling

The role of a Big Data Manager is a difficult one, so sufficient experience in management is advisable. If the manager is inexperienced, problems can develop because the tasks at hand are already complex and difficult.

Big Data Engineer

A Big Data Engineer builds what the Big Data solutions architect designs. Big Data Engineers develop, maintain, test, and evaluate Big Data solutions within organizations. Most of the time, they are also involved in the design of the solutions, because of their experience with Hadoop-based technologies, such as MapReduce, Hive MongoDB, and Cassandra. A Big Data Engineer builds large-scale data processing systems, is an expert in data warehousing solutions, and should be able to work with the latest (NoSQL) database technologies.

A Big Data Engineer should have experience with object-oriented design, coding, and testing patterns, as well as experience in engineering (commercial or open-source) software platforms and large-scale data infrastructures. Engineers should also have the ability to architect highly scalable distributed systems, using different open-source tools, and should have experience building high-performance algorithms.

A Big Data Engineer embraces the challenge of dealing with petabytes or even exabytes of data on a daily basis and understands how to apply technologies to solve Big Data problems and develop innovative solutions. To be able to do this, the engineer requires extensive knowledge of different programming or scripting languages, such as Java, Linux, C++, PHP, Ruby, Phyton, and R, and of different (NoSQL or RDBMS) databases, such as MongoDB, or Redis. Building data processing systems with Hadoop and Hive using Java or Python should be common knowledge to the engineer.

A Big Data Engineer generally works on implementing complex Big Data projects with a focus on collecting, parsing, managing, analyzing, and visualizing large sets of data to turn information into insights using multiple platforms. He or she should be able to decide on the required hardware and software design needs and develop prototypes and Proof of Concepts for the selected solutions.

Additional qualifications for the position include:

  • Enjoyment at being challenged and in solving complex problems on a daily basis.
  • Excellent oral and written communication skills.
  • Proficiency in designing efficient and robust extract, transform, and load (ETL) workflows.
  • The ability to work in cloud computing environments.
  • A Bachelor's or Master's degree in computer science or software engineering.
  • The ability to work in teams and collaborate with others to clarify requirements.
  • The ability to assist in documenting requirements, as well as in resolving conflicts or ambiguities.
  • The ability to fine tune Hadoop solutions to improve performance and the end-user experience.
  • Strong coordination and project management skills to handle complex projects.

Big Data Engineer is a technical job requiring substantial expertise in a wide range of software development and programming fields. He or she should have sufficient knowledge of Big Data solutions to be able to implement those on the premises or in the cloud.

Big Data Consultant

The Big Data Consultant advises organizations about all aspects of Big Data, including how to formulate and implement a strategy and which technologies best fit the needs of the organization. A Big Data Consultant, therefore, should have business experience, as well as sound technical knowledge of a broad range of Big Data tools.

The Big Data Consultant designs strategies and programs to collect, store, analyze, and visualize data from various sources for specific projects. He or she should be able to lead a team and project, while ensuring quality and on-time delivery according to the specifications.

In addition, the consultant should produce accurate analyses of the datasets using smart algorithms and the latest Big Data technologies. He or she should be aware of the latest Big Data trends, as well as know which Big Data technologies are available on the market and would fit a chosen strategy. An understanding of the possibilities of public data is important as well. The consultant should be able to assess datasets and confirm the quality and correctness of the available data. Then, the consultant should be able to query the data, perform analyses, and present findings in a clear and understandable language.

The Big Data Consultant should have sufficient technical knowledge. He or she needs to be able to program, preferably in different programming/scripting languages, such as, for example, Python, R, Java, Ruby, Clojure, Matlab, Pig, and SQL, as well as to understand Hadoop, Hive, HBase, MongoDB, and/or MapReduce. In addition, he or she needs to be familiar with disciplines, such as text mining, clustering analysis, recommendation analysis, outlier detection, predictive analytics, and similarity searches, as well as different modeling techniques. The Big Data solution can be delivered on premises or in the cloud. Therefore, the consultant should also have experience with one of the large cloud-computing infrastructure solutions, such as Amazon Web Services, or Elastic MapReduce.

The Big Data Consultant should enjoy working in teams, as he or she will need to interact with Big Data Scientists and Big Data Engineers to produce powerful data processes with real-time analytics and reporting applications, as well as to build the necessary hardware and software for the chosen solution.

The consultant also needs to communicate effectively at the executive level, as quite often he or she deals with executive management when advising an organization. It is important for the consultant to ask the right questions to understand the problem at hand and propose the right solution.

The goal is to improve business results using Big Data technology. Therefore, he or she needs to translate business issues into Big Data solutions that will help (senior) managers in their decision-making. A Big Data Consultant should also have, apart from technical expertise, the “standard” strategy consultant skills, including:

  • A Master's degree (or equivalent) from a leading university.
  • Excellent oral, written communication, and interpersonal skills.
  • Commercial awareness and a natural curiosity in solving complex problems.
  • Enjoyment in working in a fast-changing and competitive environment.
  • The ability to handle multiple tasks and responsibilities.
  • The ability to work under pressure and meet regulatory deadlines.
  • Self-reliance and the capability to work both independently and as a team member.
  • The ability to deliver clear and concise presentations to (senior) management.
  • The ability to develop and review project plans, identifying and solving issues and communicating the status of assigned projects to users and managers.

The role of a Big Data Consultant is important for organizations that require help in understanding Big Data and how to apply it. In addition, a consultant can give organizations that already have a Big Data solution new insights, thereby improving business results. It is a difficult, but highly respected, role, as so many technical and business skills are required.

SMALL AND MEDIUM ENTERPRISES (SMEs) CAN ACHIEVE REMARKABLE RESULTS WITH BIG DATA

Quite often, I hear that SMEs cannot join the Big Data movement or cannot develop a Big Data strategy because they have too little data. A 2012 survey by SAP, however, shows that 76 percent of the interviewed C-level executives of SMEs view Big Data as an opportunity.86 Steve Lucas, EVP Business Analytics, Database & Technology at SAP, said: “Every company should be thinking about their Big Data strategy whether they are big or small.”87,88 Even companies with smaller amounts of data can develop a Big Data roadmap and become an information-centric organization. So, what are the Big Data opportunities for SMEs hereafter, and how can they leverage their “small data”?

By small data, I am not referring to IBM's definition of small data.89 IBM defines small data as low volumes, batch velocities, and structured varieties. Small data, however, can be any form of data, structured and unstructured and in real time or in batch processed. Small data simply refers to smaller volumes, that is, gigabytes and a few terabytes instead of petabytes or more.

It is true that some SMEs might not have that much data. However, even SMEs have suppliers or distributors. When these companies start to work together and share their data, the amount of available data increases many times. We also see this process happening at large multinationals. Nike, for example, shares data from all its suppliers with the rest of the industry.90 This allows other organizations in the supply chain to populate and use the database and make better decisions.

When SMEs start using and combining their data with that from suppliers and vendors, they suddenly have sufficient data to analyze, visualize, and use for improved decision making. They can also combine their existing “small data” with public datasets. These datasets are becoming increasingly available, and more and public platforms exist from which SMEs can download for free or buy additional datasets. Merging personal data with public data increases the data available for analyses. An additional benefit is that combining existing data with new public datasets can create completely new results, such as finding new markets or target groups.

SMEs should not look at the data they already have and collect, but should also be open to new ways to collect data. Creativity is key in this matter, as in the end, any product can be turned into data if sensors join the game. Sensors are becoming cheaper every day, and adding sensors to existing products can deliver completely new datasets that can deliver unexpected insights.

Big Data is not only about volume and velocity but also variety. The power of Big Data is the ability to combine unstructured and structured datasets to gain new insights. Unstructured data comes from a wide variety of sources, including social data, visuals, documents, emails, and even voice data. Combining several smaller datasets can deliver the same insights as combining large datasets. Gigabytes of data can therefore provide SMEs with the same insights as petabytes or exabytes do for large corporate multinationals.

As Jamie Turner from Real Business said: “with their limited resources […] flexibility and agility are crucial for SMEs.”91 They indeed have to look for solutions that fit their available resources. Instead of developing a complete Big Data solution from IBM, SAS, or HP, they can use cloud-based solutions created by the smaller, and thus more flexible, Big Data startups. In addition, they can build their own Big Data solution with open-source tools. Although the latter still requires specialized personnel, it does not have to cost the world any more. Open-source tools are free of charge (of course without any service) and commodity hardware becomes cheaper every day, as discussed in Chapter 4.

Big Data is certainly not only for large organizations. There are plenty of chances for SMEs to gain valuable insights from their existing data or new datasets. The fact is that SMEs do have to be a bit more creative to solve the Big Data puzzle. They have to think out of the box to see the data opportunities within their companies as well as outside their organization. But, that is also true for large corporates in the end, if they want to take full advantage of Big Data.

So, small data can become Big Data by cleverly combining various datasets with different data formats. For example, combine weather data with your restaurant's sales data to discover the impact of rain on items sold and, as such, adjust your purchasing behavior. Combine your customer data with their sentiment online to surprise them and create long-lasting relationships. Track how your customers behave through your shop and combine it with your sales data to see how you can adjust and improve your floor plan. Or, combine online sales data with offline customer profiles to see how you can optimize your multichannel approach for your small retail shop. The opportunities are endless and also small data can provide big insights.

GOVERNANCE FOR THE ACCURACY OF BIG DATA

With large volumes of data that are used as an asset within your organization comes the responsibility to ensure that the data, as well as the analysis, are correct. Big Data governance should therefore be an important aspect of your strategy. The data governance structure within your organization should be capable of dealing with high volumes of a variety of data that needs to be checked and controlled whether correct or not. This requires extensive and smart algorithms that perform dazzling analyses within a fraction of a second and that deliver predictions and visualizations that are used to determine the course of an organization and that can affect many stakeholders.

It is vital that the data organizations collect, store, and analyze are 100 percent correct. With information-centric organizations that base decisions on algorithms, it is crucial that the algorithms and their (predictive) analyses are accurate. But who is capable of checking and controlling thousands of petabytes of data or extensive and extremely complex algorithms that improve over time? How do we ensure that consumer data is kept secure, private, and not mistreated? How do we guarantee that the predictions made are based on the right variables? How do we know that green is really green and not perhaps red?

The era of Big Data will require a new form of governance consisting of auditing and control, of checks and balances and, perhaps as well, of quality labels for organizations. An ISO for Big Data? This could lead to a completely new field within the global Big Data industry. When organizations start to place Big Data on the balance sheet, auditing and regulatory organizations will pay very close attention to how the data is stored, collected, analyzed, and visualized, as it can make or break an organization.

The Data

Responsibility for the reliability and accuracy of the data collected and stored begins with the user who provides the data using different applications to the organization that collects, stores, and analyzes it. Organizations will have to guarantee that users understand what data is collected when and for what purpose. They should inform users how the data will be used in the first instance, but also what secondary usage will be made of the data the moment this becomes clear. As discussed, this should be done through clear and understandable privacy policies and terms and conditions, which can also be understood by the digital immigrant. Users should also be kept up to date via email when usage of their data changes. Making these documents difficult to understand and/or changing them rapidly does not fit with an organization that wants to make the most of Big Data. Organizations should therefore make it extremely easy for users to adjust their privacy settings, as well as to delete or edit their data whenever they deem suitable. Organizations should not be allowed to push the responsibility on to the users.

Taking responsibility for how the data is used is only one aspect of data governance. Another is that organizations, as well as governments, should do everything to ensure that the data they collect is accurate. If this is not done correctly, it can go very wrong, as shown in the example of U.S. Senator Edward Kennedy being refused entrance in 2004 at several airports because his name somehow appeared on an terrorist watchlist.92 Most of the time, users do not know what information is collected, but more importantly they do not always have access to their own data. Users should therefore be able to correct data if they discover it is incorrect or can be misread. This principle is incorporated into the Fair Credit Reporting Act (FCRA), which requires credit reporting agencies to provide consumers with access to their reports, so they can have inaccuracies corrected. But this is only for credit-reporting agencies and the law was originally passed in 1970, long before all other industries started collecting massive amounts of data. So, nowadays, the general public has no clue what entities are collecting what data and what they are or will be doing with it. Of course, giving consumers the ability to adjust incorrect data should require several security measures, so as to prevent misuse by criminals.

Organizations that collect and store data should take the necessary security steps to ensure that data is stored securely and cannot be stolen by criminals. Just as banks do everything to protect the money they have received from consumers and organizations and indemnify consumers and organizations when a bank is robbed, organizations should protect the data they have collected and indemnify users when their data is stolen.

The Algorithms

Algorithms are capable of amazing analyses and can turn massive amounts of raw data into information. The first step is to ensure that the data used by the algorithms is correct. The second is to guarantee that the algorithms themselves are correct. How do managers and consumers know that the algorithm works properly? How do they know that green is really green If major business decisions are based on an incorrect algorithm, it could have massive consequences for the organizations as well as for consumers. Consumers who apply for a loan have to trust organizations that use algorithms to determine their risk profile correctly and that they are not refused a loan or have to pay more because of inaccurate data.

Big Data technology vendors that have developed algorithms should receive a quality label that confirms their algorithms are working correctly and appropriately, serving the purposes for which they were designed. Organizations that use an algorithm developed by a Big Data technology vendor that already has such a quality label can be trusted more and are more likely to receive a positive assessment by the Big Data regulators.

Organizations that develop algorithms in-house should also have them checked to confirm that they meet local regulations. The authors of the book Big Data, Vicktor Mayer-Schönberger and Kenneth Cukier, therefore predict the rise of “algorithmists,” who would be capable of and allowed to check any algorithm created by organizations.93 These “algorithmists” would be knowledgeable about the different Big Data techniques that are available and would specialize in different sections to be able to read and assess algorithms. As algorithms are private company information, these “algorithmists” should sign nondisclosure agreements, just as conventional accountants.

Users will have more confidence in organizations that have had their algorithms checked and approved, as they know that their data is being analyzed correctly.

The Data Auditors

Data auditors, who can be internal or external, are responsible for ensuring:

  1. The correctness of the data and confirming the fact that it is secured correctly.
  2. The correctness of the algorithms performing the analyses on the data.
  3. That the organization observes the four ethical guidelines.

Auditors can perform different levels of scrutiny. Organizations that deal with highly private personal information, such as health records or financial data, should undergo the strictest assessment, while organizations that use the data for innocent mobile applications can have less strict regulations. How these assessments or regulations look will differ by country but, in the end, a global set of data governance standards should be developed, similar to the International Financial Reporting Standards (IFRS) or the Generally Accepted Accounting Principles (GAAP).

TAKEAWAYS

Big Data will change how organizations are structured and managed. It will affect all departments, from those that deal with the core activities of an organization, such as operations or manufacturing of products, to supporting departments, such as human resources.

The challenge organizations will face in the coming years is how to become information-centric organizations that make decisions based on massive amounts of data that are collected in real time. Although the number of organizations currently taking full advantage of Big Data is still small, this will change in the future. The result will be that all companies, including the SMEs, will be able to take full advantage of the benefits of Big Data regardless of their industry.

The roadmap to an information-centric organization is a long and difficult, but one worth pursuing. Research has shown that organizations that have implemented a Big Data strategy successfully outperform their competitors by 20 percent.94 The objective should be to eventually determine the return on data and put data on the balance sheet as an asset. As a result, organizations will be held accountable by data auditors, who will ensure that data is collected and stored correctly and securely and guarantee that the algorithms used are performing as intended.

In order to develop and implement a Big Data strategy, organizations will require several new types of employees with different competencies. The most important will be the Big Data Scientists, who have the ability to build to tools to give organizations the insights necessary to improve their bottom lines.

In the end, this roadmap can help your organization develop and implement a Big Data strategy that is good for the company, good for your customers, and good for society. Big Data is too important and has too many implications and advantages to be ignored.

Organizations will find many different uses for Big Data, including obtaining a 360-degree view of all its customers. Chapter 7 dives deeper into the different business use cases for different industries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset