images

Introduction

This book was built for those of you who are searching. Those of you who are wondering. Searching and wondering what on earth big data will mean for your data world. IT takes a different approach, however, than the litany of titles designed to spend hundreds of pages beating you over the head telling you that you need big data, that everyone is doing it, and that you have to be “cool,” too!

This author team wanted to create something that would be your go-to resource for moving from your existing relational world and provide you not only the roadmap forward but also practical experience for those of you who don't need the click here, move the mouse to the left, and click again level of instruction. We do explain some things in greater detail, but these are things that require this due to their newness or relative complexity.

We are focused on making sure you can ease your transition to using these tools and technologies because we have been where you are. Your boss came back from a conference and said, “We need a big data solution.” When you inquire what he would like it to solve, he doesn't really know, but he knows how critical it is that the organization have one. You will become the responsible party for making these big data dreams come true.

Normally, this would entail training classes and long hours combing the Internet like you did when they told you they needed a data warehouse or a cube, those other words once foreign to you. You will learn through this text that big data is really big—no pun intended. It can do big things, solve big problems, and is a big ecosystem of tools and platforms. However, like most other ecosystems (RDBMSs, programming languages, mobile, and cloud), there are really only a few foundational things, and if you can come up to speed on those, you will be rocking and rolling when you need to apply more advanced tools, or automation, and so on.

Our Team

We have assembled a strong international team of authors to make sure that we can provide a sound perspective and knowledge transfer on the right topics (we'll discuss those shortly). Those topics include:

  1. Accelerated overview of Big Data, Hadoop, NoSQL, and key industry knowledge
  2. Key problems people are trying to solve and how to identify them
  3. Delivering big data in a Microsoft world
  4. Tool and platform choice
  5. Installation, configuration, and exploration
  6. Storing and managing big data
  7. Working with, adding structure, and cleansing your data
  8. Big data and SQL Server together
  9. Analytics in the big data world
  10. How this works in the cloud
  11. Case studies and real world applications
  12. Moving your organization forward in this new world

This team includes members of Pragmatic Works, a global leader in information services, software, and training; Microsoft Research; Microsoft Consulting Services; Azure Customer Advisory Team; and some other industry firms making a big impact in this expanding space.

All Kidding Aside

Big data is coming on strong. You will have these solutions in your environment within 24 months, and you should be prepared. This book is designed to help you make the transition with practical skills from a relational to a more “evolved” view of the data worlds. This includes solutions that will handle data that does not fit nicely into a tabular structure, but is nonetheless just as or more important in some cases as the data that you have curated so carefully for so many years.

You will learn some new terms as well. This will be almost as much a vocabulary lesson as a technical lesson.

Who Is This Book For?

This book is for those data developers, power users, and executives looking to understand how these big data technologies will impact their world and how to properly approach solutions in this new ecosystem. Readers will need a basic understanding of data systems and a passion for learning new technologies and techniques. Some experience with developing database or application solutions will be helpful in some advanced topic areas.

What You Need to Use This Book

We have designed this book to make extensive use of cloud resources so, as the reader, you will need to have a newer model computer PC or Mac that can access the Internet reliably. In addition, you will want to be able to install additional programs and tools as advised by the authors, so please ensure you have that access on the machine you're using. Different chapters will have different tools or data sets, so please follow the authors' instructions in those chapters to get the most out of your experience. Having access to a SQL Server database will be required in certain chapters, and if you wish to set up your environment on premise, then a virtualization technology such as Hyper-V, VMWare, or Virtual box is recommended.

Chapter Overview

Now we'll go through the chapters in this text and discuss what you'll be learning from each one.

  • Chapter 1: Industry Needs and Solutions

    No book on big data would be complete without some coverage of the history, origins, and use cases in this ecosystem. We also need to discuss the industry players and platforms that are in scope for the book. Other books spend 5 to 6 chapters rehashing this information; we have done it efficiently for you so you can get to work on more fun topics!

  • Chapter 2: Microsoft's Approach to Big Data

    Doing this in a Microsoft world is a little different that the traditional UNIX or Linux deployment. We chose this approach since we feel it makes this technology more accessible to millions of windows administrators, developers and power users. Many of the folks were surveyed before this writing, we heard overwhelmingly that we needed a Windows-focused solution to help the largest population of enterprise users access this new technology.

  • Chapter 3: Installing HDInsight

    In this chapter, you'll get started configuring your big data environment.

  • Chapter 4: HDFS, Hive, HBase and HCatalog

    These are some key data and metadata technologies. We'll make sure you understand when to use each one and how to get the most out of them.

  • Chapter 5: Storing and Managing data in HDFS

    A distributed file system might be a new concept for most readers, so we are going to make sure we go through this core component of Hadoop and ensure you're prepared for designing with this incredible feature.

  • Chapter 6: Adding Structure with Hive

    We need to go deeper into Hive because you'll use it a lot. Let's dive in with this chapter to make sure you understand commands and the logic behind using Hive efficiently.

  • Chapter 7: Expanding your Capability with HBase and HCatalog

    Dealing with large tables and metadata requires some new tools and techniques. HBase and HCatalog will help you manage these types of challenges, and we're going to take you through using them. Get ready to put the BIG in big data.

  • Chapter 8: Effective Big Data ETL with SSIS, Pig, and Sqoop

    We have to load this data, and there is no better way to do it than with our ETL expert authors. Come along while they take you through using favorite and familiar tools, along with some new ones, to load data quickly and effectively.

  • Chapter 9: Data Research and Advanced Data Cleansing with Pig and Hive

    Now we've installed, configured, explored, and loaded some data. Let's get buys researching and cleansing this data with our new tools and platform.

  • Chapter 10: Data Warehouses and Hadoop Integration

    How do SQL Server and business intelligence fit in with big data? Very closely. Most of the time they will work in tandem. We will show you when to use each solution and how they work together in scale-up and scale-out solutions.

  • Chapter 11: Visualizing Big Data with Microsoft BI

    Now that we have the analysis, how do we visualize this for our users? Do we have new tools? Do we use our familiar tools? Yes! Let's do this together so we can understand how to combine these solutions for the best results for our users and customers.

  • Chapter 12: Big Data Analytics

    You've heard about analytics. This chapter includes advanced statistical analysis, social sentiment analysis, forecasting, modeling, and much more! No PhD required.

  • Chapter 13: Big Data In the Cloud

    Do you need a lot of servers in your data center to do the things in this book? No way! We can do it in the cloud in an elastic and scalable fashion.

  • Chapter 14: SQL Server Big Data Case Examples

    How are other firms succeeding and failing in this ecosystem. We will take you through some of the best wins and losses and why these outcomes happened so you can model after them or avoid them.

  • Chapter 15: Building and Executing your Big Data Plan

    How do we take what we've done and make it real? This chapter will help you write your big data plan.

  • Chapter 16: Operational Big Data Management

    Administering these technologies and integrating them into your existing infrastructure will take planning and careful execution, just like your other critical systems. Let's plan this out together!

Features Used in This Book

The following features and icons are used in this book to help draw your attention to some of the most important or useful information in the book:

images Be sure to take heed when you see one of these asides. When particular steps could cause damage to your electronics if performed incorrectly, you'll see one of these asides.

images These asides contain quick hints about how to perform simple tasks that might prove useful for the task at hand.

images These asides contain additional information that may be of importance to you, including links to videos and online material that will make it easier to following along with the development of a particular project.

SAMPLE HEADING

These asides go into additional depth about the current topic or a related topic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset