Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

About the Technical Editors

images

Contents

Part I What Is Big Data?

Chapter 1 Industry Needs and Solutions

What's So Big About Big Data?

A Brief History of Hadoop

What Is Hadoop?

Derivative Works and Distributions

Hadoop Distributions

Core Hadoop Ecosystem

Important Apache Projects for Hadoop

The Future for Hadoop

Chapter 2 Microsoft's Approach to Big Data

A Story of “Better Together”

Competition in the Ecosystem

SQL on Hadoop Today

Hortonworks and Stinger

Cloudera and Impala

Microsoft's Contribution to SQL in Hadoop

Deploying Hadoop

Deployment Factors

Deployment Topologies

Deployment Scorecard

Part II Setting Up for Big Data with Microsoft

Chapter 3 Configuring Your First Big Data Environment

Getting Started

Getting the Install

Running the Installation

On-Premise Installation: Single-Node Installation

HDInsight Service: Installing in the Cloud

Windows Azure Storage Explorer Options

Validating Your New Cluster

Logging into HDInsight Service

Verify HDP Functionality in the Logs

Common Post-Setup Tasks

Loading Your First Files

Verifying Hive and Pig

Part III Storing and Managing Big Data

Chapter 4 HDFS, Hive, HBase, and HCatalog

Exploring the Hadoop Distributed File System

Explaining the HDFS Architecture

Interacting with HDFS

Exploring Hive: The Hadoop Data Warehouse Platform

Designing, Building, and Loading Tables

Configuring the Hive ODBC Driver

Exploring HCatalog: HDFS Table and Metadata Management

Exploring HBase: An HDFS Column-Oriented Database

Columnar Databases

Defining and Populating an HBase Table

Using Query Operations

Chapter 5 Storing and Managing Data in HDFS

Understanding the Fundamentals of HDFS

HDFS Architecture

NameNodes and DataNodes

Data Replication

Using Common Commands to Interact with HDFS

Interfaces for Working with HDFS

File Manipulation Commands

Administrative Functions in HDFS

Moving and Organizing Data in HDFS

Moving Data in HDFS

Implementing Data Structures for Easier Management

Rebalancing Data

Chapter 6 Adding Structure with Hive

Understanding Hive's Purpose and Role

Providing Structure for Unstructured Data

Enabling Data Access and Transformation

Differentiating Hive from Traditional RDBMS Systems

Working with Hive

Creating and Querying Basic Tables

Creating Databases

Creating Tables

Adding and Deleting Data

Querying a Table

Using Advanced Data Structures with Hive

Setting Up Partitioned Tables

Loading Partitioned Tables

Creating Indexes for Tables

Chapter 7 Expanding Your Capability with HBase and HCatalog

Creating HBase Tables

Loading Data into an HBase Table

Performing a Fast Lookup

Loading and Querying HBase

Managing Data with HCatalog

Working with HCatalog and Hive

Defining Data Structures

Creating Indexes

Creating Partitions

Integrating HCatalog with Pig and Hive

Using HBase or Hive as a Data Warehouse

Part IV Working with Your Big Data

Chapter 8 Effective Big Data ETL with SSIS, Pig, and Sqoop

Combining Big Data and SQL Server Tools for Better Solutions

Why Move the Data?

Transferring Data Between Hadoop and SQL Server

Working with SSIS and Hive

Connecting to Hive

Configuring Your Packages

Loading Data into Hadoop

Getting the Best Performance from SSIS

Transferring Data with Sqoop

Copying Data from SQL Server

Copying Data to SQL Server

Using Pig for Data Movement

Transforming Data with Pig

Using Pig and SSIS Together

Choosing the Right Tool

Use Cases for SSIS

Use Cases for Pig

Use Cases for Sqoop

Chapter 9 Data Research and Advanced Data Cleansing with Pig and Hive

Getting to Know Pig

When to Use Pig

Taking Advantage of Built-in Functions

Executing User-defined Functions

Building Your Own UDFs for Pig

Data Analysis with Hive

Types of Hive Functions

Extending Hive with Map-reduce Scripts

Creating a Custom Map-reduce Script

Creating Your Own UDFs for Hive

Part V Big Data and SQL Server Together

Chapter 10 Data Warehouses and Hadoop Integration

State of the Union

Challenges Faced by Traditional Data Warehouse Architectures

Technical Constraints

Business Challenges

Hadoop's Impact on the Data Warehouse Market

Keep Everything

Code First (Schema Later)

Model the Value

Throw Compute at the Problem

Introducing Parallel Data Warehouse (PDW)

Why Is PDW Important?

Project Polybase

Polybase Architecture

Business Use Cases for Polybase Today

Speculating on the Future for Polybase

Chapter 11 Visualizing Big Data with Microsoft BI

An Ecosystem of Tools

Reporting Services

Self-service Big Data with PowerPivot

Setting Up the ODBC Driver

Updating the Model

Adding Measures

Creating Pivot Tables

Rapid Big Data Exploration with Power View

Spatial Exploration with Power Map

Chapter 12 Big Data Analytics

Data Science, Data Mining, and Predictive Analytics

Predictive Analytics

Introduction to Mahout

Building a Recommendation Engine

Getting Started

Running a User-to-user Recommendation Job

Running an Item-to-item Recommendation Job

Chapter 13 Big Data and the Cloud

Defining the Cloud

Exploring Big Data Cloud Providers

Setting Up a Big Data Sandbox in the Cloud

Getting Started with Amazon EMR

Getting Started with HDInsight

Storing Your Data in the Cloud

Uploading Your Data

Exploring Big Data Storage Tools

Integrating Cloud Data

Other Cloud Data Sources

Chapter 14 Big Data in the Real World

Common Industry Analytics

IT/Hosting Optimization

Marketing Social Sentiment

Operational Analytics

A New Ecosystem of Technologies

Part VI Moving Your Big Data Forward

Chapter 15 Building and Executing Your Big Data Plan

Gaining Sponsor and Stakeholder Buy-In

Problem Definition

Scope Management

Stakeholder Expectations

Defining the Criteria for Success

Identifying Technical Challenges

Environmental Challenges

Challenges in Skillset

Identifying Operational Challenges

Planning for Setup/Configuration

Planning for Ongoing Maintenance

The HandOff to Operations

After Deployment

Chapter 16 Operational Big Data Management

Hybrid Big Data Environments: Cloud and On-Premise Solutions Working Together

Ongoing Data Integration with Cloud and On-Premise Solutions

Integration Thoughts for Big Data

Backups and High Availability in Your Big Data Environment

High Availability

Disaster Recovery

Big Data Solution Governance

Creating Operational Analytics

System Center Operations Manager for HDP

Installing the Ambari SCOM Management Pack

Monitoring with the Ambari SCOM Management Pack

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.