Home Page Icon
Home Page
Table of Contents for
Part III: Advanced Topics
Close
Part III: Advanced Topics
by Dionysios Logothetis, Roman Shaposhnik, Claudio Martella
Practical Graph Analytics with Apache Giraph
Cover
Title
Copyright
Contents at a Glance
Contents
About the Authors
About the Techincal reviewer
Introduction
Annotation Conventions
Part I: Giraph Building Blocks
Chapter 1: Introducing Giraph
Data, Data, Data
From Big Data to Big Graphs
Why Giraph?
Giraph and the Hadoop Ecosystem
Giraph and Other Graph-Processing Tools
Summary
Chapter 2: Modeling Graph Processing Use Cases
Graphs Are Everywhere
Modeling a Computer Network with a Simple, Undirected Graph
Modeling a Social Network and Relationships
Modeling Semantic Graphs with Multigraphs
Modeling Street Maps with Graphs and Weights
Comparing Online and Offline Computations
Fitting Giraph in an Application
Giraph at a Web-Search Company
Giraph at an E-Commerce Company
Giraph at an Online Social Networking Company
Summary
Chapter 3: The Giraph Programming Model
Simplifying Large-Scale Graph Processing
Hiding the Complexity of Parallel, Distributed Computing
Programming through a Graph-Specific Model Based on Iterations
A Vertex-centric Perspective
The Giraph Data Model
A Computation Based on Messages and Supersteps
Reducing Messages with a Combiner
Computing Global Functions with Aggregators
The Anatomy of a Giraph Computation
Computing In-Out-Degrees
Converting a Directed Graph to Undirected
Understanding the Bulk Synchronous Parallel Model
Summary
Chapter 4: Giraph Algorithmic Building Blocks
Designing Graph Algorithms That Scale
Exploring Connectivity
Computing Shortest Paths
Computing Connected Components
Ranking Important Vertices with PageRank
Ranking Web Pages
PageRank
Predicting Ratings to Compute Recommendations
Modeling Ratings with Graphs and Latent Vectors
Minimizing Prediction Error
Identifying Communities with Label Propagation
Characterizing Types of Graphs and Networks
Summary
Part II: Giraph Overview
Chapter 5: Working with Giraph
“Hello World” in Giraph
Defining the Twitter Followership Graph
Creating Your First Graph Application
Launching Your Application
Counting the Number of Twitter Connections
Turning Twitter into Facebook
Changing the Graph Structure
Sending and Combining Multiple Messages
Unit-Testing Your Giraph Application
Beyond a Single Vertex View: Enabling Global Computations
Using Aggregators
Aggregators and Master Compute
A Real-World Example: Shortest Path Finder
Summary
Chapter 6: Giraph Architecture
Genesis of Giraph
Giraph Building Blocks and Concepts
Masters
Workers
Coordinators
Bootstrapping Giraph Services
Anatomy of Giraph Services
Master Services
Worker Services
Coordination Services
Fault Tolerance
Disk Failure
Node Failure
Network Failure
Summary
Chapter 7: Graph IO Formats
Graph Representations
Input Formats
Vertex-Based Input Formats
Edge-Based Input Formats
Combining Input Formats
Input Filters
Output Formats
Vertex-Based Output Formats
Edge-Based Output Formats
Aggregator Writers
Summary
Chapter 8: Beyond the Basic API
Graph Mutations
The Mutation API
Direct Mutations
Mutation Requests
Mutation Through Messages
Resolving Mutation Conflicts
The Aggregator API
Centralized Algorithm Coordination
Halting the Computation
Using Aggregators for Coordination
Writing Modular Applications
Structuring an Algorithm into Phases
The Composable API
Summary
Part III: Advanced Topics
Chapter 9: Exposing Parallelism in Giraph
Worker Computations
Use case: Sharing Data Across a Worker
Use Case: Per-Worker Performance Statistics
Thread Safety in Giraph
Controlling Graph Partitioning
The Importance of Partitioning
Implementing Custom Partitioners
Partition Balancing
Summary
Chapter 10: Advanced IO
Accessing Data in Hive
Reading Input Data
Writing Output Data
Accessing Data in Gora
Reading Input Data
Writing Output Data
Summary
Chapter 11: Tuning Giraph
Key Giraph Performance Factors
Giraph’s Requirements for Hadoop
Hardware-related Choices
Job-related Choices
Tuning Your Data Structures
The OutEdges Interface
The MessageStore Interface
Going Out-of-Core
Out-of-Core Graph
Out-of-Core Messages
Giraph Parameters
Summary
Chapter 12: Giraph in the Cloud
A Quick Introduction to Cloud Computing
Giraph on the Amazon Web Services Cloud
Before You Begin
Creating Your First Cluster on the Amazon Cloud
The Building Blocks of an EMR Cluster
The Composition of an EMR Cluster: Instance Groups
Deploying Giraph Applications onto an EMR Cluster
EMR Cluster Data Processing Steps
When Things Go Wrong: Debugging EMR Clusters
Where’s My Stuff? Data Migration to and from EMR Clusters
Putting It All Together: Ephemeral Graph Processing EMR Clusters
Getting the Most Bang for the Buck: Amazon EMR Spot Instances
One Size Doesn’t Fit All: Fine-Tuning Your EMR Clusters
Summary
Appendix A: Install and Configure Giraph and Hadoop
System Requirements
Hadoop Installation
Giraph Installation
Installing the Binary Release of Giraph
Installing Giraph As Part of a Packaged Hadoop Distribution
Installing Giraph by Building from Source Code
Fundamentals of Hadoop and Hadoop Ecosystem Projects Configuration
Configuring Giraph
Configuring Hadoop
Configuring Hadoop in Pseudo-Distributed Mode
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Chapter 8: Beyond the Basic API
Next
Next Chapter
Chapter 9: Exposing Parallelism in Giraph
PART III
Advanced Topics
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset