Home Page Icon
Home Page
Table of Contents for
Data Pipelines Pocket Reference
Close
Data Pipelines Pocket Reference
by James Densmore
Data Pipelines Pocket Reference
Preface
Who This Book Is For
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
1. Introduction to Data Pipelines
What Are Data Pipelines?
Who Builds Data Pipelines?
SQL and Data Warehousing Fundamentals
Python and/or Java
Distributed Computing
Basic System Administration
A Business Goal Mentality
Why Build Data Pipelines?
How Are Pipelines Built?
2. A Modern Data Infrastructure
Diversity of Data Sources
Source System Ownership
Ingestion Interface and Data Structure
Data volume
Data Cleanliness and Validity
Latency and Bandwidth of the Source System
Cloud Data Warehouses and Data Lakes
Data Ingestion Tools
Data Transformation and Modeling Tools
Workflow Orchestration Platforms
Directed Acyclic Graphs
Customizing Your Data infrastructure
3. Common Data Pipeline Patterns
ETL and ELT
The Emergence of ELT over ETL
EtLT Subpattern
ELT for Data Analysis
ELT for Data Science
ELT for Data Products
4. Data Ingestion
Setting Up Your Python Environment
Setting Up Cloud File Storage
Configuring an Amazon Redshift Warehouse as a Destination
Configuring a Snowflake Warehouse as a Destination
Extracting Data from a MySQL Database
Full or Incremental MySQL Table Extraction
Binary Log Replication of MySQL Data
Extracting Data from a Postgres Database
Full or Incremental Postgres Table Extraction
Replicating Data Using the Write Ahead Log
Extracting Data from MongoDB
Extracting Data from a REST API
Streaming Data Ingestions with Kafka and Debezium
Loading Data into a Redshift Data Warehouse
Loading Raw Data Stored in CSV Files
Incremental vs Full Loads
Loading Data Extracted from a Change Data Capture Log
Loading Data into a Snowflake Data Warehouse
Using Your File Storage as a Data Lake
Open Source Frameworks
Commercial Alternatives
5. Transforming Data
Non-Contextual Transformations
Deduplicating Records in a Table
Parsing URLs
When to Transform? During or After Ingestion?
Data Modeling Foundations
Key Data Modeling Terms
Modeling Fully Refreshed Data
Slowly Changing Dimensions for Fully Refreshed Data
Modeling Incrementally Ingested Data
Modeling Append Only Data
Modeling Change Capture Data
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Data Pipelines Pocket Reference
Data Pipelines Pocket Reference
Pocket Reference/Guide
James Densmore
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset