0%

Book Description

Perform fast interactive SQL analytics against different data sources using the Presto distributed SQL query engine. With this practical book, you’ll learn how to conduct analytics on data where it lives, including Hive, Cassandra, relational databases, and proprietary data stores. Matt Fuller from Starburst Data and Presto cocreator Martin Traverso show analysts how to manage, use, and even develop with Presto.

Initially developed by Facebook, open source Presto is now used by Netflix, Airbnb, LinkedIn, Twitter, Uber, and many other companies. You’ll learn how a single Presto query can combine data from multiple sources to allow for analytics across your entire organization.

This book will help you:

  • Get started using Presto
  • Explore Presto architectural concepts
  • Learn best practices and tuning
  • Use Presto with various business intelligence and SQL analytical tools
  • Query data from different data sources, including query federation
  • Learn how to use Presto on Amazon Web Services, Microsoft Azure, and Google Cloud Platform

With Early Release ebooks, you get books in their earliest form—the author's raw and unedited content as he or she writes—so you can take advantage of these technologies long before the official release of these titles.

Table of Contents

  1. 1. Getting Started
    1. Downloading and Using Presto
      1. Using Presto in a Docker Container
      2. Detailed Installation
    2. Presto Command Line Interface
      1. Presto Statements
      2. Presto CLI Options
      3. Executing Queries via the Presto CLI
      4. Using Presto on a Cluster of Machines
      5. Other Installation Methods
      6. Cloud Installation
  2. 2. Data Organization in Hive Connector
    1. Overview
      1. Apache Hadoop and Hive
      2. Hive Connector for Presto
      3. Hive-Style Table Format
    2. File Formats and Compression
      1. Summary
  3. 3. Presto Architecture
    1. Presto Distributed Components
      1. Presto Coordinator
      2. Presto Worker
    2. Presto Connector Based Architecture
      1. Catalogs, Schemas and Tables
    3. Query Execution Model
  4. 4. Data Definition
    1. Overview
      1. Catalogs
      2. Schema Definition
      3. Table Definitions
      4. DDL with Connectors
      5. Data Types
  5. 5. Queries in Presto
    1. SELECT Statement Basics
      1. WHERE Clause
      2. GROUP BY and HAVING Clauses
      3. ORDER BY and LIMIT Clauses
      4. Joins
      5. UNION, INTERSECT, and EXCEPT Clauses
      6. Grouping Operations
      7. WITH Clause
      8. Subqueries
      9. Quantified Subquery
      10. Unnesting Complex Data Types in Presto
      11. Information Schema
      12. ANSI SQL compliance
  6. 6. Functions and Operators
    1. Scalar Functions and Operators
      1. Operators
      2. Boolean Operators
      3. [NOT] BETWEEN
      4. IS [NOT] NULL
      5. Mathematical Functions and Operators
      6. String Functions and Operators
      7. Unicode
      8. Regular Expressions
    2. JSON Functions
    3. Date and Time Functions and Operators
      1. Temporal Operators
      2. Time and Date Parsing
      3. Temporal Functions
    4. Aggregate Functions
      1. Map Aggregate Functions
    5. Approximate Aggregate Functions
  7. 7. Clients and Tools
    1. Presto Command Line Interface
    2. Presto Statements
      1. Examples
    3. Presto CLI Options
      1. Presto CLI Pager
      2. Presto CLI Hi story
      3. Additional Diagnostics
    4. Executing Queries via the Presto CLI
      1. Output Formats
      2. Ignoring Errors
    5. Presto JDBC and ODBC Drivers
      1. Presto JDBC Driver
      2. Using the JDBC Driver in your Java Code
      3. Prepared Statements in Presto
      4. Presto ODBC Driver
    6. Using Microsoft Excel and Presto
    7. Presto’s RESTful API
  8. 8. Connectors
    1. Distributed Storage
    2. Relational Database Management Systems
      1. Query Pushdown
      2. Parallelism and Concurrency
      3. Presto RDBMS Connectors
      4. Security
    3. Non-Relational Data Sources
      1. Key Value Stores
      2. Streaming Systems and Document Stores
      3. Other Connectors
  9. 9. Federated Queries
    1. Query Federation in Presto
      1. Extract, Transform, Load
      2. Federated Architecture
      3. Security in Federation
  10. 10. Presto and Amazon Web Services
    1. Presto Architecture on AWS
      1. Presto on AWS EC2
      2. Presto Solutions on AWS
      3. Starburst
      4. Amazon EMR
      5. Persisting Data and Metadata
      6. AWS Glue Data Catalog
      7. AWS S3 Consistency
      8. AWS S3 Select Pushdown
      9. Cost Optimization on AWS
      10. Graceful Shutdown
      11. Amazon Athena
  11. 11. Security
    1. Authentication
      1. Password and LDAP Authentication
    2. Authorization
      1. Connector Access Control
      2. System Access Control
    3. Encryption
      1. Encrypting Presto Client to Coordinator Communication
        1. Creating Java Keystore Java Truststore
        2. Encrypting Inter Presto Cluster Communication
        3. Certificate Authority vs. Self-Signed Certificates
      2. Hive Connector Security
        1. Hive Connector Authentication