Index
Symbols
- ! (exclamation mark), lines in Jupyter Notebook preceded by, Notebooks on Google Cloud Platform
- ! (logical negation) operator, Comparisons
- != (not-equals) comparison operator, Comparisons
- # (pound sign), comments beginning with, Retrieving Rows by Using SELECT
- $ (dollar sign), end of string matching in regular expressions, Regular Expressions
- % (percent sign)
- %%bigquery Magics (see Jupyter)
- & (bitwise AND) operator, Numeric Types and Functions, Comparisons
- () (parentheses)
- , (comma)
- - (hyphen), escaping in dataset name, Retrieving Rows by Using SELECT
- -- (double dash), comments beginning with, Retrieving Rows by Using SELECT
- ; (semicolon) separating statements in a script, A sequence of statements
- <, <=, >, >=, and != (or <>) comparison operators, Comparisons
- << (bitwise) operator, Numeric Types and Functions
- <> (not-equals) comparison operator, Comparisons
- >> (bitwise) operator, Numeric Types and Functions
- ? (question mark), in positional parameters, Positional parameters
- ?: (capture group) in regular expressions, Regular Expressions
- @ (at symbol), marking named parameters, Named parameters
- @run_date parameter, Named timestamp parameters
- @run_time parameter, Named timestamp parameters
- [] (square brackets), array operator, A Brief Primer on Arrays and Structs
- d matching digits in regular expressions, Regular Expressions
- s matching spaces in regular expressions, Regular Expressions
- `` (backticks), escape character in dataset name, Retrieving Rows by Using SELECT
- | (bitwise OR) operator, Numeric Types and Functions, Comparisons
- ʌ (caret), beginning of string matching in regular expressions, Regular Expressions
A
- access control
- access tokens, Table manipulation
- Access Transparency program, Access transparency
- ACID operations with BigQuery, Managed Storage
- admin role, Predefined roles
- administering BigQuery, Administering and Securing BigQuery, Administering BigQuery-Stackdriver monitoring and audit logging
- authorizing users, Authorizing Users
- availability, disaster recovery, and encryption, Availability, Disaster Recovery, and Encryption-Customer-Managed Encryption Keys
- continuous integration/continuous deployment, Continuous Integration/Continuous Deployment-Cost/Billing Exports
- cost/billing exports, Cost/Billing Exports-Dashboards, Monitoring, and Audit Logging
- dashboards, monitoring, and audit logging, Dashboards, Monitoring, and Audit Logging-Stackdriver monitoring and audit logging
- job management, Job Management
- regulatory compliance, Regulatory Compliance-Data Exfiltration Protection
- restoring deleted records and tables, Restoring Deleted Records and Tables
- Advanced Encryption Standard (AES-256), Infrastructure Security
- advanced queries (see queries)
- aggregates, Aggregates-A Brief Primer on Arrays and Structs
- aggregation functions, Numeric Types and Functions
- aggregations
- AI (artificial intelligence)
- aliasing
- allAuthenticatedUsers, access for, Identity
- Alpega Group, use of BigQuery, Data Processing Architectures
- ALTER TABLE SET OPTIONS statement, Data Management (DDL and DML), Labels and tags, Changing options
- analytic functions, Numeric Types and Functions, Window Functions
- (see also window functions)
- analytic window functions, Window Functions
- (see also window functions)
- analytics
- AND condition, combining categorical features into, Human insights and auxiliary data
- AND keyword, Filtering with WHERE
- ANY_VALUE, GIS Measures
- Apache Beam, Writing a Dataflow job, Using the Streaming API directly, Cloud Dataflow
- Apache Hive, loading and querying Hive partitions, Loading and querying Hive partitions
- Apache Spark, MapReduce Framework
- API gateway infrastructure, secured global, Infrastructure Security
- APIs
- application-default credentials, Table manipulation
- APPROX_* functions
- Apps Scripts client library, Incorporating BigQuery Data into Google Slides (in G Suite)
- architecture of BigQuery, Architecture of BigQuery-Summary
- arithmetic operations
- ARRAY type, Creating Arrays by Using ARRAY_AGG, Data Types, Functions, and Operators, Summary
- arrays, A Brief Primer on Arrays and Structs-Joining Tables
- adding entry using DML UPDATE, Updating row values
- ambiguities in Standard SQL, Advanced SQL
- ARRAY of STRUCT, Array of STRUCT
- ARRAY of tuples or anonymous struct, TUPLE
- array parameters, Array and struct parameters
- BigQuery support for, Powerful Analytics
- converting to structs for hybrid recommendation model, Training hybrid recommendation model
- creating using ARRAY type and ARRAY_AGG function, Creating Arrays by Using ARRAY_AGG
- experimenting with, A Brief Primer on Arrays and Structs
- finding length of and retrieving individual items, Working with Arrays
- in a script, Anatomy of a simple script
- NULL elements in, Creating Arrays by Using ARRAY_AGG
- storing data as arrays of structs, Storing data as arrays of structs-Storing data as arrays of structs
- string representations, Internationalization
- unnesting, UNNEST an Array
- working with, in advanced SQL, Working with Arrays-Window Functions
- ARRAY_AGG function, Creating Arrays by Using ARRAY_AGG, Numbering functions
- ARRAY_CONCAT function, Array functions
- ARRAY_LENGTH function, Working with Arrays, Using arrays for generating data
- ARRAY_TO_STRING function, Array functions
- artificial intelligence (see AI; machine learning)
- AS statement, aliasing column names with, Aliasing Column Names with AS-Filtering with WHERE
- audit logging, Stackdriver monitoring and audit logging
- authorization tokens, Step 1: HTTP POST
- authorized views, Authorized views
- authorizing users, Authorizing Users
- AUTO partitioning mode, Loading and querying Hive partitions
- AutoML, Bulk reads using BigQuery Storage API
- auxiliary data for regression model, Human insights and auxiliary data
- availability, Availability, Disaster Recovery, and Encryption-Regional failures
- availability zones, Storage Data
- averages, computing, Named timestamp parameters, Aggregate analytic functions
- AVG function, Computing Aggregates by Using GROUP BY, Creating Arrays by Using ARRAY_AGG, Aggregate analytic functions
- Avro files, ETL, EL, and ELT
B
- backups, Durability, Backups, and Disaster Recovery
- bag of words, Unstructured data
- balancing classes in machine learning, Balancing Classes
- bandwidth, dynamic provisioning with BigQuery networking infrastructure, Storage and Networking Infrastructure
- bash
- batch data, ingest of, support by BigQuery, Powerful Analytics
- BATCH job priority, Batch Queries
- batch queries, Batch Queries
- Beam (see Apache Beam)
- BI Engine, accelerating queries with, Accelerating queries with BI Engine
- BigQuery
- bigquery library from CRAN, Working with BigQuery from R
- BigQuery Mate, Estimating per-query cost
- .bigqueryrc file, Executing Queries
- BigQueryReader, TensorFlow’s BigQueryReader
- binary classification problems, Classification, Summary of model types
- bitwise operations (<< and >>), Numeric Types and Functions
- BOOL type, Data Types, Functions, and Operators, Summary
- Boolean expressions
- Booleans, Working with BOOL-String Functions
- boosted decision trees, Gradient-boosted trees
- boosted_tree_classifier model type, Training
- boosted_tree_regressor model type, Gradient-boosted trees
- Borg container management system, Worker Shard
- bq command-line tool, Loading from a Local Source, Bash Scripting with BigQuery
- adding a label to a dataset, Labels and tags
- --batch flag, Batch Queries
- bq extract command, Extracting data
- bq load command, Loading and inserting data
- bq query command, Executing Queries
- bq wait, Copying datasets
- checking if a dataset exists with bq ls, Checking whether a dataset exists
- copying datasets using bq cp, Copying datasets
- copying tables using bq cp, Data Management (DDL and DML)
- creating a dataset in a different project with bq mk, Creating a dataset in a different project
- creating a table with bq mk --table, Creating a table
- creating a transfer job, Create a transfer job
- creating datasets using bq mk and specifying the location, Creating Datasets and Tables
- creating table definition using bq mkdef, How to Use Federated Queries
- deleting a table or view as a whole, Data Management (DDL and DML)
- --dry_run option, Estimating per-query cost
- examining information from query statistics, Scan-filter-count query
- initiating cross-region dataset copy via bq mk, Cross-region dataset copy
- listing BigQuery objects with bq ls, BigQuery Objects
- making external table definition with bq mk, How to Use Federated Queries
- previewing a table using bq head, Previewing data
- showing details of BigQuery objects with bq show, Showing details
- specifying Hive partition mode to bq load, Loading and querying Hive partitions
- SQL dialect used by, Executing Queries
- updating details of tables, datasets, and other objects with bq update, Updating
- using wildcards in file path for bq mkdef and bq load, Wildcards
- BREAK statement, Looping
- broadcast JOIN query, Broadcast JOIN query-Broadcast JOIN query
- broadcast joins, Broadcast JOIN query
- bucketizing variables, Human insights and auxiliary data
- bulk reads, using BigQuery Storage API, Bulk reads using BigQuery Storage API
- business intelligence (BI) tools, using on data held in BigQuery, Powerful Analytics
- BY HASH directive, Stage 0
- BYTES type, Data Types, Functions, and Operators, Summary
- BYTE_LENGTH function, Internationalization
C
- caching
- calendar, extracting parts from timestamps, Extracting Calendar Parts
- cannibalization, What’s Being Clustered?
- Capacitor, Storage Data, Storage format: Capacitor-Storage format: Capacitor
- cardinality, Storage format: Capacitor
- casting
- cast as bytes, Internationalization
- DATETIME to TIMESTAMP, Date, Time, and DateTime
- of Booleans, using COUNTIF to avoid, Using COUNTIF to Avoid Casting Booleans
- of strings to FLOAT64, Loading from a Local Source
- requiring explicit use of CAST function, Casting and Coercion
- string as INT64 or FLOAT64 to parse it, using CAST function, Printing and Parsing
- categorical_weights, Examining Model Weights
- centroid of an aggregate of geometries, Geometry transformations and aggregations
- charts, Saving query results to pandas
- CHAR_LENGTH function, Internationalization
- classification, Classification
- client API functions and SQL alternatives, Table manipulation
- client libraries, Summary
- Cloud AI Platform (CAIP), Deep Neural Networks
- Cloud Bigtable, SQL Queries on Data in Cloud Bigtable-Improving performance
- Cloud Catalog, Integration with Google Cloud Platform
- Cloud Client API (Python), Parameterized Queries
- Cloud Composer, Integration with Google Cloud Platform, Loading from a Local Source
- Cloud Console
- Cloud Data Labeling Service, Clustering
- Cloud Dataflow, Bulk reads using BigQuery Storage API
- Cloud Dataproc, Integration with Google Cloud Platform, Bulk reads using BigQuery Storage API
- Cloud Functions, Integration with Google Cloud Platform, Loading from a Local Source
- Cloud Natural Language, Unstructured data
- Cloud Pub/Sub, Minimizing Network Overhead
- using for streaming inserts into BigQuery, File Loads
- Cloud Scheduler, Integration with Google Cloud Platform
- Cloud Shell
- Cloud Vision API, Unstructured data
- clustering, Clustering
- clustering (in machine learning), Clustering
- clustering ratio, Reclustering
- COALESCE function, using to evaluate expressions until non-NULL value is obtained, Cleaner NULL-Handling with COALESCE
- coalesce stage, Broadcast JOIN query
- coercion, Casting and Coercion
- Coldline Storage, Setting up life cycle management on staging buckets
- Colossus File System, Storage and Networking Infrastructure, Step 5: Returning the query results, Storage Data
- column stores, How BigQuery Came About
- column-oriented stores, Storage format: Capacitor, Clustering
- columnar files, Loading Data Efficiently
- columnar storage formats
- comma cross joins, CROSS JOIN
- comments, lines beginning with -- or #, Retrieving Rows by Using SELECT
- committed state (storage sets), Storage sets
- community-developed, open source UDFs, Public UDFs
- comparisons
- compliance, Security and Compliance
- (see also regulatory compliance)
- compression of files, Loading from a Local Source
- computation, moving to the data, How BigQuery Came About
- compute
- compute_fit method, Cloud Dataflow
- CONCAT function, String Functions, String Manipulation Functions, Building queries dynamically
- concatenation of arrays, Array functions
- conda environment for Jupyter, Working with BigQuery from R
- conditional expressions, Conditional Expressions
- constants, defining, Defining constants
- container management system (Borg), Worker Shard
- CONTINUE statement, Looping
- continuous integration/continuous deployment (CI/CD), Continuous Integration/Continuous Deployment-Cost/Billing Exports
- correlated CROSS JOINs, Using arrays for generating data
- correlated subqueries, Correlated subquery
- correlation coefficients, Correlation
- correlation, functions for, Correlation
- costs
- COUNT function, counting records with, Counting Records by Using COUNT
- COUNTIF function, using to avoid casting Booleans, Using COUNTIF to Avoid Casting Booleans
- COUNT_STAR operator, Stage 0
- CRAN, bigquery library from, Working with BigQuery from R
- CREATE FUNCTION IF NOT EXISTS, Persistent UDFs
- CREATE FUNCTION statements, Persistent UDFs
- CREATE IF NOT EXISTS statement, Setting up destination table
- CREATE MODEL statement, Training and Evaluating the Model
- CREATE OR REPLACE FUNCTION, Persistent UDFs
- CREATE OR REPLACE PROCEDURE statement, Stored procedures
- CREATE OR REPLACE TABLE statement, Setting up destination table
- CREATE TABLE AS SELECT statement, Data Management (DDL and DML), Step 5: Returning the query results
- CREATE TABLE statement, Copying into a New Table, Setting up destination table
- CreateDisposition and WriteDisposition, controlling load of pandas DataFrame, Loading a pandas DataFrame
- CROSS JOIN statement, CROSS JOIN, Using arrays for generating data
- cross-entropy loss measure in classification, Evaluation
- cross-region dataset copies, Cross-region dataset copy
- cross-selling of product groups, improving, What’s Being Clustered?
- CRUD operations
- crypto-shredding, Crypto-shredding
- cryptography
- CSV files, ETL, EL, and ELT, Loading from a Local Source
- curl utility
- CURRENT_TIMESTAMP function, Query History and Caching, Parsing and Formatting Timestamps
- custom roles, Custom roles
- customer information, security of, Infrastructure Security
- customer segmentation, What’s Being Clustered?
- customer targeting, Summary of model types, What’s Being Clustered?, Customer targeting
- Customer-Managed Encryption Keys (CMEK), Customer-Managed Encryption Keys, CMEK
D
- dashboards, tables accessed from, using BI Engine with, Accelerating queries with BI Engine
- data
- Data Catalog, searching for tables with specific label, Creating a table
- Data Definition Language (DDL), DDL-DML
- data exfiltration protection, Data Exfiltration Protection
- data locality, Data Locality
- data loss prevention, Data Loss Prevention-Data Loss Prevention
- data management (DDL and DML), Data Management (DDL and DML)-Data Management (DDL and DML)
- Data Manipulation Language (DML), DML, Caching the Results of Previous Queries, DML-MERGE statement
- BigQuery and very-high-frequency DML updates, DML
- deleting rows with DELETE WHERE, Deleting rows
- INSERT SELECT, Insert SELECT
- INSERT VALUES, Insert VALUES
- INSERT VALUES with subquery SELECT, Insert VALUES with subquery SELECT
- MERGE statement, MERGE statement
- removing all transactions related to a single individual, DML
- statements, Data Management (DDL and DML)
- statements forcing a recluster, Reclustering
- support by BigQuery, How BigQuery Came About
- updating row values, Updating row values
- data marketplace, How BigQuery Came About
- data processing architectures, Data Processing Architectures-BigQuery: A Serverless, Distributed SQL Engine
- data science tools, accessing BigQuery from, Accessing BigQuery from Data Science Tools-Incorporating BigQuery Data into Google Slides (in G Suite)
- Cloud Dataflow, Cloud Dataflow-Cloud Dataflow
- incorporating BigQuery data into Google Slides, Incorporating BigQuery Data into Google Slides (in G Suite)-Incorporating BigQuery Data into Google Slides (in G Suite)
- JDBC/ODBC drivers, JDBC/ODBC drivers
- notebooks on Google Cloud Platform, Notebooks on Google Cloud Platform-Working with BigQuery, pandas, and Jupyter
- working with BigQuery from R, Working with BigQuery from R-Cloud Dataflow
- working with BigQuery, pandas, and Jupyter, Working with BigQuery, pandas, and Jupyter-Working with BigQuery, pandas, and Jupyter
- Data Sheets (BigQuery), Exploring BigQuery tables as a data sheet in Google Sheets
- data skew, Data skew
- data split, controlling in BigQuery ML, Controlling Data Split
- Data Studio, Integration with Google Cloud Platform
- Data Transfer Service (BigQuery), Data Transfer Service-Cross-region dataset copy, Data Migration Methods
- data types, Data Types, Functions, and Operators-Summary
- Booleans, working with, Working with BOOL-String Functions
- geographic, Geographic types
- Geography functions, Working with GIS Functions
- numeric types and functions, Numeric Types and Functions-Precise Decimal Calculations with NUMERIC
- strings and string functions, String Functions-Working with TIMESTAMP
- strongly typed managed storage with BigQuery, Managed Storage
- supported by BigQuery, Data Types, Functions, and Operators
- TIMESTAMP, working with, Working with TIMESTAMP-Date, Time, and DateTime
- data warehouses
- data-driven decisions, making with k-means clustering, Data-Driven Decisions
- dataEditor role, Predefined roles
- DataFrames (see pandas)
- dataOwner role, Predefined roles
- datasets, Metadata
- access to, on BigQuery, Loading from a Local Source
- checking if a dataset exists with bq ls, Checking whether a dataset exists
- copying using bq cp, Copying datasets
- creating in a different project, Creating a dataset in a different project
- creating to load into BigQuery, Loading from a Local Source
- creating using bq mk, Creating Datasets and Tables
- creating using Google Cloud Client Library, Creating a dataset
- cross-region dataset copy via Data Transfer Service, Cross-region dataset copy
- deleting a dataset using Google Cloud Client Library, Deleting a dataset
- deriving insights across, Deriving Insights Across Datasets
- determining those involved in query requests, Step 2: Routing
- information on, using Google Cloud Client Library, Dataset information
- joining Google Sheets data with dataset in BigQuery, Joining Sheets data with a large dataset in BigQuery
- manipulation through HTTP request to BigQuery REST API URL, Dataset manipulation
- manipulation via Google Cloud Client library for BigQuery, Dataset manipulation
- modifying attributes using Google Cloud Client Library, Modifying attributes of a dataset
- names of, Retrieving Rows by Using SELECT
- names, key components of, Retrieving Rows by Using SELECT
- permissions to access, Predefined roles
- primitive roles providing access to, Primitive roles
- providing for Identity and Access Management (IAM), Retrieving Rows by Using SELECT
- training dataset for regression model, creating, Creating a Training Dataset
- dataViewer role, Predefined roles, Resource
- DATE type, Date, Time, and DateTime, Summary
- dates and time, working with timestamps, Working with TIMESTAMP-Date, Time, and DateTime
- DATETIME type, Data Types, Functions, and Operators, Date, Time, and DateTime, Summary
- Davies-Bouldin index, Hyperparameter tuning using scripting
- decision trees, Gradient-boosted trees
- Deep Learning Virtual Machine, Notebooks on Google Cloud Platform
- deep neural networks, Deep Neural Networks-Deep Neural Networks
- DELETE statement, Data Management (DDL and DML), DML, DML
- deletions
- denormalization, Denormalization, Joining with precomputed values
- DENSE_RANK function, Numbering functions
- descriptive analytics, powerful, performing with BigQuery, Powerful Analytics
- developing with BigQuery, Developing with BigQuery-Summary
- dictionary encoding, Storage format: Capacitor
- disaster recovery, Durability, Backups, and Disaster Recovery
- disks
- DISTINCT, finding unique values with, Finding Unique Values by Using DISTINCT
- division
- DNN (see deep neural networks)
- dnn_classifier model type, Training
- dnn_regressor model, Deep Neural Networks, Training hybrid recommendation model
- draining the zone, Zonal failures
- drains or failovers of compute clusters, Step 3: Job Server
- Dremel (SQL engine), How BigQuery Came About, Step 4: Query engine
- Dremel query engine, Query Engine (Dremel)-Hash join query
- DROP FUNCTION statement, Persistent UDFs
- DROP TABLE statement, Data Management (DDL and DML)
- --dry_run option, running parameterized queries with, Array and struct parameters
- dry runs for queries, Dry run
- dsinfo object, Dataset information
- durability, Durability, Backups, and Disaster Recovery
- dynamic SQL queries, Building queries dynamically
E
- EL (extract and load), ETL, EL, and ELT
- ELT (extract, load, and transform), ETL, EL, and ELT
- empty tables, Empty table
- encoding (storage), Physical storage: Colossus
- encryption, Simplicity of Management, Privacy and Encryption-Customer-Managed Encryption Keys
- ENDS_WITH function, String Manipulation Functions
- entity extraction, Summary of model types
- envelope encryption, CMEK
- equality, not-equals, using != or <> operator, Comparisons
- erasure encoding, Storage Data, Physical storage: Colossus
- errors, inserting rows into a table, Inserting rows into a table
- etags, Updating a table’s schema
- ETL (extract, transform, and load)
- evaluating machine learning models
- EXCEPT, using with SELECT, SELECT *, EXCEPT, REPLACE
- execution plans (see query plans)
- execution stages (queries), Scan-filter-count query
- EXISTS operator, Using arrays to store repeated fields
- expensive computations, reducing number of, Reducing the number of expensive computations
- experimenting with BigQuery, using sandbox, Estimating per-query cost
- expiration
- cached tables expiring, Caching the Results of Previous Queries
- changing for a table after creating it, Changing options
- for partitions, Partitioned tables
- specifying for partitions, Partitioning, Partitioned tables
- specifying for tables, Loading from a Local Source, Data Management (DDL and DML), Creating Datasets and Tables, Creating a table, Options list
- system event logged when table or partition expires, Stackdriver monitoring and audit logging
- temporary tables holding query results, Query History and Caching
- explicit conversion, Casting and Coercion
- exports of data from BigQuery
- extensions to SQL in BigQuery supporting data analytics, Powerful Analytics
- extensions, invoking in Jupyter Notebook, Notebooks on Google Cloud Platform
- external data sources
F
- failover processes, Step 3: Job Server
- failure handling, BigQuery and Failure Handling-Regional failures
- FARM fingerprint algorithm, Fingerprint function
- feature engineering, Exploring the Dataset to Find Features
- features (in machine learning), Formulating a Machine Learning Problem
- federated queries, ETL, EL, and ELT, Integration with Google Cloud Platform
- file compression, Loading from a Local Source
- file loads, File Loads
- fingerprint function, Fingerprint function
- FIRST_VALUE function, Navigation functions
- FLOAT64 type, Data Types, Functions, and Operators, Summary
- floating-point numbers, standard-compliant floating-point division, Standard-Compliant Floating-Point Division
- folium package, plotting a map with, Working with BigQuery, pandas, and Jupyter
- FORMAT function, Printing and Parsing
- FORMAT_DATE function, Printing and Parsing
- FORMAT_TIMESTAMP function, Printing and Parsing, Parsing and Formatting Timestamps
- FROM clause
- from_items, The JOIN Explained
- functions, Numeric Types and Functions
G
- G Suite, Interactive Exploration and Querying of Data in Google Sheets, Incorporating BigQuery Data into Google Slides (in G Suite)
- Gamma distribution fit, computing parameters of, Cloud Dataflow
- GARBAGE, marking old storage sets as, Storage sets, DML
- gcloud command-line tool, Notebooks on Google Cloud Platform
- GCP (see Google Cloud Platform)
- GCP Cloud Console (see Cloud Console)
- GCS (see Google Cloud Storage)
- generational (storage system), Storage optimization
- Geo Viz (BigQuery), Geometry transformations and aggregations
- Geographic Information Systems (GIS), BigQuery Geographic Information Systems-Geometry transformations and aggregations
- geographic types, Geographic types
- GEOGRAPHY type, Data Types, Functions, and Operators, Working with GIS Functions, Summary
- geohash, Creating Polygons
- GeoJSON geospatial data, Geographic types
- GET requests (HTTP), Table manipulation
- GitHub repository for this book, Table manipulation
- Global Positioning System (GPS), Working with GIS Functions
- Google Apps Script, Incorporating BigQuery Data into Google Slides (in G Suite)
- Google BigQuery (see BigQuery)
- Google Cloud Client Library, Developing Programmatically, Google Cloud Client Library-Parameterized queries, Notebooks on Google Cloud Platform
- browsing rows of a table, Browsing the rows of a table
- copying a table, Copying a table
- creating a dataset, Creating a dataset
- creating an empty table, Creating an empty table
- creating an empty table with schema, Creating an empty table with schema
- dataset information from dsinfo object, Dataset information
- dataset manipulation, Dataset manipulation
- deleting a dataset, Deleting a dataset
- deleting a table, Deleting a table
- extracting data from a table, Extracting data from a table
- inserting rows into a table, Inserting rows into a table
- installing BigQuery client library, Google Cloud Client Library
- instantiating a Client, Google Cloud Client Library
- loading a BigQuery table directly from Google Cloud URI, Loading from a URI
- loading a BigQuery table from a local file, Loading from a local file
- loading a pandas DataFrame, Loading a pandas DataFrame
- modifying attributes of a dataset, Modifying attributes of a dataset
- querying with, Querying-Parameterized queries
- table management with, Table management
- updating a table's schema, Updating a table’s schema
- Google Cloud Data Loss Prevention API, Integration with Google Cloud Platform
- Google Cloud Identity and Access Management (see Identity and Access Management)
- Google Cloud Platform (GCP)
- BigQuery interacting with, using bq tool, Loading from a Local Source
- custom machine learning models in, Custom Machine Learning Models on GCP-Predicting with TensorFlow models
- Google Cloud Storage or Cloud Pub/Sub, Minimizing Network Overhead
- integration of BigQuery with, Integration with Google Cloud Platform
- notebooks on, Notebooks on Google Cloud Platform-Working with BigQuery, pandas, and Jupyter
- Pricing Calculator, Estimating per-query cost
- security features provided by, Administering and Securing BigQuery
- Google Cloud Software Development Kit (SDK), Table manipulation, Bash Scripting with BigQuery
- Google Cloud Storage (GCS), MapReduce Framework, Minimizing Network Overhead
- Google File System (GFS), Physical storage: Colossus
- Google Front-End (GFE) servers, Step 2: Routing
- Google Sheets, When to Use Federated Queries and External Data Sources, Interactive Exploration and Querying of Data in Google Sheets
- Google Slides, incorporating BigQuery data into, Incorporating BigQuery Data into Google Slides (in G Suite)-Incorporating BigQuery Data into Google Slides (in G Suite)
- gradient-boosted trees, Gradient-boosted trees
- Gradle build tool, installing, Measuring Query Speed Using BigQuery Workload Tester
- Gray, Jim, How BigQuery Came About
- GROUP BY
- gsutil cp command, Impact of compression and staging via Google Cloud Storage, Data Migration Methods
- gzip file compression, Loading from a Local Source
H
- Hadoop, MapReduce Framework
- hash algorithms, Hash Algorithms-Summary
- hash join query, Hash join query-Hash join query
- hash joins, Hash join query
- hashes
- about, Stage 0
- BY HASH directive in scan-filter-aggregate query, Stage 0
- HAVING clause, Anatomy of a simple script
- Heartbleed vulnerability, Infrastructure Security
- heredoc syntax in Bash, Querying
- hidden_units, Deep Neural Networks
- history of queries, Query History and Caching
- Hive partitions, loading and querying, Loading and querying Hive partitions
- HLL functions, HLL functions
- HTTP requests
- batching requests ot BigQuery REST API, Batching multiple requests
- BigQuery REST API documentation specifying details of, Dataset manipulation
- DELETE request to BigQuery REST API URL, Dataset manipulation, Table manipulation
- GET request to BigQuery REST API URL, Table manipulation
- GET, POST, PUT, PATCH, and DELETE methods, Dataset manipulation
- getting status of jobId using REST API with GET request, Limitations
- POST request for a query, Step 1: HTTP POST
- POST request to BigQuery REST API URL with JSON request embedded, Querying
- to BigQuery REST API, Accessing BigQuery via the REST API
- HTTPS, Accessing BigQuery via the REST API
- human insights in regression model, Human insights and auxiliary data
- HyperLogLog++ (HLL++) algorithm, HLL functions
- hyperparameter tuning, Hyperparameter Tuning-Hyperparameter tuning using AI Platform
I
- I/O, minimizing for queries, Minimizing I/O-Reducing the number of expensive computations
- Identity and Access Management (IAM), Simplicity of Management, Administering and Securing BigQuery, Identity and Access Management-Resource
- IEEE_Divide function, Standard-Compliant Floating-Point Division
- IF conditions, Looping
- IF function, Conditional Expressions
- IF statement, using on Booleans, Using COUNTIF to Avoid Casting Booleans
- IFNULL function, Cleaner NULL-Handling with COALESCE
- image captioning, Summary of model types
- image classification, Summary of model types
- implicit conversion, Casting and Coercion
- in-memory filesystem, Worker Shard
- (see also Colossus File System)
- increasing query speed, Increasing Query Speed-Optimizing How Data Is Stored and Accessed
- indexes (array), Using arrays for generating data
- indexing, not needed in BigQuery, Simplicity of Management
- infinite loops, avoiding with SQL, How BigQuery Came About
- INFORMATION_SCHEMA view, Table manipulation, Obtaining table properties, Building queries dynamically
- infrastructure provisioning, not needed with BigQuery, Simplicity of Management
- INNER JOIN statement, INNER JOIN, CROSS JOIN
- INSERT SELECT statement, Insert SELECT
- INSERT statement, Data Management (DDL and DML), Step 5: Returning the query results, DML
- INSERT VALUES statement, Data Management (DDL and DML), Insert VALUES
- Institute of Electrical and Electronics Engineers (IEEE), Standard-Compliant Floating-Point Division
- INT64 type, Data Types, Functions, and Operators, Summary
- INTEGER type, detection by AUTO partitioning mode, Loading and querying Hive partitions
- internationalization of strings, Internationalization
- intersection of geography types, Geometry transformations and aggregations
- IS NOT NULL operator, Finding Unique Values by Using DISTINCT
- IS NULL operator, Finding Unique Values by Using DISTINCT
- IS operator
- isolation between jobs, Simplicity of Management
J
- Java Database Connectivity (JDBC), JDBC/ODBC drivers
- JavaScript
- JDBC/ODBC drivers, JDBC/ODBC drivers, Step 5: Returning the query results
- job management, Job Management
- job priority, BATCH, Batch Queries
- job servers, Step 3: Job Server
- JobConfig flags, Loading from a URI
- jobIds, Limitations, Step 5: Returning the query results
- jobUser role, Predefined roles, Resource
- job_config, Parameterized queries
- join+ stage
- joins, Joining Tables-Saving and Sharing
- broadcast and hash, Broadcast JOIN query
- broadcast JOIN query, Broadcast JOIN query-Broadcast JOIN query
- complex, support by BigQuery, Powerful Analytics
- CROSS JOIN, CROSS JOIN
- for cases seeming to require a script, A sequence of statements
- hash join query, Hash join query-Hash join query
- INNER JOIN, INNER JOIN
- JOIN statement, The JOIN Explained
- joining user table and machine learning weights, Creating input features
- OUTER JOIN, OUTER JOIN
- performing efficient joins, Performing Efficient Joins-JOIN versus denormalization
- queries doing JOIN operations, Query Engine (Dremel)
- summary of types of joins and their output, OUTER JOIN
- JSON, ETL, EL, and ELT
- arrays, Creating Arrays by Using ARRAY_AGG
- compressed files, loading into BigQuery, Impact of compression and staging via Google Cloud Storage
- converting arrays to JSON strings, Array functions
- creating JSON strings for dataset schema, Specifying a Schema
- creating table definition of data stored in newline-delimited JSON for Hive partition, Loading and querying Hive partitions
- GeoJSON, Geographic types
- JSON request in body of HTTP POST sent to BigQuery REST API URL, Querying
- JSON/REST interface, Accessing BigQuery via the REST API
- loading files into BigQuery, Loading from a Local Source
- newline-delimited files, extract format using Google Cloud Client Library, Extracting data from a table
- response from HTTP POST request to BigQuery REST API URL, Querying
- transformation of JSON HTTP request to Protobufs, Step 2: Routing
- writing rows to insert into tables as newline-delimited JSON, Loading and inserting data
- Jupiter Networking, Storage and Networking Infrastructure
- Jupyter
- Jupyter Notebooks, Geometry transformations and aggregations
L
- L1 and L2 regularization, Regularization
- labels, Labels and tags
- LAG function, Navigation functions
- LAST_VALUE function, Navigation functions
- layers in deep neural networks, Deep Neural Networks
- LEAD function, Navigation functions
- LEFT JOIN statement, Using arrays for generating data
- LENGTH function, String Functions
- life cycle management on staging buckets, Setting up life cycle management on staging buckets
- LIKE operator, SELECT *, EXCEPT, REPLACE
- LIME (model explainability package), Examining Model Weights
- LIMIT clause, Approximate top
- linear regression models
- lines, Geographic types
- literate programming, Accessing BigQuery from Data Science Tools
- loading data into BigQuery, Loading Data into BigQuery-Summary
- localities for data, Data locality
- localities for datasets, Creating a dataset, Creating Datasets and Tables
- locations
- LOG function, prefixing with SAFE, SAFE Functions
- logical operations, Boolean AND, OR, and NOT, Logical Operations
- logistic regression, Examining Model Weights
- logs, ELT in SQL for experimentation
- longitude and latitude, Geographic types
- LOOP statement, Looping
- looping, Looping
- LOWER function, String Functions
- LPAD function, Transformation Functions
- LTRIM function, Transformation Functions
M
- machine failures, Machine failures
- machine learning, Machine Learning in BigQuery
- AutoML Tables and AutoML Text, creating models from data in BigQuery tables, Integration with Google Cloud Platform
- building a classification model, Building a Classification Model-Choosing the Threshold
- building a regression model, Building a Regression Model-Human insights and auxiliary data
- creating learning models and carrying out batch predictions with BigQuery, Powerful Analytics
- custom models in GCP, Custom Machine Learning Models on GCP-Predicting with TensorFlow models
- customizing BigQuery ML, Customizing BigQuery ML-Regularization
- formulating a problem, Formulating a Machine Learning Problem-Types of Machine Learning Problems
- geographic locations in, Creating Polygons
- Google Cloud Platform APIs integrated with BigQuery, Integration with Google Cloud Platform
- in Google Sheets, automatic chart creation, Exploring BigQuery tables using Sheets
- k-means clustering, k-Means Clustering-Data-Driven Decisions
- recommender systems, Recommender Systems-Training hybrid recommendation model
- supervised, Machine Learning in BigQuery
- types of problems, Types of Machine Learning Problems-Building a Regression Model
- using BigQuery, AutoML
- magic numbers, Defining constants
- Magics, invoking in Jupyter Notebook, Notebooks on Google Cloud Platform
- magnitude or sign of model weights, Examining Model Weights
- managed storage, Managed Storage
- management, simplicity of, using BigQuery, Simplicity of Management
- MapReduce framework, MapReduce Framework
- maps, interactive, creating with folium, Working with BigQuery, pandas, and Jupyter
- MATCHED, NOT MATCHED BY SOURCETARGET, NOT MATCHED BY, MERGE statement
- materialized views
- mathematical functions, Mathematical Functions
- matrix factorization, Matrix Factorization-Matrix Factorization
- matrix_factorization model, What’s Being Clustered?, Matrix Factorization
- MAX function, Navigation functions
- --maximum_bytes_billed option, Estimating per-query cost
- MD5 hashing algorithm, MD5 and SHA
- measuring and troubleshooting queries, Measuring and Troubleshooting-Visualizing the query plan information
- MEDIAN function, user-defined, Public UDFs
- memory
- MERGE statement, Data Management (DDL and DML), DML, Reclustering, Deleting rows, MERGE statement
- metadata, Metadata-Meta-File
- clustering, Clustering
- DML (Data Manipulation Language), DML
- meta-file, Query Master, Meta-File
- metadataViewer role, Predefined roles
- partitioning, Partitioning
- performance optimizations with clustered tables, Performance optimizations with clustered tables
- storage optimization, Storage optimization
- storage sets, Storage sets
- table, Table Metadata-Time travel
- time travel, Time travel
- migration of data, moving on-premises data to Google Cloud Storage, Data Migration Methods
- ML.BUCKETIZE function, Bucketizing the hour of day
- ML.EVALUATE function, Evaluating the model
- ML.FEATURE_CROSS function, Human insights and auxiliary data
- ML.FEATURE_INFO function, Gradient-boosted trees
- ML.PREDICT function, Predicting with the Model, Prediction
- ML.RECOMMEND function, Batch predictions for all users and movies
- ML.WEIGHTS function, Examining Model Weights, Obtaining user and product factors
- models (machine learning)
- monitoring resources using Stackdriver, Stackdriver monitoring and audit logging
- multiclass classification problems, Classification, Summary of model types
- multipart/mixed content type, Batching multiple requests
- multiregions, Zones, Regions, and Multiregions, Regional failures
- multitenant queries, Simplicity of Management
- MySQL, Relational Database Management System
N
- named parameters, Named parameters
- NaN (Not-a-Number), Standard-Compliant Floating-Point Division
- Natural Language API, Unstructured data
- navigation functions, Navigation functions
- Nearline Storage, Setting up life cycle management on staging buckets
- nested fields, Storing data as arrays of structs
- networking
- nodes in deep neural networks, Deep Neural Networks
- nondeterministic behavior, queries exhibiting, Caching the Results of Previous Queries
- NoSQL
- NOT keyword, Filtering with WHERE
- NOT MATCHED BY TARGET or NOT MATCHED BY SOURCE, MERGE statement
- Not-a-Number (see NaN)
- notebooks, Accessing BigQuery from Data Science Tools
- NP-hard problems, Storage format: Capacitor
- NTH_VALUE function, Navigation functions
- NULL values
- cleaner handling with COALESCE, Cleaner NULL-Handling with COALESCE
- CROSS JOIN excluding rows with empty or NULL arrays, Using arrays for generating data
- filtering for in WHERE clause, Finding Unique Values by Using DISTINCT
- in comparisons, Comparisons, Logical Operations
- in dataset CSV filed loaded into BigQuery, Loading from a Local Source
- making scalar functions return, SAFE Functions
- NULL elements in arrays, Creating Arrays by Using ARRAY_AGG
- replacing privacy-suppressed values with, Specifying a Schema
- returning NULL from casting, not an error, Casting and Coercion
- numbering functions, Numbering functions
- NUMERIC type, Data Types, Functions, and Operators, Summary
- numeric types
- numeric_weights, Examining Model Weights
- num_clusters option, Carrying Out Clustering
- num_factors option, Matrix Factorization
O
- OAuth2 tokens, Step 1: HTTP POST
- objects (BigQuery)
- OFFSET function, Using arrays for generating data
- ogr2ogr tool, converting Shapefiles to GeoJSON, Geographic types
- on-demand pricing, Controlling Cost
- online transaction processing (OLTP) databases, relational, Relational Database Management System
- Open Database Connectivity (ODBC), JDBC/ODBC drivers
- operators
- <, <=, >, >=, and != (or <>) comparison operators, Comparisons
- optimization, Optimizing Performance and Cost
- Optimized Row Columnar (ORC) files, Loading Data Efficiently, Storage format: Capacitor
- OPTIONS list
- OR keyword, Filtering with WHERE
- ORDER BY
- ordering, preserving using arrays, Using arrays to preserve ordering
- ORDINAL indexing of arrays, Using arrays for generating data
- OUTER JOIN statement, summary of, OUTER JOIN
- OVER clause, Aggregate analytic functions, Navigation functions
- overfitting, Training
P
- pandas
- parallelization of query execution in BigQuery, Simplicity of Management
- parameterized queries, Parameterized queries, Parameterized Queries-Array and struct parameters
- Parquet files, Storage format: Capacitor
- PARSE_TIMESTAMP function, Parsing and Formatting Timestamps
- parsing strings, Printing and Parsing
- PARTITION BY, Aggregate analytic functions
- partitioning, Partitioning
- partitioning mode, specifying for bq load, Loading and querying Hive partitions
- partitions, Partitioning
- PATTERN variable, Anatomy of a simple script
- Pearson correlation coefficient, Number of bicycles
- Pending state, Storage sets
- per-query costs, Controlling Cost
- performance and cost, optimizing, Optimizing Performance and Cost-Checklist
- permissions, Security and Compliance
- (see also Identity and Access Management)
- for access to user-defined functions, Persistent UDFs
- persistent user-defined functions, Persistent UDFs
- personas, What’s Being Clustered?
- points, Geographic types
- polygons, Geographic types
- positional parameters, Positional parameters
- POST requests (HTTP), Querying, Step 1: HTTP POST
- PostgreSQL, Relational Database Management System
- precision, Choosing the Threshold
- predicate functions (GIS), GIS predicate functions
- predictions, Powerful Analytics
- preprocessing functions
- Pricing Calculator (GCP), Estimating per-query cost
- pricing plans, Controlling Cost
- primitive roles, Primitive roles
- primitives, geographic data in, Geographic types
- printing strings, Printing and Parsing
- privacy and encryption, Privacy and Encryption-Customer-Managed Encryption Keys
- probability threshold, choosing for classification model, Choosing the Threshold
- product features, getting for movies data, Creating input features
- product groups, What’s Being Clustered?
- product recommendations, What’s Being Clustered?
- programmatic development
- accessing BigQuery via Google Cloud Client Library, Google Cloud Client Library-Parameterized queries
- browsing rows of a table, Browsing the rows of a table
- copying a table, Copying a table
- creating a dataset, Creating a dataset
- creating an empty table with schema, Creating an empty table with schema
- creating empty table, Creating an empty table
- dataset information, Dataset information
- dataset manipulation, Dataset manipulation
- deleting a dataset, Deleting a dataset
- deleting a table, Deleting a table
- extracting data from a table, Extracting data from a table
- inserting rows into a table, Inserting rows into a table
- loading a pandas DataFrame, Loading a pandas DataFrame
- loading from a Google Cloud URI, Loading from a URI
- loading from a local file, Loading from a local file
- modifying attributes of a dataset, Modifying attributes of a dataset
- obtaining table properties, Obtaining table properties
- querying, Querying-Parameterized queries
- table management, Table management
- updating a table's schema, Updating a table’s schema
- accessing BigQuery via REST API, Developing Programmatically-Limitations
- programming languages
- project ID, Retrieving Rows by Using SELECT
- projects
- protocol buffers (protobufs), How BigQuery Came About, Step 2: Routing
- public user-defined functions, Public UDFs
- Python
Q
- quantiles, Quantiles
- queries, Query Essentials-Summary, Query Engine (Dremel)
- (see also Dremel query engine)
- advanced, Advanced Queries-Summary
- aggregates, Aggregates-A Brief Primer on Arrays and Structs
- batch, Batch Queries
- executing using bq query and specifying the query, Executing Queries
- execution by Dremel, Query Execution-Hash join query
- joining tables, Joining Tables-Saving and Sharing
- life of a query request, Life of a Query Request-Step 5: Returning the query results
- performance, key drivers of, Key Drivers of Performance
- primer on arrays and structs, A Brief Primer on Arrays and Structs-Joining Tables
- querying BigQuery using Jupyter Magics and saving results to pandas DataFrame, Working with BigQuery, pandas, and Jupyter
- querying with Google Cloud Client Library, Querying-Parameterized queries
- running from Jupyter notebook on GCP
- running within notebooks, Jupyter Magics
- saving and sharing, Saving and Sharing-Summary
- scheduling in BigQuery, Scheduled queries
- simple, Simple Queries-Sorting with ORDER BY
- aliasing column names with AS, Aliasing Column Names with AS-Filtering with WHERE
- filtering SELECT results with WHERE, Filtering with WHERE
- retrieving rows using SELECT, Retrieving Rows by Using SELECT-Retrieving Rows by Using SELECT
- SELECT*, EXCEPT, REPLACE, SELECT *, EXCEPT, REPLACE
- sorting with ORDER BY, Sorting with ORDER BY
- subqueries using WITH, Subqueries with WITH
- query engine, distributed (Dremel), Query Engine (Dremel)-Hash join query
- Query Masters, Step 4: Query engine, Query Master
- query plans, Query Master
- QUERY_TEXT variable, Querying, Executing Queries
- question answering, Summary of model types
R
- r (raw) prefix for string literals, Regular Expressions
- R language, working with BigQuery from, Working with BigQuery from R-Cloud Dataflow
- race conditions, preventing in table schema updates, Updating a table’s schema
- RAND function, Query History and Caching, Random number generator
- random number generator, Random number generator
- RANGE, Aggregate analytic functions
- RANK function, Numbering functions
- readSessionUser role, Predefined roles
- recall, Choosing the Threshold
- reclustering, Reclustering
- recommender systems, Recommender, Summary of model types, Recommender Systems-Training hybrid recommendation model
- record-oriented stores, How BigQuery Came About, Storage format: Capacitor
- Reed-Solomon encoding, Physical storage: Colossus
- (see also erasure encoding)
- REGEXP_CONTAINS function, Regular Expressions
- REGEXP_EXTRACT function, Regular Expressions
- REGEXP_EXTRACT_ALL function, Regular Expressions
- REGEXP_REPLACE function, Regular Expressions
- regions, Zones, Regions, and Multiregions
- regression, Regression, Summary of model types
- regular expressions
- regularization in BigQuery ML, Regularization
- regulatory compliance, Regulatory Compliance-Data Exfiltration Protection
- relational database management systems, Relational Database Management System
- remote procedure call (RPC) interface exposed by worker shards, Worker Shard
- repeated fields, Storing data as arrays of structs
- REPLACE, using with SELECT, SELECT *, EXCEPT, REPLACE
- replicated encoding, Physical storage: Colossus
- reservations, Step 2: Routing
- resources
- REST APIs
- restoring deleted records and tables, Restoring Deleted Records and Tables
- restoring deleted tables, Deleting a table
- restricting access to subsets of data, Restricting Access to Subsets of Data-Dynamic filtering based on user
- reusable queries, Reusable Queries-Defining constants
- REVERSE function, Transformation Functions
- roles, Role-Custom roles
- ROUND function, Mathematical Functions
- ROW_NUMBER function, Limiting large sorts, Numbering functions
- RPAD function, Transformation Functions
- RTRIM function, Transformation Functions
- run-length encoding, Storage format: Capacitor
S
- SAFE functions, SAFE Functions
- sandbox, using to experiment with BigQuery, Estimating per-query cost
- saving queries, Saved Queries
- scalar functions, Numeric Types and Functions
- scalar query parameters, Array and struct parameters
- scan-filter-aggregate query example, Scan-filter-aggregate query-Stage 2
- scan-filter-aggregate query with high cardinality, Scan-filter-aggregate query with high cardinality-Broadcast JOIN query
- scan-filter-count query example, Scan-filter-count query-Stage 1
- scatter plots, drawing in pandas from saved query results, Saving query results to pandas, Working with BigQuery, pandas, and Jupyter
- scheduler, Query Master
- scheduling of queries, Scheduled queries
- schemas
- authoritative schema for managed storage, Managed Storage
- changing to use arrays, Using arrays to store repeated fields
- complex, using JSON file for, Complex schema
- creating empty table with schema, Creating an empty table with schema
- examining details of insert job to ascertain the schema, Troubleshooting Workloads Using Stackdriver
- for dataset tables loaded into BigQuery, Loading from a Local Source
- in external table definitions for CSV and JSON files, Temporary table
- information, Building queries dynamically
- not specifying for Parquet and ORC files, Loading and querying Parquet and ORC
- schema of imported TensorFlow model, Predicting with TensorFlow models
- specifying for dataset loaded into BigQuery, Specifying a Schema-Specifying a Schema
- star schemas applied to clustered tables, Side benefits of clustering
- updating table schema using Google Cloud Client Library, Updating a table’s schema
- scipy package (Python), Cloud Dataflow
- scripting, Scripting-Advanced Functions
- security
- BigQuery features supporting, Simplicity of Management
- Cloud Security Command Center, Cloud Security Command Center
- GCP features providing security for BigQuery, Security and Compliance
- infrastructure provided by public cloud services, Administering and Securing BigQuery
- infrastructure security for BigQuery, Infrastructure Security-Infrastructure Security
- managing access control for BigQuery using IAM, Administering and Securing BigQuery
- managing access control for BigQuery with IAM, Identity and Access Management-Resource
- privacy and encryption, Privacy and Encryption-Customer-Managed Encryption Keys
- verifying effectiveness of, Dashboards, Monitoring, and Audit Logging
- SELECT * ... LIMIT 10, Side benefits of clustering
- SELECT * EXCEPT statement, Be purposeful in SELECT
- SELECT * LIMIT statement, Be purposeful in SELECT
- SELECT * REPLACE statement, Storing data as geography types
- SELECT * statement, selecting all columns in a table, SELECT *, EXCEPT, REPLACE
- SELECT statement, Query Essentials
- being purposeful in, Be purposeful in SELECT
- combining with UNION ALL, A Brief Primer on Arrays and Structs
- conditional expressions using Booleans, Conditional Expressions
- filtering with WHERE clause, Filtering with WHERE
- from UNNEST, UNNEST an Array
- in CREATE OR REPLACE MODEL, data split in, Controlling Data Split
- in WITH clause, Numbering functions
- INSERT VALUES with SELECT subquery, Insert VALUES with subquery SELECT
- leading commas in SELECT clause, Creating Arrays by Using ARRAY_AGG
- limits on results for SELECT queries, Step 5: Returning the query results
- preparing training dataset, Training and Evaluating the Model
- reducing data being read, Reducing data being read
- retrieving rows with, Retrieving Rows by Using SELECT-Retrieving Rows by Using SELECT
- SELECT DISTINCT, Finding Unique Values by Using DISTINCT
- withing a loop, Looping
- self-joins
- sentiment analysis, Summary of model types
- serverless (BigQuery), BigQuery: A Serverless, Distributed SQL Engine
- SESSION_USER function, Dynamic filtering based on user
- SHA hashing algorithms, MD5 and SHA
- Shapefiles, geospatial data in, Geographic types
- shards
- sharing queries
- shuffle sinks, Scheduler, Shuffle
- shuffles, Storage and Networking Infrastructure
- slots in BigQuery, Separation of Compute and Storage, Step 4: Query engine, Worker Shard
- slowly-changing dimensions, The Basics
- Software as a Service (SaaS) applications, loading data into BigQuery, Data Transfer Service
- sorting
- Spanner, Step 5: Returning the query results
- database index (IDX), helping find storage sets within a range, Partitioning
- Spark, MapReduce Framework
- SPLIT function, A Brief Primer on Arrays and Structs
- split points for distributed sort, Distributed sort
- splittable files, Loading from a Local Source
- Spotify, use of BigQuery, Data Processing Architectures
- SQL (Structured Query Language), Relational Database Management System
- advanced, Advanced SQL
- ambiguities of Standard SQL, Advanced SQL
- BigQuery's full-featured support for SQL:2011, Powerful Analytics
- BigQuery, serverless distributed SQL engine, BigQuery: A Serverless, Distributed SQL Engine
- creating string containing SQL to be executed by BigQuery, Querying
- creating tables in, Setting up destination table
- deleting a table or view from BigQuery, Data Management (DDL and DML)
- dialect used in bq command-line tool, Executing Queries
- DML (Data Manipulation Language), DML
- execution by worker shard, Worker Shard
- for computation of data in the cloud, reasons for choosing, How BigQuery Came About
- legacy SQL used by Dremel, Simple Queries
- queries on data in Cloud Bigtable, SQL Queries on Data in Cloud Bigtable-Improving performance
- queries on distributed datasets, Hadoop runningSpark, MapReduce Framework
- SQL/MM 3 specification for spatial functions, Working with GIS Functions
- SQL:2011, BigQuery: A Serverless, Distributed SQL Engine
- standard SQL used by BigQuery, Simple Queries
- support for standard SQL in BigQuery, launch of, How BigQuery Came About
- user-defined functions, SQL User-Defined Functions-Public UDFs
- using instead of client API to access BigQuery programmatically, Table manipulation
- using to automate schema creation, Specifying a Schema
- SQL injection attacks, Parameterized queries
- SSL 3.0 exploit, Infrastructure Security
- SSL/TLS channels, access to API gateway infrastructure, Infrastructure Security
- Stackdriver, Integration with Google Cloud Platform
- standardize_features option, Carrying Out Clustering
- star schemas, Side benefits of clustering
- STARTS_WITH function, String Manipulation Functions
- statistical functions, Useful Statistical Functions-Correlation
- storage, Storage-Meta-File
- BigQuery storage system providing table and file abstractions, How BigQuery Came About
- choosing efficient storage format, Choosing an Efficient Storage Format-Storing data as geography types
- managed, in BigQuery, Managed Storage
- metadata, Metadata-Meta-File
- of intermediate query results, Scheduler
- physical storage in Colossus, Physical storage: Colossus-Physical storage: Colossus
- separation from compute in BigQuery, ETL, EL, and ELT, Separation of Compute and Storage
- storage format, Capacitor, Storage format: Capacitor-Storage format: Capacitor
- storing data as arrays, Working with Arrays
- Storage API (BigQuery), bulk reads using, Bulk reads using BigQuery Storage API
- storage encoding (see encoding)
- storage sets, Storage sets
- stored procedures, Insert VALUES with subquery SELECT
- streaming data
- string functions, String Functions-Working with TIMESTAMP
- STRING type, Data Types, Functions, and Operators, Summary
- strings
- arrays of, Array functions
- casting to FLOAT64, Loading from a Local Source
- creating query doing string formatting, security risks of, Parameterized queries
- explicitly converting to INT64, Casting and Coercion
- geographic data in, Geographic types
- in schema autodetection by BigQuery, Specifying a Schema
- NUMERIC types ingested into BigQuery as strings, Precise Decimal Calculations with NUMERIC
- query provided in, Executing Queries
- representing as array of Unicode characters, array of bytes, or array of Unicode code points, Internationalization
- SPLIT function, A Brief Primer on Arrays and Structs
- STRPOS function, String Functions, String Manipulation Functions
- STRUCT keyword
- STRUCT type, Data Types, Functions, and Operators, Summary
- structures
- ST_AsGeoJSON function, Geographic types
- ST_AsText function, Geographic types
- ST_CENTROID_AGG function, Geometry transformations and aggregations
- ST_Contains function, Working with GIS Functions, GIS predicate functions
- ST_CoveredBy function, GIS predicate functions
- ST_Distance function, GIS Measures
- ST_DWithin function, GIS predicate functions
- ST_GeogFromGeoJSON function, Geographic types
- ST_GeogFromText function, Geographic types
- ST_GeogPoint function, Geographic types
- ST_GeoHash function, Creating Polygons, Human insights and auxiliary data
- ST_Intersects function, GIS predicate functions
- ST_MakeLine function, Creating Polygons
- ST_MakePolygon function, Creating Polygons
- ST_SnapToGrid function, GIS Measures
- ST_UNION function, Geometry transformations and aggregations
- subqueries, Query Engine (Dremel)
- SUBSTR function, String Functions, String Manipulation Functions
- suffixes (table), Antipattern: Table suffixes and wildcards
- SUM function, using NUMERIC type, Precise Decimal Calculations with NUMERIC
- superQuery, Estimating per-query cost
- supervised machine learning, Machine Learning in BigQuery
- SYSTEM_TIME AS OF, Restoring Deleted Records and Tables
T
- table-valued functions, Numeric Types and Functions
- tables, Metadata
- avoiding creation of tables with same name, Deleting a table
- browsing rows using Google Cloud Client Library, Browsing the rows of a table
- clustered, performance optimizations with, Performance optimizations with clustered tables
- copying between datasets using bq cp, Copying datasets
- copying between datasets using Google Cloud Client Library, Copying a table
- creating empty table using Google Cloud Client Library, Creating an empty table
- creating empty table with schema, using Google Cloud Client Library, Creating an empty table with schema
- creating in SQL, Setting up destination table
- creating staging table for updates to apply, DML
- creating with bq mk --table, Creating a table
- creating with complex schema, Complex schema
- deleting a table using Google Cloud Client Library, Deleting a table
- extracting data from using bq extract, Extracting data
- extracting data from, using Google Cloud Client Library, Extracting data from a table
- inserting rows into with bq insert, Loading and inserting data
- inserting rows using Google Cloud Client Library, Inserting rows into a table
- joining, Joining Tables-Saving and Sharing
- management using Google Cloud Client Library, Table management
- manipulating through HTTP requests to BigQuery REST API, Table manipulation
- metadata, Table Metadata-Time travel
- obtaining properties using Google Cloud Client Library, Obtaining table properties
- query results functionally equivalent to, Step 5: Returning the query results
- recovering deleted tables, Restoring Deleted Records and Tables
- structured storage at table level, Managed Storage
- table/view in dataset names, Retrieving Rows by Using SELECT
- updating schema using Google Cloud Client Library, Updating a table’s schema
- tagging
- temporary tables
- TensorFlow, Bulk reads using BigQuery Storage API, Machine Learning in BigQuery, Support for TensorFlow-Predicting with TensorFlow models
- text classification, Summary of model types
- text editors, Specifying a Schema
- text summarization, Summary of model types
- text, Well Known Text (WKT) format for geographic strings, Geographic types
- threshold (probability), choosing for classification model, Choosing the Threshold
- time functions prefixed with SAFE, SAFE Functions
- time travel
- TIME type, Date, Time, and DateTime, Summary
- time utility, Measuring Query Speed Using REST API
- time zones, Parsing and Formatting Timestamps, Date, Time, and DateTime
- time-insensitive use cases, Time-Insensitive Use Cases-File Loads
- TIMESTAMP type, Data Types, Functions, and Operators, Working with TIMESTAMP-Date, Time, and DateTime, Summary
- timestamps
- TIMESTAMP_MILLIS function, Extracting Calendar Parts
- Titan chip, Infrastructure Security
- tools for direct reads from BigQuery Storage API, Bulk reads using BigQuery Storage API
- TO_JSON_STRING function, Specifying a Schema, Array functions
- training datasets, creating for regression model, Creating a Training Dataset
- training models
- Transfer Appliance, Data Migration Methods
- transfers of data into BigQuery, Transfers and Exports-Cross-region dataset copy
- transformations
- TRIM function, Transformation Functions
- tuples, TUPLE
- Twitter, use of BigQuery, Data Processing Architectures
U
- UDFs (see user-defined functions)
- undoing deletions of records and tables, Restoring Deleted Records and Tables
- Unicode strings in BigQuery, Internationalization
- UNION ALL, using with SELECT, A Brief Primer on Arrays and Structs
- union of geography types, Geometry transformations and aggregations
- Unix epoch, number of seconds from, Extracting Calendar Parts
- Unix shell, using bash to get access tokens, Table manipulation
- UNIX_MILLIS function, Extracting Calendar Parts
- UNIX_SECONDS, Aggregate analytic functions
- UNNEST function, A Brief Primer on Arrays and Structs, UNNEST an Array, Storing data as arrays of structs
- unstructured data, Unstructured data, Summary of model types
- UPDATE statement, DML
- updates, BigQuery not designed for very-high-frequency DML updates, DML
- upgrades to BigQuery, BigQuery Upgrades
- URIs
- URLs
- user role, Predefined roles
- user-defined functions, Numeric Types and Functions
- users
- UTF-8 encoding, Internationalization
- UUIDs (universally unique identifiers), UUID
V
- variables
- versions (BigQuery), Accessing BigQuery via the REST API
- views
- Virtual Private Cloud Service Controls (VPC-SC), Security and Compliance, Virtual Private Cloud Service Controls
- visualizations
- drawing scatter plot in pandas from saved query results, Saving query results to pandas, Working with BigQuery, pandas, and Jupyter
- of geospatial data, Geometry transformations and aggregations
- plotting interactive map using Python folium package, Working with BigQuery, pandas, and Jupyter
- visualizing query plan information, Visualizing the query plan information-Visualizing the query plan information
- visualizing the billing report, Visualizing the billing report
W
- web UI (BigQuery)
- weights
- Well Known Text (WKT), Geographic types
- WGS84 ellipsoid, Working with GIS Functions, Geographic types
- What-If tool, Examining Model Weights
- WHERE clause
- Boolean expressions in, Logical Operations
- casting in, Loading from a Local Source
- comparisons and NULL values, Comparisons
- correlated subqueries in, Correlated subquery
- filtering for NULL values in, Finding Unique Values by Using DISTINCT
- filtering results returned by SELECT, Filtering with WHERE
- GIS predicate functions in, GIS predicate functions
- LIKE operator, SELECT *, EXCEPT, REPLACE
- partitioning and clustering tables in, Insert SELECT
- using GROUP BY instead of, Computing Aggregates by Using GROUP BY
- WHILE loop, Looping
- wildcards
- window functions, Window Functions-Table Metadata
- WITH clause
- worker shards
- Workload Tester, using to measure query speed, Measuring Query Speed Using BigQuery Workload Tester-Measuring Query Speed Using BigQuery Workload Tester
- workloads, troubleshooting using Stackdriver, Troubleshooting Workloads Using Stackdriver-Troubleshooting Workloads Using Stackdriver
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.