Home Page Icon
Home Page
Table of Contents for
Contributors
Close
Contributors
by Suresh Kumar Mukhiya, Tao Wei, James Lee
Hands-On Big Data Modeling
Title Page
Copyright and Credits
Hands-On Big Data Modeling
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Big Data and Data Management
The concept of big data 
Interesting insights regarding big data
Characteristics of big data
Sources and types of big data
Challenges of big data
Introduction to big data modeling
Uses of models
Introduction to managing big data
Importance and implications of big data modeling and management
Benefits of big data management
Challenges in big data management 
Setting up big data modeling platforms
Getting started on Windows
Getting started on macOS
Summary
Further reading
Data Modeling and Management Platforms
Big data management
Data ingestion
Data storage
Data quality
Data operations
Data scalability and security
Big data management services
Data cleansing
Data integration
Big data management vendors
Big data storage and data models
Storage models
Block-based storage
File-based storage 
Object-based storage
Data models
Relational stores (SQLs)
Scalable relational systems
Database as a Service (DaaS)
NoSQL stores
Document stores
Key-value stores
Extensible-record stores
Big data programming models
MapReduce
MapReduce functionality
Hadoop
Features of Hadoop frameworks
Yet Another Resource Negotiator 
Functional programming
Spark
Reasons to choose Apache Spark
Flink
Advantages of Flink
SQL data models
Hive Query Langauge (HQL)
Cassandra Query Language (CQL)
Spark SQL
Apache Drill
Getting started with Python and R
Python on macOS
Python on Windows
R on macOS
R on Windows
Summary
Further reading
Defining Data Models
Data model structures
Structured data
Unstructured data
Sources of unstructured data
Comparing structured and unstructured data
Data operations
Subsetting
Union
Projection
Join
Data constraints
Types of constraints
Value constraints
Uniqueness constraints
Cardinality constraints
Type constraints
Domain constraints
Structural constraints
A unified approach to big data modeling and data management
Summary
Further reading
Categorizing Data Models
Levels of data modeling
Conceptual data modeling
Logical data modeling
Benefits of constructing LDMs
Physical data modeling
Features of the physical data model
Types of data model
Hierarchical database models
Relational models
Advantages of the relational data model
Network models
Object-oriented database model
Entity-relationship models
Object-relational models
Summary
Further reading
Structures of Data Models
Semi-structured data models
Exploring the semi-structured data model of JSON data
Installing Python and the Tweepy library
Getting authorization credentials to access the Twitter API
VSM with Lucene
Lucene
Graph-data models
Graph-data models with Gephi
Summary 
Further reading
Modeling Structured Data
Getting started with structured data
NumPy
Operations using NumPy
Pandas
Matplotlib
Seaborn
IPython
Modeling structured data using Python
Visualizing the location of houses based on latitude and longitude
Factors that affect the price of houses
Visualizing more than one parameter
Gradient-boosting regression
Summary
Further reading
Modeling with Unstructured Data
Getting started with unstructured data
Tools for intelligent analysis
New methods of data processing
Tools for analyzing unstructured data
Weka
KNIME
Characteristics of KNIME
The R language
Unstructured text analysis using R
Data ingestion
Data cleaning and transformations
Data visualization
Improving the model
Summary
Further reading
Modeling with Streaming Data
Data stream and data model versus data format
Why is streaming data different?
Use cases of stream processing
What is a data stream?
Data streaming systems
How streaming works
Data harvesting
Data processing
Data analytics
Importance and implications of streaming data
Needs for stream processing
Challenges with streaming data
Streaming data solutions
Exploring streaming sensor data from the Twitter API
Analyzing the streaming data
Summary
Further reading
Streaming Sensor Data
Sensor data
Data lakes
Differences between data lakes and data warehouses
How a data lake works
Exploring streaming sensor data from a weather station
Summary
Further study
Concept and Approaches of Big-Data Management
Non-DBMS-based approach to big data
Filesystems
Problems with processing files
DBMS-based approach to big data
Advantages of the DBMS
Declarative Query Language (DQL)
Data independence
Controlling data redundancy
Centralized data management and concurrent access
Data integrity
Data availability
Efficient access through optimization
Parallel and distributed DBMS
Parallel DBMS
Motivations for parallel DBMS
Architectures for parallel databases
Distributed DBMS
Features of a distributed DBMS
Merits of a distributed DBMS
DBMS and MapReduce-style systems
Summary
Further reading
DBMS to BDMS
Characteristics of BDMS
BASE properties
Exploring data management with Redis
Getting started with Redis on macOS
Advanced key-value stores
Redis and Hadoop
Aerospike
Aerospike technology
AsterixDB
Data models
The Asterix query language
Getting started with AsterixDB
Unstructured data in AsterixDB
Inserting into datasets
Querying in AsterixDB
Summary
Further reading
Modeling Bitcoin Data Points with Python
Introduction to Bitcoin data
Theory
Importing Bitcoin data into iPython
Importing required libraries
Preprocessing and model creation
Predicting Bitcoin price using Recurrent Neural Network
Importing packages
Importing datasets
Preprocessing
Constructing the RNN model
Prediction
Summary
Further reading
Modeling Twitter Feeds Using Python
Importing Twitter feed data
Modeling Twitter feeds
The frequency of the tweets
Sentiment analysis
Installing TextBlob
Parts of speech
Noun-phrase extraction
Tokenization
Bag of words
Summary
Further reading
Modeling Weather Data Points with Python
Introduction to weather data
Importing data
Forecasting Nepal's temperature change
Modeling with data
Persistence model forecast
Weather statistics by country
Linear regression to predict the temperature of a city
Summary
Further reading
Modeling IMDb Data Points with Python
Introduction to IMDb data
Episode data
Rating data
Theory 
Modeling with the IMDb dataset
Starting the platform
Importing the required libraries
Importing a file
Data cleansing
Clustering
Summary
Further reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Packt.com
Next
Next Chapter
About the authors
Contributors
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset