Home Page Icon
Home Page
Table of Contents for
Data Wrangling
Close
Data Wrangling
by M. Niranjanamurthy, Kavita Sheoran, Geetika Dhand, Prabhjot Kaur
Data Wrangling
Cover
Series Page
Title Page
Copyright Page
1 Basic Principles of Data Wrangling
2 Skills and Responsibilities of Data Wrangler
3 Data Wrangling Dynamics
4 Essentials of Data Wrangling
5 Data Leakage and Data Wrangling in Machine Learning for Medical Treatment
6 Importance of Data Wrangling in Industry 4.0
7 Managing Data Structure in R
8 Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review
9 Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence
10 Generative Adversarial Networks: A Comprehensive Review
11 Analysis of Machine Learning Frameworks Used in Image Processing: A Review
12 Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges
13 Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car
14 Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited
About the Editors
Index
Also of Interest
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Series Page
Table of Contents
Cover
Series Page
Title Page
Copyright Page
1 Basic Principles of Data Wrangling
1.1 Introduction
1.2 Data Workflow Structure
1.3 Raw Data Stage
1.4 Refined Stage
1.5 Produced Stage
1.6 Steps of Data Wrangling
1.7 Do’s for Data Wrangling
1.8 Tools for Data Wrangling
References
2 Skills and Responsibilities of Data Wrangler
2.1 Introduction
2.2 Role as an Administrator (Data and Database)
2.3 Skills Required
2.4 Responsibilities as Database Administrator
2.5 Concerns for a DBA [12]
2.6 Data Mishandling and Its Consequences
2.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation
2.8 Solution to the Problem
2.9 Case Studies
2.10 Conclusion
References
3 Data Wrangling Dynamics
3.1 Introduction
3.2 Related Work
3.3 Challenges: Data Wrangling
3.4 Data Wrangling Architecture
3.5 Data Wrangling Tools
3.6 Data Wrangling Application Areas
3.7 Future Directions and Conclusion
References
4 Essentials of Data Wrangling
4.1 Introduction
4.2 Holistic Workflow Framework for Data Projects
4.3 The Actions in Holistic Workflow Framework
4.4 Transformation Tasks Involved in Data Wrangling
4.5 Description of Two Types of Core Profiling
4.6 Case Study
4.7 Quantitative Analysis
4.8 Graphical Representation
4.9 Conclusion
References
5 Data Leakage and Data Wrangling in Machine Learning for Medical Treatment
5.1 Introduction
5.2 Data Wrangling and Data Leakage
5.3 Data Wrangling Stages
5.4 Significance of Data Wrangling
5.5 Data Wrangling Examples
5.6 Data Wrangling Tools for Python
5.7 Data Wrangling Tools and Methods
5.8 Use of Data Preprocessing
5.9 Use of Data Wrangling
5.10 Data Wrangling in Machine Learning
5.11 Enhancement of Express Analytics Using Data Wrangling Process
5.12 Conclusion
References
6 Importance of Data Wrangling in Industry 4.0
6.1 Introduction
6.2 Steps in Data Wrangling
6.3 Data Wrangling Goals
6.4 Tools and Techniques of Data Wrangling
6.5 Ways for Effective Data Wrangling
6.6 Future Directions
References
7 Managing Data Structure in R
7.1 Introduction to Data Structure
7.2 Homogeneous Data Structures
7.3 Heterogeneous Data Structures
References
8 Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review
8.1 Introduction
8.2 Application Based Literature Review
8.3 Dimensionality Reduction Techniques
8.4 Experimental Analysis
8.5 Conclusion
References
9 Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence
9.1 Introduction
9.2 The Internet of Things and Big Data Correlation
9.3 Design, Structure, and Techniques for Big Data Technology
9.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools
9.5 Big Data Applications in the Commercial Surroundings
9.6 Big Data Insights’ Constraints
9.7 Conclusion
References
10 Generative Adversarial Networks: A Comprehensive Review
List of Abbreviations
10.1 Introductıon
10.2 Background
10.3 Anatomy of a GAN
10.4 Types of GANs
10.5 Shortcomings of GANs
10.6 Areas of Application
10.7 Conclusıon
References
11 Analysis of Machine Learning Frameworks Used in Image Processing: A Review
11.1 Introduction
11.2 Types of ML Algorithms
11.3 Applications of Machine Learning Techniques
11.4 Solution to a Problem Using ML
11.5 ML in Image Processing
11.6 Conclusion
References
12 Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges
12.1 Introduction
12.2 Uses of AI in Accounting & Finance Sector
12.3 Applications of AI in Accounting and Finance Sector
12.4 Benefits and Advantages of AI in Accounting and Finance
12.5 Challenges of AI Application in Accounting and Finance
12.6 Suggestions and Recommendation
12.7 Conclusion and Future Scope of the Study
References
13 Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car
13.1 Introduction
13.2 Simulations and Results
13.3 Conclusion
References
14 Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited
14.1 Introduction
14.2 Literature Review
14.3 Methodology
14.4 Findings
14.5 Discussion
14.6 Conclusion
References
About the Editors
Index
Also of Interest
End User License Agreement
List of Tables
Chapter 4
Table 4.1 Movement of data through various stages.
Chapter 7
Table 7.1 Classified view of data structures in R.
Chapter 8
Table 8.1 Research papers and the tools and application areas covered by them.
Table 8.2 Results of red-wine quality dataset.
Table 8.3 Results of Wisconsin breast cancer quality dataset.
Chapter 11
Table 11.1 Difference between SL, UL, and RL.
Chapter 14
Table 14.1 Indian economy driving sectors Real Gross Value Added (GVA) growth ...
Table 14.2 Stats during FY 19’-20’ reflecting effect on sales.
Table 14.3 Stats during Mar’21 and Feb’21 reflecting effect on sales.
List of Illustrations
Chapter 1
Figure 1.1 Actions in the raw data stage.
Figure 1.2 Actions in the refined stage.
Figure 1.3 Actions in the produced stage.
Figure 1.4 Data value funnel.
Figure 1.5 Steps for data wrangling process.
Chapter 2
Figure 2.1 PowerBI collaborative environment.
Figure 2.2 Power BI’s various components.
Figure 2.3 UBER’s working model.
Figure 2.4 UBER’s trip description in a week.
Figure 2.5 UBER’s city operations map visualizations.
Figure 2.6 UBER’s separate trips and UBER-Pool trips.
Figure 2.7 Analysis area wise in New York.
Figure 2.8 Analysis area wise in spatiotemporal format.
Chapter 3
Figure 3.1 Graphical depiction of the data wrangling architecture.
Figure 3.2 Image of the Excel tool filling the missing values using the random...
Figure 3.3 Image of the graphical user interface of Altair tool showing the in...
Figure 3.4 Pictorial representation of the features and advantages of Anzo Sma...
Figure 3.5 Image representing the interface to extract the data files in .pdf ...
Figure 3.6 Image representing the transformation operation in Trifacta tool.
Figure 3.7 Graphical representation for accepting the input from various heter...
Figure 3.8 Image depicting the graphical user interface of Paxata tool perform...
Figure 3.9 Image depicting data preparation process using Talend tool where su...
Chapter 4
Figure 4.1 Actions performed in the raw data stage.
Figure 4.2 Actions performed in refined data stage.
Figure 4.3 Actions performed in production data stage.
Figure 4.4 This is how the dataset looks like. It consists of number of record...
Figure 4.5 Snippet of libraries included in the code.
Figure 4.6 Snippet of dataset used.
Figure 4.7 Snippet of manipulations on dataset.
Figure 4.8 The order of the columns has been changed and the datatype of “Numb...
Figure 4.9 Top 10 records of the dataset.
Figure 4.10 Result—Bottom 10 records of the dataset.
Figure 4.11 Here we can get the count, unique, top, freq, mean, std, min, quar...
Figure 4.12 Maximum number of fires is 998 and was reported in the month of Se...
Figure 4.13 The data if grouped by state and we can get the total number of fi...
Figure 4.14 Maximum of total fires recorded were 51118, and this was for State...
Figure 4.15 Code snippet for line graph.
Figure 4.16 Line graph.
Figure 4.17 Code snippet for creating pie graph.
Figure 4.18 Pie chart.
Figure 4.19 Code snippet for creating bar graph.
Figure 4.20 Bar graph.
Chapter 5
Figure 5.1 Task of data wrangling.
Figure 5.2 Pandas (is a software library that was written for Python programmi...
Figure 5.3 NetworkX.
Figure 5.4 Geopandas.
Figure 5.5 Data processing in Python.
Figure 5.6 Various types of machine learning algorithms.
Chapter 6
Figure 6.1 Turning messy data into useful statistics.
Figure 6.2 Organized data using data wrangling.
Chapter 7
Figure 7.1 Data structure in R.
Chapter 8
Figure 8.1 Overview of procedure of dimensionality reduction.
Figure 8.2 Dimension reduction techniques and their application areas.
Figure 8.3 Five variances acquired by PCs.
Figure 8.4 Two class LDA.
Figure 8.5 Choosing the best centroid for maximum separation among various cat...
Figure 8.6 Kernel principal component analysis.
Figure 8.7 Results of de-noising handwritten digits.
Figure 8.8 Casting the structure of Swiss Roll into lower dimensions.
Figure 8.9 Working of LLE.
Chapter 9
Figure 9.1 Architecture for large-scale data computing in standard.
Figure 9.2 Important decision-making considerations.
Figure 9.3 To show the overall design of sensing connection with machinery.
Figure 9.4 To show the basic layout of the IoT interconnected in a production-...
Figure 9.5 Signal transmission from multiple equipment toward a data acquisiti...
Figure 9.6 To show the analogue-to-digital conversion.
Figure 9.7 To show the overall organization of information acquirer operations...
Figure 9.8 To show the standard machine condition.
Figure 9.9 To show the overall working efficiency of the production devices.
Figure 9.10 To show the operating condition of every individual device.
Figure 9.11 To show the correlation of a top company’s revenues per year ago v...
Figure 9.12 To show the product-specific revenues.
Figure 9.13 To show material-related data available on the cloud.
Figure 9.14 To show systems that comprise a conventional manufacturing busines...
Chapter 10
Figure 10.1 Increasingly realistic faces generated by GANs [27].
Figure 10.2 Architecture of GAN.
Figure 10.3 Architecture of cGAN.
Figure 10.4 DCGAN architecture.
Chapter 11
Figure 11.1 Types of ML algorithms.
Figure 11.2 Personal assistants [8].
Figure 11.3 Apps used for navigations and cab booking [9].
Figure 11.4 Social media using phone [10].
Figure 11.5 Fraud detection [10].
Figure 11.6 Google translator [10].
Figure 11.7 Product recommendations [9].
Figure 11.8 Surveillance with video [10].
Figure 11.9 Data science problem categories [20].
Figure 11.10 Anomaly detection in red color person [21].
Figure 11.11 Data clustering [21].
Figure 11.12 Workflow of image processing using ML data clustering [22].
Chapter 12
Figure 12.1 AI applications in finance.
Figure 12.2 Use of AI applications in finance.
Chapter 13
Figure 13.1 Self-driving car UI.
Figure 13.2 Simple Maze with no to-fro loops involved.
Figure 13.3 Teaching hair-pin bends.
Figure 13.4 A more difficult path to cope with looping paths.
Figure 13.5 Plan of attack to achieve the desired goal.
Figure 13.6 Lane.
Figure 13.7 Lane detection from video clips.
Figure 13.8 Depiction of pixel value.
Figure 13.9 Setting area of interest on the frame.
Figure 13.10 (a) Masked image (b) Image after thresholding.
Figure 13.11 Shows lane detection in various frames of the video.
Chapter 14
Figure 14.1 Automobile Production trends 2015–2021.
Figure 14.2 Domestic sales growth for four-wheelers segment.
Figure 14.3 Flowchart of the research methodology.
Figure 14.4 Global impact of COVID-19 on automotive sector.
Figure 14.5 Sales percentage of vehicles according to their type.
Figure 14.6 Market shares of different automotive sector players.
Guide
Cover
Series Page
Title Page
Copyright Page
Table of Contents
Begin Reading
About the Editors
Index
Also of Interest
WILEY END USER LICENSE AGREEMENT
Pages
ii
iii
iv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset