Preface

Big data is a term used to describe data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. The possible challenges in this direction include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract meaningful value from data without concern for the size of the data set. Due to the advances in data evolution, scientists are encountering limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology and environmental research.

Big data analytics is the process of examining large and varied data sets – i.e., big data – to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful information that can help organizations make more informed business decisions. Big data analytics applications enable data scientists, predictive modellers, statisticians and other analytics professionals to analyse growing volumes of structured transaction data, plus other forms of data that are often left untapped by conventional business intelligence (BI) and analytics programs that encompasses a mix of semi-structured and unstructured data. On a broad scale, data analytics technologies and techniques provide a means of analysing data sets and drawing conclusions about them to help organizations make informed business decisions. BI queries answer basic questions about business operations and performance. Big data analytics is a form of advanced analytics that involves complex applications with elements such as predictive models assisted by statistical algorithms powered by high-performance analytics systems.

Note that unstructured and semi-structured data of these types typically do not fit well in traditional data warehouses that are based on relational databases oriented towards structured data sets. Furthermore, data warehouses may not be able to handle the processing demands posed by sets of big data that need to be updated frequently – or even continually, as in the case of real-time data on stock trading, the online activities of website visitors or the performance of mobile applications. As a result, many organizations that collect, process and analyse big data turn to Hadoop and its companion tools, such as YARN, MapReduce, Spark, HBase, Hive, Kafka and Pig, as well as NoSQL databases. In some cases, Hadoop clusters and NoSQL systems are used primarily as landing pads and staging areas for data before they get loaded into a data warehouse or analytical database for analysis, usually in a summarized form that is more conducive to relational structures. Of late, scientists and researchers have resorted to machine intelligence for analysing big data, thereby evolving BI. It is a well-known fact that data in any form exhibit varying amounts of ambiguity and imprecision. Machine learning tools and strategies are adept at handling these uncertainties and, hence, extracting relevant and meaningful information from data.

This volume comprises six well-versed contributed chapters devoted to reporting the latest findings on the applications of machine learning for big data analytics.

Chapter 1 provides a hands-on introduction to psychometric trait analysis and presents a scalable infrastructure solution as a proof of concept for two important concepts, efficient handling of enormous amounts of available data and the demand for micro-targeting. The authors discuss two use cases and show how psychometric information, which could, for example, be used for targeted political messages, can be derived from Facebook data. Finally, potential further developments are outlined that could serve as starting points for future research.

Video summarization is an important field of research in content-based video retrieval. One of the major aims in this domain has been to generate summaries of videos in the shortest possible time. In Chapter 2, the primary aim is to rapidly select keyframes from the composing shots of a video to generate a storyboard in a minimal amount of time. The time taken to produce the storyboard is directly proportional to the number of correlations to be computed between the frames of a shot. To reduce the time, user input is obtained regarding the amount of actual correlations to be computed. Keyframes are selected from each shot by generating an approximate minimal spanning tree and computing the density around each frame of the shot by means of an automatic threshold based on the statistical distribution of the correlation values.

Most techniques for image processing involves algorithms that are custom built and lack flexibility, making them different to the data being processed. In Chapter 3, the authors elaborate upon various methodologies within the domain of image processing. They chronologically demonstrate the role of learning techniques involved in image super resolution, image upsampling, image quality assessment and parallel computing techniques. Further, an in-depth explanation is provided of the involvement of deep neural architectures as an impressive tool for performing multiple image processing tasks.

Chapter 4 focuses on connected cities in terms of smart transportation. A connected city controls available resources in such a way that it can efficiently improve economic and societal outcomes. Many data are generated from people, systems and things in cities. Thus, data generated from various resources are considered to be the most scalable asset of a connected city. Heterogeneous data are difficult to organize, interpret and analyse. Generally, the data generated from various sources are very large and heterogeneous as well because of they are generated from heterogeneous environments like water, traffic, energy, buildings and so forth. Hence, different multidimensional contexts like databases, data mining, artificial intelligence and distributed systems communities are useful for dealing with the challenges of big data in connected cities.

A new hybrid structure of neural fuzzy networks is proposed and studied in Chapter 5, combining through the layer of fuzzy clustering a fuzzy cellular Kohonen neural network and radial-basic neural network. The proposed model has a high degree of self-organization of neurons, improving the separation of network properties in case of overlapping clusters; automatic adjustment of the parameters of radially symmetric functions; the presence of a single hidden layer, sufficient for modelling pronounced non-linear dependencies; a simple algorithm for optimizing weight coefficients; and a high learning speed. The model can be used to solve a wide range of problems – clusterization, approximation and classification (recognition) of multidimensional, semi-sstructured data.

Image fusion is a combination of multiple images that results in a fused image. It provides more information than any other input images and is based on discrete wavelet transformation (DWT) and sparse sampling. The sparse sampling method offers better performance than the Nyquist theorem for signal processing. Among the various techniques, DWT offers many advantages; it yields higher quality and requires less storage and low cost, which is very useful in image applications. Image-related applications have few constraints such as minimum data storage and less bandwidth for communication that takes place through satellite, which actually results in the capture of low-quality images. To overcome this problem, image fusion has proven to be a potential tool for remote sensing applications that incorporate data from combinations of panchromatic, multispectral images; for bringing out a composite image having both higher spatial and spectral resolutions. The research in this area goes back a couple of decades; the diverse approaches and methodologies proposed so far by the various researchers in the field are discussed in Chapter 6.

This volume is intended to be used as a reference by undergraduate and postgraduate students in the fields of computer science, electronics and telecommunication, information science and electrical engineering as part of their curriculum.

April 2018

Kolkata, India

Siddhartha Bhattacharyya

Hrishikesh Bhaumik

Anirban Mukherjee

Sourav De

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset