Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9

Advancing with Higher-Order Functions

IN THIS CHAPTER

Defining the kinds of data manipulation

Changing dataset size using slicing and dicing

Changing dataset content using mapping and filtering

Organizing your data

Previous chapters in this book spend a lot of time looking at how to perform basic application tasks and viewing data to see what it contains in various ways. Just viewing the data won’t do you much good, however. Data rarely comes in the form you need it and even if it does, you still want the option to mix it with other data to create yet newer views of the real world. Gaining the ability to shape data in certain ways, throw out what you don’t need, refine its appearance, change its type, and otherwise condition it to meet your needs is the essential goal of this chapter.

Shaping, in the form of slicing and dicing, is the most common kind of manipulation. Data analysis can take hours, days, or even weeks at times. Anything you can do to refine the data to match specific criteria is important in getting answers fast. Obtaining answers quickly is essential in today’s world. Yes, you need the correct answer, but if someone else gets the correct answer first, you may find that the answer no longer matters. You lose your competitive edge.

Also essential is having the right data. The use of data mapping enables you to correlate data between information systems so that you can draw new conclusions. In addition, information overload, especially the wrong kind of information, is never productive, so filtering is essential as well. The combination of mapping and filtering lets you control the dataset content without changing the dataset truthfulness. In short, you get a new view of the same old information.

Data presentation — that is, its organization — is also important. The final section of this chapter discusses the issue of how to organize data to better see the patterns it contains. Given that there isn’t just one way to organize data, one presentation may show one set of patterns, and another presentation could display other patterns. The goal of all this data manipulation is to see something in the data that you haven’t seen before. Perhaps the data will give you an idea for a new product or help you market products to a new group of users. The possibilities are nearly limitless.

Considering Types of Data Manipulation

When you mention the term data manipulation, you convey different information to different people, depending on their particular specialty. An overview of data manipulation may include the term CRUD, which stands for Create, Read, Update, and Delete. A database manager may view data solely from this low-level perspective that involves just the mechanics of working with data. However, a database full of data, even accurate and informative data, isn’t particularly useful, even if you have all the best CRUD procedures and policies in place. Consequently, just defining data manipulation as CRUD isn’t enough, but it’s a start.

To make really huge datasets useful, you must transform them in some manner. Again, depending on whom you talk to, transformation can take on all sorts of meanings. The one meaning that you won’t see in this book is the modification of data such that it implies one thing when it actually said something else at the outset (think of this as spin doctoring the data). In fact, it’s a good idea to avoid this sort of data manipulation entirely because you can end up with completely unpredictable results when performing analysis, even if those results initially look promising and even say what you feel they should say.

Another kind of data transformation actually does something worthwhile. In this case, the meaning of the data doesn’t change; only the presentation of the data changes. You can separate this kind of transformation into a number of methods that include (but aren’t necessarily limited to) tasks such as the following:

Cleaning: As with anything else, data gets dirty. You may find that some of it is missing information and some of it may actually be correct but outdated. In fact, data becomes dirty in many ways, and you always need to clean it before you can use it. Machine Learning For Dummies, by John Paul Mueller and Luca Massaron (Wiley), discusses the topic of cleaning in considerable detail.
Verification: Establishing that data is clean doesn’t mean that the data is correct. A dataset may contain many entries that seem correct but really aren’t. For example, a birthday may be in the right form and appear to be correct until you determine that the person in question is more than 200 years old. A part number may appear in the correct form, but after checking, you find that your organization never produced a part with that number. The act of verification helps ensure the veracity of any analysis you perform and generates fewer outliers to skew the results.
Data typing: Data can appear to be correct and you can verify it as true, yet it may still not work. A significant problem with data is that the type may be incorrect or it may appear in the wrong form. For example, one dataset may use integers for a particular column (feature), while another uses floating-point values for the same column. Likewise, some datasets may use local time for dates and times, while others might use GMT. The transformation of the data from various datasets to match is an essential task, yet the transformation doesn’t actually change the data’s meaning.
Form: Datasets come with many form issues. For example, one dataset may use a single column for people’s names, while another might use three columns (first, middle, and last), and another might use five columns (prefix, first, middle, last, and suffix). The three datasets are correct, but the form of the information is different, so a transformation is needed to make them work together.
Range: Some data is categorical or uses specific ranges to denote certain conditions. For example, probabilities range from 0 to 1. In some cases, there isn’t an agreed-upon range. Consequently, you find data appearing in different ranges even though the data refers to the same sort of information. Transforming all the data to match the same range enables you to perform analysis by using data from multiple datasets.
Baseline: You hear many people talk about dB when considering audio output in various scenarios. However, a decibel is simply a logarithmic ratio, as described at http://www.animations.physics.unsw.edu.au/jw/dB.htm. Without a reference value or a baseline, determining what the dB value truly means is impossible. For audio, the dB is referenced to 1 volt (dBV), as described at http://www.sengpielaudio.com/calculator-db-volt.htm. The reference is standard and therefore implied, even though few people actually know that a reference is involved. Now, imagine the chaos that would result if some people used 1 volt for a reference and others used 2 volts. dBV would become meaningless as a unit of measure. Many kinds of data form a ratio or other value that requires a reference. Transformations can adjust the reference or baseline value as needed so that the values can be compared in a meaningful way.

You can come up with many other transformations. The point of this section is that the method used determines the kind of transformation that occurs, and you must perform certain kinds of transformations to make data useful. Applying an incorrect transformation or the correct transformation in the wrong way will result in useless output even when the data itself is correct.

Performing Slicing and Dicing

Slicing and dicing are two ways to control the size of a dataset. Slicing occurs when you use a subset of the dataset in a single axis. For example, you may want only certain records (also called cases) or you may want only certain columns (also called features). Dicing occurs when you perform slicing in multiple directions. When working with two-dimensional data, you select certain rows and certain columns from those rows. You see dicing used more often with three-dimensional or higher data, when you want to restrict the x-axis and the y-axis but keep all the z-axis (as an example). The following sections describe slicing and dicing in more detail and demonstrate how to perform this task using both Haskell and Python.

REAL-WORLD SLICING AND DICING

The examples in this chapter are meant to demonstrate techniques used with the functional programming paradigm in the simplest manner possible. With this in mind, the examples rely on native language capabilities whenever possible. In the real world, when working with large applications rather than experimenting, you use libraries to make the task easier — especially when working with immense datasets. For example, Python developers often rely on NumPy (http://www.numpy.org/) or pandas (https://pandas.pydata.org/) when performing this task. Likewise, Haskell developers often use hmatrix (https://hackage.haskell.org/package/hmatrix), repa (https://hackage.haskell.org/package/repa), and vector (https://hackage.haskell.org/package/vector) to perform the same tasks. The libraries vary in functionality, provide language-specific features, and make it tough to compare code. Consequently, when you're initially discovering how to perform a technique, it’s often best to rely on native capability and then add library functionality to augment the language.

Keeping datasets controlled

Datasets can become immense. The data continues to accumulate from various sources until it becomes impossible for the typical human to comprehend it all. So slicing and dicing might at first seem to be a means for making data more comprehensible. It can do that, but making the data comprehensible isn’t the point. Too much data can even overwhelm a computer — not in the same way as a human gets overwhelmed, because a computer doesn’t understand anything, but to the point where processing proceeds at a glacial pace. As the cliché says, time is money, which is precisely why you want to control dataset size. The more focused you can make any data analysis, the faster the analysis will proceed.

Sometimes you must use slicing and dicing to break the data down into training and testing units for computer technologies such as machine learning. You use the training set to help an algorithm perform the correct processing in the correct way through examples. The testing set then verifies that the training went as planned. Even though machine learning is the most prominent technology today that requires breaking data into groups, you can find other examples. Many database managers work better when you break data into pieces and perform batch processing on it, for example.

Slicing and dicing can give you a result that doesn't actually reflect the realities of the data as a whole. If the data isn’t randomized, one piece of the data may contain more of some items than the other piece. Consequently, you must sometimes randomize (shuffle) the dataset before using slicing and dicing techniques on it.

Focusing on specific data

Slicing and dicing techniques can also help you improve the focus of a particular analysis. For example, you may not actually require all the columns (features) in a dataset. Removing the extraneous columns can actually make the data easier to use and provide results that are more reliable.

Likewise, you may need to remove unneeded information from the dataset. For example, a dataset that contains entries from the last three years requires slicing or dicing when you need to analyze only the results from one year. Even though you could use various techniques to ignore the extra entries in code, eliminating the unwanted years from the dataset using slicing and dicing techniques makes more sense.

Be sure to keep slicing and dicing separate from filtering. Slicing and dicing focuses on groups of randomized data for which you don’t need to consider individual data values. Slicing out a particular year from a dataset containing sales figures is different from filtering the sales produced by a particular agent. Filtering looks for specific data values regardless of which group contains that value. The “Filtering Data” section, later in this chapter, discusses filtering in more detail, but just keep in mind that the two techniques are different.

Slicing and dicing with Haskell

Haskell slicing and dicing requires a bit of expertise to understand because you don’t directly access the slice as you might with other languages through indexing. Of course, there are libraries that encapsulate the process, but this section reviews a native language technique that will do the job for you using the take and drop functions. Slicing can be a single-step process if you have the correct code. To begin, the following code begins with a one-dimensional list, let myList = [1, 2, 3, 4, 5].

-- Display the first two elements. take 2 myList -- Display the remaining three elements. drop 2 myList -- Display a data slice of just the center element. take 1 $ drop 2 myList

The slice created by the last statement begins by dropping the first two elements using drop 2 myList, leaving [3,4,5]. The $ operator connects this output to the next function call, take 1, which produces an output of [3]. Using this little experiment, you can easily create a slice function that looks like this:

slice xs x y = take y $ drop x xs

To obtain just the center element from myList, you would call slice myList 2 1, where 2 is the zero-based starting index and 1 is the length of the output you want. Figure 9-1 shows how this sequence works.

“Screen capture of WinGHCi with codes let myList = [1, 2, 3, 4, 5]; and take 2 myList, drop 2 myList, take y $ drop x xs, slice myList 2 1 with outputs [1,2], [3,4,5], [3], [3].” — FIGURE 9-1: Use the `slice` function to obtain just a slice of `myList`.

Of course, slicing that works only on one-dimensional arrays isn't particularly useful. You can test the slice function on a two-dimensional array by starting with a new list, let myList2 = [[1,2],[3,4],[5,6],[7,8],[9,10]]. Try the same call as before, slice myList2 2 1, and you see the expected output of [[5,6]]. So, slice works fine even with a two-dimensional list.

Dicing is somewhat the same, but not quite. To test the dice function, begin with a slightly more robust list, let myList3 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]]. Because you're now dealing with the inner values rather than the lists contained with a list, you must rely on recursion to perform the task. The “Defining the need for repetition” section of Chapter 8 introduces you to the forM function, which repeats a particular code segment. The following code shows a simplified, but complete, dicing sequence.

import Control.Monad let myList3 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]] slice xs x y = take y $ drop x xs dice lst x y = forM lst (i -> do return(slice i x y)) lstr = slice myList3 1 3 lstr lstc = dice lstr 1 1 lstc

To use forM, you must import Control.Monad. The slice function is the same as before, but you must define it within the scope created after the import. The dice function uses forM to examine every element within the input list and then slice it as required. What you're doing is slicing the list within the list. The next items of code first slice myList3 into rows, and then into columns. The output is as you would expect: [[5],[8],[11]]. Figure 9-2 shows the sequence of events.

“Screen capture of WinGHCi with code starting import Control.Monad and codes lstr = slice myList3 1 3, lstc = dice lstr 1 1 with outputs [[4,5,6],[7,8,9], [10,11,12]]; [[5],[8],[11]]” — FIGURE 9-2: Dicing is a two-step process.

Slicing and dicing with Python

In some respects, slicing and dicing is considerably easier in Python than in Haskell. For one thing, you use indexes to perform the task. Also, Python offers more built-in functionality. Consequently, the one-dimensional list example looks like this:

myList = [1, 2, 3, 4, 5] print(myList[:2]) print(myList[2:]) print(myList[2:3])

The use of indexes enables you to write the code succinctly and without using special functions. The output is as you would expect:

[1, 2] [3, 4, 5] [3]

Slicing a two-dimensional list is every bit as easy as working with a one-dimensional list. Here's the code and output for the two-dimensional part of the example:

myList2 = [[1,2],[3,4],[5,6],[7,8],[9,10]] print(myList2[:2]) print(myList2[2:]) print(myList2[2:3]) [[1, 2], [3, 4]] [[5, 6], [7, 8], [9, 10]] [[5, 6]]

Notice that the Python functionality matches that of Haskell’s take and drop functions; you simply perform the task using indexes instead. Dicing does require using a special function, but the function is concise in this case and doesn't require multiple steps:

def dice(lst, rb, re, cb, ce): lstr = lst[rb:re] lstc = [] for i in lstr: lstc.append(i[cb:ce]) return lstc

In this case, you can’t really use a lambda function — or not easily, at least. The code slices the incoming list first and then dices it, just as in the Haskell example, but everything occurs within a single function. Notice that Python requires the use of looping, but this function uses a standard for loop instead of relying on recursion. The disadvantage of this approach is that the loop relies on state, which means that you can’t really use it in a fully functional setting. Here’s the test code for the dicing part of the example:

myList3 = [[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]] print(dice(myList3, 1, 4, 1, 2)) [[5], [8], [11]]

Mapping Your Data

You can find a number of extremely confusing references to the term map in computer science. For example, a map is associated with database management (see https://en.wikipedia.org/wiki/Data_mapping), in which data elements are mapped between two distinct data models. However, for this chapter, mapping refers to a process of applying a high-order function to each member of a list. Because the function is applied to every member of the list, the relationships among list members is unchanged. Many reasons exist to perform mapping, such as ensuring that the range of the data falls within certain limits. The following sections of the chapter help you better understand the uses for mapping and demonstrate the technique using the two languages supported in this book.

Understanding the purpose of mapping

The main idea behind mapping is to apply a function to all members of a list or similar structure. Using mapping can help you adjust the range of the values or prepare the values for particular kinds of analysis. Functional languages originated the idea of mapping, but mapping now sees use in most programming languages that support first-class functions.

The goal of mapping is to apply the function or functions to a series of numbers equally to achieve specific results. For example, squaring the numbers can rid the series of any negative values. Of course, you can just as easily take the absolute value of each number. You may need to convert a probability between 0 and 1 to a percentage between 0 and 100 for a report or other output. The relationship between the values will stay the same, but the range won’t. Mapping enables you to obtain specific data views.

Performing mapping tasks with Haskell

Haskell is one of the few computer languages whose map function isn't necessarily what you want. For example, the map associated with Data.Map.Strict, Data.Map.Lazy, and Data.IntMap works with the creation and management of dictionaries, not the application of a consistent function to all members of a list (see https://haskell-containers.readthedocs.io/en/latest/map.html and http://hackage.haskell.org/package/containers-0.5.11.0/docs/Data-Map-Strict.html for details). What you want instead is the map function that appears as part of the base prelude so that you can access map without importing any libraries.

The map function accepts a function as input, along with one or more values in a list. You might create a function, square, that outputs the square of the input value: square x = x * x. A list of values, items = [0, 1, 2, 3, 4], serves as input. Calling map square items produces an output of [0,1,4,9,16]. Of course, you could easily create another function: double x = x + x, with a map double items output of [0,2,4,6,8]. The output you receive clearly depends on the function you use as input (as expected).

You can easily get overwhelmed trying to create complex functions to modify the values in a list. Fortunately, you can use the composition operator (., or dot) to combine them. Haskell actually applies the second function first. Consequently, map (square.double) items produces an output of [0,4,16,36,64] because Haskell doubles the numbers first, and then squares them. Likewise, map (double.square) items produces an output of [0,2,8,18,32] because squaring occurs first, followed by doubling.

The apply operator ($) is also important to mapping. You can create a condition for which you apply an argument to a list of functions. As shown in Figure 9-3, you place the argument first in the list, followed by the function list (map ($4) [double, square]). The output is a list with one element for each function, which is [8,16] in this case. Using recursion would allow you to apply a list of numbers to a list of functions.

“Screen capture of WinGHCi with code square x = x * x and codes map square items, map double items, map (square.double) items, map (double.square) items with outputs [0,1,4,9,16]; [0,2,4,6,8]; [0,4,16,36,64]; [0,2,8,18,32].” — FIGURE 9-3: You can apply a single value to a list of functions.

Performing mapping tasks with Python

Python performs many of the same mapping tasks as Haskell, but often in a slightly different manner. Look, for example, at the following code:

square = lambda x: x**2 double = lambda x: x + x items = [0, 1, 2, 3, 4] print(list(map(square, items))) print(list(map(double, items)))

You obtain the same output as you would with Haskell using similar code. However, note that you must convert the map object to a list object before printing it. Given that Python is an impure language, creating code that processes a list of inputs against two or more functions is relatively easy, as shown in this code:

funcs = [square, double] for i in items: value = list(map(lambda items: items(i), funcs)) print(value)

Note that, as with the Haskell code, you're actually applying individual list values against the list of functions. However, Python requires a lambda function to get the job done. Figure 9-4 shows the output from the example.

Screen capture of FPD_09_Higher_Order_Functions Jupyter screen with codes for square, double, and funcs with items described and outputs given. — FIGURE 9-4: Using multiple paradigms in Python makes mapping tasks easier.

Filtering Data

Most programming languages provide specialized functions for filtering data today. Even when the language doesn’t provide a specialized function, you can use common methods to perform filtering manually. The following sections discuss what filtering is all about and how to use the two target languages to perform the task.

Understanding the purpose of filtering

Data filtering is an essential tool in removing outliers from datasets, as well as selecting specific data based on one or more criteria for analysis. While slicing and dicing selects data regardless of specific content, data filtering makes specific selections to achieve particular goals. Consequently, the two techniques aren’t mutually exclusive; you may well employ both on the same dataset in an effort to locate the particular data needed for an analysis. The following sections discuss details of filtering use and provide examples of simple data filtering techniques for both of the languages used in this book.

Developers often apply slicing and dicing, mapping, and filtering together to shape data in a manner that doesn’t change the inherent relationships among data elements. In all three cases, the data’s organization remains unchanged, and an element that is twice the size of another element tends to remain in that same relationship. Modifying data range, the number of data elements, and other factors in a dataset that don’t modify the data’s content — its relationship to the environment from which it was taken — is common in data science in preparation for performing tasks such as analysis and comparison, along with creating single, huge datasets from numerous smaller datasets. Filtering enables you to ensure that the right data is in the right place and at the right time.

Using Haskell to filter data

Haskell relies on a filter function to remove unwanted elements from lists and other dataset structures. The filter function accepts two inputs: a description of what you want removed and the list of elements to filter. The filter descriptions come in three forms:

Special keywords, such as odd and even
Simple logical comparisons, such as >
Lambda functions, such as x -> mod x 3 == 0

To see how this all works, you could create a list such as items = [0, 1, 2, 3, 4, 5]. Figure 9-5 shows the results of each of the filtering scenarios.

“Screen capture of WinGHCi with code items = [0, 1, 2, 3, 4, 5] and codes, outputs: filter (odd) items, [1, 3, 5]; filter (.3) items, [4, 5]; filter (x -> mod x 3 == 0) items, [0, 3].” — FIGURE 9-5: Filtering descriptions take three forms in Haskell.

You want to carefully consider the use of Haskell operators when performing any task, but especially filtering. For example, at first look, rem and mod might not seem much different. Using rem 5 3 produces the same output as mod 5 3 (an output of 2). However, as noted at https://stackoverflow.com/questions/5891140/difference-between-mod-and-rem-in-haskell, a difference arises when working with a negative number. In this situation, mod 3 (-5) produces an output of -2, while rem 3 (-5) produces an output of 3.

Using Python to filter data

Python doesn't provide a few of the niceties that Haskell does when it comes to filtering. For example, you don’t have access to special keywords, such as odd or even. In fact, all the filtering in Python requires the use of lambda functions. Consequently, to obtain the same results for the three cases in the previous section, you use code like this:

items = [0, 1, 2, 3, 4, 5] print(list(filter(lambda x: x % 2 == 1, items))) print(list(filter(lambda x: x > 3, items))) print(list(filter(lambda x: x % 3 == 0, items)))

Notice that you must convert the filter output using a function such as list. You don't have to use list; you could use any data structure, including set and tuple. The lambda function you create must evaluate to True or False, just as it must with Haskell. Figure 9-6 shows the output from this example.

Screen capture of FPD_09_Higher_Order_Functions Jupyter screen with codes for print(list(filter(lambda x: for x % 2 == 1; x > 3; x % 3 == 0 with outputs [1, 3, 5]; [4, 5]; [0, 3]. — FIGURE 9-6: Python lacks some of the Haskell special filtering features.

Organizing Data

None of the techniques discussed so far changes the organization of the data directly. All these techniques can indirectly change organization through a process of data selection, but that's not the goal of the methods applied. However, sometimes you do need to change the organization of the data. For example, you might need it sorted or grouped based on specific criteria. In some cases, organizing the data can also mean to randomize it in some manner to ensure that an analysis reflects the real world. The following sections discuss the kinds of organization that most people apply to data; also covered is how you can implement sorting using the two languages that appear in this book.

Considering the types of organization

Organization — the forming of any object based on a particular pattern—is an essential part of working with data for humans. The coordination of elements within a dataset based on a particular need is usually the last step in making the data useful, except when other parts of the cleaning process require organization to work properly. How something is organized affects the way in which humans view it, and organizing the object in some other manner will change the human perspective, so often people find themselves organizing datasets one way and then reorganizing them in another. No right or wrong way to organize data exists; you just want to use the approach that works best for viewing the information in a way that helps see the desired pattern.

You can think about organization in a number of ways. Sometimes the best organization is disorganization. Seeing patterns in seemingly random patterns finds a place in many areas of life, including art (see the stereograms at http://www.vision3d.com/sghidden.html as an example). A pattern is what you make of it, so sometimes thinking about what you want to see, rather than making things neat and tidy, is the best way to achieve your objectives. The following list provides some ideas on organization, most of which you have thought about, but some of which you likely haven’t. The list is by no means exhaustive.

Sorting: One of the most common ways to organize data is to sort it, with the alphanumeric sort being the most common. However, sorts need not be limited to ordering the data by the alphabet or computer character number. For example, you could sort according to value length or by commonality. In fact, the idea of sorting simply means placing the values in an order from greatest to least (or vice versa) according to whatever criteria the sorter deems necessary.
Grouping: Clustering data such that the data with the highest degree of commonality is together is another kind of sorting. For example, you might group data by value range, with each range forming a particular group. As with sorting, grouping criteria can be anything. You might choose to group textual data by the number of vowels contained in each element. You might group numeric data according to an algorithm of some sort. Perhaps you want all the values that are divisible by 3 in one bin and those that are divisible by 7 in another, with a third bin holding values that can’t be divided by either.
Categorizing: Analyzing the data and placing data that has the same properties together is another method of organization. The properties can be anything. Perhaps you need to find values that match specific colors, or words that impart a particular kind of meaning. The values need not hold any particular commonality; they just need to have the same properties.
Shuffling: Disorganization can be a kind of organization. Chaos theory (see https://fractalfoundation.org/resources/what-is-chaos-theory/ for an explanation) finds use in a wide variety of everyday events. In fact, many of today’s sciences rely heavily on the effects of chaos. Data shuffling often enhances the output of algorithms and creates conditions that enable you to see unexpected patterns. Creating a kind of organization through the randomization of data may seem counter to human thought, but it works nonetheless.

Sorting data with Haskell

Haskell provides a wide variety of sorting mechanisms, such that you probably won’t have to resort to doing anything of a custom nature unless your data is unique and your requirements are unusual. However, getting the native functionality that's found in existing libraries can prove a little daunting at times unless you think the process through first. To start, you need a list that’s a little more complex than others used in this chapter: original = [(1, "Hello"), (4, "Yellow"), (5, "Goodbye"), (2, "Yes"), (3, "No")]. Use the following code to perform a basic sort:

import Data.List as Dl sort original

The output is based on the first member of each tuple: [(1,"Hello"),(2,"Yes"),(3,"No"),(4,"Yellow"),(5,"Goodbye")]. If you want to perform a reverse sort, you can use the following call instead:

(reverse . sort) original

Notice how the reverse and sort function calls appear in this example. You must also include the space shown between reverse, sort, and the composition operator (.). The problem with using this approach is that Haskell must go through the list twice: once to sort it and once to reverse it. An alternative is to use the sortBy function, as shown here:

sortBy (x y -> compare y x) original

The sortBy function lets you use any comparison function needed to obtain the desired result. For example, you might not be interested in sorting by the first member of the tuple but instead prefer to sort by the second member. In this case, you must use the snd function from Data.Tuple (which loads with Prelude) with the comparing function from Data.Ord (which you must import), as shown here:

import Data.Ord as Do sortBy (comparing $ snd) original

Notice how the call applies comparing to snd using the apply operator ($). Using the correct operator is essential to make sorts work. The results are as you would expect: [(5,"Goodbye"),(1,"Hello"),(3,"No"),(4,"Yellow"),(2,"Yes")]. However, you might not want a straight sort. You really may want to sort by the length of the words in the second member of the tuple. In this case, you can make the following call:

sortBy (comparing $ length . snd) original

The call applies comparing to the result of the composition of snd, followed by length (essentially, the length of the second tuple member). The output reflects the change in comparison: [(3,"No"),(2,"Yes"),(1,"Hello"),(4,"Yellow"),(5,"Goodbye")]. The point is that you can sort in any manner needed using relatively simple statements in Haskell unless you work with complex data.

Sorting data with Python

The examples in this section use the same list as that found in the previous section: original = [(1, "Hello"), (4, "Yellow"), (5, "Goodbye"), (2, "Yes"), (3, "No")], and you'll see essentially the same sorts, but from a Python perspective. To understand these examples, you need to know how to use the sort method, versus the sorted function. When you use the sort method, Python changes the original list, which may not be what you want. In addition, sort works only with lists, while sorted works with any iterable. The sorted function produces output that doesn't change the original list. Consequently, if you want to maintain your original list form, you use the following call:

sorted(original)

The output is sorted by the first member of the tuple: [(1, 'Hello'), (2, 'Yes'), (3, 'No'), (4, 'Yellow'), (5, 'Goodbye')], but the original list remains intact. Reversing a list requires the use of the reverse keyword, as shown here:

sorted(original, reverse=True)

Both Haskell and Python make use of lambda functions to perform special sorts. For example, to sort by the second element of the tuple, you use the following code:

sorted(original, key=lambda x: x[1])

The key keyword is extremely flexible. You can use it in several ways. For example, key=str.lower would perform a case-insensitive sort. Some of the common lambda functions appear in the operator module. For example, you could also sort by the second element of the tuple using this code:

from operator import itemgetter sorted(original, key=itemgetter(1))

You can also create complex sorts. For example, you can sort by the length of the second tuple element by using this code:

sorted(original, key=lambda x: len(x[1]))

Notice that you must use a lambda function when performing a custom sort. For example, trying this code will result in an error:

sorted(original, key=len(itemgetter(1)))

Even though itemgetter is obtaining the key from the second element of the tuple, it doesn’t possess a length. To use the second tuple’s length, you must work with the tuple directly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Advancing with Higher-Order Functions

Create new playlist

Sign In

Sign Up