Chapter 7. Mapping and Set Types

In this chapter, we take a look at Python’s mapping and set types. As in earlier chapters, an introduction is followed by a discussion of the applicable operators and factory and built-in functions (BIFs) and methods. We then go into more specific usage of each data type.

7.1 Mapping Type: Dictionaries

Dictionaries are the sole mapping type in Python. Mapping objects have a one-to-many correspondence between hashable values (keys) and the objects they represent (values). They are similar to Perl hashes and can be generally considered as mutable hash tables. A dictionary object itself is mutable and is yet another container type that can store any number of Python objects, including other container types. What makes dictionaries different from sequence type containers like lists and tuples is the way the data are stored and accessed.

Sequence types use numeric keys only (numbered sequentially as indexed offsets from the beginning of the sequence). Mapping types may use most other object types as keys; strings are the most common. Unlike sequence type keys, mapping keys are often, if not directly, associated with the data value that is stored. But because we are no longer using “sequentially ordered” keys with mapping types, we are left with an unordered collection of data. As it turns out, this does not hinder our use because mapping types do not require a numeric value to index into a container to obtain the desired item. With a key, you are “mapped” directly to your value, hence the term “mapping type.” The reason why they are commonly referred to as hash tables is because that is the exact type of object that dictionaries are. Dictionaries are one of Python’s most powerful data types.

Core Note: What are hash tables and how do they relate to dictionaries?

image

Sequence types use sequentially ordered numeric keys as index offsets to store your data in an array format. The index number usually has nothing to do with the data value that is being stored. There should also be a way to store data based on another associated value such as a string. We do this all the time in everyday living. You file people’s phone numbers in your address book based on last name, you add events to your calendar or appointment book based on date and time, etc. For each of these examples, an associated value to a data item was your key.

Hash tables are a data structure that does exactly what we described. They store each piece of data, called a value, based on an associated data item, called a key. Together, these are known as key-value pairs. The hash table algorithm takes your key, performs an operation on it, called a hash function, and based on the result of the calculation, chooses where in the data structure to store your value. Where any one particular value is stored depends on what its key is. Because of this randomness, there is no ordering of the values in the hash table. You have an unordered collection of data.

The only kind of ordering you can obtain is by taking either a dictionary’s set of keys or values. The keys() or values() method returns lists, which are sortable. You can also call items() to get a list of keys and values as tuple pairs and sort that. Dictionaries themselves have no implicit ordering because they are hashes.

Hash tables generally provide good performance because lookups occur fairly quickly once you have a key.

Python dictionaries are implemented as resizeable hash tables. If you are familiar with Perl, then we can say that dictionaries are similar to Perl’s associative arrays or hashes.

We will now take a closer look at Python dictionaries. The syntax of a dictionary entry is key:value Also, dictionary entries are enclosed in braces ( { } ).

How to Create and Assign Dictionaries

Creating dictionaries simply involves assigning a dictionary to a variable, regardless of whether the dictionary has elements or not:

image

image

In Python versions 2.2 and newer, dictionaries may also be created using the factory function dict(). We discuss more examples later when we take a closer look at dict(), but here’s a sneak peek for now:

image

image

In Python versions 2.3 and newer, dictionaries may also be created using a very convenient built-in method for creating a “default” dictionary whose elements all have the same value (defaulting to None if not given), fromkeys():

image

How to Access Values in Dictionaries

To traverse a dictionary (normally by key), you only need to cycle through its keys, like this:

image

image

Beginning with Python 2.2, you no longer need to use the keys() method to extract a list of keys to loop over. Iterators were created to simplify accessing of sequence-like objects such as dictionaries and files. Using just the dictionary name itself will cause an iterator over that dictionary to be used in a for loop:

image

To access individual dictionary elements, you use the familiar square brackets along with the key to obtain its value:

image

Dictionary dict1 defined above is empty while dict2 has two data items. The keys in dict2 are 'name' and 'port', and their associated value items are 'earth' and 80, respectively. Access to the value is through the key, as you can see from the explicit access to the 'name' key.

If we attempt to access a data item with a key that is not part of the dictionary, we get an error:

image

In this example, we tried to access a value with the key 'server' which, as you know from the code above, does not exist. The best way to check if a dictionary has a specific key is to use the in or not in operators starting with version 2.2. The has_key() method is obsolete and removed in Python 3.0, so it is best to just use in or not in.

image

The in and not in operators are Boolean, returning True if a dictionary has that key and False otherwise. (In Python versions preceding Boolean constants [older than 2.3], the values returned are 1 and 0, respectively.)

image

Here is another dictionary example mixing the use of numbers and strings as keys:

image

Rather than adding each key-value pair individually, we could have also entered all the data for dict3 at the same time:

image

Creating the dictionary with a set key-value pair can be accomplished if all the data items are known in advance (obviously). The goal of the examples using dict3 is to illustrate the variety of keys that you can use. If we were to pose the question of whether a key for a particular value should be allowed to change, you would probably say, “No.” Right?

Not allowing keys to change during execution makes sense if you think of it this way: Let us say that you created a dictionary element with a key and value. Somehow during execution of your program, the key changed, perhaps due to an altered variable. When you went to retrieve that data value again with the original key, you got a KeyError (since the key changed), and you had no idea how to obtain your value now because the key had somehow been altered. For this reason, keys must be hashable, so numbers and strings are fine, but lists and other dictionaries are not. (See Section 7.5.2 for why keys must be hashable.)

How to Update Dictionaries

You can update a dictionary by adding a new entry or element (i.e., a key-value pair), modifying an existing entry, or deleting an existing entry (see below for more details on removing an entry).

image

If the key does exist, then its previous value will be overridden by its new value. The print statement above illustrates an alternative way of using the string format operator ( % ), specific to dictionaries. Using the dictionary argument, you can shorten the print request somewhat because naming of the dictionary occurs only once, as opposed to occurring for each element using a tuple argument.

You may also add the contents of an entire dictionary to another dictionary by using the update() built-in method. We will introduce this method in Section 7.4.

How to Remove Dictionary Elements and Dictionaries

Removing an entire dictionary is not a typical operation. Generally, you either remove individual dictionary elements or clear the entire contents of a dictionary. However, if you really want to “remove” an entire dictionary, use the del statement (introduced in Section 3.5.5). Here are some deletion examples for dictionaries and dictionary elements:

image

Core Tip: Avoid using built-in object names as identifiers for variables!

image

For those of you who began traveling in the Python universe before version 2.3, you may have once used dict as an identifier for a dictionary. However, because dict() is now a type and factory function, overriding it may cause you headaches and potential bugs. The interpreter will allow such overriding—hey, it thinks you seem smart and look like you know what you are doing! So be careful. Do NOT create variables with built-in names like: dict, list, file, bool, str, input, or len!

7.2 Mapping Type Operators

Dictionaries will work with all of the standard type operators but do not support operations such as concatenation and repetition. Those operations, although they make sense for sequence types, do not translate to mapping types. In the next two subsections, we introduce you to the operators you can use with dictionaries.

7.2.1 Standard Type Operators

The standard type operators were introduced in Chapter 4. Here are some basic examples using some of those operators:

image

How are all these comparisons performed? Like lists and tuples, the process is a bit more complex than it is for numbers and strings. The algorithm is detailed in Section 7.3.1.

7.2.2 Mapping Type Operators

Dictionary Key-Lookup Operator ( [ ] )

The only operator specific to dictionaries is the key-lookup operator, which works very similarly to the single element slice operator for sequence types.

For sequence types, an index offset is the sole argument or subscript to access a single element of a sequence. For a dictionary, lookups are by key, so that is the argument rather than an index. The key-lookup operator is used for both assigning values to and retrieving values from a dictionary:

image

(Key) Membership (in, not in)
image

Beginning with Python 2.2, programmers can use the in and not in operators to check key membership instead of the has_key() method:

image

7.3 Mapping Type Built-in and Factory Functions

7.3.1 Standard Type Functions [type(), str(), and cmp()]

The type() factory function, when applied to a dict, returns, as you might expect, the dict type, “<type 'dict'>”. The str() factory function will produce a printable string representation of a dictionary. These are fairly straightforward.

In each of the last three chapters, we showed how the cmp() BIF worked with numbers, strings, lists, and tuples. So how about dictionaries? Comparisons of dictionaries are based on an algorithm that starts with sizes first, then keys, and finally values. However, using cmp() on dictionaries isn't usually very useful.

The next subsection goes into further detail about the algorithm used to compare dictionaries, but this is advanced reading, and definitely optional since comparing dictionaries is not very useful or very common.

*Dictionary Comparison Algorithm

In the following example, we create two dictionaries and compare them, then slowly modify the dictionaries to show how these changes affect their comparisons:

image

In the first comparison, dict1 is deemed smaller because dict2 has more elements (2 items vs. 0 items). After adding one element to dict1, it is still smaller (2 vs. 1), even if the item added is also in dict2.

image

After we add the second element to dict1, both dictionaries have the same size, so their keys are then compared. At this juncture, both sets of keys match, so comparison proceeds to checking their values. The values for the 'host' keys are the same, but when we get to the 'port' key, dict1 is deemed larger because its value is greater than that of dict2’s 'port' key (8080 vs. 80). When resetting dict2’s 'port' key to the same value as dict1’s 'port' key, then both dictionaries form equals: They have the same size, their keys match, and so do their values, hence the reason that 0 is returned by cmp().

image

As soon as an element is added to one of the dictionaries, it immediately becomes the “larger one,” as in this case with dict1. Adding another key-value pair to dict2 can tip the scales again, as both dictionaries' sizes match and comparison progresses to checking keys and values.

image

Our final example reminds us that cmp() may return values other than -1, 0, or 1. The algorithm pursues comparisons in the following order.

(1) Compare Dictionary Sizes

If the dictionary lengths are different, then for cmp (dict1, dict2), cmp() will return a positive number if dict1 is longer and a negative number if dict2 is longer. In other words, the dictionary with more keys is greater, i.e.,

len(dict1) > len(dict2) ⇒ dict1 > dict2

(2) Compare Dictionary Keys

If both dictionaries are the same size, then their keys are compared; the order in which the keys are checked is the same order as returned by the keys() method. (It is important to note here that keys that are the same will map to the same locations in the hash table. This keeps key-checking consistent.) At the point where keys from both do not match, they are directly compared and cmp() will return a positive number if the first differing key for dict1 is greater than the first differing key of dict2.

(3) Compare Dictionary Values

If both dictionary lengths are the same and the keys match exactly, the values for each key in both dictionaries are compared. Once the first key with non-matching values is found, those values are compared directly. Then cmp() will return a positive number if, using the same key, the value in dict1 is greater than the value in dict2.

(4) Exact Match

If we have reached this point, i.e., the dictionaries have the same length, the same keys, and the same values for each key, then the dictionaries are an exact match and 0 is returned.

Figure 7-1 illustrates the dictionary compare algorithm we just outlined.

Figure 7-1. How dictionaries are compared

image

7.3.2 Mapping Type Related Functions

dict ( )

The dict() factory function is used for creating dictionaries. If no argument is provided, then an empty dictionary is created. The fun happens when a container object is passed in as an argument to dict().

If the argument is an iterable, i.e., a sequence, an iterator, or an object that supports iteration, then each element of the iterable must come in pairs. For each pair, the first element will be a new key in the dictionary with the second item as its value. Taking a cue from the official Python documentation for dict():

image

If it is a(nother) mapping object, i.e., a dictionary, then dict() will just create a new dictionary and copy the contents of the existing one. The new dictionary is actually a shallow copy of the original one and the same results can be accomplished by using a dictionary’s copy() built-in method. Because creating a new dictionary from an existing one using dict() is measurably slower than using copy(), we recommend using the latter.

Starting in Python 2.3, it is possible to call dict() with an existing dictionary or keyword argument dictionary (** function operator, covered in Chapter 11):

image

We remind viewers that the dict9 example is only an exercise in understanding the calling semantics of dict() and not a realistic example. It would be wiser (and better performance-wise) to execute something more along the lines of:

image

len()

The len() BIF is flexible. It works with sequences, mapping types, and sets (as we will find out later on in this chapter). For a dictionary, it returns the total number of items, that is, key-value pairs:

image

We mentioned earlier that dictionary items are unordered. We can see that above, when referencing dict2, the items are listed in reverse order from which they were entered into the dictionary.

hash()

The hash() BIF is not really meant to be used for dictionaries per se, but it can be used to determine whether an object is fit to be a dictionary key (or not). Given an object as its argument, hash() returns the hash value of that object. The object can only be a dictionary key if it is hashable (meaning this function returns a[n integer] value without errors or raising an exception). Numeric values that are equal (when pitted against each other using a comparison operator) hash to the same value (even if their types differ). A TypeError will occur if an unhashable type is given as the argument to hash() (and consequently if an attempt is made to use such an object as the key when assigning a value to a dictionary):

image

In Table 7.1, we summarize these three mapping type related functions.

Table 7.1. Mapping Type Related Functions

image

7.4 Mapping Type Built-in Methods

Dictionaries have an abundance of methods to help you get the job done, as indicated in Table 7.2.

Table 7.2. Dictionary Type Methods

image image

Below, we showcase some of the more common dictionary methods. We have already seen has_key() and its replacements in and not in at work above. Attempting to access a nonexistent key will result in an exception (KeyError) as we saw in Section 7.1.

Basic dictionary methods focus on their keys and values. These are keys(), which returns a list of the dictionary's keys, values(), which returns a list of the dictionary's values, and items(), which returns a list of (key, value) tuple pairs. These are useful when you wish to iterate through a dictionary's keys or values, albeit in no particular order.

image

The keys() method is fairly useful when used in conjunction with a for loop to retrieve a dictionary’s values as it returns a list of a dictionary’s keys. However, because its items (as with any keys of a hash table) are unordered, imposing some type of order is usually desired.

image

In Python versions prior to 2.4, you would have to call a dictionary’s keys() method to get the list of its keys, then call that list’s sort() method to get a sorted list to iterate over. Now a built-in function named sorted(), made especially for iterators, exists, which returns a sorted iterator:

image

The update() method can be used to add the contents of one dictionary to another. Any existing entries with duplicate keys will be overridden by the new incoming entries. Nonexistent ones will be added. All entries in a dictionary can be removed with the clear() method.

image

The copy() method simply returns a copy of a dictionary. Note that this is a shallow copy only. Again, see Section 6.20 regarding shallow and deep copies. Finally, the get() method is similar to using the key-lookup operator ( [ ] ), but allows you to provide a default value returned if a key does not exist. If a key does not exist and a default value is not given, then None is returned. This is a more flexible option than just using key-lookup because you do not have to worry about an exception being raised if a key does not exist.

image

image

The built-in method, setdefault(), added in version 2.0, has the sole purpose of making code shorter by collapsing a common idiom: you want to check if a dictionary has a key. If it does, you want its value. If the dictionary does not have the key you are seeking, you want to set a default value and then return it. That is precisely what setdefault() does:

image

Earlier, we took a brief look at the fromkeys() method, but here are a few more examples:

image

Currently, the keys(), items(), and values() methods return lists. This can be unwieldy if such data collections are large, and the main reason why iteritems(), iterkeys(), and itervalues() were added to Python in 2.2. They function just like their list counterparts only they return iterators, which by lazier evaluation, are more memory-friendly. In Python 3, each of the iter*() methods replaces their non-iterator counterparts and the iter*() return views—these are like a combination of a set and an iterator. You can iterate through them one-at-a-time, but each such collection behaves like a set.

image

7.5 Dictionary Keys

Dictionary values have no restrictions. They can be any arbitrary Python object, i.e., from standard objects to user-defined objects. However, the same cannot be said of keys.

7.5.1 More Than One Entry per Key Not Allowed

One rule is that you are constrained to having only one entry per key. In other words, multiple values per the same key are not allowed. (Container objects such as lists, tuples, and other dictionaries are fine.) When key collisions are detected (meaning duplicate keys encountered during assignment), the last (most recent) assignment wins.

image

Rather than producing an error, Python does not check for key collisions because that would involve taking up memory for each key-value pair assigned. In the above example where the key 'foo' is given twice on the same line, Python applies the key-value pairs from left to right. The value 789 may have been set at first, but is quickly replaced by the string 'xyz'. When assigning a value to a nonexistent key, the key is created for the dictionary and value added, but if the key does exist (a collision), then its current value is replaced. In the above example, the value for the key 'foo' is replaced twice; in the final assignment, 'xyz' is replaced by 123.

7.5.2 Keys Must Be Hashable

As we mentioned earlier in Section 7.1, most Python objects can serve as keys; however they have to be hashable objects—mutable types such as lists and dictionaries are disallowed because they cannot be hashed.

All immutable types are hashable, so they can definitely be used as keys. One caveat is numbers: Numbers of the same value represent the same key. In other words, the integer 1 and the float 1.0 hash to the same value, meaning that they are identical as keys.

Also, there are some mutable objects that are (barely) hashable, so they are eligible as keys, but there are very few of them. One example would be a class that has implemented the __hash__() special method. In the end, an immutable value is used anyway as __hash__() must return an integer.

Why must keys be hashable? The hash function used by the interpreter to calculate where to store your data is based on the value of your key. If the key was a mutable object, its value could be changed. If a key changes, the hash function will map to a different place to store the data. If that was the case, then the hash function could never reliably store or retrieve the associated value. Hashable keys were chosen for the very fact that their values cannot change. (This question can also be found in the Python FAQ.)

We know that numbers and strings are allowed as keys, but what about tuples? We know they are immutable, but in Section 6.17.2, we hinted that they might not be as immutable as they could be. The clearest example of that was when we modified a list object that was one of our tuple elements. To allow tuples as valid keys, one more restriction must be enacted: Tuples are valid keys only if they only contain immutable arguments like numbers and strings.

We conclude this chapter on dictionaries by presenting a program (userpw.py as in Example 7.1) that manages usernames and passwords in a mock login entry database system. This script accepts new users given that they provide a login name and a password. Once an “account” has been set up, an existing user can return as long as the user gives the login and correct password. New users cannot create an entry with an existing login name.

Example 7.1. Dictionary Example (userpw.py)

This application manages a set of users who join the system with a login name and a password. Once established, existing users can return as long as they remember their login and password. New users cannot create an entry with someone else’s login name.

image

image

Line-by-Line Explanation

Lines 1–3

After the Unix-startup line, we initialize the program with an empty user database. Because we are not storing the data anywhere, a new user database is created every time this program is executed.

Lines 5–15

The newuser() function is the code that serves new users. It checks to see if a name has already been taken, and once a new name is verified, the user is prompted for his or her password (no encryption exists in our simple program), and his or her password is stored in the dictionary with his or her user name as the key.

Lines 17–24

The olduser() function handles returning users. If a user returns with the correct login and password, a welcome message is issued. Otherwise, the user is notified of an invalid login and returned to the menu. We do not want an infinite loop here to prompt for the correct password because the user may have inadvertently entered the incorrect menu option.

Lines 26–51

The real controller of this script is the showmenu() function. The user is presented with a friendly menu. The prompt string is given using triple quotes because it takes place over multiple lines and is easier to manage on multiple lines than on a single line with embedded ' ' symbols. Once the menu is displayed, it waits for valid input from the user and chooses which mode of operation to follow based on the menu choice. The try-except statements we describe here are the same as for the stack.py and queue.py examples from the last chapter (see Section 6.14.1).

Lines 53–54

This is the familiar code that will only call showmenu() to start the application if the script was involved directly (not imported). Here is a sample execution of our script:

image

7.6 Set Types

In mathematics, a set is any collection of distinct items, and its members are often referred to as set elements. Python captures this essence in its set type objects. A set object is an unordered collection of hashable values. Yes, set members would make great dictionary keys. Mathematical sets translate to Python set objects quite effectively and testing for set membership and operations such as union and intersection work in Python as expected.

Like other container types, sets support membership testing via in and not in operators, cardinality using the len() BIF, and iteration over the set membership using for loops. However, since sets are unordered, you do not index into or slice them, and there are no keys used to access a value.

There are two different types of sets available, mutable (set) and immutable (frozenset). As you can imagine, you are allowed to add and remove elements from the mutable form but not the immutable. Note that mutable sets are not hashable and thus cannot be used as either a dictionary key or as an element of another set. The reverse is true for frozen sets, i.e., they have a hash value and can be used as a dictionary key or a member of a set.

Sets became available in Python 2.3 via the sets module and accessed via the ImmutableSet and Set classes. However, it was decided that having them as built-in types was a better idea, so these classes were then ported to C along with some improvements and integrated into Python 2.4. The sets module is deprecated in Python 2.6. You can read more about those improvements as well as set types in general in PEP 218 at http://python.org/peps/pep-0218.html.

image

Although sets are now an official Python type, they have often been seen in many Python applications (as user-defined classes), a wheel that has been reinvented many times over, similar to complex numbers (which eventually became a Python type way back in 1.4). Until current versions of Python, most users have tried to shoehorn set functionality into standard Python types like lists and dictionaries as proxies to a real set type (even if they were not the perfect data structure for their applications). Now users have more options, including a “real” set type.

Before we go into detail regarding Python set objects, we have to mentally translate the mathematical symbols to Python (see Table 7.3) so that we are clear on terminology and functionality.

Table 7.3. Set Operation and Relation Symbols

image

How to Create and Assign Set Types

There is no special syntax for sets like there is for lists ( [ ] ) and dictionaries ( { } ) for example. Lists and dictionaries can also be created with their corresponding factory functions list() and dict(), and that is also the only way sets can be created, using their factory functions set() and frozenset():

image

How to Access Values in Sets

You are either going to iterate through set members or check if an item is a member (or not) of a set:

image

How to Update Sets

You can add and remove members to and from a set using various built-in methods and operators:

image

As mentioned before, only mutable sets can be updated. Any attempt at such operations on immutable sets is met with an exception:

image

How to Remove Set Members and Sets

We saw how to remove set members above. As far as removing sets themselves, like any Python object, you can let them go out of scope or explicitly remove them from the current namespace with del. If the reference count goes to zero, then it is tagged for garbage collection.

>>> del s
>>>

7.7 Set Type Operators

7.7.1 Standard Type Operators (all set types)

Membership (in, not in)

As for sequences, Python’s in and not in operators are used to determine whether an element is (or is not) a member of a set.

image

Set Equality/Inequality

Equality (or inequality) may be checked between the same or different set types. Two sets are equal if and only if every member of each set is a member of the other. You can also say that each set must be a(n improper) subset of the other, e.g., both expressions s <= t and s >= t are true, or (s <= t and s >= t) is True. Equality (or inequality) is independent of set type or ordering of members when the sets were created—it is all based on the set membership.

image

Subset Of/Superset Of

Sets use the Python comparison operators to check whether sets are subsets or supersets of other sets. The “less than” symbols (<, <= ) are used for subsets while the “greater than” symbols (>, >= ) are used for supersets.

Less-than and greater-than imply strictness, meaning that the two sets being compared cannot be equal to each other. The equal sign allows for less strict improper subsets and supersets.

Sets support both proper (< ) and improper (<= ) subsets as well as proper (> ) and improper (>= ) supersets. A set is “less than” another set if and only if the first set is a proper subset of the second set (is a subset but not equal), and a set is “greater than” another set if and only if the first set is a proper superset of the second set (is a superset but not equal).

image

7.7.2 Set Type Operators (All Set Types)

Union ( | )

The union operation is practically equivalent to the OR (or inclusive disjunction) of sets. The union of two sets is another set where each element is a member of at least one of the sets, i.e., a member of one set or the other. The union symbol has a method equivalent, union().

image

Intersection ( & )

You can think of the intersection operation as the AND (or conjunction) of sets. The intersection of two sets is another set where each element must be a member of at both sets, i.e., a member of one set and the other. The intersection symbol has a method equivalent, intersection().

image

Difference/Relative Complement ( - )

The difference, or relative complement, between two sets is another set where each element is in one set but not the other. The difference symbol has a method equivalent, difference().

image

Symmetric Difference ( ^ )

Similar to the other Boolean set operations, symmetric difference is the XOR (or exclusive disjunction) of sets. The symmetric difference between two sets is another set where each element is a member of one set but not the other. The symmetric difference symbol has a method equivalent, symmetric_difference().

image

Mixed Set Type Operations

In the above examples, s is a set while t is a frozenset. Note that each of the resulting sets from using the set operators above result in sets. However note that the resulting type is different when the operands are reversed:

image

If both types are sets or frozensets, then the type of the result is the same type as each of the operands, but if operations are performed on mixed types (set and frozenset, and vice versa), the type of the resulting set is the same type as the left operand, which we can verify in the above.

And no, the plus sign is not an operator for the set types:

image

7.7.3 Set Type Operators (Mutable Sets Only)

(Union) Update ( | = )

The update operation adds (possibly multiple) members from another set to the existing set. The method equivalent is update().

image

Retention/Intersection Update ( & = )

The retention (or intersection update) operation keeps only the existing set members that are also elements of the other set. The method equivalent is intersection_update().

image

Difference Update ( - = )

The difference update operation returns a set whose elements are members of the original set after removing elements that are (also) members of the other set. The method equivalent is difference_update().

image

Symmetric Difference Update ( ^ = )

The symmetric difference update operation returns a set whose members are either elements of the original or other set but not both. The method equivalent is symmetric_difference_update().

image

7.8 Built-in Functions

7.8.1 Standard Type Functions

len()

The len() BIF for sets returns cardinality (or the number of elements) of the set passed in as the argument.

image

7.8.2 Set Type Factory Functions

set() and frozenset()

The set() and frozenset() factory functions generate mutable and immutable sets, respectively. If no argument is provided, then an empty set is created. If one is provided, it must be an iterable, i.e., a sequence, an iterator, or an object that supports iteration such as a file or a dictionary.

image

7.9 Set Type Built-in Methods

7.9.1 Methods (All Set Types)

We have seen the operator equivalents to most of the built-in methods, summarized in Table 7.4.

Table 7.4. Set Type Methods

image

The one method without an operator equivalent is copy(). Like the dictionary method of the same name, it is faster to create a copy of the object using copy() than it is using a factory function like set(), frozenset(), or dict().

7.9.2 Methods (Mutable Sets Only)

Table 7.5 summarizes all of the built-in methods that only apply to mutable sets, and similar to the methods above, we have already seen most of their operator equivalents.

Table 7.5. Mutable Set Type Methods

image

The new methods here are add(), remove(), discard(), pop(), and clear(). For the methods that take an object, the argument must be hashable.

7.9.3 Using Operators versus Built-in Methods

As you can see, there are many built-in methods that have near-equivalents when using operators. By “near-equivalent,” we mean that there is one major difference: when using the operators, both operands must be sets while for the methods, objects can be iterables too. Why was it implemented this way? The official Python documentation states that “[this] precludes error-prone constructions like set('abc') [and] 'cbs' in favor of the more readable set('abc').intersection('cbs').”

7.10 Operator, Function/Method Summary Table for Set Types

In Table 7.6, we summarize all of the set type operators, functions, and methods.

Table 7.6. Set Type Operators, Functions, and Methods

image image image

7.11 Related Modules

The sets module became available in 2.3 and may be useful if you wish to subclass the Set or ImmutableSet classes. Although set types were integrated into Python 2.4, there are currently no plans to deprecate the module.

Some general online references for sets which you may find useful include:

http://en.wikipedia.org/wiki/Set

http://www.geocities.com/basicmathsets/set.html

http://www.math.uah.edu/stat/foundations/Sets.xhtml

7.12 Exercises

7-1. Dictionary Methods. What dictionary method would we use to combine two dictionaries together?

7-2. Dictionary Keys. We know that dictionary values can be arbitrary Python objects, but what about the keys? Try using different types of objects as the key other than numbers or strings. What worked for you and what didn't? As for the failures, why do you think they didn't succeed?

7-3. Dictionary and List Methods.

(a) Create a dictionary and display its keys alphabetically.

(b) Now display both the keys and values sorted in alphabetical order by the key.

(c) Same as part (b), but sorted in alphabetical order by the value. (Note: This has no practical purpose in dictionaries or hash tables in general because most access and ordering [if any] is based on the keys. This is merely an exercise.)

7-4. Creating Dictionaries. Given a pair of identically sized lists, say, [1, 2, 3,...], and ['abc', 'def', 'ghi',...], process all that list data into a single dictionary that looks like: { 1:'abc', 2:'def', 3:'ghi',...}.

7-5. userpw2.py. The following problems deal with the program in Example 7.1, a manager of a database of name-password key-value pairs.

(a) Update the script so that a timestamp (see the time module) is also kept with the password indicating date and time of last login. This interface should prompt for login and password and indicate a successful or failed login as before, but if successful, it should update the last login timestamp. If the login occurs within four hours of the last login, tell the user, “You already logged in at: <last_ login_timestamp>.”

(b) Add an “administration” menu to include the following two menu options: (1) remove a user and (2) display a list of all users in the system and their passwords

(c) The passwords are currently not encrypted. Add password-encryption if so desired (see the getpass, md5, crypt [Unix-only], hashlib [2.5], and other cryptographic modules).

(d) *Add a GUI interface, i.e., Tkinter, on top of this application.

(e) Allow usernames to be case-insensitive.

(f) Restrict usernames by not allowing symbols or whitespace.

(g) Merge the “new user” and “old user” options together. If a new user tries to log in with a nonexistent username, prompt if they are new and if so, do the proper setup. Otherwise, they are an existing user so log in as normal.

7-6. Lists and Dictionaries. Create a crude stock portfolio database system. There should be at least four data columns: stock ticker symbol, number of shares, purchase price, and current price—you can add more if you wish, such as percentage gain(loss), 52-week high/low, beta, etc.

Have the user input values for each column to create a single row. Each row should be created as list. Another all-encompassing list will hold all these rows. Once the data is entered, prompt the user for one column to use as the sort metric. Extract the data values of that column into a dictionary as keys, with their corresponding values being the row that contains that key. Be mindful that the sort metric must have non-coincidental keys or else you will lose a row because dictionaries are not allowed to have more than one value with the same key. You may also choose to have additional calculated output, such as percentage gain/loss, current portfolio values, etc.

7-7. Inverting Dictionaries. Take a dictionary as input and return one as output, but the values are now the keys and vice versa.

7-8. Human Resources. Create a simple name and employee number dictionary application. Have the user enter a list of names and employee numbers. Your interface should allow a sorted output (sorted by name) that displays employee names followed by their employee numbers. Extra credit: Come up with an additional feature that allows for output to be sorted by employee numbers.

7-9. Translations.

(a) Create a character translator (that works similar to the Unix tr command). This function, which we will call tr(), takes three strings as arguments: source, destination, and base strings, and has the following declaration:

def tr(srcstr, dststr, string)

srcstr contains the set of characters you want “translated,” dststr contains the set of characters to translate to, and string is the string to perform the translation on. For example, if srcstr == 'abc', dststr == 'mno', and string == 'abcdef', then tr() would output 'mnodef'. Note that len(srcstr) == len(dststr). For this exercise, you can use the chr() and ord() BIFs, but they are not necessary to arrive at a solution.

(b) Add a new flag argument to this function to perform case-insensitive translations.

(c) Update your solution so that it can process character deletions. Any extra characters in srcstr that are beyond those that could be mapped to characters in dststr should be filtered. In other words, these characters are mapped to no characters in dststr, and are thus filtered from the modified string that is returned. For example, if srcstr == 'abcdef', dststr == 'mno', and string == 'abcdefghi', then tr() would output 'mnoghi'. Note now that len(srcstr) >= len(dststr).

7-10. Encryption. Using your solution to the previous problem, and create a “rot13” translator. “rot13” is an old and fairly simplistic encryption routine whereby each letter of the alphabet is rotated 13 characters. Letters in the first half of the alphabet will be rotated to the equivalent letter in the second half and vice versa, retaining case. For example, a goes to n and X goes to K. Obviously, numbers and symbols are immune from translation.

(b) Add an application on top of your solution to prompt the user for strings to encrypt (and decrypt on reapplication of the algorithm), as in the following examples:

image

7-11. Definitions. What constitutes valid dictionary keys? Give examples of valid and invalid dictionary keys.

7-12. Definitions. (a) What is a set in the mathematical sense? (b) What is a set type as it relates to Python?

7-13. Random Numbers. The next problems use a customization of Exercise 5-17: use randint() or randrange() in the random module to generate a set of numbers: generate between 1 to 10 random numbers numbered randomly between 0 and 9 (inclusive). These values constitute a set A (A can be mutable or otherwise). Create another random set B in a similar manner. Display A | B and A & B each time sets A and B are generated.

7-14. User Validation. Alter the previous problem where instead of displaying A | B and A & B, ask the user to input solutions to A | B and A & B, and let the user know if his or her solution was right or wrong. If it is not correct, give the user the ability to correct and revalidate his or her answers. Display the correct results if three incorrect answers are submitted. Extra credit: Use your knowledge of sets to generate potential subsets and ask the user whether they are indeed subsets (or not), and provide corrections and answers as necessary as in the main part of this problem.

7-15. Set Calculator. This exercise is inspired by Exercise 12.2 in the free online Java textbook located at http://math.hws.edu/javanotes. Create an application that allows users to input a pair of sets, A and B, and allow users to give an operation symbol, i.e., in, not in, &, |, ^, <, <=, >, >=, ==, !=, etc. (For sets, you define the input syntax—they do not have to be enclosed in brackets as the Java example.) Parse the entire input string and execute the operation on the input sets as requested by the user. Your solution should require fewer lines of Python than the one in Java.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset