Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

J. L. HodgesSoftware Engineering from Scratchhttps://doi.org/10.1007/978-1-4842-5206-2_12

12. Data Structures

Jason Lee Hodges¹

(1)

Draper, UT, USA

In the previous chapter, you learned that one of the essential strategies behind mitigating operational cost risk in your software is to optimize your code for performance efficiency. Imagine, for example, that you want to create a new social media company. Successful social media companies have potentially billions of users with complicated connections between the users. If you want people to use your service, you will need to ensure consistent uptime and that information on the network is retrieved in a responsive manner. This is an extremely difficult engineering challenge.

In order to ensure optimal experience in your software, you first need to determine how to measure the performance of a program using Big O notation. As you might recall, Big O notation is a measure of the order of magnitude of operations performed on an input data set of length n. In the case of a social media company, the input data will likely be either the number of connections in a particular user’s network or the amount of activity in their feed. But the important thing to note is that performance is always measured based on an input data set, or group of data.

You were previously introduced to a variety of basic data types, of which included groups of data like Lists and Maps. Data structures are also groups of data that we can introduce to our programs in varying implementations in order to maximize the performance of our programs. Each of the following data structures, or groups of data, has very similar operations that can be performed on each of them. We will analyze each operation using Big O notation in order to compare and contrast their varying performance. Once you have a grasp on what each data structure can do, you can start to apply them to your software projects to optimize your program’s algorithmic efficiency. Again, picking the right data structures could be the difference in determining whether your software is profitable or even possible.

Arrays

You were briefly introduced to the notion of an array in a previous chapter. Some languages consider lists and arrays to be synonymous, or at least, they abstract the implementation of this type of grouped data away from the user so they are not aware of the difference. But an array is a very particular arrangement of grouped data.

The traditional array is a group of data of a fixed size that must be determined upon its instantiation. It takes that predetermined size at instantiation and goes to the computer’s memory and reserves consecutive memory locations. Upon construction, it places null values in those locations. Listing 12-1 demonstrates the instantiation of an array in Scala.

scala> val users: Array[String] = new Array(10)

users: Array[String] = Array(null, null, null, null, null, null, null, null, null, null)

scala> users(0) = "me"

scala> users

res0: Array[String] = Array(me, null, null, null, null, null, null, null, null, null)

Listing 12-1

Instantiating an array in Scala

It is important to identify the type of the array that you wish to instantiate; otherwise, Scala will imply that you want the array to be a group of data of type Nothing. This then means you cannot fill it with any data except nulls, which wouldn’t be very useful. In this example, we told Scala that we want this array to contain 10 positions and that those positions would be filled with String type data.

Once you have initialized the array, you can fill it with data using the index accessor. In this example, we added a user, represented by the string "me", to the group’s first position, index 0. Because a reference to the array was stored in the variable users, the computer was able to immediately find where this array was stored in memory. From there, the computer can add a number of memory location positions to the request to find the position you want to store the data in. In this example, we are asking the computer to store the value “me” in the starting memory position of the users array plus 0 positions. If we wanted to store a user at the end of the array, we would ask the computer to find the array’s starting memory location plus 9 positions, or users(9). Because accessing memory in this way is just a simple arithmetic operation, finding or assigning data in an array is immediate. The computer doesn’t have to look all over its memory to find where the data is stored; it has an instant reference point. Because the operation is immediate, no matter how big you decided the array should be, the assignment operation of an index position of an array is of constant algorithmic complexity, or O(1).

If you recall, O(1) complexity is the fastest we can hope to achieve in our software. So, if this is the fastest group of data, why don’t we use it for everything? Well, what if you don’t know how big your group of data is going to be? How could you possibly hope to guess? You might just try to instantiate an array so large that it would never be filled, but that would be a terrible waste memory. Not to mention, determining the length of data in your array would not be efficient. For example, determining the length of data in a full array is O(1) time since that information is decided upon instantiation. However, if you have an array of length 10,000 and it only has 100 actual items and the rest are null, you would have to traverse the entire array to determine the real length of the array. That would mean you would have to do 1 operation for each index position, which is considered O(n) time complexity.

So, it should be easy to see that the disadvantage of an array is that in order to realize its full benefits, you must know the size of data ahead of time. If your data grows too big for the array, you can always instantiate a new array with more positions and move all your items into the new array. However, that would require one assignment operation for each data point, which is O(n) time. So, in situations where you don’t know the size of your group of data, or it needs to grow and shrink fairly dynamically, you might consider using a linked list.

Linked Lists

A linked list is a group of data that is not stored in contiguous memory. Conversely, each item in a linked list is stored separately as individual items with a reference pointer to the other items in the list. In a linked list, each item in the list is considered a node. The first node in a linked list is known as the head and the last node, or the node that points to no additional nodes, is known as the tail. The tail contains a reference pointer to null until a new item is added to the end of the list, at which point the previous tail now points to this new node and the new tail node now points to null. Figure 12-1 contains a visual representation of a linked list of nodes.

../images/476847_1_En_12_Chapter/476847_1_En_12_Fig1_HTML.png — Figure 12-1
Visual representation of a linked list of nodes

Because each node in the list simply refers to another node, this data structure is free to grow and shrink as much as it wants without having to instantiate another data structure. It’s also extremely efficient to insert nodes and delete nodes if you already have a reference to them in your program because all that is required is to change its reference pointer. For an insertion operation, if you have access to the two nodes between which you want to insert a new node, you simply follow these two steps:

1.
Assign the reference pointer of the first node to the new node.
2.
Assign the reference pointer of the new node to the second node.

In a full array, an insertion of this nature would require you to instantiate a new array, traverse the existing array to assign all its values to the new array, pause where the insertion should occur to insert the new value, and then continue on assigning its remaining values. Deletion on an array is the same procedure but simply in reverse. Insertion and deletion on an array, therefore, are of O(n) complexity. In a linked list that is free to grow and shrink, both insertion and deletion are simple operations of reassigning pointers. Therefore, insertion and deletion on linked lists are considered O(1) complexity operations. This is the major advantage of linked lists over arrays and is especially important when your data structures are required to maintain a certain order because adding new items to a list that needs to be sorted requires, by its nature, to insert new items in a specified order rather than at the end of the group of data.

The disadvantage of linked lists is that accessing an item in the list is O(n) complexity. Consider a linked list with ten nodes. If you want to access and use the seventh node in the list, you must first access the head and follow its reference pointer to the second node in the list. From the second node, you follow its reference pointer to the third node and so on until you reach the seventh node. In the best case scenario, the item you want to access is the head of the list and it only requires one operation. In the worst case scenario, the item you want to access is at the very end of the list and requires you to traverse the entire n length list, hence the O(n) complexity. We always measure the complexity of data structures based on the worst case scenario since we cannot rely on the best case when assessing risk in our software.

If you wanted to implement your own linked list in Scala, it would only require defining two classes, a LinkedList class and a Node class that the linked list could reference. An example of this implementation is demonstrated in Listing 12-2.

LinkedList.scala

package data.structures

case class LinkedList[T](private val data: T, var head: Node[T] = null) {

head = new Node(data)

def add(data: T) {

val newNode = Node(data)

if(this.head == null){

this.head = newNode

}

else {

var current = this.head

while(current.next != null){

current = current.next

}

current.next = newNode

}

Node.scala

package data.structures

case class Node[T](data: T, var next: Node[T] = null)

Listing 12-2

An implementation of a Linked List

As you can see in this example, all that is required of the individual node in a linked list is to keep track of its own data value and the reference to the next node. This makes reassignment to its surrounding nodes a fairly trivial operation. You’ll notice that we’ve created a method for adding data to the end of the linked list which requires traversal of the list to find the tail. This is an O(n) operation since it is seeking the last node. In an array, finding the last node would be an O(1) operation. Deciding whether to use a linked list or an array depends on what type of operations will be performed most frequently on the group of input data.

You might also notice some new syntax in this example. In both the LinkedList class and the Node class, there is a reference to a generic type [T]. A generic is a notation that allows you to let the downstream developer decide what type of data the linked list will use. If I were to instantiate this new linked list, I would assign it to a variable of type LinkedList[String] in order to tell the linked list that it expects all of its Nodes to contain String types in their data property.

It is worthy to note that you do not need to write your own linked list implementation for each project. Scala has built-in linked lists that you can use that are optimized for performance. The List data type that you were introduced to earlier in this book is an implementation of an immutable linked list. You can also use the mutable.LinkedList data structure from the scala.collection standard library if you need a mutable linked list.

Exercise 12-1

Consider the following scenarios. For each scenario, determine whether to use an array or a linked list data structure to optimize the performance of your program:

1.
You are creating a program that keeps track of users in alphabetical order. More users are added to the group every hour.
2.
You need to design a program that can list every tenth person from a list for statistical analysis on a regular basis.

Queues and Stacks

Queues and stacks are both specialized implementations of a linked list. The big difference that both of these share is that neither data structure allows for insertion. However, adding and removing items from a queue or a stack are both O(1) operations. Given this, these data structures might be considered fairly efficient, albeit limited.

You can think of a queue like a line for a ride at an amusement park. The first person in the line becomes the head of this bespoke linked list. As more people become interested in the ride, they begin to line up behind the head – first come, first serve. This is known as first in, first out or FIFO. No one is allowed to cut in the line to make their wait shorter; that would be considered rude. Hence, you cannot insert items into the middle of a queue. When someone reaches the front of the line and it is their turn to board the ride, they can then leave the queue which is known as dequeuing or being “popped” off the queue. An item can only be removed from the queue if it is in the head position. Thus, it could be said that, besides the other characteristics inherited from a linked list, there are three main attributes of a queue:

1.
Items can only be added to the queue at the tail.
2.
Items cannot be inserted into the queue.
3.
Items can only be removed from the queue at the head.

Queues can be used for a variety of situations, most notably in algorithms regarding traversal of different data structures and in handling challenges associated with concurrency programming. In these scenarios, they are normally used as a temporary staging lists to handle items for short periods of time. Listing 12-3 provides an example of using a queue from the Scala mutable collections library.

scala> import scala.collection.mutable.Queue

import scala.collection.mutable.Queue

scala> val q = new Queue[String]

q: scala.collection.mutable.Queue[String] = Queue()

scala> q.enqueue("Lisa", "Dale", "Jared")

scala> q.length

res1: Int = 3

scala> val firstPerson = q.dequeue()

firstPerson: String = Lisa

scala> q.length

res2: Int = 2

Listing 12-3

Adding and removing items from a queue

As you can see from this example, to add items to a mutable queue in Scala, you use the enqueue method, and to remove items from the queue, you use the dequeue method. The dequeue method not only removes the item from the queue, but it also returns that item so that you can perform operations with it. You’ll notice that we have stored the first item to be removed from this queue in the firstPerson variable. Just like a linked list, you can check the length at any point. Before the dequeuing operation occurs, the length is three and after it is two.

A stack data structure is much like a queue except it follows a last in, first out (LIFO) methodology. You can think of a stack structure like a stack of pancakes. As you add items to the stack, they always go on top. Likewise, items removed from the stack will always be removed from the top as well. The first item added to the stack, which is at the very bottom of the stack, will be the last item to be removed from the stack. Just like queues, stacks have three main characteristics:

1.
Items can only be added to the stack at the head.
2.
Items cannot be inserted into the stack.
3.
Items can only be removed from the stack at the head.

Examples of usages for stacks include parsing tags in code like html or xml to ensure that each opening tag has a closing tag and that they are properly nested, adding instructions for execution to a computer processor, or keeping track of the history of visited web pages in the back button of a web browser. An example of using a stack from the Scala mutable collections library is demonstrated in Listing 12-4.

scala> import scala.collection.mutable.Stack

import scala.collection.mutable.Stack

scala> val history = new Stack[String]

history: scala.collection.mutable.Stack[String] = Stack()

scala> history.push("Google","Github","System76")

res4: history.type = Stack(System76, Github, Google)

scala> history.length

res5: Int = 3

scala> val lastViewed = history.pop()

lastViewed: String = System76

scala> history.length

res6: Int = 2

Listing 12-4

Adding and removing items from a stack

Just like the queue, you can add and remove items to and from the stack. When you add an item, you use the push method, and when you remove items, you use the pop method. Items that are popped off the stack are returned from the pop method so that you can use them in your program. In this case we have stored that item in the lastViewed variable for later reference. As you can see, the length of the stack before the pop operation was 3, and after the operation, it is now of length 2.

As you can see, arrays, linked lists, queues, and stacks all have their advantages and disadvantages when it comes to inserting, deleting, and accessing known items in the group of data. But what if you have no idea what values a group of data contains and you want to search for an arbitrary value? All of these data structures perform “search” operations in O(n) time. Might we be able to improve upon that? If so, at what cost? Hash tables and trees may provide an answer to these questions.

Hash Table

A hash table is a data structure that uses a hashing function to determine the location of each item in the group of data. The hash function can have various implementations, ranging from extremely simple to increasingly complex. However, just like any function, a hashing function takes an input (the item to be added, removed, or looked up in the group of data) and returns a single output (the location of the item in the group).

By way of example, let’s assume that we are creating a program that will need to store a list of friends in a group of data. We might start with a basic array with a predetermined length in which to store these friends. In our program, the need for a user to enter a search term to see if they can find a friend in the list is a basic requirement. Because this operation might be performed frequently, we want it to return a result in the most optimized time frame possible. In a basic array, the only way to know if the search term is contained in the list when searching is by iterating through each position in the array and checking if the search term matches the item in that position. This would be an O(n) operation as demonstrated in Listing 12-5.

val friends = Array("Jonathan", "Eric", "Jacob", "Kim", "Jeremy")

def search(input: String): Any = {

friends.foreach(friend => {

if(friend == input) {

return friend

}

})

}

val result = search("Jeremy")

println(result)

Listing 12-5

An O(n) search operation on an array

In this example, the search term was contained in the very last position of the array, so the search function had to traverse the entire array to find it. This is less than ideal in an application that might need to scale to billions of searches. In order to optimize this search experience, we can add a hash function to our array, which allows us to determine whether any item exists in an array in O(1) time. The hash function demonstrated in Listing 12-6 is an extremely rudimentary modulo division implementation, which works for this simple example. However, production hash functions are typically much more complex.

def hash(item: String, group: Array[String]): Int = {

return item.length % group.length

}

val hashIndex = hash("Jeremy", friends)

println(hashIndex)

Terminal Output

Listing 12-6

Modulo division hashing function

In this example, our hashing function takes the length of the input string, “Jeremy,” and performs modulo division on it with the length of the array as the additional operand in the equation. The length of string “Jeremy,” which is 6, divided by the length of the array, which is 5, is 1 with a remainder of 1. That remainder of 1 is returned by the hashing function and can be used as its index position. If we perform this hashing function on each item in the array before adding them to the array, we can ensure that simply by knowing what the item in the array is, we can determine its exact index location in O(1) time. Figure 12-2 illustrates this fact for each item in the group.

../images/476847_1_En_12_Chapter/476847_1_En_12_Fig2_HTML.png — Figure 12-2
Applying the modulo division hashing function to each item in a group of data with length 5 and then storing the item in the corresponding index position returned by the function

By performing the hashing function on each item prior to adding it to the array, we will then have an optimized array for searching. Now, when a user types in a search term, we can perform the hashing function on the search term itself which will tell us which index position to look in to find the string in question. By avoiding the need to traverse the entire array, we have now optimized our search time to be O(1). Listing 12-7 provides a complete example of hashing items before insertion and upon searching.

val friends: Array[String] = new Array(5)

def add(input:String, group: Array[String]) {

group(hash(input, group)) = input

}

add("Jonathan", friends)

add("Eric", friends)

add("Jacob", friends)

add("Kim", friends)

add("Jeremy", friends)

def contains(input: String, group: Array[String]): Boolean = {

if(group(hash(input, group)) == input){

return true

}

return false

}

println(contains("Jeremy", friends))

println(contains("Bob", friends))

Terminal Output

true

false

Listing 12-7

Hashing items before insertion and upon searching for items

As you can see, by inserting each friend into the array according to the result of their hash function, we can guarantee an O(1) operation when searching to see if the array “contains” a friend, as demonstrated by the contains function. In the second call to the contains function in this example, we perform the hash function on the input string “Bob” which evaluates to 3. We immediately check in the array at index 3 to see if Bob is there. Since that index position contains the value “Kim,” we can automatically determine that “Bob” is not in the list of friends without needing to search the rest of the array.

You might notice that this methodology will not work if we are trying to add two friends whose names contain the same length of characters. If we wanted to add “Bob” to our list of friends, his hash code would be the index 3, which is already occupied by “Kim.” In this situation we encounter what is called a hash collision. To deal with a hash collision, we can insert a list into the index position where the collision has occurred and store any name that has the same length of characters in that new inner list. So, index 3 would contain an inner list like so: List("Kim", "Bob"). Then when performing the contains function, we can iterate through each item in the list in that index position to see if the search term exists in the inner list. This is slightly less than O(1) time complexity since there is a bit of traversal going on within the inner list. However, it is more performant than traversing the entire array. More complicated hashing functions minimize the probability of hash code collisions.

It is important to note that, in order to ensure the uniqueness of hash code values, data structures that use hashes to store values must contain only unique values. You cannot store any duplicate data in hash tables. Also, given how the hash function determines where the data should be stored, it might be obvious to you that data stored via hashing cannot do so in any type of sorted order. If you attempt to sort a hashing table, you will forfeit the benefits of its O(1) search times.

In the Scala mutable collections library, you can see that the implementation of the HashTable is written as a Trait. Therefore, you cannot use a hash table directly in Scala. You can, however, decorate your own data structure with the HashTable trait if you desire. That being said, the HashTable is used in various implementations of both Maps and Sets, both of which have native Scala collections, which we will cover later in this chapter.

Trees

If you wish to have increased performance when searching for an item in a group of data but you must also mandate that the list maintain some kind of sorted order, a Binary Search Tree data structure might be what you are looking for. The binary search tree will not provide you the same O(1) time complexity achieved when searching for items in a hash table; however, it does allow for better than O(n) search time. There are many types of trees that can be used in software engineering, but the binary search tree is fairly common for searching items in sorted order.

Just as you would expect from the title, a binary tree is a tree in which every branch splits into two potential paths, left and right (hence binary). Like a linked list, the tree is composed of a set of nodes that reference each other. Unlike a linked list, the tree does not have a head but rather what is referred to as the root node. Every node, including the root node, has a property leftChild and rightChild in addition to a value property that contains the value of the item contained in that node. The left and right child properties are assigned to either other nodes or a null value. If both the leftChild and rightChild of a node are null, it is considered a leaf node. If either property points to another node, it is considered a branch node. Figure 12-3 provides an illustration of this data structure.

../images/476847_1_En_12_Chapter/476847_1_En_12_Fig3_HTML.png — Figure 12-3
An illustration of a Binary Search Tree data structure

In order to allow for increased search performance, values inserted into the tree should do so in sorted order. At no point can a binary search tree ever be unsorted. When adding an item to the tree, if there are no items in the tree, then a root node is created and the value of the item is added to this root node. When the next item is added to the tree, if its value is less than the value of the root node, it is added as a node that the root’s leftChild property points to. If its value is greater than the value of the root node, it is added as a node that the root’s rightChild property points to. If leftChild or rightChild already have a node assigned to them, then the new item will be compared to the value of that node. The new item will continue to traverse the tree of nodes, comparing to the left and to the right, until it finds a null reference point to assign itself to. If the value of the node being inserted ends up being equal to the value of a node that already exists on the tree, then the new node will be ignored. Just like hash tables, the values in a tree must be unique in order to obtain the performance benefits that a tree can provide.

Note

There are edge cases in tree data structures where items being added to the list will already be in a sorted order. In this case, a straight-line tree will occur where either every leftChild or every rightChild is null which essentially creates a linked list, nullifying the benefits of the binary nature of the tree. In this case, many tree structures have built-in balancing algorithms that re-organize themselves to maintain performance. For further study, investigate red-black trees or weight balanced binary trees.

By assigning items to a binary search tree in this way, we can guarantee to find items in the group in less than O(n) time. Take Figure 12-4 for example, which contains a binary search tree of integer values. If we needed to know whether or not the number 5 existed in this tree, we would start by examining the root node. If the root node had a value equal to 5, we would return true and the search function would be done. If it does not equal 5, we check to see if 5 is larger or smaller than the value of the root node. Because it is smaller, we move to the left branch of the root node for further investigation. By doing this, we effectively eliminate half of the data structure that we would have otherwise needed to traverse in order to find the value we are looking for. After examining the next node, which contains a value of 3, we determine that 5 is larger than 3, and therefore we move to its right child node for further investigation, eliminated another half of the remaining values left to search. Each time we traverse through a level of the tree, we eliminate up to half of the potential values that we would have otherwise had to search in other data structures. This half-life type efficiency is what is known as logarithmic efficiency, or O(log(n)) time. Insertion, deletion, and searching in binary search trees are all performed in O(log(n)) time, which is not as good as O(1) time, but still exceptionally better than O(n) time.

../images/476847_1_En_12_Chapter/476847_1_En_12_Fig4_HTML.png — Figure 12-4
Example of a Binary Search Tree with node values

Just like hash tables, a tree is typically not used alone in Scala. In other words, like the hash table, a tree is an abstract data structure. Instead of using a tree on its own, its organizing principles are generally extended by either sets or maps. If you desire, you can also create your own data structures that extend the capabilities of a tree as well.

Sets

A set, in both mathematics and computer science, is a collection of items that are inherently unique or distinct, meaning that there are no duplicates. Sound familiar? One of the main side effects from the performance gains of both hash tables and trees is that the items contained within them must be unique. Thus, the abstraction of hash tables and trees can be applied to the Set data type.

In the Scala mutable collections library, the HashSet implements the characteristics of a hash table on a collection of unique items. Items can be added to the set, which are passed through the hashing function to assign them to their index position in the set. In the same vein, a HashSet has a contains method as one of its members. This method passes the item being searched for through the hashing function to look for corresponding matches in the set in O(1) time. The job of implementing a hashing function, deciding the size of the underlying hash table, and how to keep it efficiently balanced to minimize hash collisions is left up to Scala to implement for you. By using the built-in HashSet collection, you can be assured that these methods are implemented as efficiently as possible. An example of using the HashSet data structure in Scala is shown in Listing 12-8.

import scala.collection.mutable.HashSet

val friends: HashSet[String] = new HashSet()

friends.add("Jonathan")

friends.add("Eric")

friends.add("Jacob")

friends.add("Kim")

friends.add("Jeremy")

friends.add("Bob")

friends.add("Eric")

println(friends)

println(friends.contains("Jeremy"))

println(friends.contains("Bob"))

Terminal Output

Set(Jeremy, Kim, Jacob, Bob, Jonathan, Eric)

true

Listing 12-8

An example of using a HashSet from the Scala mutable collections library

Notice in this example that we’ve added two names to the set that have the same length, “Kim” and “Bob.” The hashing function that the native HashSet implementation of Scala uses handles the hashing collision that we would have run into if we were using our own modulo division implementation of the hashing function. We also attempt to add in “Eric” to the set twice, which is ignored by the set as you can see when the final set is printed to the terminal. You might also notice that the order of the items the set ultimately prints is not the same order in which they were added. This is confirmation that they are being positioned according to the hashing function rather than according to the order in which they were inserted. Remember, data structures that implement hashes while gaining the O(1) access and searching performance cannot contain any type of sorted order.

If you wish to have a unique set of items that can maintain a sorted order, you can turn to the native TreeSet data structure from the Scala mutable collections library. Just like the HashSet, the TreeSet implements the features of the tree abstract data type for a set of values. By using the native Scala TreeSet, you can be assured that the ordering and balancing of the tree is implemented as efficiently as possible. An example of using the native TreeSet in Scala is provided in Listing 12-9.

import scala.collection.mutable.TreeSet

val numbers: TreeSet[Int] = new TreeSet()

numbers.add(1)

numbers.add(3)

numbers.add(13)

numbers.add(7)

numbers.add(5)

numbers.add(10)

numbers.add(14)

numbers.add(15)

numbers.add(11)

numbers.add(10)

println(numbers)

println(numbers.contains(5))

Terminal Output

TreeSet(1, 3, 5, 7, 10, 11, 13, 14, 15)

true

Listing 12-9

An example of using a TreeSet from the Scala mutable collections library

In this example, we add a collection of numbers to a set in random order. We also attempt to add the number 10 twice, which per the properties of a set is ignored. When printing out the resulting TreeSet, you can see that the items are stored in sorted order. Also, when printing whether or not the set of numbers contains the number 5, we receive a true value that was accessed in O(log(n)) time.

Maps

You’ve already been introduced to maps as basic data structures. Maps are collections of key/value pairs. In situations where you need key/value pairs but you want to obtain the performance gains of an abstract data structure, you can turn to the Scala mutable collections library for assistance. If you need maximum O(1) performance and order is not important, you can use the HashMap. If order is mandatory, then you can settle for the O(log(n)) performance of a TreeMap. Listing 12-10 provides an example of each.

import scala.collection.mutable.HashMap

val friends: HashMap[String, Int] = new HashMap()

friends.put("Jonathan", 32)

friends.put("Eric", 27)

friends.put("Jacob", 21)

friends.put("Kim", 42)

friends.put("Jeremy", 18)

println(friends)

println(friends.contains("Jeremy"))

import scala.collection.mutable.TreeMap

val ages: TreeMap[Int, String] = new TreeMap()

ages.put(32, "Jonathan")

ages.put(27, "Eric")

ages.put(21, "Jacob")

ages.put(42, "Kim")

ages.put(18, "Jeremy")

println(ages)

println(ages.contains(18))

Terminal Output

Map(Jacob -> 21, Kim -> 42, Eric -> 27, Jeremy -> 18, Jonathan -> 32)

true

TreeMap(18 -> Jeremy, 21 -> Jacob, 27 -> Eric, 32 -> Jonathan, 42 -> Kim)

true

Listing 12-10

Demonstration of a HashMap and a TreeMap from the Scala mutable collections library

This example provides a collection of key/value pairs of the names and ages of the friends in our program. Notice in the terminal output that the items stored in the HashMap appear to be in no particular order. The contains method of the HashMap searches for any matching keys in O(1) time. Accessing an item by key in the HashMap will also occur in O(1) time since the key itself is stored based on the hashing function.

In the TreeMap output, you can see that the keys are stored in sorted order. Determining whether the age of a friend exists in the Map by key is performed in O(log(n)) time. Accessing any individual friend by key is also performed in O(log(n)) time as each friend is stored in the sorted tree structure.

Exercise 12-2

Based on the following scenarios, determine whether you should use a HashSet, TreeSet, HashMap, or TreeMap:

1.
Your program needs to access as quickly as possible the medical records of a patient by their id.
2.
You have a program that generates error logs based on order of events, and you want to be able to search the logs for events or display them in order.
3.
You want to create a search engine that indexes the entire Internet, and you should be able to search for the URL of each page as quickly as possible.

Performance Reference

If you forget the algorithmic efficiency for each data structure, they are standardized across all languages, and you can find reference guides all over the Internet to help guide you. In order to provide this same convenient reference in this book, Table 12-1 provides the Big O notation for the access, search, and insertion/deletion methods for each of the data structures described in this chapter.

Table 12-1

A reference guide of Big O algorithmic efficiency for each data structure

Data Structure	Access	Search	Insertion/Deletion
Array	O(1)	O(n)	O(n)
Linked List	O(n)	O(n)	O(1)
Stack/Queue	O(1)	O(n)	O(1)
Hash Tables^∗⁺	O(1)	O(1)	O(1)
Trees^∗	O(log(n))	O(log(n))	O(log(n))

^∗ Requires uniqueness

⁺ Cannot be sorted

Summary

In this chapter, you learned how to optimize the performance of your programs by weighing the pros and cons of the different data structures that might impact your code. You learned that arrays are optimal for accessing items by index position but inflexible when it comes to inserting or deleting items. In order to gain efficiency in programs that need flexibility in that area, you learned that linked lists can grow and shrink with relative ease. From there, you were introduced to queues and stacks that are specialized implementations of the linked list. Finally, to gain performance in searching for items in a collection, you learned that Hashes and Trees can be applied to either Sets or Maps to give you optimal algorithmic efficiency.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Data Structures

Create new playlist

Sign In

Sign Up

12. Data Structures

Arrays

Linked Lists

Exercise 12-1

Queues and Stacks

Hash Table

Trees

Note

Sets

Maps

Exercise 12-2

Performance Reference

Summary

Table of Contents for
12. Data Structures