This chapter presents solutions to the previous exercise and continues the discussion of analysis of algorithms.
My implementation of indexOf
is the following code snippet. Read through it and see if you can identify its order of growth before you read the explanation:
public int indexOf(Object target) { Node node = head; for (int i=0; i<size; i++) { if (equals(target, node.data)) { return i; } node = node.next; } return -1; }
Initially node
gets a copy of head
, so they both refer to the same Node
. The loop variable, i
, counts from 0 to size-1
. Each time through the loop, we use equals
to see if we’ve found the target. If so, we return i
immediately. Otherwise we advance to the next Node
in the list.
Normally we would check to make sure the next Node
is not null
, but in this case it is safe because the loop ends when we get to the end of the list (assuming size
is consistent with the actual number of nodes in the list).
If we get through the loop without finding the target, we return -1
.
So what is the order of growth for this method?
Each time through the loop we invoke equals
, which is constant time (it might depend on the size of target
or data
, but it doesn’t depend on the size of the list). The other operations in the loop are also constant time.
The loop might run n times, because in the worse case, we might have to traverse the whole list.
So the runtime of this method is proportional to the length of the list.
Next, here is my implementation of the two-parameter add
method. Again, you should try to classify it before you read the explanation:
public void add(int index, E element) { if (index == 0) { head = new Node(element, head); } else { Node node = getNode(index-1); node.next = new Node(element, node.next); } size++; }
If index==0
, we’re adding the new Node
at the beginning, so we handle that as a special case. Otherwise, we have to traverse the list to find the element at index-1
. We use the helper method getNode
:
private Node getNode(int index) { if (index < 0 || index >= size) { throw new IndexOutOfBoundsException(); } Node node = head; for (int i=0; i<index; i++) { node = node.next; } return node; }
getNode
checks whether index
is out of bounds; if so, it throws an exception. Otherwise it traverses the list and returns the requested Node
.
Jumping back to add
, once we find the right Node
, we create the new Node
and put it between node
and node.next
. You might find it helpful to draw a diagram of this operation to make sure you understand it.
So, what’s the order of growth for add
?
getNode
is similar to indexOf
, and it is linear for the same reason.
In add
, everything before and after getNode
is constant time.
So all together, add
is linear.
Finally, let’s look at remove
:
public E remove(int index) { E element = get(index); if (index == 0) { head = head.next; } else { Node node = getNode(index-1); node.next = node.next.next; } size--; return element; }
remove
uses get
to find and store the element at index
. Then it removes the Node
that contained it.
If index==0
, we handle that as a special case again. Otherwise we find the node at index-1
and modify it to skip over node.next
and link directly to node.next.next
. This effectively removes node.next
from the list, and it can be garbage collected.
Finally, we decrement size
and return the element we retrieved at the beginning.
So, what’s the order of growth for remove
? Everything in remove
is constant time except get
and getNode
, which are linear. So remove
is linear.
When people see two linear operations, they sometimes think the result is quadratic, but that only applies if one operation is nested inside the other. If you invoke one operation after the other, the runtimes add. If they are both in O(n), the sum is also in O(n).
The following table summarizes the differences between MyLinkedList
and MyArrayList
, where 1
means O(1) or constant time and n means O(n) or linear:
MyArrayList |
MyLinkedList | ||
add (at the end) |
1 |
n | |
add (at the beginning) |
n |
1 | |
add (in general) |
n |
n | |
get / set |
1 |
n | |
indexOf / lastIndexOf |
n |
n | |
isEmpty / size |
1 |
1 | |
remove (from the end) |
1 |
n | |
remove (from the beginning) |
n |
1 | |
remove (in general) |
n |
n |
The operations where MyArrayList
has an advantage are adding at the end, removing from the end, getting and setting.
The operations where MyLinkedList
has an advantage are adding at the beginning and removing from the beginning.
For the other operations, the two implementations are in the same order of growth.
Which implementation is better? It depends on which operations you are likely to use the most. And that’s why Java provides more than one implementation, because it depends.
For the next exercise I provide a class called Profiler
that contains code that runs a method with a range of problem sizes, measures runtimes, and plots the results.
You will use Profiler
to classify the performance of the add
method for the Java implementations of ArrayList
and LinkedList
.
Here’s an example that shows how to use the profiler:
public static void profileArrayListAddEnd() { Timeable timeable = new Timeable() { List<String> list; public void setup(int n) { list = new ArrayList<String>(); } public void timeMe(int n) { for (int i=0; i<n; i++) { list.add("a string"); } } }; String title = "ArrayList add end"; Profiler profiler = new Profiler(title, timeable); int startN = 4000; int endMillis = 1000; XYSeries series = profiler.timingLoop(startN, endMillis); profiler.plotResults(series); }
This method measures the time it takes to run add
on an ArrayList
, which adds the new element at the end. I’ll explain the code and then show the results.
In order to use Profiler
, we need to create a Timeable
object that provides two methods: setup
and timeMe
. The setup
method does whatever needs to be done before we start the clock; in this case it creates an empty list. Then timeMe
does whatever operation we are trying to measure; in this case it adds n elements to the list.
The code that creates timeable
is an anonymous class that defines a new implementation of the Timeable
interface and creates an instance of the new class at the same time. If you are not familiar with anonymous classes, you can read about them here: http://thinkdast.com/anonclass.
But you don’t need to know much for the next exercise; even if you are not comfortable with anonymous classes, you can copy and modify the example code.
The next step is to create the Profiler
object, passing the Timeable
object and a title as parameters.
The Profiler
provides timingLoop
which uses the Timeable
object stored as an instance variable. It invokes the timeMe
method on the Timeable
object several times with a range of values of n. timingLoop
takes two parameters:
startN
is the value of n the timing loop should start at.
endMillis
is a threshold in milliseconds. As timingLoop
increases the problem size, the runtime increases; when the runtime exceeds this threshold, timingLoop
stops.
When you run the experiments, you might have to adjust these parameters. If startN
is too low, the runtime might be too short to measure accurately. If endMillis
is too low, you might not get enough data to see a clear relationship between problem size and runtime.
This code is in ProfileListAdd.java
, which you’ll run in the next exercise. When I ran it, I got this output:
4000, 3 8000, 0 16000, 1 32000, 2 64000, 3 128000, 6 256000, 18 512000, 30 1024000, 88 2048000, 185 4096000, 242 8192000, 544 16384000, 1325
The first column is problem size, n; the second column is runtime in milliseconds. The first few measurements are pretty noisy; it might have been better to set startN
around 64000.
The result from timingLoop
is an XYSeries
that contains this data. If you pass this series to plotResults
, it generates a plot like the one in Figure 4-1.
The next section explains how to interpret it.
Based on our understanding of how ArrayList
works, we expect the add
method to take constant time when we add elements to the end. So the total time to add n elements should be linear.
To test that theory, we could plot total runtime versus problem size, and we should see a straight line, at least for problem sizes that are big enough to measure accurately. Mathematically, we can write the function for that line:
where a is the intercept of the line and b is the slope.
On the other hand, if add
is linear, the total time for n adds would be quadratic. If we plot runtime versus problem size, we expect to see a parabola. Or mathematically, something like:
With perfect data, we might be able to tell the difference between a straight line and a parabola, but if the measurements are noisy, it can be hard to tell. A better way to interpret noisy measurements is to plot runtime and problem size on a log-log scale.
Why? Let’s suppose that runtime is proportional to nk, but we don’t know what the exponent k is. We can write the relationship like this:
For large values of n, the term with the largest exponent is the most important, so:
where ≈ means “approximately equal”. Now, if we take the logarithm of both sides of this equation:
This equation implies that if we plot runtime versus n on a log-log scale, we expect to see a straight line with intercept log (c) and slope k. We don’t care much about the intercept, but the slope indicates the order of growth: if k = 1, the algorithm is linear; if k = 2, it’s quadratic.
Looking at the figure in the previous section, you can estimate the slope by eye. But when you call plotResults
it computes a least squares fit to the data and prints the estimated slope. In this example:
Estimated slope = 1.06194352346708
which is close to 1; and that suggests that the total time for n adds is linear, so each add is constant time, as expected.
One important point: if you see a straight line on a graph like this, that does not mean that the algorithm is linear. If the runtime is proportional to nk for any exponent k, we expect to see a straight line with slope k. If the slope is close to 1, that suggests the algorithm is linear. If it is close to 2, it’s probably quadratic.
In the repository for this book you’ll find the source files you need for this exercise:
Profiler.java
contains the implementation of the Profiler
class described in the previous section. You will use this class, but you don’t have to know how it works. But feel free to read the source.
ProfileListAdd.java
contains starter code for this exercise, including the example in the previous section. You will modify this file to profile a few other methods.
Also, in the code
directory, you’ll find the Ant build file build.xml
.
Run ant ProfileListAdd
to run ProfileListAdd.java
. You should get results similar to Figure 4-1, but you might have to adjust startN
or endMillis
. The estimated slope should be close to 1, indicating that performing n add operations takes time proportional to n raised to the exponent 1; that is, it is in O(n).
In ProfileListAdd.java
, you’ll find an empty method named profileArrayListAddBeginning
. Fill in the body of this method with code that tests ArrayList.add
, always putting the new element at the beginning. If you start with a copy of profileArrayListAddEnd
, you should only have to make a few changes. Add a line in main
to invoke this method.
Run ant ProfileListAdd
again and interpret the results. Based on our understanding of how ArrayList
works, we expect each add operation to be linear, so the total time for n adds should be quadratic. If so, the estimated slope of the line, on a log-log scale, should be near 2. Is it?
Now let’s compare that to the performance of LinkedList
. Fill in the body of profileLinkedListAddBeginning
and use it to classify LinkedList.add
when we put the new element at the beginning. What performance do you expect? Are the results consistent with your expectations?
Finally, fill in the body of profileLinkedListAddEnd
and use it to classify LinkedList.add
when we put the new element at the end. What performance do you expect? Are the results consistent with your expectations?
I’ll present results and answer these questions in the next chapter.