11 Java Collections

Collections provide the ability to manage groups of objects. A collection can be ordered or unordered, contain duplicates or not, depending upon the particular implementation chosen. Collections can also contain key-value pairs to provide a facility for maintaining in memory an index of objects.

What’s wonderful about collections for the COBOL programmer is the automatic memory management that lies behind the group of objects. Unlike COBOL arrays, space does not need to be allocated in advance for a fixed number of objects in this collection. You can simply define the collection and add objects to it as needed, without any concern for running out of space. The only limitations are those of the physical memory of the machine or operating system.

COLLECTIONS BACKGROUND

Before Java 1.2, the java.util package had two primary classes that implemented functionality similar to the functionality of the Java Collections framework: Vector and Hashtable. A Vector may be thought of as an array that can expand or contract dynamically as objects are added to it or taken from it. A Hashtable may be thought of as a group of objects, each of which is paired with or indexed by a key.

If you ever work with programs written before Java 1.2, you will probably find extensive usage of Vectors and Hashtables. Vectors are indispensable in situations in which you needed to maintain in memory a list of objects but could not predict the exact number of objects you would have.

Java 1.2 created a much more powerful framework for handling groups of objects. The Collections framework is an extensible assortment of supporting classes and interfaces. Vector and Hashtable still exist and operate much the way they used to in prior versions of Java. Code written with Vector and Hashtable in Java 1.1 or before will compile and work the same in Java 1.2.

Vector and Hashtable stand out from the rest of the Collections framework because they are legacy classes. The methods for adding and retrieving objects from them are different from those for the rest of the framework. They are also different in another big way: The methods for manipulating the objects in Vector and Hashtable are synchronized. This means that you can add and remove objects from them from multiple threads at the same time, and the class maintains its internal consistency. That sounds good, but it can be costly; synchronized code carries a heavy performance penalty. If you are writing a single-thread application or are manipulating the objects in the Vector or Hashtable from a single thread, you don’t need to add that overhead.

To a large extent, the newer ArrayList collection class mimics the functionality of Vector, and HashMap mimics the functionality of Hashtable. These collection classes do not use synchronized methods, so they perform much better when synchronization isn’t required.

One useful way to understand collections is to compare them to traditional COBOL files. COBOL programs can traverse a file in a particular order, looking for specific records, and then modify or perhaps delete those records. In order to examine or modify any particular record, the program must first read it into a record structure. Standard functions exist to read, update, delete, and reorder (sort) the file. Further, the file has a determinate number of records, and the program can detect when it has reached the end of the file.

All COBOL programmers know how to process files. Data stored in files is readily available to a COBOL program. If you need to build a program that processes this file, all you need to know is where the file is stored and its record layout. The COBOL developer can concentrate on the business problem to be solved and efficiently use the syntax appropriate to process a COBOL file.

The preceding descriptions are (more or less) the characteristics of Java collections. It is also helpful to note that collections have the following differences:

  • Java collections are stored in memory, not on disk (although you can certainly extend the basic collection class to provide for disk storage).

  • Multiple users (that is, programs) can simultaneously access files, whereas collections are contained in a single runtime instance.

  • Java collections can be passed as parameters to functions and can be returned as the return item from a method. This would be similar to passing the file selection and file definition to a subroutine, allowing the subroutine to read and update the file. (Actually, some COBOL compilers do allow this!)

  • Java collections store references to objects, not the objects themselves. When you add an object to a collection, you are simply adding a reference to the object. You are not cloning the object and creating a copy of it.

A collection is a set of related objects or elements. One collection may be ordered in some fashion, and others may be unordered. A collection type might allow duplicates, whereas another implementation might not.

The basic collection interface supports a few bulk operations, or methods that can be used to move many elements at one time. As an added bonus, the contents of a collection can be easily moved into arrays and out of arrays.

These are some of the interface definitions in Java’s Collection framework:

Set: An interface for implementing classes that support groups of elements that cannot contain duplicates. Elements in a Set are not ordered in any particular fashion.

List: An interface for a class that supports groups of ordered elements. Elements in a List can contain duplicates. Elements in a List are often accessed using an index (or subscript) in a manner very similar to arrays.

Map: An interface for a class that maps key elements to values (i.e., other objects). It is roughly analogous to an indexed (ISAM) file in COBOL, where a key is mapped to a record (a file that is contained in memory).

SortedSet: An extension of the Set interface. Elements in a SortedSet collection are automatically ordered in some fashion, either through the natural ordering of the elements or because an explicit Comparator object provides for an ordering algorithm.

Queue: A collection that is designed to manage objects that are to be “processed” in some fashion. Typically the objects in a Queue are processed in first-in-first-out (FIFO) order, although there are other orders available.

The preceding interfaces are just definitions; it is up to an actual implementation to construct a class that supports these definitions. The Java Collections frameworks provide some very useful basic collection implementations. These include some Set collection types, a few List collection types, and two Map collection types, as well as a number of Queue types.

These collection implementations can, in many cases, be used as is by your application.

ORDERED COLLECTIONS: VECTORS AND ARRAYLISTS

The most commonly used ordered List is the ArrayList or its older cousin, the Vector. Both classes support a growable array of objects that can be accessed with an integer index.

The Vectors and ArrayLists are Java constructs that are similar to Java arrays in many ways, but they implement additional capabilities.

  • They are explicitly implemented as a class in the java.util package.

  • They can grow and shrink dynamically as required.

  • They can hold only Objects (that is, Vectors and ArrayLists cannot contain primitive numeric types).

A program can create a Vector or ArrayList using this syntax:

Vector errorMsgsVector = new Vector ();
ArrayList errorMsgsList = new ArrayList ();

The following syntax, introduced with Java 1.5, uses the concepts of Generics to tell the compiler what type of object you want to store in your collections. The compiler will evaluate the type of each object that is placed into the collection and report any invalid object types as compile-time errors.

Vector< ErrorMsg > errorMsgsVector = new Vector< ErrorMsg > ();
ArrayList< ErrorMsg > errorMsgsList = new ArrayList< ErrorMsg > ();

Both Vector and ArrayList have constructors that pass an integer for the capacity of the collection. The capacity value is merely the initial capacity and, if the constructor without it is used, a default value for initial capacity is used.

Vectors and ArrayLists have a size attribute. Size is the number of elements currently in the Vector. The sample Vector so far has a size of zero (no elements in the Vector). The values of this attribute can be accessed with the size() methods.

To add elements to the Vector, use the addElement() method. To add elements to the ArrayList, use the add() method. This places a new element in the Vector in the next available position and increases the size of the Vector by 1.

// Create some objects.
     ErrorMsg myErrorMsg = new ErrorMsg ();
     ErrorMsg myotherErrorMsg = new ErrorMsg ("Some Text");
     ErrorMsg mythirdErrorMsg = new ErrorMsg ("Third One");
// Place them in the Vector.
     errorMsgsVector.addElement (myErrorMsg);
     errorMsgsVector.addElement (myotherErrorMsg);
     errorMsgsVector.addElement (mythirdErrorMsg);
// Place them in the ArrayList.
     errorMsgsList.add(myErrorMsg);
     errorMsgsList.add(myotherErrorMsg);
     errorMsgsList.add(mythirdErrorMsg);

Now the Vector and ArrayList both have a size of three.

A Vector and ArrayList will automatically expand when their capacities are exceeded. If you were to add 11 items to the Vector, then the Vector would resize itself (and probably relocate itself in memory). The default behavior is to double in size whenever the current capacity is exceeded. Defining an increment value when the Vector is first created will control how the Vector is expanded:

Vector errorMsgs = new Vector (10, 20);

This Vector will have an initial capacity of 10 and will increment by 20 element positions whenever its current capacity is exceeded (that is, first 10, then 30, then 50 elements).

A performance penalty is incurred when a Vector’s capacity is exceeded. So under some circumstances it might be advantageous to allocate capacity for a Vector, especially one that will hold large numbers of elements (more than 50 or so). The default for a Vector is to allocate space for 10 elements and to increment by 10.

Actually, using Vectors and ArrayLists in your program is a little more complex than it would seem. This is because you have to use the proper method when you add, change, or retrieve objects from a Vector. Furthermore, Vectors and ArrayLists store only objects, not specific class types. Subsequently, any object retrieved from them must be cast back into the correct class before it can be properly used, if you do not store the same Generic type definition specified when the collection was created.

Let’s contrast the statements necessary to access elements in an array with the way you would manage elements in an ArrayList:

// Define an array of five error messages.
     ErrorMsg[] errorMsgs = new ErrorMsg[5];
// Add some elements to the array.
     errorMsgs[0] = myErrorMsg;
     errorMsgs[1] = myotherErrorMsg;
// Modify the first element in the array.
     errorMsgs[0] = mythirdErrorMsg;
// Retrieve the first element in the array.
     myErrorMsg = errorMsgs[0];
// Create a potential problem because errorMsgs may not be large enough to
// contain all of the error messages in ErrorMsgIO.
     for (int x = 0; x < errorMsgIO.length; x++) {
          errorMsgs[x] = errorMsgIO[x]
               ...
     }
// Retrieve the number of elements in errorMsgs.
     int y = errorMsgs.length;

This is how you could code ErrorMsgs as an ArrayList:

// Define an ArrayList that will contain ErrorMsgs.
     ArrayList<ErrorMsg> errorMsgs = new ArrayList<ErrorMsg> ();
// Add some elements to the list.
     errorMsgs.add (myErrorMsg);
     errorMsgs.add (myotherErrorMsg);
// Modify the first element in the list.
     errorMsgs.set (0, mythirdErrorMsg);
// Retrieve the first element from the list.
     myErrorMsg = errorMsgs.get (0);
// This loop will never be a problem because ErrorMsgs will grow as required
// to contain all the error messages in ErrorMsgIO.
     for (int x = 0; x < errorMsgIO.length; x++) {
         errorMsgs.add (errorMsgIO[x]);
                ...
     }
// Retrieve the number of elements in errorMsgs.
     int y = errorMsgs.size();

Since Vectors and ArrayLists can contain only objects, and not primitive data types, numeric items cannot be directly placed into Vectors. Instead, numbers must be cast first into their object wrappers before they can be placed into a Vector array.

   // Add an Integer element to an ArrayList.
   ArrayList<Integer> myArrayList = new ArrayList< Integer > ();
   int x = 5;
Integer myInteger = new Integer (x);
myArrayList.add (myInteger);

The autoboxing feature introduced with Java 1.5 can create these object wrappers for you automatically. However, under the covers, the compiler is still creating the integer object.

   // Use Autoboxing to add an Integer element to the ArrayList.
   int x = 5;
myArrayList.add (x);

Another difficulty with integer objects is that they are immutable, just like string objects. That is, once an integer object is created, there is no way to change its value. Instead, a new integer must be created to contain a new value. As a result, groups of numeric values are often managed with Java arrays (which can contain primitive data types), instead of with Vectors or ArrayLists.

You can shrink a Vector or ArrayList in order to conserve memory. The trim-ToSize() method will truncate the Vector’s capacity to the current size. You should perform this function only if you are sure the Vector is not likely to grow, since any additions to the Vector will immediately cause it to be reallocated.

Following is a table showing the most commonly used ArrayList methods with their corresponding Vector methods.

KEYED COLLECTIONS: HASHTABLE AND HASHMAP

The most commonly used keyed list is the HashMap or its older cousin, the Hashtable. Java 1.5 introduces a high-performance implementation of HashMap: ConcurrentHashMap. All classes support a growable array of objects that can be accessed with a key.

These classes are used for managing objects and providing direct access to the objects. Like ArrayLists and Vectors, you can only place objects into them. When using Generics, the compiler will ensure that the object returned is the type specified for the collection. If you need to, you can cast the returned object to the correct type in order to be able to use it.

From a COBOL perspective, maps can be viewed as similar to the indexed file type. The index of the file is analogous to the keys in a Map collection, and the record area is analogous to the values in a Map collection. To be more precise, an indexed sequential (ISAM) file type is very similar to the TreeMap collection, and an indexed file accessed via a hashed key is very similar to the HashMap collection.

A program can create a Hashtable or HashMap using this syntax:

Hashtable errorMsgsTable = new Hashtable ();
HashMap errorMsgsMap = new HashMap ();

Like Vector and ArrayList, Hashtable and HashMap have constructors that pass an integer for the capacity of the collection and a size attribute. Also, you should specify the object types that your collection will contain using the <ObjectType, ObjectType> qualifier.

Hashtable<String, ErrorMsg> errorMsgsTable = new Hashtable<String, ErrorMsg>
(20);
HashMap<String, ErrorMsg> errorMsgsMap = new HashMap<String, ErrorMsg> (50);

To add elements to the Hashtable or HashMap, use the put() method. The put() method takes two arguments, both of which are objects. The first argument is the key and the second is the value or the object you wish to retrieve with the key.

// Create some objects.
     ErrorMsg myErrorMsg = new ErrorMsg ();
     ErrorMsg myotherErrorMsg = new ErrorMsg ("Some Text");
     ErrorMsg mythirdErrorMsg = new ErrorMsg ("Third One");
// Place them in the HashMap.
     errorMsgsMap.put ("error one", myErrorMsg);
     errorMsgsMap.put ("error two", myotherErrorMsg);
     errorMsgsMap put ("error three", mythirdErrorMsg);
// Place them in the Hashtable.
     errorMsgsTable.put ("error one", myErrorMsg);
     errorMsgsTable.put ( "error two", myotherErrorMsg);
     errorMsgsTable.put ( "error three", mythirdErrorMsg);

Now the Hashtable and HashMap have a size of three. You retrieve the values from the Hashtable or HashMap using the get() method.

// Get them from the HashMap.
     ErrorMsg myErrorMsg = (ErrorMsg ) errorMsgsMap.get ("error one");
     myErrorMsg = errorMsgsMap.get ("error two");
     myErrorMsg = errorMsgsMap.get ("error three");
// Get them from the Hashtable.
     myErrorMsg = errorMsgsTable.get ("error one");
     myErrorMsg = errorMsgsTable.get ( "error two");
     myErrorMsg = errorMsgsTable.get ( "error three");

OTHER COLLECTIONS

Java provides some other collection types that are quite useful in particular situations.

HashSet: A general-purpose yet efficient implementation of the basic Set interface. Elements in a HashSet are not ordered in any particular way, but iteration (that is, sequential access) to these elements is optimized compared to TreeSets.

TreeSet: A basic implementation of the SortedSet interface. Elements in a TreeSet will be ordered based on its comparator. The default comparator orders elements based on their natural order, but you can override this behavior with your own comparator.

A word of caution about the Set collection type. The Set interface defines good semantics that allow a program to add and remove elements from the collection, but there is no reliable way to manage elements that have changed while they are part of a set. This is because the modification is assigned to the element itself, not the collection (that is, the collection doesn’t see the modification). The proper sequence in this case is to get the element, remove it from the set, modify it, and then add it into the set. To extend this programming model to the COBOL file processing analogy, you would need to read in the record, modify it, delete the record, and then write the record.

AbstractSequentialList: The basic implementation of the List interface, appropriate for sequential access to the members of the list. This class is provided in order to simplify the task of a developer who wishes to implement a custom list collection, one that supports sequential access (for example, a LinkedList).

LinkedList: The LinkedList implementation of the List interface. This uses many of the methods provided by the AbstractSequentialList class. It also adds methods to conveniently add, delete, and retrieve elements from the beginning or end of a list. LinkedLists are most appropriate when you want to optimize write performance, compared to read performance.

ITERATORS

Iterators are objects that support sequential access to the elements in a collection. The iterators() method of Vector and ArrayList returns an Iterator object.

     Iterator iterator = errorMsgsVector. iterator ();
     while (iterator.hasNext()) {
              ErrorMsg myErrorMsg = (ErrorMsg ) iterator.next ();
}

To obtain an iterator for a HashMap or Hashtable, you must obtain a reference to the iterator for either the values or the keys. Once you have the one you want, you can then use it to iterate either of them.

Iterator iterator = errorMsgsMap.values().iterator ();
while (iterator.hasNext()) {
         ErrorMsg myErrorMsg = (ErrorMsg ) iterator.next ();
}

Java 1.5 modified the processing associated with the for loop in order to simplify the use of collections. Essentially, if you are examining the contents of a collection, you can use the { for : each } syntax to examine the contents of a collection in sequential order, without explicitly defining or managing an iterator.

for (ErrorMsg myErrorMsg : errorMsgsVector)
         System.out.println ("Error message: " + myErrorMsg );

Read this statement as “for each ErrorMsg myErrorMsg in errorMsgsVector.” Notice that this statement places the current value (that is, the value pointed to by the implied Iterator) into the variable myErrorMsg. This syntax also uses Generics to simplify and clarify the statement.

ORDERING AND COMPARISON FUNCTIONS

The Collections class contains several static methods that provide useful functions on collections, such as sorting and searching. These powerful (and polymorphic) functions demonstrate some of the benefits provided by collections. They are standard mechanisms available to any developer who needs to manage a group of related items. Some examples of the collection algorithms provided with Java are as follows.

sort: Organize a list based on the natural order of the objects in the list, or based on a comparator (a user-defined ordering method). Sort always orders a list in ascending order.

reverse: The same function as sort(), but the elements are organized in descending order.

fill: Populate a list with copies of the same element. Existing elements in the list will be overwritten, so this method is very useful when you want to reinitialize a list.

replaceAll: Replaces all instances of a particular value with another.

copy: Copy elements from a list into another list. The target list must be large enough to hold all the elements, but if the target list is larger, then the extra elements will not be affected.

swap: Swaps some of the elements in a list.

binarySearch: Examine a list to find a particular element. If the element is contained in the list, then its index position will be returned. Otherwise, the negative value of the position it can be inserted in is returned.

min and max: Examine a collection to find the element with the lowest or highest value. As with the sort algorithm, the natural ordering of the elements can be used, or a user-defined comparator can be used to compare one element to another.

Ordering: Elements in a list-type collection are always ordered in some fashion. By default, this is the natural order of the objects. Java defines an interface (Comparable) that defines how objects of a given class should be compared. Classes that implement this interface use the compareTo() method to compare two elements in a collection. Many of the original Java classes have been retrofitted to support this method.

A specific collection can also define an ordering method unique to that collection type. A user-defined implementation of the Comparator interface, with its compare() and equal() methods, can be passed to the Collection.sort() function. In this case, the user-defined compare function will be used to evaluate elements during any sort operations.

Suppose you have a class named Pupil, and it has two properties, name and grade. When you place a Pupil object into a collection, you may want to make sure they are sorted by name. To do this, you can implement the Comparator interface in Pupil.

public class Pupil implements Comparator
{
     public String name;
     public String grade;
     public int compareTo (Object o) {
          Pupil p = (Pupil) o;
           //  Compare the name property of this Pupil,
          //  with another, and return the result.
           return name.compareTo (p.name);
     }
  ...

The SortedSet implementation extends the basic Set implementation. This collection type orders its elements in some fashion but does not allow duplicate elements. The default Comparator is used to compare two elements (and therefore provide ordering), but you can write a specialized Comparator unique to your requirements.

The existence of a Comparator implies that all elements in a SortedSet collection must be comparable with each other. To do this, either the compareTo method of the element (that is, element1.compareTo (element2)) or the compare() method of the Comparator (that is, Comparator.compare()) will be used.

EXERCISES: JAVA COLLECTIONS

It’s time to visit the example classes and try out all these new ideas. You’ll start with the Set collection type and then visit the List and the Map type.

  1. Create a new class called Pupil that looks like the following class. You will use this class for the exercises with ArrayList and HashMap.

    public class Pupil
    {
         protected String name;
         protected String grade;

         public Pupil(String name, String grade)       {
              this.name = name;
              this.grade = grade;
         }

         public String getName()       {
              return name;
         }

         public String getGrade()       {
              return grade;
         }

         public void setName(String name)       {
              this.name = name;
         }

         public void setGrade(String grade)       {
              this.grade = grade;
         }
    }

  2. Create an ArrayList and three instances of Pupil.

    // Create an ArrayList.
                          ArrayList<Pupil> list = new ArrayList<Pupil> ();

    // Create three instances of Pupil.
              Pupil pupilOne = new Pupil("Susan", "A");
              Pupil pupilTwo = new Pupil("Tom", "B");
              Pupil pupilThree = new Pupil("Mary", "C");

  3. Let’s add the instance of Pupil to the ArrayList.

    // Add the three instances of Pupil to the ArrayList.
              list.add(pupilOne);
              list.add(pupilTwo);
              list.add(pupilThree);

  4. Let’s create a HashMap.

    // Create a HashMap.
         HashMap<String, Pupil> map = new HashMap<String, Pupil> ();

  5. Let’s retrieve the objects from the ArrayList. As you do so, you are going to display their names and grades, and then you are going to add them to the HashMap using their names as a key.

              for (int i = 0; i < list.size(); i++)
              {
    // Retrieve the pupils from the list, print their names.
                   Pupil pupil = list.get(i);
                  System.out.println(pupil.getName() + " earned a "
    + pupil.getGrade());
    // Add them to the HashMap.
                   map.put(pupil.getName(), pupil);
              }

  6. You are now going to retrieve Tom’s record and change his grade.

    // Retrieves Tom from the HashMap and changes his grade.
              Pupil pupilTom = map.get("Tom");
              pupilTom.setGrade("A");

  7. You are now going to redisplay the list.

    // Redisplay the list.
              for (int i = 0; i < list.size(); i++)
              {
                   Pupil pupil = list.get(i);
                   System.out.println(pupil.getName() + " earned a "
    +
                        pupil.getGrade());
               }

  8. Compile the program and execute it. Your output should look like this:

    Susan earned a A
    Tom earned a B
    Mary earned a C
    Susan earned a A
    Tom earned a A
    Mary earned a C

  9. You are now going to redisplay the list using the for...each syntax.

    // Redisplay the list using the simpler for : each syntax
              for (Pupil pupil : list)
              {
                   System.out.println(pupil.getName() + " earned a "
    +
                      pupil.getGrade());
               }

  10. Compile the program and execute it. Your output should look like this:

    Susan earned a A
    Tom earned a B
    Mary earned a C
    Susan earned a A
    Tom earned a A
    Mary earned a C
    Susan earned a A
    Tom earned a A
    Mary earned a C

REVIEWING THE EXERCISES

ArrayList and HashMap (and the older legacy classes, Vector and Hashtable) are used quite commonly in Java development. These classes and the other collection classes provide an easy-to-use facility for managing objects in memory. Collection objects, like ArrayList and HashMap, really contain references to objects. ArrayList references objects in an ordered fashion, whereas HashMap references objects in the keyed manner.

An important thing to realize from this exercise is that both collection objects, the ArrayList and the HashMap, contain just the references to the Pupil objects. They do not contain copies of the objects. So when you retrieve the Tom Pupil object from the HashMap and change its grade, you are changing the one and only copy of this object, as it exists in memory. When you display the list the second time, Tom’s grade is now an A because the ArrayList is referencing the same object that you retrieved from the HashMap.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset