This chapter introduces Java’s interpretation of fundamental data structures, known as the Java Collections. These abstractions are core to many (if not most) programming types, and form an essential part of any programmers basic toolkit. Accordingly, this is one of the most important chapters of the entire book, and provides a toolkit that is essential to virtually all Java programmers.
In this chapter, we will introduce the fundamental interfaces and the type hierarchy, show how to use them, and discuss aspects of their overall design. Both the “classic” approach to handling the Collections and the newer approach (using the Streams API and the lambda expressions functionality introduced in Java 8) will be covered.
The Java Collections are a set of generic interfaces that describe the most common forms of data structure. Java ships with several implementations of each of the classic data structures, and because the types are represented as interfaces, it is very possible for development teams to develop their own, specialized implementations of the interfaces for use in their own projects.
The Java Collections define two fundamental types of data structures. A Collection
is a grouping of objects, while a Map
is a set of mappings, or associations, between objects. The basic layout of the Java Collections is shown in Figure 8-1.
Within this basic description, a Set
is a type of Collection
with no duplicates, and a List
is a Collection
in which the elements are ordered (but may contain duplicates).
SortedSet
and SortedMap
are specialized sets and maps that maintain their elements in a sorted order.
Collection
, Set
, List
, Map
, SortedSet
, and SortedMap
are all interfaces, but the java.util
package also defines various concrete implementations, such as lists based on arrays and linked lists, and maps and sets based on hash tables or binary trees. Other important interfaces are Iterator
and Iterable
, which allow you to loop through the objects in a collection, as we will see later on.
Collection<E>
is a parameterized interface that represents a generalized grouping of objects of type E
. Methods are defined for adding and removing objects from the group, testing an object for membership in the group, and iterating through all elements in the group. Additional methods return the elements of the group as an array and return the size of the collection.
Collection
may or may not allow duplicate elements and may or may not impose an ordering on the elements.The Java Collections Framework provides Collection
because it defines the features common to all common forms of data structure. The JDK ships Set
, List
, and Queue
as subinterfaces of Collection
. The following code illustrates the operations you can perform on Collection
objects:
// Create some collections to work with.
Collection
<
String
>
c
=
new
HashSet
<>();
// An empty set
// We'll see these utility methods later Be aware that there are
// some subtleties to watch out for when using them
Collection
<
String
>
d
=
Arrays
.
asList
(
"one"
,
"two"
);
Collection
<
String
>
e
=
Collections
.
singleton
(
"three"
);
// Add elements to a collection. These methods return true
// if the collection changes, which is useful with Sets that
// don't allow duplicates.
c
.
add
(
"zero"
);
// Add a single element
c
.
addAll
(
d
);
// Add all of the elements in d
// Copy a collection: most implementations have a copy constructor
Collection
<
String
>
copy
=
new
ArrayList
<
String
>(
c
);
// Remove elements from a collection.
// All but clear return true if the collection changes.
c
.
remove
(
"zero"
);
// Remove a single element
c
.
removeAll
(
e
);
// Remove a collection of elements
c
.
retainAll
(
d
);
// Remove all elements that are not in e
c
.
clear
();
// Remove all elements from the collection
// Querying collection size
boolean
b
=
c
.
isEmpty
();
// c is now empty, so true
int
s
=
c
.
size
();
// Size of c is now 0.
// Restore collection from the copy we made
c
.
addAll
(
copy
);
// Test membership in the collection. Membership is based on the equals
// method, not the == operator.
b
=
c
.
contains
(
"zero"
);
// true
b
=
c
.
containsAll
(
d
);
// true
// Most Collection implementations have a useful toString() method
System
.
out
.
println
(
c
);
// Obtain an array of collection elements. If the iterator guarantees
// an order, this array has the same order. The array is a copy, not a
// reference to an internal data structure.
Object
[]
elements
=
c
.
toArray
();
// If we want the elements in a String[], we must pass one in
String
[]
strings
=
c
.
toArray
(
new
String
[
c
.
size
()]);
// Or we can pass an empty String[] just to specify the type and
// the toArray method will allocate an array for us
strings
=
c
.
toArray
(
new
String
[
0
]);
Remember that you can use any of the methods shown here with any Set
, List
, or Queue
. These subinterfaces may impose membership restrictions or ordering constraints on the elements of the collection but still provide the same basic methods.
add()
, remove()
, clear()
, and retainAll()
that alter the collection were conceived of as optional parts of the API. Unfortunately, they were specified a long time ago, when the received wisdom was to indicate the absence of an optional method by throwing UnsupportedOperation
Exception
. Accordingly, some implementations (notably read-only forms) may throw this unchecked exception.Collection
, Map
, and their subinterfaces do not extend the Cloneable
or
interfaces. All of the collection and map implementation classes provided in the Java Collections Framework, however, do implement these interfaces.
Some collection implementations place restrictions on the elements that they can contain. An implementation might prohibit null
as an element, for example. And EnumSet
restricts membership to the values of a specified enumerated type.
Attempting to add a prohibited element to a collection always throws an unchecked exception such as NullPointerException
or ClassCastException
. Checking whether a collection contains a prohibited element may also throw such an exception, or it may simply return false
.
A set is a collection of objects that does not allow duplicates: it may not contain two references to the same object, two references to null
, or references to two objects a
and b
such that a.equals(b)
. Most general-purpose Set
implementations impose no ordering on the elements of the set, but ordered sets are not prohibited (see
and LinkedHashSet
). Sets are further distinguished from ordered collections like lists by the general expectation that they have an efficient contains
method that runs in constant or logarithmic time.
Set
defines no additional methods beyond those defined by Collection
but places additional restrictions on those methods. The add()
and addAll()
methods of a Set
are required to enforce the no-duplicates rules: they may not add an element to the Set
if the set already contains that element. Recall that the add()
and addAll()
methods defined by the Collection
interface return true
if the call resulted in a change to the collection and false
if it did not. This return value is relevant for Set
objects because the no-duplicates restriction means that adding an element does not always result in a change to the set.
Table 8-1 lists the implementations of the Set
interface and summarizes their internal representation, ordering characteristics, member restrictions, and the performance of the basic add()
, remove()
, and contains
operations as well as iteration performance. You can read more about each class in the reference section. Note that CopyOnWriteArraySet
is in the java.util.concurrent
package; all the other implementations are part of java.util
. Also note that java.util.BitSet
is not a Set
implementation. This legacy class is useful as a compact and efficient list of boolean
values but is not part of the Java Collections Framework.
Class | Internal representation | Since | Element order | Member restric-tions | Basic opera-tions | Iteration perfor-mance | Notes |
---|---|---|---|---|---|---|---|
|
Hashtable |
1.2 |
None |
None |
O(1) |
O(capacity) |
Best general-purpose implementation. |
|
Linked hashtable |
1.2 |
Insertion order |
None |
O(1) |
O(n) |
Preserves insertion order. |
|
Bit fields |
5.0 |
Enum declaration |
Enum values |
O(1) |
O(n) |
Holds non- |
|
Red-black tree |
1.2 |
Sorted ascending |
Comparable |
O(log(n)) |
O(n) |
|
|
Array |
5.0 |
Insertion order |
None |
O(n) |
O(n) |
Threadsafe without synchronized methods. |
The TreeSet
implementation uses a red-black tree data structure to maintain a set that is iterated in ascending order according to the natural ordering of Comparable
objects or according to an ordering specified by a Comparator
object. TreeSet
actually implements the SortedSet
interface, which is a subinterface of Set
.
The SortedSet
interface offers several interesting methods that take advantage of its sorted nature. The following code illustrates:
public
static
void
testSortedSet
(
String
[]
args
)
{
// Create a SortedSet
SortedSet
<
String
>
s
=
new
TreeSet
<>(
Arrays
.
asList
(
args
));
// Iterate set: elements are automatically sorted
for
(
String
word
:
s
)
{
System
.
out
.
println
(
word
);
}
// Special elements
String
first
=
s
.
first
();
// First element
String
last
=
s
.
last
();
// Last element
// all elements but first
SortedSet
<
String
>
tail
=
s
.
tailSet
(
first
+
' '
);
System
.
out
.
println
(
tail
);
// all elements but last
SortedSet
<
String
>
head
=
s
.
headSet
(
last
);
System
.
out
.
println
(
head
);
SortedSet
<
String
>
middle
=
s
.
subSet
(
first
+
' '
,
last
);
System
.
out
.
println
(
middle
);
}