Single-level grouping

All grouping collectors have a classification function (the function that classifies the elements of the stream into different groups). Mainly, this is an instance of the Function<T, R> functional interface.

Each element of the stream (of the T type) is passed through this function, and the return will be a classifier object (of the R type). All the returned R types represent the keys (K) of a Map<K, V>, and each group is a value in this Map<K, V>.

In other words, the key (K) is the value returned by the classification function, and the value (V) is a list of elements in the stream that have this classified value (K). So, the final result is of the Map<K, List<T>> type.

Let's look at an example to bring some light to this brain-teasing explanation. This example relies on the simplest flavor of groupingBy(), that is, groupingBy(Function<? super T,? extends K> classifier).

So, let's group Melon by type:

Map<String, List<Melon>> byTypeInList = melons.stream()
  .collect(groupingBy(Melon::getType));

The output will be as follows:

{
  Crenshaw = [Crenshaw(1200 g)],
  Apollo = [Apollo(2600 g)],
  Gac = [Gac(3000 g), Gac(1200 g), Gac(3000 g)],
  Hemi = [Hemi(2600 g), Hemi(1600 g), Hemi(2600 g)],
  Horned = [Horned(1700 g)]
}

We can also group Melon by weight:

Map<Integer, List<Melon>> byWeightInList = melons.stream()
  .collect(groupingBy(Melon::getWeight));

The output will be as follows:

{
  1600 = [Hemi(1600 g)],
  1200 = [Crenshaw(1200 g), Gac(1200 g)],
  1700 = [Horned(1700 g)],
  2600 = [Hemi(2600 g), Apollo(2600 g), Hemi(2600 g)],
  3000 = [Gac(3000 g), Gac(3000 g)]
}

This grouping is shown in the following diagram. More precisely, this is a snapshot of the moment when Gac(1200 g) passes through the classification function (Melon::getWeight):

So, in the melon-classification example, a key is the weight of Melon, and its value is a list containing all the Melon objects of that weight.

The classification function can be a method reference or any other lambda.

One issue with the preceding approach is the presence of unwanted duplicates. This happens because the values are collected in a List (for example, 3000=[Gac(3000g), Gac(3000g)). But we can fix this by relying on another flavor of groupingBy(), that is, groupingBy(Function<? super T,? extends K> classifier, Collector<? super T,A,D> downstream).

This time, we can specify the desired downstream collector as the second argument. So, besides the classification function, we have a downstream collector as well.

If we wish to reject duplicates, we can use Collectors.toSet(), as follows:

Map<String, Set<Melon>> byTypeInSet = melons.stream()
  .collect(groupingBy(Melon::getType, toSet()));

The output is as follows:

{
  Crenshaw = [Crenshaw(1200 g)],
  Apollo = [Apollo(2600 g)],
  Gac = [Gac(1200 g), Gac(3000 g)],
  Hemi = [Hemi(2600 g), Hemi(1600 g)],
  Horned = [Horned(1700 g)]
}

We can also do this by weight:

Map<Integer, Set<Melon>> byWeightInSet = melons.stream()
  .collect(groupingBy(Melon::getWeight, toSet()));

The output will be as follows:

{
  1600 = [Hemi(1600 g)],
  1200 = [Gac(1200 g), Crenshaw(1200 g)],
  1700 = [Horned(1700 g)],
  2600 = [Hemi(2600 g), Apollo(2600 g)],
  3000 = [Gac(3000 g)]
}

Of course, in this case, distinct() can be used as well:

Map<String, List<Melon>> byTypeInList = melons.stream()
  .distinct()
  .collect(groupingBy(Melon::getType));

The same goes for doing this by weight:

Map<Integer, List<Melon>> byWeightInList = melons.stream()
  .distinct()
  .collect(groupingBy(Melon::getWeight));

Well, there are no more duplicates, but the results are not ordered. It would be nice to have this map ordered by keys, so the default HashMap is not very useful. If we could specify a TreeMap instead of the default HashMap, then the problem will be solved. We can do this via another flavor of groupingBy(), that is, groupingBy(Function<? super T,? extends K> classifier, Supplier<M> mapFactory, Collector<? super T,A,D> downstream).

The second argument of this flavor allows us to provide a Supplier object that provides a new empty Map into which the results will be inserted:

Map<Integer, Set<Melon>> byWeightInSetOrdered = melons.stream()
  .collect(groupingBy(Melon::getWeight, TreeMap::new, toSet()));

Now, the output is ordered:

{
  1200 = [Gac(1200 g), Crenshaw(1200 g)],
  1600 = [Hemi(1600 g)],
  1700 = [Horned(1700 g)],
  2600 = [Hemi(2600 g), Apollo(2600 g)],
  3000 = [Gac(3000 g)]
}

We can also have a List<Integer> containing the weights of 100 melons:

List<Integer> allWeights = new ArrayList<>(100);

We want to split this list into 10 lists of 10 weights each. Basically, we can obtain this via grouping, as follows (we can apply parallelStream() as well):

final AtomicInteger count = new AtomicInteger();
Collection<List<Integer>> chunkWeights = allWeights.stream()
  .collect(Collectors.groupingBy(c -> count.getAndIncrement() / 10))
  .values();

Now, let's tackle another issue. By default, Stream<Melon> is divided into a suite of List<Melon>. But what can we do to divide Stream<Melon> into a suite of List<String>, where each list is holding only the types of melons, not the Melon instances?

Well, transforming elements of a stream is commonly the job of map(). But inside groupingBy(), this is the job of Collectors.mapping() (more details can be found in the Filtering, flattening, and mapping collectors section of this chapter):

Map<Integer, Set<String>> byWeightInSetOrdered = melons.stream()
  .collect(groupingBy(Melon::getWeight, TreeMap::new,
    mapping(Melon::getType, toSet())));

This time, the output is exactly what we wanted:

{
  1200 = [Crenshaw, Gac],
  1600 = [Hemi],
  1700 = [Horned],
  2600 = [Apollo, Hemi],
  3000 = [Gac]
}

Ok, so far, so good! Now, let's focus on the fact that two of the three flavors of groupingBy() accept a collector as an argument (for example, toSet()). This can be any collector. For example, we may want to group melons by types and count them. For this, Collectors.counting() is very helpful (more details can be found in the Summarization collectors section):

Map<String, Long> typesCount = melons.stream()
  .collect(groupingBy(Melon::getType, counting()));

The output will be as follows:

{Crenshaw=1, Apollo=1, Gac=3, Hemi=3, Horned=1}

We can also do this by weight:

Map<Integer, Long> weightsCount = melons.stream()
  .collect(groupingBy(Melon::getWeight, counting()));

The output will be as follows:

{1600=1, 1200=2, 1700=1, 2600=3, 3000=2}

Can we group the lightest and heaviest melons by type? Of course we can! We can do this via Collectors.minBy() and maxBy(), which were presented in the Summarization collectors section:

Map<String, Optional<Melon>> minMelonByType = melons.stream()
  .collect(groupingBy(Melon::getType,
    minBy(comparingInt(Melon::getWeight))));

The output will be as follows (notice that minBy() returns an Optional):

{
  Crenshaw = Optional[Crenshaw(1200 g)],
  Apollo = Optional[Apollo(2600 g)],
  Gac = Optional[Gac(1200 g)],
  Hemi = Optional[Hemi(1600 g)],
  Horned = Optional[Horned(1700 g)]
}

We can also do this via maxMelonByType():

Map<String, Optional<Melon>> maxMelonByType = melons.stream()
  .collect(groupingBy(Melon::getType,
    maxBy(comparingInt(Melon::getWeight))));

The output will be as follows (notice that maxBy() returns an Optional):

{
  Crenshaw = Optional[Crenshaw(1200 g)],
  Apollo = Optional[Apollo(2600 g)],
  Gac = Optional[Gac(3000 g)],
  Hemi = Optional[Hemi(2600 g)],
  Horned = Optional[Horned(1700 g)]
}

The minBy() and maxBy() collectors take a Comparator as an argument. In these examples, we have used the built-in Comparator.comparingInt() function. Starting with JDK 8, the java.util.Comparator class was enriched with several new comparators, including the thenComparing() flavors for chaining comparators.

The issue here is represented by the optionals that should be removed. More generally, this category of issues continues to adapt the result returned by a collector to a different type.

Well, especially for these kinds of tasks, we have the collectingAndThen(Collector<T,A,R> downstream, Function<R,RR> finisher) factory method. This method takes a function that will be applied to the final result of the downstream collector (finisher). It can be used as follows:

Map<String, Integer> minMelonByType = melons.stream()
  .collect(groupingBy(Melon::getType,
    collectingAndThen(minBy(comparingInt(Melon::getWeight)),
      m -> m.orElseThrow().getWeight())));

The output will be as follows:

{Crenshaw=1200, Apollo=2600, Gac=1200, Hemi=1600, Horned=1700}

We can also use maxMelonByType():

Map<String, Integer> maxMelonByType = melons.stream()
  .collect(groupingBy(Melon::getType, 
    collectingAndThen(maxBy(comparingInt(Melon::getWeight)),
      m -> m.orElseThrow().getWeight())));

The output will be as follows:

{Crenshaw=1200, Apollo=2600, Gac=3000, Hemi=2600, Horned=1700}

We may also want to group melons by type in Map<String, Melon[]>. Again, we can rely on collectingAndThen() for this, as follows:

Map<String, Melon[]> byTypeArray = melons.stream()
  .collect(groupingBy(Melon::getType, collectingAndThen(
    Collectors.toList(), l -> l.toArray(Melon[]::new))));

Alternatively, we can create a generic collector and call it, as follows:

private static <T> Collector<T, ? , T[]> 
    toArray(IntFunction<T[]> func) {

  return Collectors.collectingAndThen(
    Collectors.toList(), l -> l.toArray(func.apply(l.size())));
}

Map<String, Melon[]> byTypeArray = melons.stream()
  .collect(groupingBy(Melon::getType, toArray(Melon[]::new)));

Table of Contents for Single-level grouping

Create new playlist

Sign In

Sign Up

Table of Contents for
Single-level grouping