Filtering Collections using Guava – Performance Analysis

Last updated on 2014-09-07

As I showed in my previous post, Guava Predicates let you write code that is clearer (at leas IMHO). As explained in the Guava site, the most basic use of Predicates is for collection filtering, which is also where I am using them. But since there is no such thing as a free beer, I wondered whether this came with a price tag.

To solve this question, I created a mini-benchmark to check the performance of Guava filters. The typical use of a collection filter is when you are doing something to part of the elements in the collection – so this is what I did:

Create a collection of random integers in a given range (the collection has 10*range items).
Create a predicate that filter the elements in the collection by randomly selecting and integer and filtering using less than comparison.
Get the filtered collection (using both guava and collection copying) and do something with it (otherwise a very smart JIT may simply ignore the code if it has no visible results…). Time how much this takes.

Steps 2-3 were repeated 1000 times so that outliers are averaged. I executed these tests changing the number of element in the source list and changing the range of the elements in the list. The results were divided by counting both the time it took to filter the elements and the time it took to access them.

These are the results:

#Elements	Value Range	Copy Filter	Copy Access	Total Copy	Guava Filter	Guava Access	Total Guava
100	10	12	10	22	11	40	51
100	100	1	0	1	1	1	2
100	1000	4	1	5	1	5	6
100	10000	4	1	5	1	0	1
1000	10	33	0	33	1	13	14
1000	100	29	0	29	6	23	29
1000	1000	22	1	23	3	22	25
1000	10000	31	0	31	0	21	21
10000	10	223	3	226	14	160	174
10000	100	211	2	213	11	182	193
10000	1000	235	0	235	14	201	215
10000	10000	242	1	243	11	169	180

From the results:

The rage of the filter does not matter much when doing a one-pass access.
Total time is very similar for small collections, but Guava is slightly better for large collections.
Since the actual filter in Guava is done when the collections is accessed, the time payment is incurred later.

I also checked what happened if the filter class was accessed 10 times instead of only one time. Here the results are very different:

#Elements	Value Range	Multiple Copy Access	Multiple Guava Access
100	10	125	196
100	100	3	20
100	1000	3	24
100	10000	2	23
1000	10	13	248
1000	100	20	254
1000	1000	25	283
1000	10000	18	204
10000	10	116	1759
10000	100	128	1822
10000	1000	167	1877
10000	10000	198	1899

Here we can see the hidden cost of the Guava filter. Since the collection is filtered as you access it, every time we do an iteration on the filtered collection we are actually iterating over the full collection and filtering on-the-fly. For a large number of elements the cost is even tenfold.

So using Guava uses in places when you are filtering a collection for a one time use is definitely OK. If you are going to use the filtered collection more times, then refrain from directly using the collection you received and copy the filtered collection (using either Java’s ArrayList.new(Collection) constructor or Guava’s Lists.newArrayList(Iterable)).

The code for the performance test can be found here. Your comments are welcome. Happy coding!.

Filtering Collections using Guava – Performance Analysis

Like this:

Related

Be First to Comment

Leave a ReplyCancel reply

Filtering Collections using Guava – Performance Analysis

Share this:

Like this:

Related

Be First to Comment

Leave a ReplyCancel reply

Discover more from Musings of a Strange Loop