Java Set interface
When you need to keep only unique elements within a collection, to get rid of duplicates in a sequence, or if you intend to perform some mathematical operations, you may use a set.
Note that a set is a collection of elements like a mathematical set. A set is significantly different from an array or a list since it's impossible to get an element by its index.
In this topic, we will consider mutable and immutable sets and their differences. All our examples will use strings and numbers, since storing objects of custom classes as elements has some significant points. It will be considered in other topics.
The Set interface
The Java Collections framework provides the Set<E>
interface to represent a set as an abstract data type. It inherits all the methods from the Collection<E>
interface but doesn't add any new ones. The most widely used methods include contains
, add
, addAll
, remove
, removeAll
, size
, and others we've considered in the earlier topic on the Collections framework.
The add
and addAll
methods add elements to the set only if those elements are not already in the set. A set always contains only unique elements.
One method is worth special attention when talking about Set<E>
interface, as it is often used with sets: retainAll(Collection<E> coll)
. It retains only those set elements that are contained in the specified collection.
To start using a set, you need to instantiate one of its implementations: HashSet
, TreeSet
, and LinkedHashSet
. These are mutable sets and they use different rules for ordering elements and have some additional methods. They are also optimized for different types of operations. There are also immutable sets, whose names are not important for programmers. They also implement the Set<E>
interface.
As an addition, there is a high-performance implementation EnumSet
for enum
types. We will not consider it in this topic.
Immutable sets
The simplest way to create a set is to invoke the of
method of the Set
interface.
Set<String> emptySet = Set.of();
Set<String> people = Set.of("Larry", "Kenny", "Sabrina");
Set<Integer> numbers = Set.of(100, 200, 300, 400);
It returns an immutable set containing either all the passed elements or an empty set. Using the of
method is convenient when creating set constants or testing some code.
The order of elements of immutable sets is not fixed:
System.out.println(emptySet); // []
System.out.println(people); // [Kenny, Larry, Sabrina] or another order
System.out.println(numbers); // [400, 200, 300, 100] or another order
One of the most widely used set operations is checking whether a set contains an element. Here is an example:
System.out.println(emptySet.contains("hello")); // false
System.out.println(people.contains("Sabrina")); // true
System.out.println(people.contains("John")); // false
System.out.println(numbers.contains(300)); // true
For immutable sets, it's only possible to invoke contains
, size
, and isEmpty
methods. All others will throw UnsupportedOperationException
since they try to change the set. If you'd like to add / remove elements, use one of HashSet
, TreeSet
or LinkedHashSet
.
Next, we will consider three primary mutable implementations of the Set
interface.
HashSet
The HashSet
class represents a set backed by a hash table. It uses hash codes of elements to effectively store them. It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
The following code snippet demonstrates creating a HashSet
and adding countries to it (with a duplicate). The output result does not contain duplicate elements.
Set<String> countries = new HashSet<>();
countries.add("India");
countries.add("Japan");
countries.add("Switzerland");
countries.add("Japan");
countries.add("Brazil");
System.out.println(countries); // [Japan, Brazil, Switzerland, India]
System.out.println(countries.contains("Switzerland")); // true
Although technically the order of HashSet
is somewhat determined by hashCode
, it is a bad practice to rely on such features, since the dependency is quite complicated. HashSet
should be treated as an unordered set.
You must not rely on the order of elements in this set even with the for-each loop.
The HashSet
class offers constant time O(1)
performance for the basic operations (add
, remove
, and contains
), assuming the hash function disperses the elements properly among the buckets.
In practice, sets are often used to check whether some elements belong to them. The HashSet
class is especially recommended for such cases since its contains
operation is highly optimized.
TreeSet
The TreeSet
class represents a set that gives us guarantees on the order of the elements. It corresponds to the sorting order of the elements determined either by their natural order (if they implement the Comparable
interface) or by specific Comparator
implementation.
The order in which the elements would be sorted is the same as if you used a sort algorithm on an array or list containing these elements.
The TreeSet
class implements the SortedSet
interface which extends the base Set
interface. The SortedSet
interface provides some new methods related to comparisons of elements:
Comparator<? super E> comparator()
returns the comparator used to order elements in the set ornull
if the set uses the natural ordering of its elements;SortedSet<E> headSet(E toElement)
returns a subset containing elements that are strictly less thantoElement
;SortedSet<E> tailSet(E fromElement)
returns a subset containing elements that are greater than or equal tofromElement
;SortedSet<E> subSet(E fromElement, E toElement)
returns a subset containing elements in the rangefromElement
(inclusive)toElement
(exclusive);E first()
returns the first (lowest) element in the set;E last()
returns the last (highest) element in the set.
The following example demonstrates some of the listed methods:
SortedSet<Integer> sortedSet = new TreeSet<>();
sortedSet.add(10);
sortedSet.add(15);
sortedSet.add(13);
sortedSet.add(21);
sortedSet.add(17);
System.out.println(sortedSet); // [10, 13, 15, 17, 21]
System.out.println(sortedSet.headSet(15)); // [10, 13]
System.out.println(sortedSet.tailSet(15)); // [15, 17, 21]
System.out.println(sortedSet.subSet(13,17)); // [13, 15]
System.out.println(sortedSet.first()); // minimum is 10
System.out.println(sortedSet.last()); // maximum is 21
Note, while HashSet
is much faster than TreeSet
: constant-time versus log-time for most operations, it offers no ordering guarantees. If you need to use the operations from the SortedSet
interface or if the value-ordered iteration is required, use TreeSet
; otherwise, HashSet
would be a better choice.
LinkedHashSet
The LinkedHashSet
class represents a set with linked elements. It differs from HashSet
by guaranteeing that the order of the elements is the same as the order they were inserted to the set. Reinserting an element that is already in the LinkedHashSet
does not change this order.
In some sense, LinkedHashSet
is something intermediate between HashSet
and TreeSet
. Implemented as a hash table with a linked list running through it, this set provides insertion-ordered iteration and runs nearly as fast as HashSet
.
The following example demonstrates this.
Set<Character> characters = new LinkedHashSet<>();
for (char c = 'a'; c <= 'k'; c++) {
characters.add(c);
}
System.out.println(characters); // [a, b, c, d, e, f, g, h, i, j, k]
In this code, the order of characters is always the same and matches the order in which they are inserted into the set.
The LinkedHashSet
implementation spares its clients from the chaotic ordering provided by HashSet
without incurring the increased time cost of operations associated with TreeSet
. But LinkedHashSet
requires more memory for storing elements.
Set operations
You have already seen some operations on sets. Now let's look at operations that are usually called set theoretic operations that come from math. It's funny that in Java they are common for all collections, not only for sets.
Here is an example of such operations. First of all, we create a mutable set. Then, we apply operations to it, changing the elements.
// getting a mutable set from an immutable one
Set<String> countries = new HashSet<>(List.of("India", "Japan", "Switzerland"));
countries.addAll(List.of("India", "Germany", "Algeria"));
System.out.println(countries); // [Japan, Algeria, Switzerland, Germany, India]
countries.retainAll(List.of("Italy", "Japan", "India", "Germany"));
System.out.println(countries); // [Japan, Germany, India]
countries.removeAll(List.of("Japan", "Germany", "USA"));
System.out.println(countries); // [India]
After performing addAll
, the set countries
does not contain duplicate countries. The retainAll
and removeAll
operations affect only those elements which are specified in the passed sets. It is also possible to use any class that implements the Collection
interface. for these methods (e.g. ArrayList
).
In math and other programming languages, the demonstrated set operations are known as union (addAll
), intersection (retainAll
) and difference (removeAll
).
There is also a method that allows us to check whether a set is a subset of (i.e. contained in) another set.
Set<String> countries = new HashSet<>(List.of("India", "Japan", "Algeria"));
System.out.println(countries.containsAll(Set.of())); // true
System.out.println(countries.containsAll(Set.of("India", "Japan"))); // true
System.out.println(countries.containsAll(Set.of("India", "Germany"))); // false
System.out.println(countries.containsAll(Set.of("Algeria", "India", "Japan"))); // true
As you can see, this method returns true
even for an empty set and a set that is fully equal to the initial set.
Set equality
Last but not least is how sets are compared. Two sets are equal when they contain the same elements. Equality does not depend on the types of sets themselves.
Objects.equals(Set.of(1, 2, 3), Set.of(1, 3)); // false
Objects.equals(Set.of(1, 2, 3), Set.of(1, 2, 3)); // true
Objects.equals(Set.of(1, 2, 3), Set.of(1, 3, 2)); // true
Set<Integer> numbers = new HashSet<>();
numbers.add(1);
numbers.add(2);
numbers.add(3);
Objects.equals(numbers, Set.of(1, 2, 3)); // true
We assume that the given examples do not need any comments.
Summary
We finished the consideration of the Set
interface and common features for all sets. Remember, there are immutable and mutable sets. They can be both unordered and ordered and never contain duplicates. The HashSet
is highly optimized for some operations by effectively storing elements, but it does not guarantee their order. TreeSet
guarantees the order of elements based on their value, but the time cost of operations is higher. LinkedHashSet
keeps the order of the elements as they were inserted and requires more memory for storing. Each type of set is best suitable for a specific purpose. We hope you get the basic understanding of how to use them effectively.