This topic introduces HashSet, the most commonly used implementation of the Set interface. The class embodies a mathematical set and is designed for storing unique elements. We will cover its features, frequently used methods, constructors, and limitations.
HashSet behavior
HashSet is a collection designed for storing unique objects while offering high-performance access. It's not suitable for storing duplicate elements. The example below illustrates basic HashSet functionality.
import java.util.HashSet;
import java.util.Set;
public class HashSetDemo {
public static void main(String[] args) {
Set<String> set = new HashSet<>();
set.add("J. Gosling");
set.add("J. Bloch");
set.add("J. Gosling");
set.add("B. Kernighan");
set.add("J. Bloch");
set.add("J. Gosling");
System.out.println(set); // output: [J. Gosling, B. Kernighan, J. Bloch]
}
}
Let's discuss some practical and essential characteristics of HashSet. Elements in a HashSet are inserted based on their content hash code, which means that HashSet does not maintain the insertion order. While searching through the data is fast, the class does not offer a sorting method. It provides constant-time performance for basic operations such as add, remove, contains, and size. HashSet contains unique elements, which may include a single null value. It's worth noting that HashSet is not synchronized and, therefore, not thread-safe. While it's possible to make HashSet thread-safe, that topic is beyond the scope of this discussion. For further details, consulting the official documentation is highly recommended. To summarize, here is a list of essential facts about HashSet:
HashSetcontains only unique elements — no duplicates allowed.A set can include a single null value.
Elements are stored based on hashing.
The insertion order is not maintained — unordered.
There is no sort method — unsorted.
Search operations are fast.
Not synchronized — hence, not thread-safe.
Constructors
The two fundamental constructors for HashSet are as follows:
public HashSet()public HashSet(Collection<? extends E> c)
For the first constructor, you can create an empty set. Initially, since no elements have been added, the output will display an empty set, represented as []. To populate the set, you can employ the add or addAll methods.
Set<String> set = new HashSet<>();
System.out.println(set); // output: []
For the second constructor, which can be described as a copy or conversion constructor, a new set is created that contains the elements from a specified collection. In the example below, the collection is a list. Since a set cannot contain duplicate elements, adding items that are already present will have no effect.
List<String> list = List.of("Mars","Earth", "Jupiter", "Mars");
Set<String> set = new HashSet<>(list);
System.out.println(set); // output: [Earth, Mars, Jupiter]
Alternatively, you could accomplish the same outcome as in the above example by taking an extra step: first, create an empty set and then use the addAll method. However, it is generally preferable to utilize the parametrized constructor, as demonstrated earlier. The longer version is illustrated below for your reference.
List<String> list = List.of("Mars","Earth", "Jupiter", "Mars");
Set<String> set = new HashSet<>();
set.addAll(list);
System.out.println(set); // output: [Earth, Mars, Jupiter]Methods
Now let's delve into the primary HashSet methods that you can utilize in your applications. The first one to discuss is the add() method. Methods that operate by searching for elements, including this one, typically have O(1) time complexity. The methods we'll review are as follows:
add(E e)— Adds the element if it's not already present and returnstrue.contains(Object o)— Returnstrueif the set contains the specified element.remove(Object o)— Removes the element from the set if it is present and returnstrue.isEmpty()— Returnstrueif the set has no elements.size()— Returns the number of elements in the set.toArray()— Returns all the elements in the set as an array.clear()— Removes all elements from the set.iterator()— Returns an iterator for the elements in this set.
To utilize the add() method, start by creating a set. You can then add unique elements to this new set. Attempting to add a duplicate element will result in a false return value.
Set<String> set = new HashSet<>();
set.add("Mars"); // true
set.add("Earth"); // true
set.add("Jupiter"); // true
set.add("Mars"); // false
// set contains [Mars, Earth, Jupiter]
Beyond using String, you can use HashSet with various wrapper classes:
Set<String> set1 = new HashSet<>();
set1.add("Venus");
Set<Boolean> set2 = new HashSet<>();
set2.add(true);
Set<Character> set3 = new HashSet<>();
set3.add('G');
Set<Byte> set4 = new HashSet<>();
set4.add((byte) 7);
Set<Short> set5 = new HashSet<>();
set5.add((short) 5);
Set<Integer> set6 = new HashSet<>();
set6.add(9123);
Set<Long> set7 = new HashSet<>();
set7.add(34000L);
Set<Float> set8 = new HashSet<>();
set8.add(4.5F);
Set<Double> set9 = new HashSet<>();
set9.add(345678923D);
The add() method isn't the only way to populate a HashSet. Starting from Java 9, you can use the static method of() from the Set interface to create an immutable set. Adding any elements to such a set will throw an UnsupportedOperationException:
Set<String> set = Set.of("Mars","Earth", "Jupiter");
set.add("Venus"); // UnsupportedOperationException
Duplicate elements when using of() will result in an IllegalArgumentException.
Set<String> set = Set.of("Mars","Earth", "Jupiter", "Mars"); // IllegalArgumentException
HashSet doesn't provide a method to get an element. Instead, there is a method that checks for the presence of an element. We use the contains() method, as seen below. This example uses Set.of() to build the HashSet.
Set<String> set = new HashSet<>(Set.of("J. Gosling"));
set.contains("J. Gosling") // output: true
Owing to its optimized internal structure, the contains() method in a HashSet can operate significantly faster than its counterpart in a List interface. For instance, the contains() method in an ArrayList scans through each element to find a match, resulting in a time complexity of O(n). However, the contains() method in a HashSet is able to locate the correct element efficiently—generally in constant time, or O(1). This same performance advantage extends to the remove() method in a HashSet as well.
Moving on, let's discuss how HashSet manages the removal of elements. Unlike collections that implement the List interface, HashSet does not provide a get() method for retrieving elements. However, it offers two convenient methods for element removal. The first is remove(), which removes a specific element if it is present in the set. The second method, clear(), is used for purging all elements from the set. Examples illustrating the use of both these methods, as well as the size() and isEmpty() methods, will be provided in the subsequent sections.
We can always check the current size of a set with the size() method, which returns the size as an integer value. Also, checking for an empty set is just as easy with the isEmpty() method, which returns true when the set is empty.
Set<String> set = new HashSet<>();
set.add("J. Gosling");
set.add("J. Bloch");
set.add("B. Eckel");
set.remove("J. Gosling"); // output: true
set.size(); // output: 2
set.clear(); // removes all elements
set.isEmpty() // output: true
The next method we'll examine is toArray() which is used for converting a set into an array. This method comes in two flavors: The first version does not require any arguments and returns an array of Object types. The second version, on the other hand, accepts one argument—indicating the type of array you wish to return—and provides an array of that specified type. The example code below demonstrates these two different usages. The first returns only Object type. The second returns the runtime type as provided — String[]. Depending on the application, you can use any wrapper type.
import java.util.HashSet;
import java.util.Set;
class Set2Array {
public static void main(String[] args) {
Set<String> set = new HashSet<>();
set.add("J. Gosling");
set.add("J. Bloch");
set.add("B. Eckel");
Object[] object = set.toArray();
System.out.println("Array type: Object");
for (Object name : object) {
System.out.println(name);
}
String[] specified = set.toArray(new String[0]);
System.out.println("\nArray type <T> T[]: String ");
for (String name : specified) {
System.out.println(name);
}
}
}
Here we show the output of this program illustrating the two types of the toArray method.
Array type: Object
J. Gosling
B. Eckel
J. Bloch
Array type <T> T[]: String
J. Gosling
B. Eckel
J. Bloch
The iterator() method returns an iterator over the elements in a set. Note that the elements are returned in no specific order. Additionally, the iterator is "fail-fast," meaning it throws a ConcurrentModificationException if the collection is modified during iteration. This exception is thrown to indicate that the collection should not be changed while being iterated over, except through the iterator's own remove() method. The example program below demonstrates this behavior: it operates as expected when simply printing output, but using the HashSet's remove() method in conjunction with its iterator() method will result in a ConcurrentModificationException.
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
class hashSetIterator {
public static void main(String[] args) {
Set<Integer> integers = new HashSet<>();
integers.add(1);
integers.add(2);
integers.add(3);
Iterator<Integer> iterator = integers.iterator();
while (iterator.hasNext()) {
Integer number = iterator.next();
System.out.print(number + " "); // output: 1 2 3
//integers.remove(number); // ConcurrentModificationException
}
}
}
From JavaDoc: The iterators returned by this class's iterator method are fail-fast. Specifically, if the set is modified at any point after the iterator has been created—except through the iterator's own remove method—the Iterator will throw a ConcurrentModificationException. This ensures that in the event of concurrent modification, the iterator fails both quickly and cleanly, instead of risking unpredictable, non-deterministic behavior at some undetermined point in the future.
Limitations
Up to this point, we have used HashSet exclusively with predefined classes such as String and Integer. However, many use cases require the storage of objects based on user-defined classes. We have yet to address this topic, in part because of its added complexity. Storing such objects can be safe only if managed properly. Precise checks and comparisons are necessary to determine object equality when working with user-defined classes. Thread safety is another issue; if multiple threads need to access the HashSet, you must implement synchronization mechanisms. Lastly, HashSet relies on internal hash functions. If these are not well-designed, it could result in performance problems. Specifically, a poor distribution of elements' hash codes could lead to degraded performance.
Conclusion
We have explored the nature of the HashSet class and summarized the essential information you need to effectively utilize this collection. We have covered both its advantages and disadvantages. HashSet is the most commonly used implementation of the Set interface, and its primary benefit is that it prevents the addition of duplicate values. While HashSet has internal mechanisms that are beyond the scope of this discussion, you can use it effectively without delving into those complexities. The constructors and methods described in this topic equip you with the tools you'll need to incorporate HashSet into your work.