July 06, 2024

#Collections: Part 2- A Guide to HashMap in Java


What is Hash Map in Java?

In Java, a HashMap is a data structure that implements the Map interface and stores key-value pairs. It allows you to store and retrieve elements based on keys rather than indexes.

HashMap allows a single null key and multiple null values. This means one key can be null, but duplicate keys are not permitted, as keys must be unique.

HashMap maintains performance even with an increasing number of elements by automatically resizing itself when the number of elements exceeds a certain threshold (the load factor). This ensures effective memory utilization.

The sequence in which entries in a hash map are iterated is not fixed and can vary over time, particularly if the structure is altered by adding or deleting members. If order is important, consider using LinkedHashMap or TreeMap.

Can we use HashMap to implement caches?

Yes, we can utilize HashMaps to implement simple caching mechanisms where frequently accessed data is stored temporarily. This avoids redundant computations or database calls improving code performance.

Data corruption may result from concurrent access to a HashMap by several threads without synchronization. Use Collections.synchronizedMap(map) or ConcurrentHashMap for thread-safe situations.

How Hashmap work internally in Java?

HashMap in Java works on the hashing principle. It is a data structure which allows us to store object and retrieve it in constant time O(1) provided we know the key.

To understand Hashing, we first need to understand the Hash Function, Hash Value and Bucket?
  • hashCode() is a hash function which is present in the Object class and returns an integer value.
  • The Hash value is the int value returned by the hash function.
  • A bucket is used to store key-value pairs and can have multiple key-value pairs. In the hash map, the bucket used a simple linked list to store objects.
In hashing, hash functions are used to link key and value in HashMap. Objects are stored by calling put(key, value) method of HashMap and retrieved by calling get(key) method.

When we call the put method, hashcode() method of the key object is called so that the hash function of the map can find a bucket location to store the value object, which is actually an index of the internal array, known as the table.

HashMap internally stores mapping in the form of Map.Entry object which contains both key and value object. (Entry object stores in the bucket like this (hash,key,value,bucketindex)

When we want to retrieve the object, we need to call the get() method and pass the key object. This time again key object generates the same hash code (it's mandatory for it to do so to retrieve the object and that's why HashMap keys are immutable e.g. String) and we end up at the same bucket location. If there is only one object then it is returned and that's your value object which you have stored earlier. Things get a little tricky when collisions occur.

Since the internal array of HashMap is of fixed size, and if you keep storing objects, at some point in time hash function will return the same bucket location for two different keys, this is called collision in HashMap. (two unequal objects in Java can have the same hashcode). In this case, a linked list is formed at that bucket location and a new entry is stored as the next node.

If we try to retrieve an object from this linked list, we need an extra check to search correct value, this is done by the equals() method. Since each node contains an entry, HashMap keeps comparing the entry's key object with the passed key using equals() and when it returns true, Map returns the corresponding value.

What is the difference in Hashmap implementation in pre-Java 8 and Java 8?

In Java 8, the implementation of HashMap underwent significant changes primarily aimed at improving performance, particularly in scenarios involving hash collisions.

The alternative String hash function added in Java 7 has been removed
Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after certain threshold is reached.

These changes ensure the performance of O(log(n)) in worst-case scenarios when the hash function is not distributing keys properly and O(1) with good hashCode().

Which methods do you need to override to use any object as a key in HashMap?

equals() and hashCode()

What happens On HashMap in Java if the size of the HashMap exceeds a given threshold defined by load factor?

If the size of the Map exceeds a given threshold defined by load-factor e.g. if the load factor is .75 it will act to re-size the map once it fills 75%. Similar to other collection classes like ArrayList, Java HashMap re-size itself by creating a new bucket array of size twice the previous size of HashMap and then start putting every old element into that new bucket array. This process is called rehashing because it also applies the hash function to find new bucket location.

Does resizing of HashMap in Java create any problems?

Yes, there is a potential race condition that exists while resizing HashMap in Java, if two threads at the same time find that now HashMap needs resizing they both try to resize. In the process of resizing HashMap, the element in the bucket which is stored in the linked list gets reversed in order during their migration to the new bucket because Java HashMap doesn't append the new element at the tail instead it appends a new element at the head to avoid tail traversing. If a race condition happens then you will end up with an infinite loop.

-K Himaanshu Shuklaa..

No comments:

Post a Comment