April 02, 2020

#Collections: Part 4 (How ConcurrentHashMap Works Internally in Java?)

What is ConcurrentHashMap?
ConcurrentHashMap class is introduced in JDK 1.5, which implements ConcurrentMap as well as Serializable interface also.

It is a thread-safe implementation of the Map interface provided by Java's concurrency package (java.util.concurrent). It allows concurrent access to the map without the need for external synchronization.

Segment, which is internal data structure and part of map is locked while adding or updating the map. Due to this the ConcurrentHashMap allows concurrent threads to read the value without locking at all. This data structure was introduced to improve performance.

Difference between HashMap and ConcurrentHashMap?
  • HashMap is non-Synchronized in nature i.e. HashMap is not thread-safe whereas ConcurrentHashMap is thread-safe.
  • HashMap performance is relatively high because it is non-synchronized in nature and any number of threads can perform simultaneously. But ConcurrentHashMap performance is low sometimes because sometimes Threads are required to wait on ConcurrentHashMap.
  • While one thread is Iterating the HashMap object, if other thread try to add/modify the contents of Object then we will get Run-time exception saying ConcurrentModificationException.Whereas In ConcurrentHashMap we wont get any exception while performing any modification at the time of Iteration.
What is Concurrency-Level, Load-Factor and Initial Capacity of ConcurrentHashMap?

Concurrency-Level defines the number which is an estimated number of concurrently updating threads. The implementation performs internal sizing to try to accommodate this many threads. 

Initial Capacity performs internal sizing to accommodate these many elements.

The load factor is the measure that decides when to increase the capacity of the Map. It is a floating-point number usually between 0 and 1 (inclusive). The default load factor is 75% of the capacity. This means the HashMap can be filled up to 75% of its capacity before it will be resized.

When the number of elements in the HashMap exceeds the product of the current capacity and load factor, the HashMap is resized (doubled in capacity by default) and all existing elements are rehashed into the new larger array. This is done to maintain the load factor within acceptable limits and to ensure efficient access times.

How does ConcurrentHashMap handle resizing?
ConcurrentHashMap allows concurrent read and write operations even during resizing. When resizing occurs (due to adding more elements), it locks only a portion of the map (not the entire map), minimizing contention and allowing other threads to operate on other segments.

What is a Segment in ConcurrentHashMap?
A ConcurrentHashMap has internal final class called Segment. The ConcurrentHashMap is internally divided in segments of size 16, so at maximum 16 threads can work at a time. It means each thread can work on a each segment during high concurrency and at-most 16 threads can operate at maximum which simply maintains 16 locks to guard each bucket of the ConcurrentHashMap.

Key points of ConcurrentHashMap
  • ConcurrentHashMap class is thread-safe
  • The underlined data structure for ConcurrentHashMap is Hashtable.
  • At a time any number of threads are applicable for read operation without locking the ConcurrentHashMap object which is not there in HashMap.
  • In ConcurrentHashMap, the Object is divided into number of segments according to the concurrency level. Default concurrency-level of ConcurrentHashMap is 16.
  • In ConcurrentHashMap, at a time any number of threads can perform retrieval operation but for updation in object, thread must lock the particular segment in which thread want to operate. This type of locking mechanism is known as Segment locking or bucket locking. At a time 16 updation operations can be performed by threads.
  • null insertion is not possible in ConcurrentHashMap as key or value. If you attempt to insert a null key or null value, it will throw a NullPointerException.
Why we need ConcurrentHashMap when we already have Hashtable?
Hashtable also provides concurrent access to the Map. Entry objects by locking the entire map to perform any sort of operation (update, delete, read, create). In ConcureentHashMap instead of blocking the entire map, Segment's are blocked. This improves the performance.

Constructors Of ConcureentHashMap
ConcurrentHashMap map=new ConcurrentHashMap(); Creates a new, empty map with a default initial capacity (16), load factor (0.75) and concurrencyLevel (16).

ConcurrentHashMap map=new ConcurrentHashMap(int initialCapacity); Creates a new, empty map with the specified initial capacity, and with default load factor (0.75) and concurrencyLevel (16).

ConcurrentHashMap map=new ConcurrentHashMap(int initialCapacity, float loadFactor); Creates a new, empty map with the specified initial capacity and load factor and with the default concurrencyLevel (16).

ConcurrentHashMap map=new ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel); Creates a new, empty map with the specified initial capacity, load factor and concurrency level.

ConcurrentHashMap map=new ConcurrentHashMap(Map m); Creates a new map with the same mappings as the given map.

Can you explain the difference between putIfAbsent and computeIfAbsent in ConcurrentHashMap?
putIfAbsent(key, value) adds the key-value pair to the map if the key does not already exist. computeIfAbsent(key, mappingFunction) computes the value for the key using the provided mapping function and adds it to the map if the key is not already present. Both operations are atomic.

How ConcurrentHashMap Works Internally in Java?
The segment in CHM(ConcurrentHashMap) is a specialized hash table. If you observe segment is using Reentrant lock. Because of this any write operation(remove/put/clear) will work in 3 steps:
1). Wait until it gets the lock on that Segment.
2). Perform the operation.
3). Release the lock.

Segments are the core part of CHM, since it is responsible for write operations, wherein it obtains the lock, performs the write operation, and releases the lock after it. Simultaneously many different segments can do the same operation, each taking its own lock.

The CHM has the array of these segments, signifying the degree of synchronization it can handle. The default size is 16, meaning max 16 threads can work at a time.

All the segments are integrated, so that whenever CHM methods are called, they act in accordance with different segments, supporting multi-level write operations. Each segment will obtain its own lock and doing its operation independently of other segments.

Each thread can work on each segment during high concurrency and at most 16 threads can operate at max which simply maintains 16locks to guard each bucket of the ConcurrentHashMap. For every Write operation, ConcurrentHashMap methods will find which segment to go to and will call that segment’s method.

For the write operations, the Segment(inner class) has real implementations, where the CHM methods internally call the segment methods. But for the read operations it does not require any locks, so the CHM itself has their implementation.

-K Himaanshu Shuklaa..

No comments:

Post a Comment