April 02, 2020

#Collections: Part 5 (Hashmap implementation in pre-Java 8 and Java 8)


What is the difference in Hashmap implementation in pre-Java 8 and Java 8?

There is no change to how the Java developer uses and implements HashMap < K, V > . In Java 8, they have made some internal change in HashMap, well that changes not only affect HashMap but also, LinkedHashMap, and ConcurrentHashMap.

a). The alternative String hash function added in Java 7 has been removed in Java 8. The default hash formula on strings was optimized to being both faster and allow for less clashes on average. Thus if your key is a string (some someone’s name in a contacts list), it should calculate that hash faster than before and the likelihood that it would calculate a hash pointing to the same bucket is less.
b). Buckets containing a large number of colliding keys will store their entries in a balanced tree instead of a linked list after certain threshold is reached.

HashMap use the hashCode() and equals() method of keys to split values between buckets. As per the convention, the number of buckets should be slightly higher than the number of entries in a map, so that each bucket holds only few value. When looking up by key, we very quickly determine bucket (using hashCode() modulo number_of_buckets) and our item is available at constant time.

But hash collisions occurs when multiple hashCode() values end up in the same bucket, because of which values are placed in an ad-hoc linked list. In worst case, when all keys are mapped to the same bucket, thus degenerating hash map to linked list - from O(1) to O(n) lookup time.

Prior Java 8 after calculating hash from hash function if more then one element has same hash than they are searched by linear search and the complexity is O(n). In Java 8 that search is performed by binary search so the complexity will become O(Log N).

Till Java 7 the bucket was just a linked list, i.e. if you did have hash collisions then looking between them was a normal linear search on that bucket. In Java 8, if there’s only a handful of clashing items in a bucket it’s still the same. But once there are more than a predetermined amount (i.e threshold) of clashing items in the same bucket, it is turned into a balance binary tree instead.

The threshold is described in JEP180, when a single bucket/bin has more than TREEIFY_THRESHOLD=8 entries it will be converted into a tree.

Interestingly Java 8 is on average 20% faster than Java 7 in simple HashMap.get().


-K Himaanshu Shuklaa..

No comments:

Post a Comment