Java stability and string.hashCode () on all computers in the cluster

Question

Java stability and string.hashCode () on all computers in the cluster

I asked a similar question for a method string.GetHashCode()in .NET. From this point on, I realized that we cannot rely on an implicit hash code implementation for buit-in types if we use it on different machines. Therefore, I assume that the Java implementation is String.hashCode()also unstable in different hardware configurations and can behave differently in virtual machines (do not forget about different VM implementations)

We are currently discussing a way to safely convert strings to numbers in Java by hashing, but the hash algorithm must be stable on different nodes of the cluster and be fast for evaluation, since there will be a high frequency. My teammates insist on a native method hashCode, and I will need some reasonable arguments to get them to reconsider a different approach. Currently, I can only think about the differences between the machine configurations (x86 and x64), possibly different JVM providers on some machines (hardly applicable in our case) and byte order differences, depending on the machine that is used to start. Of course, character encoding is likely to be considered as well.

While all this comes to my mind, I am not 100% sure of any of them to be a strong enough reason, and I would appreciate your expertise and experience in this field. This will help me build stronger arguments for writing a custom hashing algorithm. In addition, I will be grateful for advice on what not to do with its implementation.

+5

java hashcode cluster-computing

Ivaylo slavov Mar 28 '13 at 22:49

source share

2 answers

, . , ( 10 ), . , , , .: -)

, JRE - .

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }

        hash = h;
    }

    return h;
}

+3

user949300 28 . '13 22:57

Louis Wasserman · Accepted Answer · 2013-03-28T22:55:24+0000

The implementation String.hashCode() is indicated in the documentation, therefore it is guaranteed to be consistent:

The hash code for the String object is calculated as
  s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
using int arithmetic, where s [i] is the i-th character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)

Java - , .

, String , . , Charset. (, String , - byte[] a String.)

Java stability and string.hashCode () on all computers in the cluster

More articles: