How to calculate a good hash code for a huge list of strings?

What is the best way to calculate a hash code based on the values ​​of this string in a single pass?

With good, I mean this should be:

1 - fast: I need to get a hash code for a huge list (10 ^ 3,10 ^ 8 elements) of short strings.

2 - identify the entire data list, so many lists, perhaps only a couple of different lines should have different hash codes

How to do it in Java?

There may be a way to use the existing hash of the string, but how to combine many hash codes calculated for individual strings?

Thank.

+5
source share
1 answer

- CRC32. :

import java.util.zip.CRC32;

public class HugeStringCollection {
    private Collection<String> strings;

    public HugeStringCollection(Collection<String> strings) {
        this.strings = strings;
    }

    public int hashCode() {
        CRC32 crc = new CRC32();
        for(String string : strings) {
            crc.update(string.getBytes())
        }

        return (int)( crc.getValue() );
    }
}

, lates.

+8

All Articles