What is an efficient method for handling word abbreviations using Java?

I have a list of words in a file. They may contain words such as who, no, etc. Therefore, reading from him, I must make them such as "who is" and "did not." This should be done in Java. I need to do this without losing a lot of time.

This is actually for handling such queries during a search that uses solr.

Below is an example of the code that I tried using a hash map

Map<String, String> con = new HashMap<String, String>();
        con.put("'s", " is");
        con.put("'d", " would");
        con.put("'re", " are");
        con.put("'ll", " will");
        con.put("n't", " not");
        con.put("'nt", " not");

        String temp = null;
        String str = "where'd you're you'll would'nt hello";

        String[] words = str.split(" ");
        int index = -1 ;
        for(int i = 0;i<words.length && (index =words[i].lastIndexOf('\''))>-1;i++){
            temp = words[i].substring(index);
            if(con.containsKey(temp)){
                 temp = con.get(temp);
            }
            words[i] = words[i].substring(0, index)+temp;
            System.out.println(words[i]);           
        }
+3
source share
3 answers

, , , "" , , , " ", Stemmer, .

, solr. . http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

Edit:
SnowballPorterFilterFactory, , .

+3

@James Jithin :

  • "s" → "is" , .
  • "d" → "" , "d" "ed".
  • "nt" → "not" , "" . ( , "wo'nt" ... .)

, , - , , . , , .

+1

The code can be written as

Map<String, String> con = new HashMap<String, String>();
    con.put("'s", " is");
    con.put("'d", " would");
    con.put("'re", " are");
    con.put("'ll", " will");
    con.put("n't", " not");
    con.put("'nt", " not");

    String str = "where'd you're you'll would'nt hello";

    for(String key : con.keySet()) {
        str = str.replaceAll(key + "\\b" , con.get(key));
    }

with your logic. But suppose it script'sis a word that shows possession, changing it to script ischanges the meaning.

0
source

All Articles