Efficient substring search in a large text file containing 100 million lines (no duplicate line)

Question

Efficient substring search in a large text file containing 100 million lines (no duplicate line)

I have a large text file (1.5 GB) containing 100 million lines (no duplicate String), and all lines are arranged in lines in the file. I want to do wepapplication in java so that when the user gives the keyword (Substring), he gets an account of all the lines that are present in the file that contains this keyword. I know one LUCENE technique already .. is there any other way to do this. ?? I want a result in 3-4 seconds. My system has 4 GB of RAM and a DUAL CORE configuration .... I need to do this in "JAVA ONLY"

+5

java file mysql search lucene

Vinay soni Jan 31 '13 at 19:09

source share

4 answers

Hemant · Answer 1 · 2013-02-01T05:16:14+0000

Try using hash tables. Another thing that can be done is any method like MAP-REDUCE. I want to say that you can try using an inverted index. Google uses the same technique. All you can do is create a stop word file where you can put words that can be ignored, for example. I, am, a, a, an, in, on, etc.

This is the only thing that I think is possible. I read somewhere that you can search for arrays.

justinvf · Answer 2 · 2013-02-01T05:42:36+0000

Are your keywords expected to have many matches? If so, you can save the hash map from the keyword ( String) to the file location ( ArrayList). You cannot store all the lines in memory, but using the service data of the object.

, , , , , . 4 . . , .

, , , . . , . - , Reddis.

phatfingers · Answer 3 · 2013-02-01T06:31:45+0000

. :

/A
/A/AA
/A/AB
/A/AC
...
/Z/ZU

, , , . . . , , .

Nabou · Answer 4 · 2013-02-03T16:13:55+0000

, , . Trie ; , , .

Efficient substring search in a large text file containing 100 million lines (no duplicate line)

More articles: