Similar subscript fast search

I need to find a substring similar to this pattern in a huge string. A huge source string can be up to 100 MB in length. The pattern is quite short (10-100 characters). The problem is that I need to find not only exact substrings, but also similar substrings that differ from the pattern in several characters (the maximum allowable error counter is provided as a parameter).

Is there any idea how to speed up the algorithm?

+3
source share
3 answers

1) There are many algorithms associated with finding strings. One of them is the famous Knut-Morris-Pratt algorithm .

2) ( " " ) , . "" .

. [Java]

String pat = "Home";
String source = "IgotanewHwme";

for(int i = 0; i < pat.length(); i++){
    //split around i .. not including char i itself .. instead, replace it with [a-zA-Z] and match using this new pattern.
    String new_pat = "("+pat.substring(0, i)+")"+ "[a-zA-Z]" + "("+pat.substring(i+1, pat.length())+")";
    System.out.println(new_pat);
    System.out.println(source.matches("[a-zA-Z]*"+new_pat+"[a-zA-Z]*"));
}

, .

+1

, / . , , .

0

, Needleman- Wunsch -

, ( , , ..). .

.

0

All Articles