Checking Word Segmentation

This is the next question of this answer and the pseudo-code algorithm that was published by the user. I did not comment on this question because of his age. I'm only interested in checking whether it is possible to split the string into words. The algorithm should not actually split the string. This is the answer to a related question:

Let S [1..length (w)] be a table with Boolean elements. S [i] is true if the word w [1..i] can be broken. Then set S [1] = isWord (w [1]) and for i = 2 to the length (w) calculate

S [i] = (isWord [w [1..i] or for any j from {2..i}: S [j-1] and isWord [j..i]).

I translated this algorithm into simple python code, but I'm not sure if I understand it correctly. The code:

def is_all_words(a_string, dictionary)):
    str_len = len(a_string)
    S = [False] * str_len
    S[0] = is_word(a_string[0], dictionary)
    for i in range(1, str_len):
        check = is_word(a_string[0:i], dictionary)
        if (check):
            S[i] = check
        else:
            for j in range(1, str_len):
                check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
                if (check):
                    S[i] == True
                    break
    return S

. 1) Python, , 2) , S, , , ? is_word - , . .

: , , . :

def is_all_words(a_string, dictionary)):
    str_len = len(a_string)
    S = [False] * str_len
    S[0] = is_word(a_string[0], dictionary)
    for i in range(1, str_len):
        check = is_word(a_string[0:i], dictionary)
        if (check):
            S[i] = check
        else:
            for j in range(1, i): #THIS LINE WAS UPDATED
                check = (S[j - 1] and is_word(a_string[j:i]), dictionary)
                if (check):
                    S[i] == True
                    break
    return S

a_string = "carrotforever"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints FALSE

a_string = "hello"
S = is_all_words(a_string, dictionary)
print(S[len(S) - 1]) #prints TRUE

True .

+3
3

, . , ( 1) python ( 0), S [0] S [1], , S [L-1] . , S. , S [3] , S [2] "". , , , .

def is_all_words(a_string, dictionary):
    str_len = len(a_string)
    S = [False] * (str_len)
# I replaced is_word function by a simple list lookup, 
# feel free to replace it with whatever function you use. 
# tries or suffix tree are best for this.
    S[0] = (a_string[0] in dictionary) 
    for i in range(1, str_len):
        check = a_string[0:i+1] in dictionary # i+1 instead of i
        if (check):
            S[i] = check
    else:
        for j in range(0,i+1): # i+1 instead of i
        if (S[j-1] and (a_string[j:i+1] in dictionary)): # i+1 instead of i
            S[i] = True
            break


    return S

a_string = "carrotforever"
S = is_all_words(a_string, ["a","car","carrot","for","eve","forever"])
print(S[len(a_string)-1) #prints FALSE

a_string = "helloworld"
S = is_all_words(a_string, ["hello","world"])
print(S[len(a_string)-1) #prints TRUE
+2

1) , . : for j in range(1, str_len): for j in range(1, i):

2), S [str_len-1] == true, .

S [i] iff

  • 0
  • S [j-1] == true j<i, [j: i] -

S [str_len-1] ,

+1

, , Python wordsegment module. , , . score, .

pip:

$ pip install wordsegment

segment :

>>> import wordsegment
>>> wordsegment.segment('carrotfever')
['carrot', 'forever']
+1

All Articles