How to apply Levenshtein distance to a set of target lines?

  • Let TARGETis a set of lines that I expect to be spoken.
  • Let be SOURCEa set of strings returned by a speech recognizer (that is, possible sentences he heard).

I need a way to select a row from TARGET. I read about the Levenshtein distance and the Damerau-Levenshtein distance, which basically returns the distance between the source line and the target line, i.e. the number of changes needed to convert the source line to the target line.

But how can I apply this algorithm to a set of target rows?

I thought I would use the following method:

  • For each row related to TARGET, I calculate the distance from each row in SOURCE. Thus, we get the m-by-n matrix, where n is power SOURCEand n is power TARGET. We can say that the ith line represents the similarity of sentences discovered by the speech recognizer with respect to the ith target.
  • By calculating the average value of the values ​​in each line, you can get the average distance between the ith target and the output of the speech recognizer. Let me call it average_on_row(i)where iis the row index.
  • Finally, for each row, I calculate the standard deviation of all the values ​​in the row. For each row, I also perform the sum of all standard deviations. The result is a column vector in which each element (let it be called stadard_deviation_sum(i)) refers to a row TARGET.

, stadard_deviation_sum, , . , ? ? , , , , , TARGET.

+3
2

: , . log .

, , pi - , pd - ps , pp = 1-ps-pd.

log (pi/pp/k), log (pd/pp) log (ps/pp/(k-1)) , , k - .

, , . ( --), , -- ( AKA EM).

. - (, , k , ...).

+1

, . , , TARGET , . -, , - .

:

  • SOURCE TARGET,
  • SOURCE ,
  • TARGET ,
  • SOURCE TARGET , ,

, p SOURCE, q TARGET, (p, q) . , , , . . .

: , . , .

[, , ]

, "". - "", "". P ( ) P ( | ). , . , , , , , d ( "c", "s" ) < d ( "c", "q" ). ( c s, c q). , .

- P ( | ) P ( | ). , . , . , . .

+4

All Articles