I want to get roughly accurate timestamps for each word in the sound file. I also have source code for an audio file that can be used as a cross-reference source. This is similar to an “intelligent search”, which, in my opinion, contains only the input sound, whereas here I have both audio and text.
Ideally, I would like to do this using open source software and would like to accept most languages as input (e.g. English, French, German, Spanish and, ideally, Russian and Mandarin).
I would even make a decision that could only match the time stamps of different words (for example, if the transcription was not completely accurate). Then cross-referencing the source text with the original to help rebuild things will be easier.
source
share