I already use HTK (a hidden Markov model toolkit) for recognizing certain commands, it is used to control my Android application, but in this case I need to transfer some voice data to the server, and this may take longer.
To prevent this delay, I am thinking about using pocketsphinx to recognize voice data locally using the Android application so that I do not need to transfer this sound to the server.
If this is a good idea, is it easy to learn pocketsphinx from scratch? In addition, what are the advantages and disadvantages of both methods (server-based and local voice recognition), and which is better?
source
share