If you use the Google Directions API for instructions, you will receive them in JSON format, which includes waypoints, route length, expected trip duration, and instructions. You can get them, and then use text for speech to make sound from a specific instruction when it's time for it.
source
share