I usually find the simplest way to give the user on-demand audio information on a quiz slide is just to insert a rollover caption or rollover image and attach the audio to the caption or image that appears. The audio begins as soon as they roll over the object and ceases as soon as they roll out. If you use a caption, you can place the transcript of the voiceover in the caption to effectively give you CC text at the same time.
I don't really see the point of setting up actions and toggle buttons when there is a far simpler solution available. Of course, if you are outputting to HTML5, then rollover objects are not supported and you DO need to consider the more complex solutions.