Hello,
Have a working example. I created four text containers T_one, T_Two, T_three, T_Four and grouped them in Gr_T.
Created a user variable v_track, with an initial value of 1 (because first text is visible when starting)
Created buttons Bt_Next (visible) and Bt_Back (initially invisible) with pausing point at 1.5sec (will be later for you if you want first to let hear the audio clip)/
Created button Bt_Cont (invisible) with pausing at 2sec (later than the other two).
Here is the timeline:
For the Next button, created a conditional advanced action with three decisions. Here is the first one, that must be the first (logic) decision and is when the third caption is visible, v_track=3:
You can send me a private message with an email address if you want to see the other ones.
For the Back button, sequence is reversed, here is the first one:
Lilybiri