This was a known issue with Captivate 5 versions. Captivate tends to clip off any pure silence. So you need to ensure that there is at least SOME low volume audio at the beginning and the end of each clip.
If the voiceover is from a real human, not generated from text, try copying some audio from pauses in the voiceover and then pasting this "dirty silence" into the area before and after the waveforms.