It is the same work flow IMO but you use a text caption whereas I prefer a highlight box for which I have a dedicated style. Think you find my blog post more complicated because I explained also how to control the audio using show/hide the object it is attached to. In your case you only use them for their timing functionality. In your original question you didn't speak about TTS... something I never use myself, prefer doing VO.
Lilybiri