Welcome to an in-depth look behind the scenes on how to use the ChatGPT platform API to create subtitles and how to embed them directly into a video.
This functionality is a core part of Dugongi, and in this tutorial, I’ll walk you through building this capability yourself. You’ll need to have FFmpeg installed, an account with OpenAI, and a Node.js environment set up.
FFmpeg is essential for transforming and converting audio from one format to another. In this tutorial, we will first use FFmpeg to extract audio from a video file. Then, after obtaining subtitles from OpenAI, we will use FFmpeg again to embed the subtitles into the video.
Assuming you have a video file named myvideo.webm, the first step is to extract the audio. To manage API limitations effectively, compress the audio as much as possible.
ffmpeg -i myvideo.webm -ac 1 -b:a 16k -map a output.webm
FFmpeg parameters explained:
The result is a compressed audio file named output.webm, ready for subtitle generation.
The OpenAI API offers a straightforward endpoint for audio inputs. Although there are libraries for several programming languages, here's an example in Node.js:
This script reads the audio file, sends it to OpenAI, and retrieves the subtitles in VTT format, which are then saved to subtitles.vtt.
To embed the subtitles into the video, use the FFmpeg subtitles filter:
This process reencodes the video with subtitles embedded, which may take some time.
All these steps can be streamlined using Dugongi. Simply record a new video or upload an existing one to the Dugongi cloud. Then click the "create subtitles" icon under the video to generate subtitles automatically. Finally, click the "mp4" icon and select "burn subtitles" to embed them directly into your video.