Captions

A caption is text that appears on a video while it is playing. A caption identifies all of the spoken words and sounds on the video, synchronously with the video.

A jack russel terrier faces a box of dog toys while a baseball game plays in the background. The closed captioning at the bottom of the screen from the baseball game reads: there starting pitcher to get a chance to actually see him throwing and.
Closed captions on a YouTube video of my dog Kaylee dragging a toy box around a room

Captions should identify who is speaking, and what they said, as well as environmental sounds such as applause or laughter.

  • Closed captions are captions that you can turn on and off.
  • Open captions are captions you can’t turn on and off, because they’re permanently burned into the video.
  • Subtitles are language-specific captions. In other words, if you’re watching a movie performed in French and the captions are written in English, those are subtitles.
  • A transcript is a written presentation of everything that happened in the audio or video source including both audible and visual elements. Sometimes transcripts are synced to the presentation of the audio or video file, but more often they’re a separate page or file.

How to write captions

Warning!
Captions take many hours to produce — more than you’d think unless you’re experienced in it — so plan a huge block of time. No, more than that. Probably double your initial estimate or more.

Captions are required to satisfy conformance for 1.2.2 Captions (Prerecorded) – Level A and 1.2.4 Captions (Live) – Level AA.

There are two ways to create captions: automatic speech recognition (ASR) and manual creation of the captions. While ASR captions may be the fastest route, it’s important to note that they only include speech elements, and as such can rapidly lose meaning for the end user. WCAG-compliant transcripts and captions require both speech and non-speech audio information, including sound effects, speaker identification, and differentiation of layered recordings.

For example, a podcast in the format of an old-style radio show about a murder mystery will involve a cast of characters playing different parts. It will also involve sound effects — the scream of a woman who finds a body, the sound a gunshot — that are required to understand what’s going on.

Most companies outsource transcript and captioning writing to services. If you are going to complete your own captioning, you may need captioning software. Captions / Subtitles by the W3C Web Accessibility Initiative gives an excellent overview to when to add captions, how to create captions, and how to position and style captions.

Additional resources