Skip to content

All about captions

First let me start off explaining what captions are exactly. Captions are text-based versions that are used when content is presented in an audible format. By including captions for all of your audio-only or multimedia (audio + video) content you are ensuring that everyone can access your content. Some of the individuals who find captions helpful can be:

  • Those who are hard of hearing
  • Those who are deaf
  • Those who are not fluent in the language
  • Those who process information better from hearing the content

Captions Vs. Subtitles

A lot of the time captions and subtitles are used in the same context, but these two are not the same thing. They both are text synchronized with the dialog/narration of the video but only one is specifically intended for those who are deaf or hard of hearing, captions.

Subtitles are intended to provide translation from one language to another.

Benefits of Captions

  • Search Engine Optimization (SEO) - This allows search engines to crawl through the captions text document and easily find any information
  • Second language learners - helps those who are unfamiliar with the language in the audio follow along to comprehend said language
  • Gives users the ability to watch or "listen to" video/audio with the sound off - This can be beneficial if the viewer is in a location where they cannot physically listen to the audio, they can use captions instead to follow along/view the information.

Open vs Closed captions

  • Open captions: Are always on. They are "burned" into the video play during production.
  • Close captions: Are time-stamped text files that are loaded into the media player to be displayed when the video is toggled on

Closed captions are beneficial because they allow the user to toggle them on and off but they do depend on the video players support of caption files. These are fine for online videos as long as you ensure that the player you are using supports captions.

Open captions benefit is that no matter what player a viewer may use, the captions will always be there. These are recommended for when showing videos at a conference so that the presenter doesn't have to worry about the player, the caption files, or that the captions are toggled on.

What should be included in captions

  • Conventional spelling - avoid whiting words phonetically (unless they phonetic spelling is essential to the meaning of the content). Captions should be high in quality and accuracy, and presented as clearly as possible
    • Highly recommended to met WCAG 2.1 AA conformance
  • Descriptions of sounds in the terms of sounds themselves not the actions causing the sounds
    • Highly recommended to met WCAG 2.1 AA conformance
  • Must be verbatim for scripted content (expect when intentionally creating simplified captioning for a relevant target audience.)
    • Include all the um's and stuttering
    • This is required to meet WCAG 2.1 AA conformance
  • Should be verbatim for unscripted or live content
    • Broadcasts, documentaries, interviews, etc
    • Use discretion with filler words to avoid the readability
    • Should be quick and easy to read
    • Optional to meet WCAG 2.1 AA conformance
  • Background sounds must be conveyed in the captions
    • In [brackets] or (parenthesis)
    • Do not clutter captions with too many descriptive words, this can be distracting to your viewer
    • This is required to meet WCAG 2.1 AA conformance
  • Any speech that is spoken off-screen must be captured in the caption(s)
    • This is required to meet WCAG 2.1 AA conformance
  • You must identify who is speaking
    • When it is not obvious who is speaking
    • When the speaker is off screen
    • When there are multiple speakers present and its not obvious who is speaking
    • This is required to meet WCAG 2.1 AA conformance
  • Captions should use punctuation to convey emphasis whenever possible
    • Avoid writing extra text to explain emphasis
    • Provides clarity
    • Facilitates easy of reading
    • Highly recommended to met WCAG 2.1 AA conformance
  • Captions must not reveal intentionally withheld information in the content before the appropriate time
    • Providing information before it should be can ruin the experience for you user
    • This is required to meet WCAG 2.1 AA conformance
  • Music should be identified by title and the artist whenever possible - unless doing so would be inappropriate
    • Highly recommended to met WCAG 2.1 AA conformance
  • Important lyrics should be included in captions (if they are relevant)
    • Use discretion when including lyrics
    • Should be presented verbatim and set off with music notes
    • Highly recommended to met WCAG 2.1 AA conformance
  • When speech is inaudible or difficult to perceive clearly captions should say so using neutral language
    • Avoid using words like unintelligible
    • Highly recommended to met WCAG 2.1 AA conformance
  • Strong language should be retained and not edited out of captions whenever possible
    • Should be bleeped or muted to match the style/content requirements of the intended audience
    • Highly recommended to met WCAG 2.1 AA conformance
  • Captions should indicate when the speech is whispered or mouthed
    • Can be done by labelling before the dialog
      • Label can be a parenthesis followed by a colon
    • Highly recommended to met WCAG 2.1 AA conformance

How captions should be visually presented

  • Should not exceed three lines on a screen at a time
    • Makes it easier and faster to read
    • Allows captions to not block essential visual components if accompanied by video
  • Caption line breaks when necessary should be inserted at logical points between phrases rather than in the middle
  • Longer sentences should be broken up accordingly to logical, or grammatical breaks and either use articles, conjunctions or prepositions in the sentences to start a new line/caption
  • When using descriptive wording, captions can be broken up where the descriptive wording stays with any of the wording being described
  • Should be typed in mixed case
    • Increases readability
  • Default font should be sans-serif
    • Helps enhance readability
  • Maximum number of characters should not exceed 32
  • Should remain on the screen for a minimum of 2 seconds
    • May need to take in account of the number of words, should give the user 0.3 seconds per word if possible
  • Captions should be positioned to not obscure on screen text, faces, or other important information
  • Captions should be precisely synchronized to the audio
    • Except for when this makes captions hard to read
  • Default color combo should be black background with white text
    • Provides high contrast
  • Default contrast ration between the font and background color must be a minimum of 3:1 for 18pt font
  • Default font-weight should be normal
    • Avoid bold fonts - they make it more difficult to read
  • Color captions should not be used as the only way to convey meaning
    • If color is used to convey meaning this can cause users to miss its significance if they are colorblind
  • Use punctuation and italics to imply emphasis
    • You can use italics or all caps when punctuation doesn't convey full meaning
  • Quotation marks & Mixed case capitalization should be used to designate titles when appropriate
    • Movies
    • Books
  • Last caption on the frame should be removed from the screen during long silent interval's
    • Remove after 4 - 5 seconds
    • Leave a minimum of 1.5 seconds between the captions to help prevent a jerky visual effect
  • Periods of silence should be noted when visual content gives the impression that there may be important sounds or speech
  • Additional screen time for caption frame should be added when the words are unfamiliar or uncommon
  • Additional screen time for captions should be added when there is a lot going on visually on screen
  • Media players should allow for visual customization of captions
  • Multiple captions formats should be provided

Caption file formats

A caption file will contain:

  • All the spoken words
  • All the descriptions conveyed through video/audio
  • Time-codes of when the captions should appear

There are multiple caption formats available

  • Basic
    • SubRib (.srt)
    • SubViewer (.sbv, .sub)
    • LRC (.lrc)
  • Advanced
    • WebVTT (.vtt)
    • SAMi (.smi or .sami)
    • TTML (.ttml)

Different media players support different captions files so its important to know which one your player supports. Most players support the basic SubRIb file format but more and more are starting to support the advanced WebVTT format. This WebVTT format allows for more flexibility in the styling and positioning of the captions.

When providing a caption file you should make sure at least one of the files ia WebVTT file. This format allows users to set styles in their operating system, then the settings will be consistent across all WebVTT videos & browsers that support WebVTT.