Transcribing is the process of converting audio formats into text. This should include any significant background noises and auditory information.

The following information has been adapted from the Described and Captioned Media Program's Captioning Key and Khan Academy

Example:

This guide contains example captions in gold boxes. 

General Considerations

The Described and Captioned Media Program's (DCMP) Captioning Key describes the following crucial elements for quality captioning:

  • Accurate – errorless captions
  • Clear – complete textual representation of audio and non-speech information
  • Consistent – consistency in style and presentation throughout a video
  • Readable – captions are displayed in sync with audio and with ample time to read
  • Equal – meaning and intention are completely preserved from audio to captions

Verbatim

In general, it is recommended to transcribe using clean verbatim; however, transcribers need to make important decisions on what is crucial auditory information for equal access.

Transcription should never fix a speaker's grammar. It should accurately reflect what is said.

Similarly, transcription should never censor harsh or offensive language. Coarse language should be transcribed as heard.

  • If it is not bleeped, the language should be included in the transcription.
  • If it is bleeped, it should be noted as such with a descriptive caption.

Clean verbatim

Clean verbatim is defined by Verbit, a captioning vendor, as a style of transcription "that still captures all speech but removes audio components that may impede readability." This is not a summarization of information, but a removal or cleaning up of confusing or unneccesary elements such as:

  • Stutters
  • Filler speech, including “um,” “uh,” etc.
  • The repetitive use of “like,” “actually,” “sort of,” “kind of”, etc.
  • Interjections made by an interviewer, such as “yeah” and “mm-hmm”
  • False starts or redirects
  • Run-on sentences

True verbatim

True verbatim is transcribing everything as said, including stammers, repetative speech, false starts, etc. It should be used when content requires these elements for full understanding. Examples of content requiring true verbatim may include:

  • Speech Language Pathology
  • Study of language or language development
  • Baby development
  • Analysis of speeches or presentations
  • Jokes

Inaudible and Indiscernable Audio

There are times when audio is not able to be heard or discerned for transcription. In these cases, use the following descriptive caption either inline with text or (for larger chunks of audio) as its own caption cell:

Example:

[Inaudible]

Transcription Containing Citations or Extra Information

Transcription should never contain citations or additional information that is not actually spoken or directly related to the perception of the auditory content.

Speaker Identification & Description

On-Screen Single Speaker Identification

When a speaker changes (including the first time someone talks in a video), denote this by using two chevrons and a space.

Example:

>> This is a correct example
of noting a speaker change.

Off-Screen Speaker Identification

When speakers are not on screen, or it is unclear who is speaking, identify/describe the speaker. A speaker’s identification should be after the double chevrons, capitalized, in parenthesis, and followed by a colon and space.

Use the name or title of person speaking, if known.

Example:

 >> (Beyonce): I am incredible.

If name and/or title are unknown, identify the speaker using the same information a hearing viewer has. Avoid the use of language that may unnecessarily gender or stereotype a speaker.

Example:

>> (Narrator): I am talking about this.

Example:

>> (Speaker 1): I heard Harrisonburg
is a great place to live.

Multiple Speaker Identification

When multiple people (two or more) are speaking in unison, denote this with three chevrons, a speaker description in parenthesis (see above), and a colon and space. Unlike with single speaker identification, you should always include the speaker identification/description for multiple speakers.

Example:

>>> (All): Good morning!

Example:

>>> (Minions): Oooh! Ahhh!

Tone & Volume

When a speaker makes a notable shift in tone or volume that is important to the context of the video, it should be noted in parenthesis followed by a space (capitalization and colons are not used). Descriptive language for these captions usually ends in -ing or -ly.

Example:

>> (whispering) Where are we?

Example:

>> (angrily) That is mahogany!

Descriptive Captions & Music

Important and relevant sounds and music in the context of a video should be noted in captions. These descriptive captions should be in square brackets [ ] and in their own caption cell. Chevrons are not used. Capitalize the first letter of the first word of the descriptive text, like a sentence.

Sounds

Describing sounds should be informative, specific, and concise. 

Never use the past tense when describing sounds. Captions should be synchronized with the sound and are therefore in the present tense.

Example:

[Laughing]

Example:

[Dings]

Sounds should include the source of the sound if not shown on screen.

Example:

[Leaves crunching]

Example:

[Audience groaning]

Music

Any music lasting longer than five seconds should be identified in a descriptive caption. Music descriptions should be noted in square brackets and the first letter should be capitalized.

Music captions should not take precedence over dialogue captions. If the presentation rate does not permit enough time for both, prioritize the dialogue captions.

Instrumental Music

The following simple caption is sufficient in most situations:

Example:

[Music]

You may describe the music, avoiding subjective language like "beautiful" or "ugly" (unless it's important to the context of the video).

Example:

[Soft music]

Example:

[Ominous piano music]

If the instrumental music playing is identifiable, state the performer/composer and title of the piece in the descriptive caption. The title of the piece should be capitalized (unless officially stylized differently) and in quotation marks.

Example:

[Beethoven’s “Symphony No. 5” plays]

Music With Lyrics

Music with audible lyrics should start with a descriptive caption with the performer/composer and title of the piece, if known/identifiable.

Example:

[Alex singing “La Vie En Rose”]

Example:

["Wheels on the Bus"
by Lil' John]

Transcribe lyrics when possible. If music is playing during dialogue, prioritize dialogue.

Start with the identifying descriptive caption. Then use double chevrons and “(singing)” followed by a space to transcribe the lyrics. As the music ends, add a descriptive caption indicating the end of the music.

Example:

["Baby Love" by The Supremes]

>> (singing) Baby, love
my baby love,

I need you,
oh, how I need you

[Music fades]

Punctuation

Use the following sparingly: ellipses, semi-colons, and exclamation points.

Abbreviations

Abbreviations should contain periods, unless it is an acronym pronounced as a word or typically written without periods.

  • U.S.A.
  • NASA
  • HTML

Quotation Marks

Quotation marks should be used for quotes or titles of media. Italics may also be used for titles, though some captioning software (including YouTube & Canvas Studio) do not support this extra formatting.

For quotes, beginning quotation marks should be used for each caption of quoted material except for the last caption. The last caption should have only the ending quotation mark.

Example:

>> Maya Angelou once said,
"My mission in life

"is not merely to survive,
but to thrive;

"and to do so with some passion,
some compassion,

some humor, and some style."

For titles of media, place the quotes around the title, making sure the title is properly capitalized.

Example:

>> He loved “The Notebook.”

When quotations are used at the end of a sentence or phrase, punctuation should always go inside quotation marks (unless it would change the meaning of a title).

Example:

>> When did you first watch
“The Golden Girls?”

Example:

>> Did you hear about
that new movie “Help!”?

Commas

Use the Oxford comma.

Example:

>> I bought butter, eggs, and milk
from the store.

Hyphens & Dashes

Use a hyphen/dash to link parts of a compound word and compound modifiers.

  • Mexican-American.
  • Thirty-four people
  • Guitar-shaped

In mathematics, a hyphen is sometimes needed between a variable and another noun.

  • x-value
  • y-axis

Hyphens & Dashes in Age

Hyphens are used when typing out age when the age is a noun or is modifying a noun.

  • Well, that’s a 16-year-old for you.
  • My two-year-old brother wants a puppy.

Hyphens are not used when the age comes after a verb.

  • I’m turning 23 years old in March.
  • He is two years old.

Em dashes & Double Dashes

An em dash or double dash is used when a speaker is interrupted in the middle of speaking.

You can use a double dash (--) for this or hold down the “Alt” key and type 0151 to create a real em dash (—).

Example:

>> I think we should--

>> No, I don't think so.

Ellipses

Use ellipses (…) sparingly. The most common use is when there is a significant pause within a caption or when a speaker trails off.

Do not use ellipses to just indicate that a sentence continues into the next caption.

Capitalization

Proper capitalization is essential for context.

You should capitalize:

  • Proper nouns (i.e., names, titles, days of the week, months, etc.)
  • Names of religions, races, peoples, and languages.
  • The first letter of elemental abbreviations.

Do not capitalize:

  • The spelled-out names of elements/chemical compounds.
  • The seasons of the year.
  • Generic drug names.

Numbers (General)

For numbers in the context of mathematics, please see the Numbers (Mathematics) section.

Numbers zero through 999,999:

  • Spell out whole numbers zero through nine.
  • Use numerals for numbers 10 through 999,999.
  • Numbers with five or more digits should include commas.
  • Do not start a sentence with a numeral, spell out the number. (Except for years.)

Example:

>> Three people showed up.

Example:

>> Seven thousand people
attended the march.

Example:

>> 2024 was the warmest year
on record.

Numbers one million and above:

  • For more general numbers, use the general rules for numbers for the first part, and then the “million,” “billion,” etc.
    • One million
    • 500 trillion
  • Use numerals for the entire number if the number is very specific.
    • 2,532,865

Decimals

All decimals are written in numerals, as spoken.

There should always be a number to the left of the decimal point. If a speaker says “point three one,” you should insert a zero before the decimal (0.31).

If a decimal is spoken as “32 hundredths,” it should be captioned as “32 hundredths.”

Consistency

Be consistent with numbers throughout an entire sentence.

Example:

>> Eight people wore blue shirts
and fourteen wore red shirts.

If a sentence contains numbers zero through nine as well as above ten and/or decimals, use numerals.

Example:

>> The measurements were
9 kilograms, 0.34 grams,

and 958,342 pounds.

Lists & Ranges

If the list or range spans numbers zero through nine as well as above ten and/or decimals, use numerals.

  • “One, five, 25, 125, 1 thousand, 10 million” should be “1, 5, 25, 125, 1000, 10 million.”
  • “Questions one through 10” should be “Questions 1 through 10.”

For ranges of numbers, you should write out the word used between numbers.

  • “657-784” should be “657 through 784.”
  • “Ages 21-29” should be “Ages 21 through 29.”

Years

Years should be expressed as follows:

  • Single year:
    • If referred to as “Ninety-five”: ‘95
    • If referred to as “1995”: 1995
  • Decades:
    • If referred to as the “sixties”: Sixties or ‘60s
    • If referred to as “1960s”: 1960s

Dates

Use numerals for dates.

When only the date and month is mentioned (no year), it is necessary to use “th,” “st,” or “nd.”

  • Their anniversary is May 22nd.
  • Labor Day is on September 3rd this year.

If day, month, and year are spoken, only use numerals

  • Their anniversary is May 22, 1986.

Time

Time should always be written numerically, except when expressed as “noon” or “midnight.” 

Example:

>> It was 3:30 in the morning
by the time they arrived.

Money

Sums of money in U.S. currency may be written with a dollar sign or the word “dollar(s).” If using the dollar sign, use numerals.

  • $25
  • 37 dollars
  • $100 million

Temperatures

Temperatures should be written as spoken.

  • Negative two degrees Celsius.
  • It’s 85 degrees in January.

Numbers (Mathematics)

Numbers in math and science content build upon the same guidelines as general numbers (i.e., numerals vs. words, lists & ranges, decimals).

Percentages

Percentages may be written with a percentage sign or the word “percent.” If using the percentage sign, use numerals.

  • 25%
  • 25 percent
  • 1%
  • One percent

Fractions

Fractions should be written as spoken. Fractions with numbers 10+ or decimals should be written in numerals.

  • ¼, depending on how the speakers says it, could be captioned as:
    • One over four
    • One-quarter
    • A fourth
    • One-fourth
    • One divided by four
  • 372/8 could be captioned as:
    • 372 over 8
    • 372 divided by 8
  • (2x-5)/9 could be captioned as:
    • 2x minus 5 over 9
    • Open parenthesis 2x minus 5 closed parenthesis divided by 9.

Graphing

For graphing terms, write it out as the speaker says:

  • (-10, 3) could be captioned as:
    • Negative 10 comma 3
    • Negative 10, 3

Quadrants are written with Roman numerals (e.g., quadrant III). Only capitalize quadrant if it begins a sentence.

Axes and coordinate references require a hyphen:

  • x-coordinate
  • y-axis

Units of Measurement

Spell out all units of measurement (inch, feet, joule, gram, ampere, volt, meter, pascal, kelvin, hertz, coulomb and newton.)

Functions

Spell out functions such as “f of x” instead of f(x).

Operators & Symbols

Most symbols should be spelled out how they are spoken.

  • 3x12 could be captioned as:
    • 3 times 12
    • 3 multiplied by 12
  • -9 could be captioned as:
    • Negative nine
    • Minus nine
  • -.2 could be captioned as
    • Negative 0.2.
    • Negative 2 tenths

Symbols such as pi should have spaces between them and the next variable or term. For example, “Two pi r” NOT “Two pir” (or if the speaker says “Two times pi times r,” reflect that). Try to be as clear and consistent as possible, using spaces as needed to avoid confusion such as pi being mistaken for p times i.

Powers & Exponents

Treat powers and exponents like other symbols and caption them as spoken.

  • X 2 could be captioned as:
    • X-squared
    • X to the second power
    • X to the power of two
    • X raised to the two
    • X to the two

If a power/exponent is said with nothing between the number and the exponent (e.g., “x two”), add in “sub” for subscript or “to the” for superscript.

Equations & Expressions

Equations and expressions should be written out as spoken following basic number conventions.

  • 1+1=2 could be captioned as:
    • One plus one equals two
    • One and one is two

Variables

When a number is attached to/paired with a variable, the number should be captioned as a numeral and there should not be a space between the two. Variables are typically kept lowercase unless beginning a sentence or appearing in the video/example as a capital letter.

  • 2x 2 + 3 = 8 could be captioned as
    • 2x squared plus 3 equals 8.
  • X+2 is captioned as
    • X plus two

Back to Top