There are two important differences between ASR captioning (also referred to as machine-generated captions) and human-generated captions: the quality and the time required to generate captions.
Machine-generated captioning produces captions very quickly. Typically, captions can be created in about one-quarter of the total video length. For example, an hour-long video could be captioned using ASR in approximately 15 minutes.
ASR captions are typically 70-75% accurate depending on the audio quality in the recording. As a result, machine-generated captions are primarily intended to enable inside-video search, and by default, they aren’t added to the video as closed captions. Instead, the text is stored in the video platform’s database for use with the video search engine.
Of course, ASR also provides a starting point from which people can manually create 100% accurate captions. In video platforms like Panopto, text generated by ASR can be added to the video as closed captions, which people can then edit.
Human-generated captions take substantially longer to produce but provide results that are at least 99% accurate. In some cases, human-generated captions can be turned around in 24 hours, but typically, you can expect a 2-5 day turnaround.
In addition to making video content more accessible to viewers with impaired hearing, captioning can actually improve the effectiveness of video:
- Captions improve comprehension by native and foreign language speakers. An 2006 Ofcom study showed 80% of people who use video captions don’t even have a hearing disability.
- Captions help compensate for poor audio quality or background noise within a video.
- Captions make video useful when a person is watching with the sound off or viewing in a noisy environment that obscures the sound.
- Captions provide viewers with one way to search inside of videos. The link gives an example using the University Replay lecture capture.
Machine-generated captions can be generated in a quarter of the time it takes to play an individual video. Depending on the service used to generate captions and the length of material, it can be well within 24 hours.
Human-generated captions are typically generated within two to five days, depending on the requested turnaround time and service options.
Yes, while a live event is taking place captions can be added to the video content but typically this would involve a third party supplier who provides captions generated by machine or human.
The latest version of Zoom will generate captions while the event is taking place and individual users can choose whether captions are displayed or not, and what size to make the captions.
The captions are then displayed either superimposed over the video content (typically at the bottom of the screen) or as a separate web browser page so that more than one line of text is visible at a time. The latter is useful for viewers who may need to see captions in a larger font size or need longer to read them if someone is speaking quickly.
It is also possible to provide subtitling (simultaneous translation) in this manner, however costs are an important factor in live captioning and translation.
The Web Content Accessibility Guidelines, also known as the WCAG standard, is the most detailed and widely adopted guide for creating accessible web content.
WCAG 2.1 AA generally asks that online content meet four principles that improve accessibility for people with disabilities and also adhere to a certain level of compliance. Both are summarized below:
WCAG Design Principles:
- Perceivable: All relevant information in your content must be presented in ways the user can perceive.
- Operable: Users must be able to operate interface components and navigation successfully.
- Understandable: Users must be able to understand both the information in your content and how to operate the user interface.
- Robust: Content must be robust enough that it can be interpreted by users, including those using assistive technologies (such as screen readers).
WCAG Compliance Levels for Online Video:
- Level A: Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such.
- Level AA: In addition to Level A compliance, captions are provided for all live audio content in synchronized media.
- Level AAA: In addition to Levels A and AA compliance, sign language interpretation is provided for all prerecorded audio content in synchronized media.
To learn more about WCAG 2.11 guidelines visit W3C.org.
Automatic Captions using Youtube
Youtube provides a facility for automatic captioning of videos on it's channels.
Be aware that this only applies to videos which have been recorded and uploaded to the channel - it does not cover live streams. Providing captions (or subtitles) for live streams will involve a third party supplier who will make a charge for their services.
Captions using Panopto
As of August 2020, the University will automatically caption all material recorded by Panopto - the PDLT team have produced a one page guide to assist with embedding and editing captions.
In addition to that, there are links below which provide further information:
Setting up 3rd party integration - the example uses 3PlayMedia as a provider for caption services.
Third party suppliers
AI Media - currently used by the Univeristy's Equality & Diversity team for events and INCLUDE meetings
3PlayMedia - used by Audio Visual & Digital Marketing for one-off captioning requests
Third party suppliers will always make a charge for their services and there may also be costs incurred for cancellation at short notice.
There is some further information here from Digital Marketing & Communications relating to Youtube captions
Sourced from 'Update to Academic Departments Newsletter #8 2 November 2020'
Additional support to departments
Executive Summary version: Additional Funding is available to pay students to support captioning work outside of GTA budgets.
We recognise that the need to provide captions for our recorded online lectures in order to meet legal requirements is burdensome. The University is exploring ways to support departments better and expects to have a medium term solution in place from the start of next calendar year. This will be in place while we look to find a better longer term fix. In the short term the following measures have been agreed with the aim of easing some of the current problems related to casual staff and carrying out the training and supervision needed to support them in this important work.
Short term approach with immediate effect until the end of the 2020 calendar year
Departments are asked to continue seeking creative solutions to do this work from existing resources. However where additional spend on casual staffing is required there will be a streamlined approval process to authorise this spend as business critical. Please note that no additional salaried staff spend is permitted for this work. Departments are asked not to make retrospective bookings using this new approach.
For the remainder of this year Departments should:
Book casual staff in the normal way;
Charge spend to a faculty work-order, details of which have been shared with Departmental administrators, which will allow spend to be tracked for the remainder of the term;
Note that the spend will not form part of GTA budgets.
Departments are asked to use the rate for Casual Office Assistant £8.72 per hour so that we are consistent. However, where the work requires specialist subject knowledge, departments may use the ‘Captioning (Specialist Knowledge)’ rate of £11.31 per hour. There should not be any additional preparation time.
These changes will be effective immediately, but do not need to be applied retrospectively to any existing bookings.
Non specialist captioning work should be offered to all students, including undergraduates. We expect the specialist captioning to be done by postgraduate students with a cognate academic background.