August 30, 2005
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Listening to the Court of Appeals

My recent article on transcription (STM March/April 2005) prompted appeals for more information about voice writing. Voice writing is employed by court reporters and other highly-trained transcriptionists to generate accurate, verbatim transcriptions of human-human interactions, such as courtroom/legislative proceedings, corporate meetings, and real-time closed captioning for hearing-impaired individuals. It’s of interest to the speech-processing industry because voice writing can employ speech-recognition technology (ASR). Here are answers to some of your questions.

Why can’t you just tape record everything and have someone transcribe later?

According to Linda Drake, president of the National Verbatim Reporters Association (NVRA), "In the judicial and deposition setting, it is imperative to have a person as the custodian of the record: to take the record, prepare the transcript, certify its authenticity, and be responsible as the ultimate authority for what transpired during the proceeding." The custodian’s role extends to marking things the tape recorder cannot capture (e.g., a nodding assent, when parties to the proceedings enter or leave the room, the times when they went off or on the record), spoken material that unattended recordings cannot be relied upon to capture (e.g., mumbling witnesses, attorneys shouting over each other, speakers positioned far from the microphone, and those whose backs are turned to the microphone), and identifying all of the voices on the tape. "Voice writers may spend whole days taking testimony.

"The voice track that I make is my record, not the actual record or digital recording," explains Jennifer Smith, NVRA’s president-elect. "It can be days after the proceeding took place before we can finalize a given transcript, so there would be no way to keep all of that straight with only an un-annotated recording as my source."

How does the technology used differ from Dragon NaturallySpeaking or ViaVoice?

Voice writers need a speech silencer (mask), a laptop with a backup digital recorder and microphone, USB processors to carry the voice to the screen, and post-transcription production capabilities to create a final transcript. The mask eliminates background noise and muffles the voice-writer’s voice so it doesn’t interfere with the proceedings. ASR isn’t necessary unless a real-time transcript is generated, although a growing number of voice writers use it for all their work.

How can you keep up with everything that’s said plus the annotations?

"Focus is everything," explains Drake. "The words go through my ears to my brain and out of my mouth, and I listen to the phrasing so that I know where to insert punctuation." Smith agrees. "There have been times in a deposition where the attorney has asked ME a question, but I don't realize it until the room gets really quiet and I then realize everyone is looking at me!"

The process isn’t as easy as Smith and Drake make it sound. Voice writing requires intense concentration. You must learn to breathe properly while using a mask and to become proficient at letting the information flow directly from your ear to your mouth. I tried it and after five minutes of voice writing just one speaker, I was exhausted.

Voice writers use macros called "voice briefs" as short-cuts for frequently-occurring phrases. For example, saying "fed-mack" to generate "Federal Bureau of Investigation" and "med-cert" to produce "to a reasonable degree of medical certainty." They also employ special models for speech faster than 200 words-per-minute (wpm). Although the highest level of certification is at 300 wpm, the NVRA holds an annual competition that includes courtroom-like dialogues that move as quickly as 350 wpm. "Not all legal proceedings are that fast-paced," explains Smith "But, you often have attorneys and witnesses talking over one another. The court reporter still must take down every word spoken—even words spoken simultaneously."

How widespread is speech recognition-based voice writing?

Most voice writers are based in the United States, but interest in voice writing, including the use of ASR, is increasing globally. One indication of this is that Intersteno (International Federation for Information Processing) invited NVRA to become a member. Intersteno covers those who use pen shorthand, stenotype, keyboard, and voice. According to Smith, "Even though improvements in the capabilities of the speech engines are needed to accommodate the speed and nuance of language, voice writing and ASR are at the forefront of everyone's minds these days. What an exciting time to be a voice writer!"

Judith Markowitz is the technology editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Listening to the Court of Appeals

Gladia Launches Solaria, a Multilingual Speech-to-Text Model

aiOla Launches Jargonic Speech Recognition Model

Northeastern Researchers Develop AI App to Help Speech-Impaired

Amazon Launches Nova Sonic, a Gen AI Model for Building Voice Applications and Agents