|
(back to 2005 C.I.P.A. Winners)
Groupe TVA Inc.STDirect - Live closed-captioning of newscasts and speech recognition
Since 1995, the Canadian Radio-television and Telecommunications Commission, as part of its commitment to improving service for viewers who are deaf or hard of hearing, has stipulated specific captioning requirements as a condition for granting or renewing broadcast licences. To retain its broadcasting rights, Groupe TVA will need to provide 90 per cent of its televised broadcasts with closed captioning by 2008. Challenge Captioning relies on the use of subtitles that appear on the television screen as a written transcript of dialogue on a television program. "Open captioning" refers to that which can be accessed by all viewers, and "closed captioning" refers to that which is accessible only to those viewers using a television equipped with a computer chip to decode the appropriate signals embedded within the broadcast signal. "Pre-packaged captioning" is prepared in advance of broadcast, while "live" or "real-time captioning" occurs simultaneously with the broadcast images. Traditionally, Groupe TVA had employed closed-caption editing software to develop just pre-packaged broadcasts. This meant that deaf and hearing impaired viewers had subtitles for the packaged news only. Direct parts of a newscast were blank for them. Objective To adequately respond to the needs of Québec's 750,000 deaf and hard-of-hearing people, Groupe TVA would need to find a way to generate live closed-captioning. Live captioning usually requires the use of a stenographer who types the captions while the images unfold onscreen. Without the benefit of time and editing, the captions can suffer for accuracy, spelling errors, missed dialogue, and loss of content as the images on the screen can sometimes outpace the stenographer. As well, technical problems with the broadcast or captioning devices can compromise the ability to provide on-the-fly captioning. Groupe TVA discounted this option. In search of a better solution, it turned to the Centre de Recherche Informatique de Montréal (CRIM). With offices in Montreal and Quebec City, CRIM is a technology accelerator providing leading-edge services in research and development, software testing, security and other ground-breaking areas of technology, including voice recognition. CRIM proposed that a customized tool be developed in conjunction with RQST (Regroupement Québécois pour le Sous-Titrage) as the foundation for Groupe TVA's prospective STDirect (Sous-titrage en direct) live-captioning system. The project was awarded to CRIM in June 2002 and a budget of $500,000 was allocated to it. As its first order of business, the speech recognition team embarked on the monumental task of building an auto/semi-renewable dictionary by integrating the entire available collection of spoken language -- hundreds of millions of words taken from Quebec sources -- five years of TVA newscast archives, and dozens of hours of content transcribed by hand. Separate models were developed for specific-subject areas, including: global, national, regional, weather, traffic, culture and financial. The resulting dictionary contains more than 48,000 words. Solution During a live newscast, a dedicated announcer in the studio repeats the transcription information into the microphone of a specially equipped workstation. Software captures the announcer's voice, communicates with the speech recognition server and displays the closed captions as they are transmitted to TVA's broadcast equipment. The software also communicates with TVA's database, permitting it to follow the intended flow of the newscast as it is being broadcast. "Thanks to the close collaboration of the TVA teams, the system was adapted to the technological environment of the newsroom and integrated with the daily tasks of the existing closed-captioning employees," explains Serge Bellerose, senior vice-president, speciality channels and business development at Groupe TVA. Simply put, the solution is a technological breakthrough that not only satisfies TVA's requirements; it is already being tested for commercial uses elsewhere. Since completion of the STDirect project, CRIM has carried out successful tests of the technology at the House of Commons in Ottawa and the Legislative Assembly of British Columbia. "TVA, CRIM and RQST truly believed it was possible to develop such a system in the short term and that many would benefit from it," Bellerose remarks. "Without a doubt, it was team spirit that most contributed to the success of this project. During the past three years, everyone involved with it invested time, energy and a tremendous effort." Innovative Use of Technology The prototype software is divided into two primary functional blocks: a server (the speech recognition server) and a client. The server executes speech recognition based on the speaker's voice. The client presents an interface to the user to perform speech sampling, communicate with the INEWS server and control the operation of the speech recognition server. The communication protocol for the closed-captioning system is built to permit the simultaneous transmission of commands or data, and voice; all operating above the TCP/IP layer. Because the protocol is asynchronous, events may be sent or received by the client or the server. The workstation includes a Windows 2000 operating system with a network connection, permitting client access to the INEWS server and speech recognition server. The station also includes a USB communication port; required for microphone operation. The client station is equipped with a sound card with inline input in order to enable speaker recording and broadcast recording into separate files at the same time. The speech recognition server uses the Linux operating system and network connection to provide access to client stations. The speech recognition server has the following specifications: 2.8 GHz Pentium P4 central processing unit, 3Gb RAM with 19" monitor, HP DVD200i burner and enough disk space to enable usage of the system by the intended number of users. The speaker interface program was developed in the C++ builder environment under Windows 2000. The speech-recognition server was developed in C/C++ on a Linux platform using a GCC version 2.95.3 compiler. A 2005 CIPA Winner! For its exceptional and innovative application of information technology to solve real-world business problems and bring greater benefit to all its stakeholders, Groupe TVA Inc. has been awarded a 2005 CIPA Silver Award of Excellence in the Customer Care, For Profit category.
|