William Tucker, Meryl Glaser and Jason Penton
Consider making a telephone call to your mother. You pick up the phone, dial the number and talk to her about the weather, what you had for dinner, etc. Now imagine that you are one of the 4 million people in South Africa who are hearing impaired or Deaf. You are Deaf, cannot speak, but use sign language. Your mother is not Deaf. She can sign, too, but she lives on the other side of town, and you need to talk to her - now. You cannot simply pick up the phone and call Mom. You have to get someone that can talk to her on your behalf - someone who can relay your sign language in speech to Mom over the phone, and translate Mom's reply from the phone into sign language for you. In order to have a synchronous exchange with anyone on the Public Switched Telephone Network (PSTN), you must use a relay. There is an alternative, however - a Deaf Telephone.
In South Africa, Telkom has built a Deaf Telephone called the Teldem. The Teldem is a text telephone that converts keypad characters to tones using Baudot encoding, transmits the tones over a normal PSTN connection, and then converts the tones on the other end back into characters for display on another Teldem. Even though the Teldem has several major drawbacks, it is a usable device and almost 800 users in South Africa have them installed. They pay a modest R14/month rental in addition to the normal call charges. Because a Teldem can only exchange characters with another Teldem, that user community remains small. You can only Teldem to Mom if she also has one.
Other alternatives come to mind. What about using the Short Message Service (SMS) on a cellphone? There are about 9 million cellphones in South Africa, roughly double the number of South African PSTN landlines. You could SMS to Mom, but how will you know that she has received your SMS? SMS is fundamentally an asynchronous technology. Mom could reply via SMS instantly to your SMS, but if her cellphone is engaged, turned off or the battery is flat, you do not receive any feedback telling you that is the case. What if it is an emergency? SMS is just not synchronous - so it is not useful when you have to know that Mom is actually communicating with you.
You could also use the Internet with a Personal Computer (PC), and use e-mail or Instant Messaging (IM) instead of SMS. E-mail suffers from the same asynchronous disadvantages as SMS. IM has promise because you can tell if Mom is online or not. If she is, the volley of text messages is nearly instantaneous, despite inherent latency as the text travels across the Internet. But what if, like most South Africans, neither you nor Mom has a PC at home. In addition, South Africa still has one of the most expensive Internet access environments in the world. To make matters worse, because you are Deaf, you are likely to have very basic literacy skills, not to mention limited Computer Literacy.
There is another way. What if we could build a bridge between a Deaf user's Deaf Telephone and a normal PSTN or a cellular handset? A collaborative effort between an Audiologist at the University of Cape Town (UCT) and a pair of Computer Scientists at the University of the Western Cape (UWC) and Rhodes University has come up with a very interesting solution. Together, these researchers have designed and developed a prototype of an automated voice/text relay that uses the Internet, Text to Speech (TTS) and Speech to Text (STT) technologies and the PSTN (see diagram below). The prototype is called Telgo323 (Teldem Goes H.323, and here is how it works.
You, the Teldem user, initiate a call to your Mom by dialling into a Teldem Gateway. The gateway prompts you (via text) for your Mom's phone number and sets up the call to her over the PSTN via another Internet Protocol (IP)/PSTN gateway called the Telephone Gateway. Mom's phone rings and she picks up the phone. You type on the Teldem, and the Teldem Gateway decodes the Teldem's Baudot encoded character tones into ASCII characters. These characters are buffered until the Teldem user has finished typing the message, after which the text string is converted to speech using an open source TTS tool such as Festival. The speech is then sent to the PSTN user via the Telephone Gateway. As a result Mom essentially 'hears' your typed text. She replies, and the chain of events is reversed. The Telephone Gateway transforms Mom's speech into text with an STT tool, sends the text over the Internet to the Teldem Gateway and finally, that gateway encodes the text into tones that the Teldem can understand, and you read what Mom has said.
The Telgo323 idea was sparked from an interesting sequence of events. The first event involved Meryl Glaser, a Lecturer of Audiology at UCT and William Tucker, a Lecturer of Computer Science at UWC. Both Glaser and Tucker were members of a Telkom/Siemens/THRIP Centre of Excellence based in the Western Cape. Tucker heard Glaser present a talk on a Deaf community field trial of the Teldem at the South African Telecommunication Networks and Applications Conference (SATNAC) in 2000. At the time, Tucker was working with IP telephony and noted the similarities between Teldem texting and Internet-based chat. They soon began collaborating and at SATNAC 2001, they delivered a paper that mapped out a series of deaf telephony bridges, starting with a human relay and finishing with a fully automated relay. That talk inspired Jason Penton to build the Telgo323 prototype as part of his Computer Science Master's research at Rhodes University. His research focus is building H.323 applications. It is interesting to note that Tucker realised that one of Penton's other prototypes, a system that reads out email over the phone, is applicable to another disabled community - blind people.
The Telgo323 prototype currently works in only one direction - from you to Mom. The main reason is that TTS technology actually works pretty well right now, but STT is another matter. It is tough to train a STT tool over the phone, and the open source STT tools really do not work very well, even on a PC with a sophisticated sound card. That is why Telgo323 is designed to "plug and play" the STT tool so that as the technology improves, we can slot a new tool into the Telgo323 architecture. The generalised design also allows us to port the application to various domains. We are currently porting Telgo323 to the Session Initiation Protocol (SIP) at UWC.
The reason for the SIP port, TelgoSIP, is that we want to open up the accessibility to a range of voice users on the Internet. Because of the inherent capabilities of H.323 and SIP gateways (entities that bridge between the PSTN and Voice over IP worlds (VoIP)), Telgo323 and TelgoSIP will not only bridge between the Teldem and the PSTN, but also between the Teldem and any IP-situated "softphone". A softphone is a VoIP-enabled chat tool like Dialpad or OpenPhone. Terminating a VoIP call is still mostly illegal in South Africa. See the Department of Communications' website for more details. In the future, though, we expect VoIP to be legalised, and also to completely overtake and replace the PSTN as we know it. Therefore, the Telgo323 architecture not only scales to future advances in STT and TTS technologies, but it also scales to the ongoing convergence in the telecommunications industry.
This leads to another fundamental advantage of our Deaf Telephony bridge. In effect, if we can manage to bridge between a Deaf and a hearing user with this technology, we can also use it to bridge between text and voice users on the Internet and the PSTN, regardless of how or where they are connected. Theoretically, we could bridge between a cellphone SMS sender and a voice user on the Internet or the PSTN. Likewise, we could bridge a Personal Data Assistant (PDA) on a wireless Local Area Network (LAN) to an Internet-connected IM user using voice or text. We could also bridge IM to a voice or text user on a landline or cellphone (yes, landline phones can already support some degree of texting). The possibilities are endless, and each of these bridges need to be built and then tested in an actual user community.
The first bridge to trial is the Deaf Telephony bridge, Telgo323/SIP. However, we still need to establish and trial the manual bridge (human relay as described above) as a baseline for several reasons. The human relay call centres are already in place throughout the developed world, subsidised mostly by government and the relevant telco. As yet, Telkom is resistant to providing this service, even on a small scale for research purposes. We feel that STT technology is currently limited to domain-oriented vocabulary, e.g. weather and cities, and that Telgo323 requires a general purpose open vocabulary system that is just not feasible at the moment. Aside from the research aspect, there also needs to be market take-up. The 4 million hearing impaired/Deaf people in South Africa will not take up an automated system that does not work well (due to poor generalised STT) no matter how cheap (or free) it is. Therefore, our approach would be to utilise a human relay call centre in order to 1) establish a market, 2) assess how manual relay is used in order to incorporate requirements into the automated Telgo323/SIP system and to 3) use the human relay as a benchmark from which to measure the automated bridge.
The trial outcomes will obviously feed into the research and development cycles of the Deaf Telephony Bridge, and the bridges to follow. In the end, the goals are to find out if Mom and you are really going to use this system or not, and to learn how to build usable, scalable and billable bridges. The collaboration continues . . . . The work has been, and continues to be, partially sponsored by the Telkom/Siemens/THRIP Centre of Excellence in ATM and Broadband Networks and their Applications at UCT, the Telkom/DiData/Lucent/THRIP Centre of Excellence in Distributed Multimedia at Rhodes and the recently launched Telkom/Cisco/THRIP Centre of Excellence in IP and Internet Computing at UWC. A website with links to all of Telkom's Centres of Excellence can be found at www.botany.uwc.ac.za/coe