SpeechWeb

A SpeechWeb is a collection of hyperlinked speech applications, accessed remotely by speech browsers running on end-user devices. Links are activated through spoken commands.

History

The idea of surfing the web by voice dates back to at least the work of Hemphill and Thrift in 1995 ^[1] who developed a system in which, HTML pages were downloaded and processed on client-side computers enabling voice access to web page content, and activation of hyperlinks through spoken commands.

Also in the mid 1990s, researchers at AT&T were discussing the development of a new markup language that would enable the web to be accessed through regular phones. From 1995 to 1999, AT&T, Lucent, Motorola, and IBM all developed their own versions of phone and speech markup languages. These companies created the VoiceXML Forum, and jointly designed the Voice Markup Language, VXML, which was accepted by the W3C Committee in 2000. VXML is typically used to create hyperlinked speech applications.^[2] VXML pages include commands for prompting user speech input, invoking recognition grammars, outputting synthesized voice, iterating through blocks of code, calling local JavaScript, and hyperlinking to other remote VXML pages downloaded in a manner similar to the linking of HTML pages in the conventional Web.

Around the same time as the emergence of VXML, a research group at the University of Windsor in Canada were developing an alternative approach, in which speech applications deployed on the web can be accessed by client-side speech browsers which provide the speech-recognition capability, that is tailored to the application by downloading an application-specific recognition grammar from the remote speech application web site. Input that is recognized by the client-side browser is sent to the remote server which processes it and returns a text result to the browsers for output as synthesized voice. The term SpeechWeb was used, in 1999,^[3] to describe the collection of hyperlinked speech applications in this architecture . The first SpeechWeb browser was demonstrated at the AAAI Sixteenth National Conference on Artificial Intelligence.^[4]

The term "speechweb" has also been used, since the 1990s, in a different context to describe a web based network of information on speech, language and speech-language pathology. In addition, it was also hoped to provide a meeting place for professionals and those who have been affected by communication disorders. The term "speechWeb" has been trademarked by the company PipeBeach, which is now owned by HP, and refers to a software product which bridges telephone networks and conventional web servers.

In 2005, it was recognized that very few voice applications were available to the public through the Internet, despite the maturity of VXML at that time. It was also observed that nearly all VXML applications that were available had been constructed by people working in commerce and industry. This was in stark contrast to the huge growth of the conventional web, and the huge involvement of the public in the development of regular web pages, only a few years after the development of HTML. This observation led to the call for a Public-Domain SpeechWeb ^[5] which is accessible to the public through existing web browsers (with speech plugins) and which contains hyperlinked speech applications that are created and deployed by the public in a manner that is analogous to the creation and deployment of HTML pages on the conventional web. A browser for the Public-Domain SpeechWeb was demonstrated at the 16th International World Wide Web Conference, held in Banff, Canada in 2007.^[6] The browser is a small X+V page which is executed by the freely available Opera with the free IBM speech-recognition plugin.

Research groups

Two research groups are developing software to facilitate the construction and deployment of SpeechWeb applications by non-experts:

The "MySpeechWeb" research group at the University of Windsor has developed documentation and software to facilitate for people who want to access and/or create SpeechWeb applications. The group has also created a prototype Public-Domain SpeechWeb containing examples of speech applications which are available through a portal.
The "w3voice skeleton" research group at the Auditory Media Laboratory, Wakayama University in Japan has created software that facilitates the construction and deployment of speech applications for the Japanese language.

References

^ Hemphill, C.T. and Thrift, P. R. "Surfing the Web by Voice " Proceedings of the third ACM International Multimedia Conference (San Francisco 1995), Year: 1995, Pages: 215 – 222.
^ Lucas, B."VoiceXML for Web-based distributed conversational applications." Commun. ACM 43, 9, Year: 2000, Pages: 53 – 57.
^ Frost, R. A. and Chitte, S. "A New Approach for Providing Natural-Language Speech Access to Large Knowledge Bases" Proc. of PACLING ’99, The Conference of the Pacific Association for Computational Linguistics, University of Waterloo, Ontario, Canada Year: 1999, Pages: 82 – 90.
^ Frost, R. A. "A Natural-Language Speech Interface Constructed Entirely as a Set of Executable Specifications." Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, USA. Year: 1999, Pages: 908 - 909.
^ Frost, R. A. "A call for a public-domain SpeechWeb." Commun. ACM 48, 11, Year: 2005, Pages: 45 – 49.
^ Frost, R. A., Ma, X. and Shi, Y. "A browser for a public-domain SpeechWeb." World Wide Web Conference, Banff, Canada Year: 2007, Pages: 1307–1308.

External links

MySpeechWeb - research group at the University of Windsor
Video demonstration of Public Domain SpeechWeb

[HemphillThrift1995-1] Hemphill, C.T. and Thrift, P. R. "Surfing the Web by Voice " Proceedings of the third ACM International Multimedia Conference (San Francisco 1995), Year: 1995, Pages: 215 – 222.

[Lucas2000-2] Lucas, B."VoiceXML for Web-based distributed conversational applications." Commun. ACM 43, 9, Year: 2000, Pages: 53 – 57.

[FrostChitte1999-3] Frost, R. A. and Chitte, S. "A New Approach for Providing Natural-Language Speech Access to Large Knowledge Bases" Proc. of PACLING ’99, The Conference of the Pacific Association for Computational Linguistics, University of Waterloo, Ontario, Canada Year: 1999, Pages: 82 – 90.

[Frost1999-4] Frost, R. A. "A Natural-Language Speech Interface Constructed Entirely as a Set of Executable Specifications." Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, Orlando, Florida, USA. Year: 1999, Pages: 908 - 909.

[Frost2005-5] Frost, R. A. "A call for a public-domain SpeechWeb." Commun. ACM 48, 11, Year: 2005, Pages: 45 – 49.

[FrostMaShi2007-6] Frost, R. A., Ma, X. and Shi, Y. "A browser for a public-domain SpeechWeb." World Wide Web Conference, Banff, Canada Year: 2007, Pages: 1307–1308.

[1]

[2]

[3]

[4]

[5]

[6]