Multilingual access to large spoken archives

Douglas W. Oard

doi:10.3115/1067807.1067809

ScienceGate Book Chapters

JOURNAL ARTICLE

Multilingual access to large spoken archives

Douglas W. Oard

Year: 2003 Vol: 1 Pages: 1-1

DOI: 10.3115/1067807.1067809

Get Full-Text PDF Get Analytical Report

Abstract

Spoken word collections promise access to unique and compelling content, and most of the technology needed to realize that promise is now in place. Decreasing storage costs, increasing network capacity, and the availability of software to encode and exchange digital audio make possible physical access to spoken word collections at a previously unimaginable scale. Effective support for intellectual access --- the problem of finding what you are looking for --- is much more challenging, however. In this talk I will briefly describe work that has been done on this problem at the Text Retrieval Conferences, the Topic Detection and Tracking evaluations, and in individual research projects around the world. I will then describe a unique resource, a collection of 116,000 hours of oral history interviews recorded in 32 languages in 57 countries that has been assembled by the Survivors of the Shoah Visual History Foundation. Nearly 10,000 hours of this audio has been manually segmented, summarized and indexed, making this an unrivaled resource with which we can explore a broad array of data-driven techniques. My main focus will be to explain how we are leveraging this exceptional resource to develop the ability to index similar materials automatically.

Keywords:

Computer science Resource (disambiguation) Focus (optics) World Wide Web Digital library ENCODE Index (typography) Data access Multimedia Information retrieval Data science Database Linguistics

Metrics

Cited By

0.38

FWCI (Field Weighted Citation Impact)

Refs

0.71

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Multilingual access to large spoken archives

Abstract

Metrics

Topics

Related Documents

Access to large spoken archives: Uses and technology. Sponsored by SIG VIS

Disclosing Spoken Culture: User Interfaces for Access to Spoken Word Archives

Situated Interaction in a Multilingual Spoken Information Access Framework

Robust named entity extraction from large spoken archives

Automated transcription and topic segmentation of large spoken archives