VLDB Journal Volume 13, Issue 1, January 2004, Pages 71-85
Cited By since 1996
This article has been cited
24
times
in Scopus:
(Showing the 2 most recent)
Shin, H.W.
,
Hovy, E.
,
McLeod, D.
The dynamic Web presentations with a generality model on the news domain
(2008)
Lecture Notes in Computer Science (including subseries Lecture Notes in
Artificial Intelligence and Lecture Notes in Bioinformatics)
Zhou, L.
,
Chaovalit, P.
Ontology-supported polarity mining
(2008)
Journal of the American Society for Information Science and Technology
a
Department of Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688, United States b
Department of Computer Science, University of Southern California, Los Angeles, CA 90088, United States c
Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, United States
Abstract
Technology in the field of digital media generates
huge amounts of nontextual information, audio, video, and images, along
with more familiar textual information. The potential for exchange and
retrieval of information is vast and daunting. The key problem in
achieving efficient and user-friendly retrieval is the development of a
search mechanism to guarantee delivery of minimal irrelevant
information (high precision) while insuring relevant information is not
overlooked (high recall). The traditional solution employs
keyword-based search. The only documents retrieved are those containing
user-specified keywords. But many documents convey desired semantic
information without containing these keywords. This limitation is
frequently addressed through query expansion mechanisms based on the
statistical co-occurrence of terms. Recall is increased, but at the
expense of deteriorating precision. One can overcome this problem by
indexing documents according to context and meaning rather than
keywords, although this requires a method of converting words to
meanings and the creation of a meaning-based index structure. We have
solved the problem of an index structure through the design and
implementation of a concept-based model using domain-dependent
ontologies. An ontology is a collection of concepts and their
interrelationships that provide an abstract view of an application
domain. With regard to converting words to meaning, the key issue is to
identify appropriate concepts that both describe and identify documents
as well as language employed in user requests. This paper describes an
automatic mechanism for selecting these concepts. An important novelty
is a scalable disambiguation algorithm that prunes irrelevant concepts
and allows relevant ones to associate with documents and participate in
query generation. We also propose an automatic query expansion
mechanism that deals with user requests expressed in natural language.
This mechanism generates database queries with appropriate and relevant
expansion through knowledge encoded in ontology form. Focusing on audio
data, we have constructed a demonstration prototype. We have
experimentally and analytically shown that our model, compared to
keyword search, achieves a significantly higher degree of precision and
recall. The techniques employed can be applied to the problem of
information selection in all media types.