Machine-readable dictionaries provide the raw material from which to construct computationally useful representations of the generic vocabulary contained within it. Many sublanguages, however, are poorly represented in on-line dictionaries, if represented at all. Vocabularies geared to specialized domains are necessary for many applications, such as text categorization and information retrieval. In this paper I describe research devoted to developing techniques for building sublanguage lexicons via syntactic and statistical corpus analysis coupled with analytic techniques based on the tenets of a generative lexicon.
Takehito UtsuroYūji MatsumotoMakoto Nagao
Sabine Schulte im WaldeStefan Müller
Antoni OliverIrene Castellón MasallesLluı́s Màrquez