R<scp>eview</scp>: Hundt, Nesselhauf and Biewer (eds, 2006) <i>Corpus Linguistics and the Web</i>. Amsterdam/New York: Rodopi

Marina Santini

doi:10.3366/e1749503209000318

ScienceGate Book Chapters

JOURNAL ARTICLE

Review: Hundt, Nesselhauf and Biewer (eds, 2006) Corpus Linguistics and the Web. Amsterdam/New York: Rodopi

Marina Santini

Year: 2009 Journal: Corpora Vol: 4 (2)Pages: 209-211 Publisher: Edinburgh University Press

DOI: 10.3366/e1749503209000318

Get Full-Text PDF Get Analytical Report

Abstract

Corpus Linguistics and the Web is an edited collection of articles from papers presented at the 2004 symposium ‘Corpus Linguistics – Perspectives for the Future’ held in Heidelberg in 2004, and articles commissioned from leading scholars (p. 4). The book is a comprehensive, insightful and well-structured compendium of advantages and disadvantages of using web data for linguistic description and corpus compilation. The main message conveyed by the book, as a whole, is that traditional corpora and web data can complement each other. The book is a good resource for corpus linguists who find traditional corpora too small, or not sufficiently representative, for their research. It can also be useful for computational linguists and information scientists who are interested in linguistic and textual features. The book begins with a short introduction written by the three editors, Hundt, Nesselhauf and Biewer, that summarises the main issues and perspectives. The volume is divided into four parts, each containing a variable number of articles. The first part, ‘Accessing the web as corpus’, describes the benefits and the pitfalls of using data from the web. On the one hand, shortcomings, such as the impossibility of replication and the absence of meta-data (Ludelink et al.; and Fletcher), must be kept in mind when assessing findings from web data. On the other hand, the richness and freshness of web material (Fletcher; and Renouf et al.) seem to outweigh the downside, and encourage the development of web-as-corpus applications or other such initiatives (e.g., WaCky, WebKWIC and WebCorp). One major drawback of the web-as-corpus approach is, however, the reliance on commercial search engines (like Google) that have very rough linguistic sensibilities. These decide the relevant pages for a search using opaque criteria, and so require tedious refinements of the results that are returned. The second part, ‘Compiling corpora from the internet’, focusses on the construction of corpora from the web – an unrivalled textual reservoir in terms of size and new genres or registers. For instance, Hoffmann takes advantage of the plenitude of publicly available CNN transcripts in order to create a specialised corpus of spoken English. Similarly, Claridge builds a corpus of public message board postings and examines how interaction and stance markers are distributed in this genre of computer-mediated

Keywords:

Corpus linguistics Computer science Compendium Linguistics Computational linguistics Complement (music) World Wide Web Applied linguistics Impossibility Artificial intelligence Natural language processing Political science Philosophy

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.07

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Lexicography and Language Studies

Social Sciences → Arts and Humanities → Language and Linguistics

Review: Hundt, Nesselhauf and Biewer (eds, 2006) Corpus Linguistics and the Web. Amsterdam/New York: Rodopi

Abstract

Metrics

Topics

Related Documents

Corpus Linguistics and the Web. * Marianne Hundt, Nadja Nesselhauf and Carolin Biewer (eds).

Review of Hundt, Nesselhauf & Biewer ((2007)): Corpus Linguistics and the Web

Review: Barth and Schnell. 2022. Understanding Corpus Linguistics. New York: Routledge

Review: Crawford and Csomay (2016). Doing Corpus Linguistics

Review: Facchinetti (ed., 2007) Corpus Linguistics 25 Years On. Amsterdam: Rodopi