JOURNAL ARTICLE

Review: Hundt, Nesselhauf and Biewer (eds, 2006) Corpus Linguistics and the Web. Amsterdam/New York: Rodopi

Marina Santini

Year: 2009 Journal:   Corpora Vol: 4 (2)Pages: 209-211   Publisher: Edinburgh University Press

Abstract

Corpus Linguistics and the Web is an edited collection of articles from papers presented at the 2004 symposium ‘Corpus Linguistics – Perspectives for the Future’ held in Heidelberg in 2004, and articles commissioned from leading scholars (p. 4). The book is a comprehensive, insightful and well-structured compendium of advantages and disadvantages of using web data for linguistic description and corpus compilation. The main message conveyed by the book, as a whole, is that traditional corpora and web data can complement each other. The book is a good resource for corpus linguists who find traditional corpora too small, or not sufficiently representative, for their research. It can also be useful for computational linguists and information scientists who are interested in linguistic and textual features. The book begins with a short introduction written by the three editors, Hundt, Nesselhauf and Biewer, that summarises the main issues and perspectives. The volume is divided into four parts, each containing a variable number of articles. The first part, ‘Accessing the web as corpus’, describes the benefits and the pitfalls of using data from the web. On the one hand, shortcomings, such as the impossibility of replication and the absence of meta-data (Ludelink et al.; and Fletcher), must be kept in mind when assessing findings from web data. On the other hand, the richness and freshness of web material (Fletcher; and Renouf et al.) seem to outweigh the downside, and encourage the development of web-as-corpus applications or other such initiatives (e.g., WaCky, WebKWIC and WebCorp). One major drawback of the web-as-corpus approach is, however, the reliance on commercial search engines (like Google) that have very rough linguistic sensibilities. These decide the relevant pages for a search using opaque criteria, and so require tedious refinements of the results that are returned. The second part, ‘Compiling corpora from the internet’, focusses on the construction of corpora from the web – an unrivalled textual reservoir in terms of size and new genres or registers. For instance, Hoffmann takes advantage of the plenitude of publicly available CNN transcripts in order to create a specialised corpus of spoken English. Similarly, Claridge builds a corpus of public message board postings and examines how interaction and stance markers are distributed in this genre of computer-mediated

Keywords:
Corpus linguistics Computer science Compendium Linguistics Computational linguistics Complement (music) World Wide Web Applied linguistics Impossibility Artificial intelligence Natural language processing Political science Philosophy

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.07
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Lexicography and Language Studies
Social Sciences →  Arts and Humanities →  Language and Linguistics

Related Documents

JOURNAL ARTICLE

Corpus Linguistics and the Web. * Marianne Hundt, Nadja Nesselhauf and Carolin Biewer (eds).

Koen Luyckx

Journal:   Literary and Linguistic Computing Year: 2007 Vol: 23 (2)Pages: 246-248
JOURNAL ARTICLE

Review of Hundt, Nesselhauf & Biewer ((2007)): Corpus Linguistics and the Web

Hans Lindquist

Journal:   International Journal of Corpus Linguistics Year: 2008 Vol: 13 (4)Pages: 551-563
JOURNAL ARTICLE

Review: Crawford and Csomay (2016). Doing Corpus Linguistics

Ge Lan

Journal:   Corpora Year: 2017 Vol: 12 (2)Pages: 307-309
JOURNAL ARTICLE

Review: Facchinetti (ed., 2007) Corpus Linguistics 25 Years On. Amsterdam: Rodopi

Eric Friginal

Journal:   Corpora Year: 2009 Vol: 4 (1)Pages: 111-114
© 2026 ScienceGate Book Chapters — All rights reserved.