Spoken Corpus Comes To Life
A The compiling of dictionaries has been historically the provenance of studious professorial types - usually bespectacled - who love to pore over weighty tomes and make pronouncements on the finer nuances of meaning. They were probably good at crosswords and definitely knew a lot of words, but the image was always rather dry and dusty. The latest technology, and simple technology at that, is revolutionising the content of dictionaries and the way they are put together.
B For the first time, dictionary publishers are incorporating real, spoken English into their data. It gives lexicographers (people who write dictionaries) access to a more vibrant, up-to-date vernacular language which has never really been studied before. In one project, 150 volunteers each agreed to discreetly tie a Walkman recorder to their waist and leave it running for anything up to two weeks. Every conversation they had was recorded. When the data was collected, the length of tapes was 35 times the depth of the Atlantic Ocean. Teams of audio typists transcribed the tapes to produce a computerised database of ten million words.
C This has been the basis - along with an existing written corpus - for the Language Activator dictionary, described by lexicographer Professor Randolph Quirk as “the book the world has been waiting for”. It shows advanced foreign learners of English how the language is really used. In the dictionary, key words such as “eat” are followed by related phrases such as “wolf down” or “be a picky eater”, allowing the student to choose the appropriate phrase.
D “This kind of research would be impossible without computers,” said Delia Summers, a director of dictionaries. “It has transformed the way lexicographers work. If you look at the word “like”, you may intuitively think that the first and most frequent meaning is the verb, as in “I like swimming”. It is not. It is the preposition, as in: “she walked like a duck”. Just because a word or phrase is used doesn’t mean it ends up in a dictionary. The sifting out process is as vital as ever. But the database does allow lexicographers to search for a word and find out how frequently it is used - something that could only be guessed at intuitively before.
E Researchers have found that written English works in a very different way to spoken English. The phrase “say what you like” literally means “feel free to say anything you want”, but in reality it is used, evidence shows, by someone to prevent the other person voicing disagreement. The phrase “it”s a question of crops up on the database over and over again. It has nothing to do with enquiry, but it’s one of the most frequent English phrases which has never been in a language learner’s dictionary before: it is now.
F The Spoken Corpus computer shows how inventive and humorous people are when they are using language by twisting familiar phrases for effect. It also reveals the power of the pauses and noises we use to play for time, convey emotion, doubt and irony.
G For the moment, those benefiting most from the Spoken Corpus are foreign learners. “Computers allow lexicographers to search quickly through more examples of real English,” said Professor Geoffrey Leech of Lancaster University. “They allow dictionaries to be more accurate and give a feel for how language is being used.” The Spoken Corpus is part of the larger British National Corpus, an initiative carried out by several groups involved in the production of language learning materials: publishers, universities and the British Library.