User-generated content (UGC) in online English dictionaries

Lew, RobertUser-generated content (UGC) in online English dictionariesMy University2013user-generated contentUGCuser-created contentUCCbottom-up lexicographydictionariesonline dictionariesEnglishWeb 2.0New mediaWiktionaryUrban DictionaryWordnikWikipediaMy UniversityMy University2013-03-082013-03-082013-03-08enPreprinthttp://hdl.handle.net/10593/5011Preprint of an artcile due to appear in 2013 in: OPAL - Online publizierte Arbeiten zur Linguistik. (Issue edited by Annette Klosa and Andrea Abel.)In what is referred to as Web 2.0 — the next interactive stage in internet experience — web users are no longer passive recipients of packaged content. Increasingly, users actively contribute to the creation and provision of self-made content. This means also that their social roles become blurred. In the context of online lexicography, the tendency translates into dictionary users becoming home-grown lexicographers. While a UGC-driven model, perhaps best known from Wikipedia, has been a resounding success in terms of encyclopedic content (factual data), it does not seem to translate well into lexicographic endeavours, such as Wiktionary. Wikipedia revolves on the principle that somewhere out there in the world there are experts on every little bit of knowledge willing to give of their time to freely share their expertise with other people. While this model works surprisingly well for the reporting of encyclopedic facts, it is much less robust when it comes to the job of describing words and expressions of a language: their meaning, pronunciation, morphology, syntax, word combination (collocation and colligation), and usage. To put it simply, while it makes good sense that somewhere out there there is an expert on a piece of specialized knowledge (say, a rare species of nettle) who is willing to share their expertise with the world, there is normally no such expert on the meanings of a particular everyday word (say, the word field), who would be capable of teasing apart the senses and providing a nuanced treatment of the many combinations and uses of the word. Rather, we could say that there are too many self-proclaimed language experts with a willingness to share, but their best efforts cannot match the output of professional, trained lexicographers. This is because the quality of lexicographic description heavily depends on a good grasp of lexicographic principles, procedures, and tools of the trade (such as skilled use of corpus data or structural markup). This is the situation for general language, though things do get a bit more blurred at the interface of encyclopedic and linguistic description: specialized vocabulary and terminology. Of the few prestigious online dictionaries for learners of English, Macmillan English Dictionary has been the boldest in embracing the UGC model: the dictionary site invites users to contribute entries to what it calls Open Dictionary. Presumably, the publisher believes in the importance of the sense of community generated through users being given a chance to, on the one hand, contribute their own entries, and, on the other hand, read entries edited by their peers. In this way, casual dictionary users may become more meaningfully (pun intended) involved. Dictionaries also experiment with user-generated content available elsewhere. A prime example of this is Wordnik (wordnik.com), which relies quite heavily on citations from Twitter and images from Flickr. I discuss the potential and dangers of using and reusing user-generated content and issues of quality control, with illustrations from current online dictionaries of English.