Sampling techniques in metalexicographic research
A careful examination of lexicographic papers reveals that sampling techniques are generally neglected by metalexicographers. Authors rarely document, still less discuss, the sampling schemes used. This is surprising in view of the fact that sampling is actually something many researchers do when they wish to make generalizations about the whole dictionary text, usually too large to be studied in its entirety. Not rarely samples consisting of one stretch only, usually selected judgmentally, are used to draw inferences about the whole dictionary text and serve as a basis for statistical analysis, which produces results of uncontrolled reliability. This study aims both at exposing the pitfalls of currently used sampling techniques and at proposing probability sampling instead. Two basic probability sampling schemes were examined: simple random an stratified selection of pages. Additionally, systematic sampling was evaluated empirically. Censuses based on three dictionaries, three characteristics examined in each one, confirmed my concerns regarding single-stretch sampling. Simple random selection of pages and systematic sampling produced, as expected, far more satisfying results in virtually all cases. This can be, however, bettered by stratification in case of entry-base characteristics in larger dictionaries. Mean number of entries per page, which constitutes a page-based characteristic in this study, did not benefit from stratification. The smallest of my dictionaries presented a range of problems mostly connected with stratified sampling. Furthermore, empirical evaluation of sampling techniques proposed in Coleman and Ogilvie (2009) demonstrated that randomization within strata is also crucial.
A full, unpublished version of a paper published under the same title in XIV EURALEX International Congress proceedings.
Sampling, Dictionaries, Lexicography, Metalexicography, Sample, Statistics