Measuring Keyness by S. Evert
On February 28th, 2023 our FAU MoD member, Prof. Dr. Stephanie Evert will give a keynote lecture on “Measuring Keyness” at the “Workshop: Using and Developing Software for Keyness Analysis” organised within the scope of the project “Zeta and Company” at Trier University (Germany).
Abstract. In corpus linguistics, the notion of keywords refers to words (and sometimes also multiword units, semantic categories or lexico-grammatical constructions) that “occur with unusual frequency in a given text” (Scott 1997: 236) or a text collection, i.e. a corpus. Keywords are deemed to represent the characteristic vocabulary of the target text or corpus and thus have many applications in corpus linguistics, digital humanities and computational social science. They can capture the aboutness of a text, the terminology of a text genre or technical domain, important aspects of literary style, linguistic and cultural differences, etc.; they give insight into historical perspectives and provide a basis for measuring the similarity of text collections. Keywords are also an important starting point for corpus-based discourse analysis, where manually formed clusters of keywords represent central topics, actors, metaphors, and framings. Since this process is guided from the outset by human understanding, it provides a more interpretable alternative to topic models in hermeneutic text analysis.
Keywords are usually operationalised in terms of a statistical frequency comparison between the target corpus and a reference corpus. Different research questions can be addressed depending on the particular constellation of target T and reference R, e.g. (i) T = a single text vs. R = a text collection (➞ aboutness), (ii) T and R = collections of articles on the same topic in left-leaning and right-leaning newspapers (➞ contrastive framings), or (iii) T = texts from a given domain or genre vs. R = a large general-language reference corpus (➞ terminology).
Although keyword analysis is a well-established approach and has been implemented in many standard corpus-linguistic software tools such as WordSmith, AntConc0, SketchEngine, and CQPweb, it is still unclear what the “right” way of measuring keyness is. In this talk, I will discuss the different operationalisations of keyword analysis and survey widely-used keyness measures. I will show how the mathematical differences between such measures can be understood intuitively with the help of a topographic map visualisation. I will address the difficulties of evaluating keyness measures and present a comparative evaluation on the task of corpus-based discourse analysis. Finally, I will summarise open questions and problems and speculate on directions that future research may take.
On-site / online