Data modelling in corpus linguistics: How low may we go?

详细信息查看全文

作者：Marjolein H. van Velzen^a ; ¹ ; ^{marjoleinvvelzen@gmail.com" class="auth_mail} ; Luca Nanetti^a ; ¹ ; Peter P. de Deyn^a ; ^b ; ^c
关键词：Akaike Information Criterion ; Probable Alzheimer's Disease ; Linguistic analysis ; Linguistic profiling ; Data modelling
刊名：Cortex
出版年：June 2014
年：2014
卷：55
期：Complete
页码：192-201
全文大小：1573 K

文摘

Corpus linguistics allows researchers to process millions of words. However, the more words we analyse, i.e., the more data we acquire, the more urgent the call for correct data interpretation becomes. In recent years, a number of studies saw the light attempting to profile some prolific authors' linguistic decline, linking this decline to pathological conditions such as Alzheimer's Disease (AD). However, in line with the nature of the (literary) work that was analysed, numbers alone do not suffice to ‘tell the story’. The one and only objective of using statistical methods for the analysis of research data is to tell a story – what happened, when, and how.

In the present study we describe a computerised but individualised approach to linguistic analysis – we propose a unifying approach, with firm grounds in Information Theory, that, independently from the specific parameter being investigated, guarantees to produce a robust model of the temporal dynamics of an author's linguistic richness over his or her lifetime. We applied this methodology to six renowned authors with an active writing life of four decades or more: Iris Murdoch, Gerard Reve, Hugo Claus, Agatha Christie, P.D. James, and Harry Mulisch. The first three were diagnosed with probable Alzheimer Disease, confirmed post-mortem for Iris Murdoch; this same condition was hypothesized for Agatha Christie. Our analysis reveals different evolutive patterns of lexical richness, in turn plausibly correlated with the authors' different conditions.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700