Tambov
All-Russian academic journal
“Issues of Cognitive Linguistics”

IDENTIFICATION OF GENDER OF AUTHORS OF WRITTEN TEXTS BASED ON QUANTITATIVE PARAMETERS: COGNITIVE APPROACH

IDENTIFICATION OF GENDER OF AUTHORS OF WRITTEN TEXTS BASED ON QUANTITATIVE PARAMETERS: COGNITIVE APPROACH


Author:  T.A. Litvinova, O.V. Zagorovskaya, P.V. Seredin

Affiliation:  National Research Centre Kurchatov Institute

Abstract:  Text is a product of speech and cognition and thus is reflective of mental activities of its authors as well as their gender. The article looks into the identification of gender of authors of written texts using the analysis of quantitative parameters typical of any text regardless of the genre and subject matter and impossible to consciously control as well as imitate, which is of tremendous importance for forensic linguistics in particular. The authors show that there are a lot of data collected by researchers in this country regarding differences between texts by males and females, but there has not been much effort made to study the identification of gender of a text’s author. We set forth a fairly accurate approach that enables the identification of gender of authors of Russian texts. It employs a representative text corpus, mathematical statistics methods, automatic text processing. Using it, we were able to identify a number of features of cognitive activities of males and females which are indicated by quantitative parameters of their texts. This approach is shown to become more effective if a wider range of linguistic parameters are used and features indicative of gender are looked at considering individual traits of authors (feminity/masculinity, lateral brain organization, etc.) to provide a more complete insight into gender-specific cognitive features of authors of texts.

Keywords:  corpus, corpus linguistics, gender, authorship profiling, mathematical linguistics.

References:  Boldyrev N.N. Ocenochnaya metaprezenta-ciya: problemy izucheniya i opisaniya // Kognitivnye issledovaniya yazyka. 2009. Vyp. 5. S. 43-51.
Gomon T. V. Issledovanie dokumentov s deformirovannoj vnutrennej strukturoj: avtoref. dis. ... kand. yuridicheskix nauk. M., 1990.
Goroshko E.I. Osobennosti muzhskogo i zhenskogo verbal'nogo povedeniya (psixolingvisticheskij analiz): dis. ... kand. filol. nauk, M., 1996.
Engalychev V.F., Belyanin V.P., Konstantinova E.S., Oshhepkova E.S. Psixolingvisticheskie osobennosti «muzhskogo» i «zhenskogo» yazykov // Trudy regional'nogo konkursa nauchnyx proektov v oblasti gumanitarnyx nauk. Kaluga, 2001. S. 177-187.
Ermolova E.I. Problema opredeleniya pola i vozrasta avtora anonimnogo dokumenta po priznakam pis'mennoj rechi // E'kspert-kriminalist. 2008. № 4. S. 16-18.
Zagorovskaya O.V., Litvinova T.A., Litvinova O.A. E'lektronnyj korpus studencheskix e'sse na russkom yazyke i ego vozmozhnosti dlya sovremennyx gumanitarnyx issledovanij // Mir nauki, kul'tury i obrazovaniya. 2012. № 3 (34). S. 387-389.
Kirillina A.V. Gendernye aspekty yazyka i kommunikacii: dis. ... d-ra filol. nauk. M., 2000.
Litvinova T.A. Profilirovanie avtora pis'mennogo teksta // Yazyk i kul'tura. 2013a. № 3 (23). S. 64-72.
Litvinova T.A. Formal'no-grammaticheskie korrelyaty lichnostnyx osobennostej avtora pis'mennogo teksta // Filologicheskie nauki. Voprosy teorii i praktiki. 2013b. № 12 (30). Ch. 1. S. 132-135.
Litvinova T.A., Litvinova O.A. Identifikaciya i diagnostirovanie lichnosti avtora pis'mennogo teksta. Voronezh, 2015.
Lyashevskaya O.N. , Sharov S.A. Chastotnyj slovar' sovremennogo russkogo yazyka (na materialax Nacional'nogo korpusa russkogo yazyka). M., 2009.
Oshhepkova E.S. Identifikaciya pola avtora po pis'mennomu tekstu (leksiko-grammaticheskij aspekt): dis. ... kand. filol. nauk. M., 2003.
Rezanova Z.I., Romanov A.S., Meshheryakov R.V. Zadachi avtorskoj atribucii teksta v aspekte gendernoj prinadlezhnosti // Vestnik Tomskogo gosudarstvennogo universiteta. 2013. № 370. S. 24-28.
Rekomendacii Mezhdunarodnoj nauchno-prakticheskoj konferencii «Teoriya i praktika sudebnoj e'kspertizy i kriminalistiki». Xar'kov, 2002.
Antonio M. G., Javier C. M. Function words in authorship attribution studies // Literary and Linguistic Computing. 2007. Vol. 22 (1). P. 49-66.
Argamon S., Koppel M., Pennebaker J., Schler J. Automatically profiling the author of an anonymous text // Communications of the ACM. 2009. Vol. 52 (2). P. 119-123.
Blum D. Sex on the Brain: The Biological Differences between Men and Women. Viking Press, 1997.
Heylighen F., Dewaele J. Variation in the contextuality of language: an empirical measure // Foundations of Science. 2002. Vol. 6. P. 293-340.
Kimura D. Sex and cognition. Cambridge, 2000.
Koppel M., Argamon S., Shimoni A. Automatically categorizing written texts by author gender // Literary and Linguistic Computing. Vol. 17 (4), November 2002. P. 401-412.
Litvinova T.A., Seredin P.V., Litvinova O.A. Using Part-of-Speech Sequences Frequencies in a Text to Predict Author Personality: a Corpus Study // Indian Journal of Science and Technology, 2015, Vol. 8, No 9 [S. l]. P. 93-97.
Miller D. I., Halpern D. F. The new science of cognitive sex differences // Trends Cogn Sci. 2014. Jan; 18 (1). P. 37-45.
Mulac A., Lundell T.L. Effects of gender-linked language differences in adults’ written discourse: Multivariate tests of language effects // Language & Communication. 1994. Vol. 14 (3). P. 299-309.
Newman L.M., Groom C.J., Handelman L.D., Pennebaker J.W. Gender differences in language use: An analysis of 14,000 text samples // Discourse Processes. 2008. Vol. 45 (3). P. 211-236.
Nini A. Authorship Profiling in a Forensic Context. PhD thesis. Aston Uni, 2014.
Rangel F. et al. Overview of the 3rd Author Profiling Task at PAN 2015 // CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France.
Saily T., Siirtola H., Nevalainen T. Variation in noun and pronoun frequencies in a sociohistorical corpus of English // Literary and Linguistic Computing. 2011. Vol. 26 (2).
Schler J., Koppel M., Argamon S., Pennebaker J. Effects of Age and Gender on Blogging // Proc. of AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs. March 2006.

Pages:  51-59

Back to the list



Login:
Password: