Word embeddings are biased: But whose bias are they reflecting?

D. Petreski*, I.C. Hashim

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

From Curriculum Vitae parsing to web search and recommendation systems, Word2Vec and other word embedding techniques have an increasing presence in everyday interactions in human society. Biases, such as gender bias, have been thoroughly researched and evidenced to be present in word embeddings. Most of the research focuses on discovering and mitigating gender bias within the frames of the vector space itself. Nevertheless, whose bias is reflected in word embeddings has not yet been investigated. Besides discovering and mitigating gender bias, it is also important to examine whether a feminine or a masculine-centric view is represented in the biases of word embeddings. This way, we will not only gain more insight into the origins of the before mentioned biases, but also present a novel approach to investigating biases in Natural Language Processing systems. Based on previous research in the social sciences and gender studies, we hypothesize that masculine-centric, otherwise known as androcentric, biases are dominant in word embeddings. To test this hypothesis we used the largest English word association test data set publicly available. We compare the distance of the responses of male and female participants to cue words in a word embedding vector space. We found that the word embedding is biased towards a masculine-centric viewpoint, predominantly reflecting the worldviews of the male participants in the word association test data set. Therefore, by conducting this research, we aimed to unravel another layer of bias to be considered when examining fairness in algorithms.
Original languageEnglish
Pages (from-to)975-982
Number of pages8
JournalAI and Society
Volume38
Early online date26 May 2022
DOIs
Publication statusPublished - Apr 2023

Keywords

  • Word embeddings
  • Androcentrism
  • Gender bias
  • Word association test

Cite this