DISSERTATION

EXPLORING GENDER BIAS IN LARGE LANGUAGE MODELS

Abstract

Large language models (LLMs) play a significant role in modern human-computer interaction. They have exploded in popularity recently, becoming widely used for various tasks. However, concerns persist regarding potential biases within these models. This project investigates gender bias in the popular LLMs - GPT-3.5, GPT-4, Gemini, and LLAMA. The first part of our study focuses on analyzing biases using ambiguous sentences across three languages - English, Malayalam, and Tamil. We evaluate the LLMs to see if they associate occupations with commonly held gender stereotypes, by using specific professions within our test sentences. Through the use of two low-resource languages, this expands upon prior research conducted exclusively in English. We examine differences between the biases in the three languages in LLMs. In the second part of our study, we generate letters of evaluation for various professions, for both genders. We use GPT-3.5, Gemini and Llama to generate these letters. We then analyze the generated content for differences between the two languages, examining factors like word count, vocabulary count, lexical diversity, readability and lexical content. In addition, we generate personalities for good and bad employees and ask the LLMs to write letters of evaluation for these employees while assigning them a gender. Our findings for part one suggest that strong gender biases exist in all the LLMs in all three languages. In our results for part two, we find differences in the lexical content for males and females. The findings also suggest the LLMs assign the male gender to bad employees more often than to good employees. With the help of this study, we can better understand the biases in large language models and be better equipped to use AI in a way that mitigates bias.

Keywords:
Computer science Linguistics Natural language processing Psychology Philosophy

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Hate Speech and Cyberbullying Detection
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.