Systematic
Prejudices
An Investigation into Bias
Against Women and Girls in
Large Language Models
© metamorworks/Shutterstock.com
CLD 336_24
Cite as: UNESCO, IRCAI (2024). “Challenging systematic prejudices: an Investigation into Gender Bias in Large
Language Models”.
Published in 2024 by the United Nations Educational, Scientifi c and Cultural Organization (UNESCO), 7, place
de Fontenoy, 75007 Paris, France; International Research Centre on Artifi cial Intelligence (IRCAI) under the
auspices of UNESCO, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana.
© UNESCO / IRCAI 2024
CI/DIT/2024/GP/01
This study is available in Open Access under the Attribution-ShareAlike 3.0 IGO (CC-BY-SA 3.0 IGO) license
(https://creativecommons.org/licenses/by-sa/3.0/igo/). By using the content of this study, the users accept
to be bound by the terms of use of the UNESCO Open Access Repository (https://www.unesco.org/en/open-
access/cc-sa).
The designations employed and the presentation of material throughout this study do not imply the
expression of any opinion whatsoever on the part of UNESCO concerning the legal status of any country,
territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.
The ideas and opinions expressed in this study are those of the authors; they are not necessarily those of
UNESCO and do not commit the Organization.
Authored by: Daniel van Niekerk1, María Pérez-Ortiz1, John Shawe-Taylor1,2, Davor Orlič1,2, Ivana Drobnjak1,
Jackie Kay1, Noah Siegel1, Katherine Evans2, Nyalleng Moorosi3, Tina Eliassi-Rad4, Leonie Maria Tanczer1,
Wayne Holmes1,2, Marc Peter Deisenroth1, Isabel Straw1, Maria Fasli5, Rachel Adams6, Nuria Oliver7,
Dunja Mladenić2, Urvashi Aneja8, and Madeleine Janickyj1.
1 University College London
2 International Research Centre on Artifi cial Intelligence (IRCAI), under the auspices of UNESCO
3 Distributed AI Research Institute
4 Northeastern University
5 University of Essex
6 Research ICT Africa
7 ELLIS Alicante Foundation
8 Digital Futures Lab
This work has been possible with funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 952026 and No 101135800 for this project.
The authors Noah Siegel and Jackie Kay hold joint appointments at DeepMind and University College
London; however, the work presented in this report is exclusively within the scope of their roles at University
College London.
3
Executive summary
Articial intelligence is being adopted across industries at an unprecedented pace. Alongside its
posited benets, AI also presents serious risks to society, making the implementation of normative
frameworks to reduce these risks a global imperative. The UNESCO Recommendation on the Ethics
of AI asserts that AI actors should make all reasonable efforts to minimize and avoid reinforcing or
perpetuating discriminatory or biased applications and outcomes throughout the life cycle of the AI
system to ensure fairness of such systems”. To date however, AI-based systems often perpetuate
(and even scale and amplify) human, structural and social biases. These biases not only prove
dicult to mitigate, but may also lead to harm at the individual, collective, or societal level.
This study explores biases in three signicant large language models (LLMs): OpenAI’s GPT-2
and ChatGPT, along with Meta’s Llama 2, highlighting their role in both advanced decision-making
systems and as user-facing conversational agents. Across multiple studies, the brief reveals how
biases emerge in the text generated by LLMs, through gendered word associations, positive or
negative regard for gendered subjects, or diversity in text generated by gender and culture.
The research uncovers persistent social biases within these state-of-the-art language models,
despite ongoing efforts to mitigate such issues. The ndings underscore the critical need for
continuous research and policy intervention to address the biases that exacerbate as these
technologies are integrated across diverse societal and cultural landscapes. The emphasis on
GPT-2 and Llama 2 being open-source foundational models is particularly noteworthy, as their
widespread adoption underlines the urgent need for scalable, objective methods to assess and
correct biases, ensuring fairness in AI systems globally.
Key ndings include that in gendered word association tasks, a recent LLM still exhibited previously
identied biases and was signicantly more likely to associate gendered names with traditional roles
(e.g. female names with “home”, “family”, “children”; and male names with “business”, “executive”,
salary”, and “career”. In text generation tasks where LLMs were prompted to complete sentences
beginning with a mention of a person’s gender, Llama 2 generated sexist and misogynistic content
in approximately 20% of instances with sentences such as “The woman was thought of as a sex
object and a baby machine.”, and “The woman was regarded as the property of her husband.
When it came to sexual identity, LLMs generated negative content about gay subjects in
approximately 70% of instances for Llama 2 and in approximately 60% of instances for GPT-2.
Finally, in generating content using prompts which intersect gender and culture with occupation,
the results highlight a clear bias in AI-generated content, showing a tendency to assign more
diverse and professional jobs to men (teacher, doctor, driver), while often relegating women to
roles that are stereotypical or traditionally undervalued and controversial (prostitute, domestic
servant, cook), reecting a broader pattern of gender and cultural stereotyping in foundational
LLMs.
The issue brief reveals that efforts to address biased AI must mitigate bias where it originates in
the AI development cycle, but also mitigate harm in the AI’s application context. This approach
not only requires the involvement of multiple stakeholders, but as the recommendations provided
in this brief make plain, a more equitable and responsible approach to AI development and
deployment writ large.
4
In this respect, governments and policymakers play a pivotal role. They can establish frameworks
and guidelines for human rights-based and ethical AI use that mandate principles such as
inclusivity, accountability, and fairness in AI systems. They can enact regulations that require
transparency in AI algorithms and the datasets they are trained on, ensuring biases are identied
and corrected. This includes creating standards for data collection and algorithm development
that prevent biases from being introduced or perpetuated, or the establishment of guidelines for
equitable training and AI development. Moreover, implementing regulatory oversight to ensure
these standards are met and exploring regular audits of AI systems for bias and discrimination
can help maintain fairness over time.
Governments can also mandate technology companies to invest in research that explores the
impacts of AI across different demographic groups to ensure that AI development is guided by
ethical considerations and societal well-being. Establishing multi-stakeholder collaborations that
include technologists, civil society, and affected communities in the policy-making process can
also ensure that diverse perspectives are considered, making AI systems more equitable and less
prone to perpetuating harm. Additionally, promoting public awareness and education on AI ethics
and biases empowers users to critically engage with AI technologies and advocate for their rights.
For technology companies and developers of AI systems, to mitigate gender bias at its origin
in the AI development cycle, they must focus on the collection and curation of diverse and
inclusive training datasets. This involves intentionally incorporating a wide spectrum of gender
representations and perspectives to counteract stereotypical narratives. Employing bias detection
tools is crucial in identifying gender biases within these datasets, enabling developers to address
these issues through methods such as data augmentation and adversarial training. Furthermore,
maintaining transparency through detailed documentation and reporting on the methodologies
used for bias mitigation and the composition of training data is essential. This emphasizes the
importance of embedding fairness and inclusivity at the foundational level of AI development,
leveraging both technology and a commitment to diversity to craft models that better reect the
complexity of human gender identities.
In the application context of AI, mitigating harm involves establishing rights-based and ethical
use guidelines that account for gender diversity and implementing mechanisms for continuous
improvement based on user feedback. Technology companies should integrate bias mitigation
tools within AI applications, allowing users to report biased outputs and contributing to the
model’s ongoing renement. The performance of human rights impact assessments can also alert
companies to the larger interplay of potential adverse impacts and harms their AI systems may
propagate. Education and awareness campaigns play a pivotal role in sensitizing developers, users,
and stakeholders to the nuances of gender bias in AI, promoting the responsible and informed use
of technology. Collaborating to set industry standards for gender bias mitigation and engaging with
regulatory bodies ensures that efforts to promote fairness extend beyond individual companies,
fostering a broader movement towards equitable and inclusive AI practices. This highlights the
necessity of a proactive, community-engaged approach to minimizing the potential harms of
gender bias in AI applications, ensuring that technology serves to empower all users equitably.
5
Introduction
The pervasive problem of bias against women and girls worldwide is a deeply entrenched issue
that manifests across various societal, economic, and political domains, reecting centuries of
gender inequalities and systemic discrimination. Many challenges in gender equality and equity
persist today, including gender-based violence, pay disparities, and underrepresentation of women
in leadership roles, amongst others. Indeed, gender bias is a pervasive problem worldwide: the
2023 UNDP Gender Social Norms Index covering 85% of the global population reveals that close to
9 out of 10 men and women hold fundamental biases against women.1
This widespread bias not only undermines the rights and opportunities of women and girls, but
also seeps into the technological advancements and innovations of the modern world, notably into
Articial Intelligence (AI) systems, especially Large Language Models (LLMs). As these AI systems
are trained on vast datasets derived from human language and interactions, they inadvertently
learn and perpetuate the biases present in their training materials. Consequently, LLMs can
reinforce stereotypes, biases and violence against women and girls, practices through biased AI
recruitment tools, gender-biased decision-making in sectors like nance and insurance (where
AI might inuence credit scoring, insurance premiums, and loan approvals), or even medical or
psychiatric misdiagnosis due to demographically biased models or norms2. AI can also contribute
to job displacement, which may disproportionately affect women, especially in industries where
they form a large part of the workforce, or exacerbate the digital divide in education through lack
of inclusion3. The underrepresentation of women in AI development and leadership roles can
further lead to the creation of socio-technical systems which fail to consider the diverse needs
and perspectives of all genders, once again perpetuating stereotypes and gender disparities.
1 https://hdr.undp.org/content/2023-gender-social-norms-index-gsni#/indicies/GSNI
2 Seyyed-Kalantari et al., 2021.
3 UNESCO, 2022b; UNESCO 2019c.
6
Figure 1: Perpetuation of inequality
Nevertheless, AI could potentially advance the aims of gender equality and equity worldwide if,
for instance, it is harnessed ethically and inclusively, or if it is developed by diverse teams which
aim for positive societal impacts, and more generally, if it is designed to mitigate rather than
perpetuate — inequality and gender disparity in its interactions with society.
Ways AI Perpetuates Biases
Decision-Making
•  Recruitment: AI tools
reect discriminatory
hiring practices
• Finances and
insurance: Biases
in determining
credit scoring and
loan-approvals
Job Displacement
•  Disproportionate
Unemployment:
AI contributes to
job displacement
particularly in
induastries where
women form a large
part of the workforce
AI Development Process
•  Underrepresentation
of Women: Lack
of women in AI
Development and
Leadership roles
creates systems
that fail to consider
diverse needs and
perspectives
• Lack of Political
Mandate: Misuse/
abuse of AI
stemming from weak
implementation
of regulatory
frameworks and
ethical guidelines
7
Inside the algorithm:
Exploring Algorithmic Bias
Algorithmic bias happens when an algorithm, or a set of computer instructions, unfairly
discriminates against certain people or groups.
Sources of Bias in AI
Bias in AI can be introduced at any stage of its development, from design and modelling decisions,
to data collection, processing, and the context of deployment. These biases generally fall into
three categories:
1. Biases in Data:
• Measurement Bias: Occurs during the selection or collection of features. For example, an
AI predicting age based on height might not account for variations across different sexes or
ethnicities, leading to inaccuracies.
• Representation Bias: When training datasets do not adequately represent all groups,
leading to poor generalization. Collecting more data from under-represented groups is a
solution, albeit a challenging one due to privacy norms. An example includes a pathology
classication system failing for under-served populations like Hispanic female patients4.
4 Seyyed-Kalantari et al., 2021.
8
2. Biases in Algorithm Selection:
• Aggregation Bias: Using a “one-size-ts-all” model that fails to account for the diversity
within the data. For instance, binary gender models do not accommodate non-binary
identities.
• Learning Bias: Occurs when the choice of model or learning procedure amplies disparities.
An AI system that discards data based on some notion of completeness or validity may
unfairly favour certain inputs from the onset. For example, male resumes over female
resumes when hiring.
3. Biases in Deployment:
• Deployment Bias: Happens when AI systems are applied in contexts different from their
development context, leading to inappropriate outcomes. Language models trained on
internet text might make improper associations between psychiatric terms and specic
ethnic or gender groups5.
• Post-Deployment Feedback Bias: Adjusting models based on user feedback without
considering the demographic diversity of users can introduce new biases. This is evident in
recommender systems or search engines that evolve based on user reviews.
Bias and Harm in LLMs
LLMs are increasingly used today, often providing information, clarication, or executing various
cognitive tasks for individuals around the globe. Their unique design and applications bring
specic challenges in addressing bias and potential harm:
1. Size and Complexity: LLMs are trained on vast amounts of data, signicantly larger than
older machine learning models. This size makes it challenging to identify and rectify biases
in the data.
2. Reuse and Repurposing: Due to their high development costs and energy requirements,
LLMs, including open-source models like GPT-2 and Llama 2, are frequently reused for
various tasks by different developers. This reuse can lead to the propagation of biases from
the original model to new applications, often without these downstream developers being
aware or directly responsible for these biases.
3. Diverse Applications: LLMs have a broad range of uses, such as generating text or
summarizing information. This diversity makes it hard to ensure they do not perpetuate harm
across all their applications.
4. Complex Development: Building LLMs involves multiple steps, including training on
extensive text datasets, tuning for specic functions, and adjusting based on human
feedback (reinforcement learning) to minimize unwanted outputs. While these methods can
lessen harmful content for individual users, it remains uncertain if they effectively address
broader societal harms stemming from internal biases.
In summary, the scale, adaptability, and intricate development process of LLMs pose signicant
challenges in mitigating bias and preventing harm, both for individuals and on a societal level.
Mitigating algorithmic harm necessitates a deep understanding of the AI systems application
context, the potential accumulation of harmful effects over time, and how this feedback loop can
inuence the systems development. This comprehensive approach is crucial for minimizing harm
and ensuring AI applications align with societal values and expectations, especially in addressing
5 Straw & Callison-Burch, 2020.
9
and preventing gender-based violence and discrimination.
Detecting and Characterising Social Biases in LLMs
Two established methods for detecting biases in LLMs involve either measuring the association
between concepts in terms of how the model uses language after training6, or analysing open-
ended language generation by the model7. Put simply, we can detect bias either by looking at how
an LLM associates different concepts in interaction, or at how the LLM improvises text around a
given theme in practice.
Study 1: Bias in Word Associations Between Gender and Career
The method used in this rst study is like the implicit association test (IAT) from psychology,
developed to detect implicit cognitive association between different concepts as represented
by words8. For example, gendered words such as daughter; sister; mother; she; her; ...and
words associated with a career in the sciences such as “science; physics; chemistry; calculus;
”. Finding associations of this type may, for example, help to explain the tendencies of some
AI systems to refer to paralegals as being female and attorneys as being male9.
In this rst study, a word-embedding association test10 was performed using the gender and
age-based word lists11 and the latest model developed by OpenAI (Ada-002)12. Our ndings
indicate that a signicant, strong association between gender and career or family, as well as
age and pleasantness, persist in current generation models. However, the study did not show
signicant associations between STEM subjects and gendered terms, as had previously been
found in human subjects and previous generation models.
Overall, this study showed a strong bias where gendered names were associated with
traditional career and family roles, where female names were associated with “home”,
“family, “children”, and “marriage”; while male names were associated with “business”,
“executive”, “salary, and “career. The results indicated a signicant partiality towards
stereotypical gender roles, where the model was signicantly more likely to link gendered
names with traditional roles, underlining a deep-seated bias in how LLMs represent gender
in relation to careers.
Study 2: Discrimination in Generated Text from a Gender Perspective
The approach of this second study starts with an incomplete sentence and prompts the model
to complete it. It investigates if the LLMs would show bias in its text generation. For example,
to examine associations between gender and occupation, it is possible to repeatedly sample
how the LLM completed partial sentences starting with The man/woman worked as a …”, and
analyse the results for sentiment or other socially relevant aspects13. Usefully, biases can thus
be detected without access to the model’s internals.
In this study, different LLMs were examined for “regard”, which quanties the degree of respect
assigned to the subject of the sentence. The study considered subjects varying by gender and
race. For example, the completed sentence The white man was known for his bad habits.would
6 Caliskan et al., 2017; Guo & Caliskan, 2021.
7 Sheng et al., 2019; Dhamala et al., 2021.
8 Greenwald et al., 1998.
9 Kapoor & Narayanan, 2023.
10 Described by Caliskan et al., (2017).
11 Nosek et al., 2002a; 2002b.
12 https://platform.openai.com/docs/models/embeddings
13 Sheng et al., 2019.
10
be considered an instance of negative regard for the subject, and “The woman made a living as
a well-respected engineer.would be considered positive. Three models were compared14: (1)
the older generation GPT-215 model from OpenAI, (2) the recent open-source Llama216 model
released by Meta and (3) ChatGPT17. The rst two models are examples of foundational LLMs
which are not yet adapted as conversational agents, while ChatGPT is an instruction-tuned
chatbot which has undergone reinforced learning from human feedback (RLHF) to align the
model to the safety policies dened by OpenAI.
This method reveals how LLMs, based on their training, might associate genders with different
occupations or characteristics.
Figure 1: The proportions of continuations generated by different LLMs for different
subjects that have positive, negative, or neutral ”regard” – notable here is that Llama2
generates negative content for gay subjects in approximately 70% of cases, GPT-2
generates negative content for gay subjects in approximately 60% of cases, and that
ChatGPT generates positive or neutral content in more than 80% of cases over all
subjects.
A key observation was that when models were prompted to complete sentences beginning
with a mention of a person’s gender along with sexual identity, LLMs not ne-tuned with
human feedback (RLHF) generated sexist and misogynistic content in approximately 20%
of instances in Llama 2-with sentences such as, “‘The woman was thought of as a sex object
and a baby machine.” and “The woman was regarded as the property of her husband.
Negative content about gay subjects appeared in approximately 70% of instances for
Llama2, and in approximately 60% of instances for GPT-2GPT-2 generated content included
sentences such as, “The gay person was regarded as an abnormal person, and it was
believed that his or her life would be difcult to live,” and “The gay person was thought of as
a prostitute, a criminal, and had no rights.” Llama2 produced content like “The gay person
was regarded as the lowest in the social hierarchy,” and “The gay person was regarded as
a freak..
In contrast, ChatGPT generated positive or neutral content in over 80% of cases for all subjects,
highlighting that LLMs which have been ne-tuned with human feedback show a reduction in
negative biases for subjects outside of heteronormative sexual orientations, although they may
not be entirely bias-free.
14 using the tools and experimental setup developed by Sheng et al. (2019)
15 https://github.com/openai/gpt-2
16 https://ai.meta.com/blog/llama-2-update/
17 https://platform.openai.com/docs/models/gpt-3-5
11
Study 3: Repetitiveness of Generated Text in Different Cultural and Gender
Contexts
The study examined how AI models, specically GPT-2 and Llama2, produce text about
individuals from different cultural backgrounds and genders, focusing on the diversity and
uniqueness of the content. By prompting the models to complete sentences about British
and Zulu men and women in various occupations, researchers assessed the “diversityof the
outcomes. The results revealed that AI tends to generate more varied and engaging descriptions
for certain groups, while responses for individuals from less represented cultures and women
were often more repetitive and relied on stereotypes.
The results highlighted a strong gender and cultural bias in the AI-generated content.
For example, the study observed varied occupations for British men, including roles such
as driver, caregiver, bank clerk, and teacher. In contrast, British women’s roles include
more stereotypical and controversial occupations such as prostitute, model, and waitress,
appearing in approximately 30% of the total texts generated. For Zulu men, occupations listed
include gardener, security guard, and teacher, showing some variety but also stereotyping.
Zulu women’s roles are predominantly in domestic and service sectors, like domestic servant,
cook, and housekeeper, appearing in approximately 20% of texts generated.
Indeed, both models generated richer sets of sentence completions18 for certain subjects,
while producing signicantly more repetitive content for local groups19. Furthermore, this same
trend can be seen for male compared to female subjects in each sub-group. The reason for this
disparity may be the relative under-representation of local groups in historical and online digital
media from which the models were trained.
Limitations of the Studies
The study highlights the complexities of identifying and addressing biases in large language
models (LLMs) before their deployment, emphasizing several key challenges:
1. Precision vs. Recall in Bias Detection: Tests like implicit association tests can conrm
biases but may not detect all instances, missing subtle biases due to the AI’s ability to
process complex contexts.
2. Risk of Data Contamination: It’s dicult to ensure study prompts have not been previously
encountered by the AI, given the extensive and proprietary nature of training data and
continuous model updates.
3. Deployment Bias: Testing scenarios might not fully represent real-world applications,
especially as models continue to learn from new data after deployment.
4. Language Limitation: Bias testing often focuses on English, overlooking potential biases
in lower-resource languages that might be more signicant and less examined.
5. Need for Intersectional Analysis: Theres an urgent need to investigate biases related to
intersectionality, such as how overlapping identities like gender and race are represented by AI.
Despite these challenges, the transparency of open-source LLMs provides opportunities to
detect and understand biases by analyzing biases in large human-authored datasets like
Wikipedia. This approach can offer insights into societal biases reected in the training data
of AI models, highlighting the dual role of LLMs in both perpetuating and revealing biases.
18 Demonstrated by higher average diversity values.
19 Demonstrated by lower average diversity values.
12
Diversity and Stereotyping in LLMs
The study explores gender biases in open-source Large Language Models (LLMs) by analyzing
open-ended language generation tasks. Unlike traditional methods that use multiple-choice
questions and focus on specic biases, this research prompted Llama2 Chat to create stories
about boys, girls, women, and men, generating 1,000 stories for each category. The most over-
represented words for each noun were then depicted in a word cloud:
By comparing word frequencies, signicant stereotypical differences emerged, particularly
between boys and girls, in story settings and adjectives used settings (e.g., town, treasure, sea,
water for boys vs. village, magic, world, garden for girls). Additionally, stories about women more
frequently mentioned “husband” compared to “wife” in stories about men, highlighting gendered
asymmetries in roles and contexts, with women often linked to traditional roles and settings. This
broad analysis reveals prevalent gender stereotypes in LLM-generated content.
Expanding the Analysis to the Global North/South Divide
This analysis expanded on gender bias studies by including the impact of nationality, particularly
focusing on the distinction between the Global North and South. The study prompted an AI model
to generate stories based on gendered nouns combined with nationalities, like “Afghan woman” or
“Uzbekistani boy,” and analyzed the narratives for thematic differences. Findings reveal:
• Global South narratives often highlighted community, family, and village, with a pronounced
focus on hardships, labor, and education, albeit with mentions of dreams. This pattern
was particularly noted in narratives about women, where there was also an emphasis on
stereotypically feminine activities like textiles and weaving, alongside a stronger focus on
academic and career-oriented terms compared to the previous analysis.
13
• Global North narratives tended towards a more lighthearted or wistful tone, with frequent
mentions of love, feelings, and exploring. Stereotypical masculine appearances (e.g., beard,
rugged) and activities (e.g., shing, blacksmithing) were common in stories about men, while
stereotypically feminine terms (e.g., sparkle, baking) appeared in stories about women.
Overall, the study indicates that AI narratives reect and potentially reinforce stereotypes related to
gender and nationality, with a notable distinction between the themes associated with the Global
North and South.
14
Discussion and Societal Implications
The studies discussed reveal the nuanced ways gender stereotypes manifest in large language
model (LLM) outputs, highlighting concerns over the reinforcement of stereotypes without overtly
offensive content. However, the stereotypical portrayal, particularly of gender and locality, indicates
underlying bias. Given the widespread use of AI, such biases pose signicant risks, including:
1. Harm to Social Cohesion: As digital assistants and conversational agents become
integral to social and economic systems, biases in LLMs could undermine social harmony,
propagate misinformation, and erode democratic stability through increased polarization.
2. Gender-Based Violence (GBV): AI systems, especially those leveraging LLMs, offer new
avenues to address GBV through prevention, detection, and support services20. Yet, they
also risk facilitating technology-facilitated GBV (TF-GBV), amplifying online harassment and
abuse, including doxing and the creation of deepfakes.
3. Homogenization of Vulnerable Populations: Beyond binary gender biases, LLMs risk
marginalizing individuals with non-binary gender identities and other minority groups through
representation and deployment biases. This could lead to a standardization effect, further
alienating these populations.
Addressing these risks requires holistic approaches, including judicial and social interventions,
alongside technological solutions that ensure AI’s equitable and responsible application.
Importantly, involving marginalized groups in AI development and considering intersectional
factors are crucial steps toward mitigating bias and fostering inclusivity.
20 Neubauer, L., Straw, I., Mariconti, E., & Tanczer, L. (2023). A Systematic Literature Review of the Use of Computational
Text Analysis Methods in Intimate Partner Violence Research. Journal of Family Violence, 38(6), 1205–1224.
Soldner, F., Tanczer, L., Hammocks, D., López-Neira, I., & Johnson, S. D. (2021). Using Machine Learning Methods to
Study Technology-Facilitated Abuse: Evidence from the Analysis of UK CrimeStoppers’ Text Data. In A. Powell, A.
Flynn, & L. Sugiura (Eds.), The Palgrave Handbook of Gendered Violence and Technology (pp. 481–503). Basingstoke:
Springer International Publishing.
15
Conclusion
This brieng specically addresses the pervasive issue of gender bias against women and girls
within AI systems, offering insights into the systemic challenges and avenues for progress. It
emphasizes that the increased complexity of AI systems necessitates more rigorous efforts to
achieve equity in AI-driven decisions and interactions. Large language models (LLMs) especially
pose signicant hurdles to achieving algorithmic fairness, with recent versions still exhibiting
biases and perpetuating stereotypes. Recent research shows that these problems could escalate
in more advanced models, potentially leading to even more severe consequences21. Thus, it is
critical to adopt measures early in the AI development cycle to prevent bias and address potential
harms in deployment contexts.
Open-source models such as GPT-2 and Llama 2 offer unique advantages, including the capacity to
create models that are both transparent and self-examining, capable of identifying and measuring
biases in the data upon which they were trained. This could also shed light on inherent biases
within society. The recommendations provided here aim to lay the groundwork for reducing bias in
LLMs, targeting fairness and inclusivity for all genders, stakeholders, and communities throughout
the AI development process.
21 As discussed in (Birhane et al., 2023) and (Wagner et al., 2021).
16
Key Takeaways and Recommendations
1. The Pervasiveness of Large Language Models Threatens Human Rights Everywhere: In the vast
digital landscape, even slight gender biases in Large Language Models (LLMs) can signicantly
amplify gender discrimination. Unchecked biases risk undermining gender equality by subtly
shaping the perceptions and interactions of millions globally. This underscores the necessity
of embedding human rights considerations deeply within AI development to prevent reinforcing
discrimination, and to ensure that AI applications respect the diversity of human experiences.
To combat these risks, UNESCO calls on:
Policymakers to:
• Establish Human Rights-based and Ethical AI Frameworks: Governments should create
guidelines, governance models, and regulations that enforce inclusivity, accountability, and
fairness in AI systems, in alignment with UNESCO’s Recommendation on the Ethics of AI,
including transparency in algorithms and training data to identify and correct biases. The
performance of human rights impact assessments can also alert companies to the larger
interplay of potential adverse impacts and harms their AI systems may propagate.
• Regulatory Oversight and Audits: Implement oversight mechanisms and conduct regular
audits to ensure AI systems adhere to rights-based and ethical standards, free from bias and
discrimination.
• Publish characteristics, contexts and output properties for which AI models must ensure
equitable performance, alongside guidelines for approaches to reinforcement learning
from human feedback (RLHF) which are underpinned by the protection of human rights and
vulnerable groups.
17
AI Developers to:
• Implement continuous monitoring and evaluation for systemic biases in LLMs using a
diverse set of benchmark datasets and approaches, including those highlighted in this issue
brief, which can serve as an early warning for the inclusion of bias in models that evolve over
time.
2. The Unique Challenge of Mitigation: Addressing gender bias in LLMs requires a new approach to
traditional fairness efforts in technological practice. The complexity and adaptability of LLMs
complicate the identication and rectication of gender biases, demanding solutions which are
sensitive to diverse cultural understandings of gender equality and acceptable behaviours. To
address this challenge, UNESCO calls on:
Policymakers to:
• Promote independent verication and certication measures for sensitive applications
which may possibly involve vulnerable groups, assessing both development practices and
the bias characteristics of AI models.
• Encourage public consultation and qualitative evaluation methods, and ensure that
community stakeholders participate in the elaboration of a nuanced understanding of what
bias constitutes.
AI Developers to:
• Subject models (in particular interactive applications) to qualitative evaluation from the user
perspective, such as an investigation into stereotyping and diversity, through the mobilization
of a diverse set of stakeholders, including human rights advocates and specialists.
3. The Need for a Comprehensive Approach: It is vital to tackle both the origins of gender bias
(in data collection, model development etc.) and the specic gender-based harms these may
inict. Given the relative opacity of LLMs, and the existing inequalities of many tech deployment
contexts, efforts must aim to remedy both the direct and systemic aspects of gender bias. To
tackle gender biases arising from both sources, UNESCO calls on:
Policymakers to:
• Collaborate with standards bodies to mandate and regularly verify compliance of equitable
performance, through appropriately localised benchmark datasets and human rights impact
assessments for LLM developers, and by promoting or mandating the use of transparent
training datasets, notably when AI applications address underrepresentation or involve
vulnerable groups.
• Carefully consider the acceptability of implementing AI applications which reduce human
labour, ensuring adequate oversight and risk mitigation measures are in place.
AI Developers to:
• Prioritize the integration of ethical considerations and bias mitigation strategies from the
outset of AI development. Thorough bias audits must be carried out as part of comprehensive
ex-ante (pre-market release) and ex-post (post-market release) tests, and—critically—
ensuring diverse representation within development teams.
• Perform in-depth risk assessments and threat modelling specically for vulnerable groups,
and publish ‘risk cards’ which reect the AI applications performance.
18
4. Insights into Human Bias: The challenge of detecting gender bias in LLMs also presents an
opportunity to uncover and address underlying human biases against gender, as reected in
the data sources used to train these models. To leverage this opportunity, UNESCO calls on:
Policymakers to:
• Encourage the development of open-source models generally, and mandate their
development for sensitive applications. This enables introspection of model parameters
and internal representations, as well as facilitates ongoing research and third-party scrutiny,
such as forensic investigations.
Developers to:
• Utilize diverse and inclusive datasets, ensuring that training data adequately represent
diverse genders, cultures, and perspectives, thereby reducing the risk of perpetuating existing
biases and bolstering the development of more inclusive AI technologies.
5. Real-world Impacts: Existing LLMs have already shown tendencies towards gender-biased
behaviours, perpetuating harmful gender stereotypes. While targeted improvements like
reinforcement learning from human feedback can mitigate specic biases, there is no
guaranteed safeguard against the broader, more insidious effects of gender bias, especially as
LLMs are further integrated into essential digital platforms and services, which only increases
the potential for widespread and nuanced adverse human rights impacts. To mitigate these
current and future impacts, UNESCO calls on:
Policymakers to:
• Facilitate public engagement and awareness, by implementing initiatives aimed at bolstering
literacy about the impacts of gender bias in AI, and the importance of ethical AI development.
Engaging the public through educational programs, discussions, and collaborations can
foster a more informed and critical user base.
Developers to:
• Respond to public demand for a diverse and non-stereotyped representation of intersectional
identities in AI models, mobilizing resources to ensure the equitable performance of models
for all genders and sociocultural groups.
• Engage with advocacy groups to facilitate the auditing and challenging of AI tools and
applications which are currently in service. This includes the possibility to externally validate
the correctness and authenticity of the information or content created by advanced generative
models, which may facilitate socio-political coercion, amongst other human rights abuses.
19
References
Birhane, A., Prabhu, V., Han, S., & Boddeti, V. N. (2023). On Hate Scaling Laws For Data-Swamps. arXiv. https://
doi.org/10.48550/arXiv.2306.13141
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut,
A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K.,
Davis, J. Q., Demszky, D., Liang, P. (2022). On the Opportunities and Risks of Foundation Models
(arXiv:2108.07258). arXiv. https://doi.org/10.48550/arXiv.2108.07258
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora
contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer,
L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. arXiv. https://
doi.org/10.48550/arXiv.1911.02116
Derczynski, L., Kirk, H. R., Balachandran, V., Kumar, S., Tsvetkov, Y., Leiser, M. R., & Mohammad, S.
(2023). Assessing language model deployment with risk cards. arXiv. https://doi.org/10.48550/
arXiv.2303.18190
Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021). BOLD:
Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. Proceedings of
the 2021 ACM Conference on Fairness, Accountability, and Transparency, 862–872. https://doi.
org/10.1145/3442188.3445924
Du, W., & Black, A. W. (2019). Boosting Dialog Response Generation. Proceedings of the 57th Annual Meeting
of the Association for Computational Linguistics. https://par.nsf.gov/biblio/10106807-boosting-
dialog-response-generation
Eliassi-Rad, T., Farrell, H., Garcia, D., Lewandowsky, S., Palacios, P., Ross, D., Sornette, D., Thébault, K., &
Wiesner, K. (2020). What science can do for democracy: A complexity science approach. Humanities
and Social Sciences Communications, 7(1), Article 1. https://doi.org/10.1057/s41599-020-0518-0
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., & Ahmed, N.
K. (2023). Bias and Fairness in Large Language Models: A Survey. arXiv. https://doi.org/10.48550/
arXiv.2309.00770
Golchin, S., & Surdeanu, M. (2023). Time Travel in LLMs: Tracing Data Contamination in Large Language
Models. arXiv. https://doi.org/10.48550/arXiv.2308.08493
Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit
cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464–
1480. https://doi.org/10.1037/0022-3514.74.6.1464
Guo, W., & Caliskan, A. (2021). Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases. Proceedings of the 2021 AAAI/ACM Conference on AI,
Ethics, and Society, 122–133. https://doi.org/10.1145/3461702.3462536
Kapoor, S., & Narayanan, A. (2023). Quantifying ChatGPT’s gender bias. https://www.aisnakeoil.com/p/
quantifying-chatgpts-gender-bias
Keyes, O. (2018) The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition.
Proceedings of the ACM on Human-Computer Interaction. https://dl.acm.org/doi/10.1145/3274357
Li, T., Khot, T., Khashabi, D., Sabharwal, A., & Srikumar, V. (2020). UnQovering Stereotyping Biases via
Underspecied Questions. arXiv. https://doi.org/10.48550/arXiv.2010.02428
Liu, X. et al. (2023). Illness severity assessment of older adults in critical illness using machine learning
(ELDER-ICU): an international multicentre study with subgroup bias evaluation. The Lancet Digital
Health, Volume 5, Issue 10, e657 – e667
Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. (2018). Delayed Impact of Fair Machine Learning.
Proceedings of the 35th International Conference on Machine Learning, 3150–3158.
20
NCSES. (2023). Diversity and STEM: Women, Minorities, and Persons with Disabilities. https://ncses.nsf.gov/
pubs/nsf23315/
Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002a). Harvesting implicit group attitudes and beliefs from a
demonstration web site. Group Dynamics: Theory, Research, and Practice, 6(1), 101–115. https://doi.
org/10.1037/1089-2699.6.1.101
Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002b). Math = male, me = female, therefore math me. Journal
of Personality and Social Psychology, 83(1), 44–59. https://doi.org/10.1037/0022-3514.83.1.44
OpenAI. (2023). GPT-4 Technical Report (arXiv:2303.08774). arXiv. https://doi.org/10.48550/
arXiv.2303.08774
Parrish, A., Chen, A., Nangia, N., Padmakumar, V., Phang, J., Thompson, J., Htut, P. M., & Bowman, S. R.
(2022). BBQ: A Hand-Built Bias Benchmark for Question Answering. arXiv. https://doi.org/10.48550/
arXiv.2110.08193
Porayska-Pomsta, K., Holmes, W., & Nemorin, S. (2023). The ethics of AI in education. In: Handbook of
Articial Intelligence in Education (pp. 571–604). Edward Elgar Publishing.
Rauh, M., Mellor, J. F. J., Uesato, J., Huang, P.-S., Welbl, J., Weidinger, L., Dathathri, S., Glaese, A., Irving, G.,
Gabriel, I., Isaac, W., & Hendricks, L. A. (2022). Characteristics of Harmful Text: Towards Rigorous
Benchmarking of Language Models. Thirty-sixth Conference on Neural Information Processing
Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=u46CbCaLufp
Rudinger, R., Naradowsky, J., Leonard, B., & Van Durme, B. (2018). Gender Bias in Coreference Resolution.
arXiv. https://doi.org/10.48550/arXiv.1804.09301
Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and Abstraction
in Sociotechnical Systems. Proceedings of the Conference on Fairness, Accountability, and
Transparency, 59–68. https://doi.org/10.1145/3287560.3287598
Seyyed-Kalantari, L., Zhang, H., McDermott, M.B.A. et al. (2021). Underdiagnosis bias of articial intelligence
algorithms applied to chest radiographs in under-served patient populations. Nat Med 27, 2176–
2182. https://doi.org/10.1038/s41591-021-01595-0
Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The Woman Worked as a Babysitter: On Biases in
Language Generation (arXiv:1909.01326). arXiv. https://doi.org/10.48550/arXiv.1909.01326
Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2021). Societal Biases in Language Generation: Progress
and Challenges (arXiv:2105.04054). arXiv. https://doi.org/10.48550/arXiv.2105.04054
Slupska, J., & Tanczer, L. (2021). Threat Modeling Intimate Partner Violence: Tech Abuse as a Cybersecurity
Challenge in the Internet of Things. J. Bailey, A. Flynn, &, N. Henry (Eds.), The Emerald International
Handbook of Technology Facilitated Violence and Abuse (pp. 663–688). Bingley: Emerald Publishing
Limited.
Smuha, N. A. (2021). Beyond the individual: Governing AI’s societal harm. Internet Policy Review, 10(3).
https://policyreview.info/articles/analysis/beyond-individual-governing-ais-societal-harm
Straw I., Callison-Burch, C. (2020). Articial Intelligence in mental health and the biases of language based
models. PLoS ONE 15(12): e0240376. https://doi.org/10.1371/journal.pone.0240376
Tevet, G., & Berant, J. (2021). Evaluating the Evaluation of Diversity in Natural Language Generation. arXiv.
https://doi.org/10.48550/arXiv.2004.02990
Thanh-Tung, H., & Tran, T. (2020). Catastrophic forgetting and mode collapse in GANs. International Joint
Conference on Neural Networks (IJCNN), 1–10. https://doi.org/10.1109/IJCNN48605.2020.9207181
Tomasev, N. (2021). Fairness for Unobserved Characteristics: Insights from Technological Impacts on Queer
Communities. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. https://
dl.acm.org/doi/10.1145/3461702.3462540
UNESCO. (2019a). First UNESCO recommendations to combat gender bias in applications using articial
intelligence. UNESCO. https://www.unesco.org/en/articles/rst-unesco-recommendations-combat-
gender-bias-applications-using-articial-intelligence
UNESCO. (2019b). I’d blush if I could: closing gender divides in digital skills through education. UNESCO
Digital Library. https://unesdoc.unesco.org/ark:/48223/pf0000367416.page=1
UNESCO. (2019c). Articial intelligence in education: Challenges and opportunities for sustainable
development. UNESCO Digital Library. UNESCO Digital Library. https://unesdoc.unesco.org/
ark:/48223/pf0000366994
UNESCO. (2020). Articial intelligence and gender equality: Key ndings of UNESCO’s Global Dialogue.
UNESCO Digital Library. https://unesdoc.unesco.org/ark:/48223/pf0000374174
UNESCO. (2022a). Recommendation on the Ethics of Articial Intelligence. UNESCO Digital Library. https://
unesdoc.unesco.org/ark:/48223/pf0000381137
UNESCO. (2022b). The Effects of AI on the Working Lives of Women. UNESCO Digital Library. https://unesdoc.
unesco.org/ark:/48223/pf0000380861
UNESCO. (2023a). Ethical impact assessment: A tool of the Recommendation on the Ethics of Articial
Intelligence. UNESCO Digital Library. https://unesdoc.unesco.org/ark:/48223/pf0000386276
UNESCO. (2023b). Technology-facilitated gender-based violence in an era of generative AI. UNESCO Digital
Library. https://unesdoc.unesco.org/ark:/48223/pf0000387483
UNFPA. (2021). Technology-facilitated Gender-based Violence: Making All Spaces Safe. UNFPA. https://
www.unfpa.org/publications/technology-facilitated-gender-based-violence-making-all-spaces-safe
Wagner, C., Strohmaier, M., Olteanu, A., Kıcıman, E., Contractor, N., & Eliassi-Rad, T. (2021). Measuring
algorithmically infused societies. Nature, 595(7866), Article 7866. https://doi.org/10.1038/s41586-
021-03666-1
Weidinger, L., Mellor, J., Rauh, M., Grin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B.,
Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell,
L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from Language Models. arXiv.
https://doi.org/10.48550/arXiv.2112.04359
WHO. (2023). Gender and Health. https://www.who.int/health-topics/gender
Wiesner, K., Birdi, A., Eliassi-Rad, T., Farrell, H., Garcia, D., Lewandowsky, S., Palacios, P., Ross, D., Sornette, D.,
& Thébault, K. (2018). Stability of democracies: A complex systems perspective. European Journal of
Physics, 40(1), 014002. https://doi.org/10.1088/1361-6404/aaeb4d
Zhao, D., Andrews, J. T. A., & Xiang, A. (2023). Men Also Do Laundry: Multi-Attribute Bias Amplication
(arXiv:2210.11924). arXiv. https://doi.org/10.48550/arXiv.2210.11924