
19
References
Birhane, A., Prabhu, V., Han, S., & Boddeti, V. N. (2023). On Hate Scaling Laws For Data-Swamps. arXiv. https://
doi.org/10.48550/arXiv.2306.13141
Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut,
A., Brunskill, E., Brynjolfsson, E., Buch, S., Card, D., Castellon, R., Chatterji, N., Chen, A., Creel, K.,
Davis, J. Q., Demszky, D., … Liang, P. (2022). On the Opportunities and Risks of Foundation Models
(arXiv:2108.07258). arXiv. https://doi.org/10.48550/arXiv.2108.07258
Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora
contain human-like biases. Science, 356(6334), 183–186. https://doi.org/10.1126/science.aal4230
Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzmán, F., Grave, E., Ott, M., Zettlemoyer,
L., & Stoyanov, V. (2020). Unsupervised Cross-lingual Representation Learning at Scale. arXiv. https://
doi.org/10.48550/arXiv.1911.02116
Derczynski, L., Kirk, H. R., Balachandran, V., Kumar, S., Tsvetkov, Y., Leiser, M. R., & Mohammad, S.
(2023). Assessing language model deployment with risk cards. arXiv. https://doi.org/10.48550/
arXiv.2303.18190
Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021). BOLD:
Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. Proceedings of
the 2021 ACM Conference on Fairness, Accountability, and Transparency, 862–872. https://doi.
org/10.1145/3442188.3445924
Du, W., & Black, A. W. (2019). Boosting Dialog Response Generation. Proceedings of the 57th Annual Meeting
of the Association for Computational Linguistics. https://par.nsf.gov/biblio/10106807-boosting-
dialog-response-generation
Eliassi-Rad, T., Farrell, H., Garcia, D., Lewandowsky, S., Palacios, P., Ross, D., Sornette, D., Thébault, K., &
Wiesner, K. (2020). What science can do for democracy: A complexity science approach. Humanities
and Social Sciences Communications, 7(1), Article 1. https://doi.org/10.1057/s41599-020-0518-0
Gallegos, I. O., Rossi, R. A., Barrow, J., Tanjim, M. M., Kim, S., Dernoncourt, F., Yu, T., Zhang, R., & Ahmed, N.
K. (2023). Bias and Fairness in Large Language Models: A Survey. arXiv. https://doi.org/10.48550/
arXiv.2309.00770
Golchin, S., & Surdeanu, M. (2023). Time Travel in LLMs: Tracing Data Contamination in Large Language
Models. arXiv. https://doi.org/10.48550/arXiv.2308.08493
Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit
cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464–
1480. https://doi.org/10.1037/0022-3514.74.6.1464
Guo, W., & Caliskan, A. (2021). Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases. Proceedings of the 2021 AAAI/ACM Conference on AI,
Ethics, and Society, 122–133. https://doi.org/10.1145/3461702.3462536
Kapoor, S., & Narayanan, A. (2023). Quantifying ChatGPT’s gender bias. https://www.aisnakeoil.com/p/
quantifying-chatgpts-gender-bias
Keyes, O. (2018) The Misgendering Machines: Trans/HCI Implications of Automatic Gender Recognition.
Proceedings of the ACM on Human-Computer Interaction. https://dl.acm.org/doi/10.1145/3274357
Li, T., Khot, T., Khashabi, D., Sabharwal, A., & Srikumar, V. (2020). UnQovering Stereotyping Biases via
Underspecied Questions. arXiv. https://doi.org/10.48550/arXiv.2010.02428
Liu, X. et al. (2023). Illness severity assessment of older adults in critical illness using machine learning
(ELDER-ICU): an international multicentre study with subgroup bias evaluation. The Lancet Digital
Health, Volume 5, Issue 10, e657 – e667
Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., & Hardt, M. (2018). Delayed Impact of Fair Machine Learning.
Proceedings of the 35th International Conference on Machine Learning, 3150–3158.