← Back to publications

NAACL 2024 · SRW

Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4

Seungyoon Lee, Dongjun Kim, Dahyun Jung, Chanjun Park, Heuiseok Lim

Korean social bias analysis

Large Language Models (LLMs) have significantly impacted various fields requiring advanced linguistic understanding, yet concerns regarding their inherent biases and ethical considerations have also increased. Notably, LLMs have been critiqued for perpetuating stereotypes against diverse groups based on race, sexual orientation, and other attributes. However, most research analyzing these biases has predominantly focused on communities where English is the primary language, neglecting to consider the cultural and linguistic nuances of other societies. In this paper, we aim to explore the inherent biases and toxicity of LLMs, specifically within the social context of Korea. We devise a set of prompts that reflect major societal issues in Korea and assign varied personas to both ChatGPT and GPT-4 to assess the toxicity of the generated sentences. Our findings indicate that certain personas or prompt combinations consistently yield harmful content, highlighting the potential risks associated with specific persona-issue alignments within the Korean cultural framework. Furthermore, we discover that GPT-4 can produce more than twice the level of toxic content than ChatGPT under certain conditions.

Background
  • LLM bias and toxicity are widely studied, but almost entirely in English-speaking contexts
  • Rigorous analysis within the Korean sociocultural context was missing
Problem
Quantify and compare ChatGPT vs. GPT-4 toxicity under Korean social issues and persona conditions
Method
Design prompts reflecting major Korean social issues with varied personas → collect model outputs → compute automated toxicity metrics with human validation → analyze bias patterns by condition
Results
  • Certain persona–issue combinations consistently yield harmful content
  • Under some conditions GPT-4 produces more than twice the toxicity of ChatGPT
Role
  • Analyzed and organized results across persona and issue conditions
  • Built the visualizations and co-wrote the paper