Dongjun Kim
AI and LLM Interpretability Researcher
Hi, I'm Jun, a researcher at Korea University with a deep commitment to advancing our understanding of Large Language Models. My research focuses on mechanistic interpretability, where I aim to reverse-engineer the algorithms and structures within LLMs to reveal how they process information, make decisions, and exhibit emergent behaviors. By making these systems more transparent and predictable, I strive to enhance their reliability, safety, and alignment with human values.
I am driven by the curiosity to understand the principles of Deep Neural Networks. My work seeks to bridge the gap between theoretical advancements in AI and their practical applications, contributing to the development of AI systems that are not only highly capable but also aligned with ethical standards and societal goals. Through rigorous research and collaboration, I aim to help build a future where AI technologies are both transformative and responsible.
As a researcher at the forefront of AI innovation, I am dedicated to pushing boundaries in interpretability, safety, and reasoning within LLMs. My work reflects a strong belief that understanding the mechanisms behind intelligence is essential for unlocking its full potential while mitigating risks. If you share my passion for advancing AI responsibly or have ideas in interpretability or safety, feel free to reach out—I am always eager to collaborate with like-minded researchers.
Research Interests
Mechanistic Interpretability
My primary focus is on reverse-engineering LLMs to uncover their internal circuits, algorithms, and decision-making processes. This includes:
- Analyzing transformer architectures to map dependencies between attention heads, layers, and emergent behaviors
- Developing sparse autoencoders as tools for isolating interpretable features in high-dimensional latent spaces
- Investigating causal relationships between model components and specific capabilities through probing techniques
By understanding these mechanisms, I aim to make LLMs more transparent while providing insights into their strengths and limitations.
AI Safety
Ensuring the safe deployment of advanced AI systems is critical to my work. My research in AI safety focuses on:
- Developing scalable frameworks for aligning model behavior with human values through fine-tuning and reinforcement learning
- Detecting and mitigating biases in language models using interpretability-driven methods
- Building robust anomaly detection systems that identify harmful or unexpected behaviors during deployment
My goal is to create methodologies that safeguard against risks while enabling reliable and ethical applications of AI.
Mechanistic Anomaly Detection
Mechanistic anomaly detection involves identifying unexpected or harmful behaviors in LLMs by analyzing their internal mechanisms. I focus on:
- Developing tools that trace causal pathways within neural networks to diagnose sources of anomalous outputs
- Designing self-monitoring models capable of detecting deviations from intended behavior during real-world use
- Applying interpretability techniques to improve system robustness under adversarial or high-stakes conditions
This research ensures that AI systems remain reliable even when faced with complex or unpredictable environments.
Reasoning Models & Agent Systems
Understanding how LLMs reason and interact as agents is an emerging focus of my work. This includes:
- Investigating multi-step reasoning processes within transformer-based architectures
- Developing agent systems that leverage LLMs for planning, decision-making, and interactive problem-solving tasks
- Exploring how compositionality in neural networks enables structured reasoning across diverse domains
By advancing reasoning models, I aim to enable LLMs to perform complex tasks reliably while maintaining interpretability.
Retrieval-Augmented Generation (RAG)
RAG combines retrieval systems with generative models to enhance factual accuracy and groundedness. My work in this area focuses on:
- Designing retrieval pipelines optimized for domain-specific knowledge integration
- Exploring hybrid architectures that improve factual consistency in generative outputs
- Reducing hallucinations by embedding retrieval mechanisms directly into transformer workflows
Through RAG techniques, I aim to bridge the gap between generative capabilities and real-world reliability.
Computational Neuroscience
Computational neuroscience bridges the gap between biological neural systems and artificial intelligence, offering insights into how natural intelligence can inspire better AI systems. My work in this area focuses on:
- Studying biological neural mechanisms to inform the development of more efficient and interpretable AI architectures
- Drawing parallels between attention mechanisms in transformers and cognitive processes observed in the brain
- Exploring how memory, reasoning, and learning in biological systems can be modeled computationally to improve LLM performance
By leveraging principles from neuroscience, I aim to enhance our understanding of both natural and artificial intelligence, driving innovations in AI research.
Education and Research Journey
My academic path reflects an evolving fascination with complex systems—from spatial computing to neural architectures—driven by the fundamental question: "How do intelligent systems truly work?" This journey has crystallized into a focused mission to reverse-engineer AI while ensuring its safe and ethical development.
Korea University, Seoul, South Korea
As a Master's researcher in the NLP&AI Lab under Dr. Heui-Seok Lim, I have been involved in multiple government and industry-funded projects, including collaborations with the Ministry of Food and Drug Safety and KT Gen AI Lab. My work focuses on advancing LLM interpretability, AI safety, and retrieval-augmented generation (RAG) systems. Key contributions include:
- Developing a novel knowledge editing method for domain-specific applications, enabling precise updates to LLMs without retraining
- Designing automatic attack detection frameworks (harmfulness/bias detection) for AI safety through automated red-teaming techniques
- Creating an advanced RAG agent system capable of dynamic knowledge retrieval and integration for real-time decision-making tasks
These projects integrate cutting-edge interpretability techniques with practical applications, bridging theoretical AI research with real-world deployment challenges. Recent breakthroughs include a new probing method to trace causal pathways in transformer architectures, currently under review at ACL 2025.
University of South Florida, Tampa, FL
During my B.S. in Computer Science, I gained foundational expertise in spatial computing and agent-based modeling through research under distinguished mentors:
- Worked with Dr. Edwin Michael on building a city-scale digital twin for Hillsborough County, developing agent-based models to simulate pandemic dynamics and inform public health policies
- Collaborated with Dr. Wanwan Li on automatic room mapping in augmented reality (AR) systems, focusing on view direction-driven SLAM algorithms for dynamic environments
- Explored neural scene representation networks for AR applications, optimizing spatial intelligence systems for real-time use cases
These experiences solidified my passion for understanding complex architectures and their applications in real-world scenarios. My contributions resulted in co-authored publications and practical tools that supported public health planning during the COVID-19 pandemic.
Recent Publications
Exploring Coding Spot: Understanding Parametric Contributions to LLM Coding Performance
Kim, D., Kim, M., Chun, Y., Park, C., Lim, H. arXiv preprint arXiv:2412.07113, 2024.
Read PaperExploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
Lee, Seungyoon, Dongjun Kim, Dahyun Jung, Chanjun Park, and Heui-Seok Lim. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 93–104, 2024.
Read PaperCitySEIRCast: an agent-based city digital twin for pandemic analysis and simulation
Bilal, S., Zaatour, W., Alonso Otano, Y., Saha, A., Newcomb, K., Kim, S., Kim, J., Ginjala, R., Groen, D., Michael, E. Complex & Intelligent Systems, 11(1), pp. 1-29, 2025.
Read Paper