JOURNAL ARTICLE

Measuring Human-AI Value Alignment in Large Language Models

Hakim NorhashimJungpil Hahn

Year: 2024 Journal:   Proceedings of the AAAI/ACM Conference on AI Ethics and Society Vol: 7 Pages: 1063-1073

Abstract

This paper seeks to quantify the human-AI value alignment in large language models. Alignment between humans and AI has become a critical area of research to mitigate potential harm posed by AI. In tandem with this need, developers have incorporated a values-based approach towards model development where ethical principles are integrated from its inception. However, ensuring that these values are reflected in outputs remains a challenge. In addition, studies have noted that models lack consistency when producing outputs, which in turn can affect their function. Such variability in responses would impact human-AI value alignment as well, particularly where consistent alignment is critical. Fundamentally, the task of uncovering a model’s alignment is one of explainability – where understanding how these complex models behave is essential in order to assess their alignment. This paper examines the problem through a case study of GPT-3.5. By repeatedly prompting the model with scenarios based on a dataset of moral stories, we aggregate the model’s alignment with human values to produce a human-AI value alignment metric. Moreover, by using a comprehensive taxonomy of human values, we uncover the latent value profile represented by these outputs, thereby determining the extent of human-AI value alignment.

Keywords:
Computer science Value (mathematics) Natural language processing Artificial intelligence Machine learning

Metrics

3
Cited By
1.92
FWCI (Field Weighted Citation Impact)
0
Refs
0.84
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.