Measuring Human-AI Value Alignment in Large Language Models

Hakim Norhashim; Jungpil Hahn

doi:10.1609/aies.v7i1.31703

ScienceGate Book Chapters

JOURNAL ARTICLE

Measuring Human-AI Value Alignment in Large Language Models

Hakim Norhashim Jungpil Hahn

Year: 2024 Journal: Proceedings of the AAAI/ACM Conference on AI Ethics and Society Vol: 7 Pages: 1063-1073

DOI: 10.1609/aies.v7i1.31703

Get Full-Text PDF Get Analytical Report

Abstract

This paper seeks to quantify the human-AI value alignment in large language models. Alignment between humans and AI has become a critical area of research to mitigate potential harm posed by AI. In tandem with this need, developers have incorporated a values-based approach towards model development where ethical principles are integrated from its inception. However, ensuring that these values are reflected in outputs remains a challenge. In addition, studies have noted that models lack consistency when producing outputs, which in turn can affect their function. Such variability in responses would impact human-AI value alignment as well, particularly where consistent alignment is critical. Fundamentally, the task of uncovering a model’s alignment is one of explainability – where understanding how these complex models behave is essential in order to assess their alignment. This paper examines the problem through a case study of GPT-3.5. By repeatedly prompting the model with scenarios based on a dataset of moral stories, we aggregate the model’s alignment with human values to produce a human-AI value alignment metric. Moreover, by using a comprehensive taxonomy of human values, we uncover the latent value profile represented by these outputs, thereby determining the extent of human-AI value alignment.

Keywords:

Computer science Value (mathematics) Natural language processing Artificial intelligence Machine learning

Metrics

Cited By

1.92

FWCI (Field Weighted Citation Impact)

Refs

0.84

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Measuring Human-AI Value Alignment in Large Language Models

Abstract

Metrics

Citation History

Topics

Related Documents

Social Value Alignment in Large Language Models

Diverse Human Value Alignment for Large Language Models via Ethical Reasoning

Heterogeneous Value Alignment Evaluation for Large Language Models

Benchmarking Multi-National Value Alignment for Large Language Models

Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models