How Utilitarian Are OpenAI’s Models Really? Replicating and Reinterpreting Pfeffer, Krügel, and Uhl (2025)

Himmelreich, Johannes

Johannes Himmelreich J.H.

Publication · 2026

How Utilitarian Are OpenAI’s Models Really? Replicating and Reinterpreting Pfeffer, Krügel, and Uhl (2025)

Johannes Himmelreich

arXiv preprint, 2026

Abstract

This paper replicates a prior study claiming that OpenAI’s reasoning model o1-mini produces more utilitarian responses than GPT-4o on moral dilemmas. The trolley problem finding does not hold: GPT-4o’s lower utilitarian response rate stems from safety refusals rather than philosophical commitments. When prompts ask “Is it morally permissible...?” instead of “Should I...?”, GPT-4o reaches 99% utilitarian responses. All models converge on utilitarian answers when prompt confounds are removed. The footbridge dilemma finding partially survives, though reasoning models frequently refuse to answer or provide non-utilitarian responses. The study concludes that single-prompt evaluations of LLM moral reasoning are unreliable and advocates for multi-prompt robustness testing as standard practice when assessing language model behavior.

PDF arXiv

All Publications