User language distorts ChatGPT information on armed conflicts, study shows

by

Editors' notes

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

trusted source

proofread

Quantitative results on the Arabic/Hebrew dyad. (a) Geographic distribution of airstrikes. (b) Number of recorded fatalities (military plus civilians) for each event. Blue are the fatalities recorded when asking in Hebrew, orange the recorded fatalities when asking in Arabic. The errorbars denote the standard deviation of the mean, evasive answers were excluded. (c) Number of total fatalities (fat.), injured (inj.), number of killed civilians averaged over all events (civ.). Credit: Journal of Peace Research (2024). DOI: 10.1177/00223433241279381

When asked in Arabic about the number of civilian casualties killed in the Middle East conflict, ChatGPT gives significantly higher casualty numbers than when the prompt was written in Hebrew, as a new study by the Universities of Zurich and Constance shows. These systematic discrepancies can reinforce biases in armed conflicts and encourage information bubbles.

Every day, millions of people engage with and seek information from ChatGPT and other large language models (LLMs). But how are the responses given by these models shaped by the language in which they are asked? Does it make a difference whether the same question is asked in English or German, Arabic or Hebrew?

Christoph Steinert, a postdoc at the Department of Political Science of the University of Zurich (UZH), and physicist Daniel Kazenwadel from the University of Konstanz, Germany, have now conducted a systematic analysis of this question. The results are published in the Journal of Peace Research.

Quantitative results on the Kurdish/Turkish dyad. (a) Geographic distribution of airstrikes. (b) Number of recorded fatalities (military plus civilians) for each airstrike. Red are the fatalities recorded when asking in Turkish, green the recorded fatalities when asking in Kurdish. (c) Number of total fatalities (fat.), injured (inj.), number of killed civilians averaged over all events (civ.). Credit: Journal of Peace Research (2024). DOI: 10.1177/00223433241279381

Information shapes armed conflicts

The researchers explored the issue in the contentious context of the Israeli–Palestinian and Turkish–Kurdish conflicts. They used an automated query procedure to ask ChatGPT the same questions in different languages. For example, the researchers repeatedly prompted ChatGPT in Hebrew and Arabic about the number of people killed in 50 randomly chosen airstrikes, including the Israeli attack on the Nuseirat refugee camp on 21 August 2014.

"We found that ChatGPT systematically provided higher fatality numbers when asked in Arabic compared to questions in Hebrew. On average, fatality estimates were 34% higher," Steiner says. When asked about Israeli airstrikes on Gaza, ChatGPT mentions civilian casualties more than twice as often and killed children six times more often in the Arabic version. The same pattern emerged when the researchers queried the chatbot about Turkish airstrikes against Kurdish targets and asked the same questions in Turkish and Kurdish.

The phrase "The first casualty when war comes is truth" is often attributed to U.S. senator Hiram Johnson (1866–1945). Throughout history, selective information policies, propaganda and misinformation have influenced numerous armed conflicts. What sets current conflicts apart is the availability of an unprecedented number of information sources—including ChatGPT.

Exaggerated in one language, embellished in the other

The results show that ChatGPT provides higher casualty figures when asked in the language of the attacked group. In addition, ChatGPT is more likely to report on children and women killed in the language of the attacked group, and to describe the airstrikes as indiscriminate. "Our results also show that ChatGPT is more likely to deny the existence of such airstrikes in the language of the attacker," adds Steinert.

The researchers believe this has profound social implications, as ChatGPT and other LLMs play an increasingly important role in information dissemination processes. Integrated in search engines such as Google Gemini or Microsoft Bing, they fundamentally shape the information provided on various topics through search queries.

"If people who speak different languages obtain different information through these technologies, it has a crucial influence on their perception of the world," Christoph Steinert says. Such language biases could lead people in Israel to perceive airstrikes on Gaza as causing fewer casualties based on information provided by LLMs, compared to Arabic speakers.

Unlike traditional media, which may also distort the news, the language-related systematic biases of LLMs are difficult for most users to detect. "There is a risk that the increasing implementation of large language models in search engines reinforces different perceptions, biases and information bubbles along linguistic divides," says Steinert, which he believes could in the future fuel armed conflicts such as in the Middle East.

More information: Christoph Valentin Steinert et al, How user language affects conflict fatality estimates in ChatGPT, Journal of Peace Research (2024). DOI: 10.1177/00223433241279381

Provided by University of Zurich