Cognitive biases in healthcare: How generative AI could help improve treatment

- FR- EN

Human cognitive biases can particularly affect decision-making when speed is of the essence, such as when lives are at stake in a medical emergency. A research team from Inserm and the University of Bordeaux has tested an advanced method of generative artificial intelligence (AI) [1] , trained with data from patient records corresponding to 480 000 entries to the Bordeaux University Hospital Emergency Department. Its findings, presented at the Machine Learning for Health symposium in Vancouver, and published in Proceedings of Machine Learning Research , show that the AI tested is likely to reproduce and measure caregiver biases relating to patient gender during triage. These results form a case study of how new generative AI algorithms can be used to identify and understand human cognitive biases.

In emergency care settings that demand rapid decision-making, human cognitive biases, particularly "judgment- biases, can critically impact medical decisions and patient prognosis. These "cognitive shortcuts- occur when people are required to form opinions or make decisions based on incomplete or over-generalized information. Decision-making can therefore be unconsciously affected by these biases (related, for example, to sex/gender, age, ethnicity, etc.), and lead to under or overestimating the severity of a patient’s condition.

So how can we better identify these biases and reduce their impact? One answer could be found in AI and particularly generative AI known as "large language models- (LLMs), which are capable of imitating human decision-making thanks to their mastery of human language (such as ChatGPT). These models are in fact capable of effectively understanding the "free-text- [2] that accounts for a large proportion of the clinical data collected by healthcare staff, particularly in hospital emergency departments.

A team led by Inserm Research Director Emmanuel Lagarde [3] at the Bordeaux Population Health Research Center (Inserm/University of Bordeaux), was interested in the potential of these LLMs to detect and quantify gender bias in a rapid decision-making setting. The context used to evaluate this method was the triage [4] of patients in emergency departments. Accurate triage is critical: underestimating an emergency in which treatment is then delayed could worsen prognosis. However, overestimating the severity of a patient’s condition could lead to the overuse of resources, which can be particularly harmful if many other patients are also requiring attention.

The scientists used an innovative approach in which AI was trained to triage patients based on the texts contained in their medical record, thereby reproducing any cognitive biases of the nursing staff having performed this triage. The data used for this training comprised over 480 000 entries to the Emergency Department of Bordeaux University Hospital between January 2013 and December 2021.

Once trained, the model was capable of assigning a triage score (evaluating the severity of the patient’s condition) based on reading a record, as the nurse would do. The record was then altered in order to change patient gender references in the clinical texts, and a new score was assigned by the model. It was the difference between these two scores, one produced from the original record and the other from the altered record, which made it possible to estimate the cognitive bias.

The results showed the AI to be significantly biased against women. Based on identical clinical records, the severity of their conditions tended to be underestimated in relation to those of men (with around 5% classified as "less critical- and 1.81% classified as "more critical-). Conversely, the severity of the men’s conditions tended to be slightly overestimated (with 3.7% deemed "more critical" versus 2.9% deemed "less critical"). This bias increased in line with the level of inexperience of the nursing staff.

"This research shows how large language models can help detect and anticipate human cognitive biases - in this case regarding the goal of fairer and more effective management of medical emergencies," explains Lagarde. "The method used shows that, in this context, LLMs are able to identify and reproduce the biases that guide human decision-making from the clinical data collected by nursing staff," adds Ariel Guerra-Adames, doctoral candidate and first author of this research [5] .

The team will now go on to study the evaluation of biases related to other patient characteristics (age, ethnic group). Ultimately, it should also be possible to refine the system with the introduction of non-verbal variables (facial expressions, tone of voice) which, while not necessarily appearing in the written data, could nevertheless be critical in decision-making.

[1] Generative artificial intelligence is an AI system that is able to create content, be it text, images, sounds, videos or other forms of data.

[2] In a medical context, free-text refers to information recorded as unstructured text, i.e. without rigid organization or a predefined format. This includes texts produced directly by healthcare professionals to describe observations, diagnoses, treatments or medical history, often in natural language.

[3] In collaboration with Cédric Gil-Jardiné from the University of Bordeaux Hospital Emergency Department and Marta Avalos from the Inria center at the University of Bordeaux

[4] Triage in medical emergencies consists of classifying patients by the severity of their condition, so as to optimize the order of care and save as many people as possible. It is performed by dedicated nurses who collect various information from each patient (reason for visit, vital signs, medical history, etc.) and assign an "emergency- score according to a validated scale.

[5] Ariel Guerra-Adames received the Best Paper Award when presenting this work at the Machine Learning for Health Symposium in Vancouver.