Chatbot Underperformance in Biology and Image-Based Questions in Medical Education

Rizzi, Joyce Santana [UNESP]; Requena, Lorraine Silva [UNESP]; Bicudo, Angelica Maria; Filho, Pedro Tadao Hamamoto [UNESP]; Ferretti, Renato [UNESP]

doi:10.1080/28338073.2025.2596550

Chatbot Underperformance in Biology and Image-Based Questions in Medical Education

dc.contributor.author	Rizzi, Joyce Santana [UNESP]
dc.contributor.author	Requena, Lorraine Silva [UNESP]
dc.contributor.author	Bicudo, Angelica Maria
dc.contributor.author	Filho, Pedro Tadao Hamamoto [UNESP]
dc.contributor.author	Ferretti, Renato [UNESP]
dc.date.accessioned	2026-04-09T12:45:33Z
dc.date.issued	2025-12-04
dc.description.abstract	AI chatbots have demonstrated variable performances across biological disciplines in medical education, particularly in multiple-choice and image-based assessments. However, their performance in addressing discipline-specific and image-based questions in biology remains unexamined. This study evaluated the accuracy and reliability of chatbots in answering biological questions from the Progress Test, a medical assessment applied across ten universities. We conducted an observational cross-sectional study by inputting 180 questions into the chatbots and categorising them according to morphology, function, and aggression. Each question was assessed for correctness across multiple chatbot attempts, and logistic regression and hierarchical clustering were applied to identify performance patterns. Although the chatbots answered functional and morphological questions accurately (from 85% (Gemini) to 91.7% (ChatGPT-4)), their accuracy decreased significantly for questions involving biological aggression and visual content. The agreement between chatbot responses remained weak, and Co-pilot displayed the lowest concordance. Chatbot accuracy decreased significantly in aggression-related disciplines and image-based questions. Logistic regression confirmed that the presence of images reduced the odds of correct answers by up to 17.6% (ChatGPT-4). Hierarchical clustering distinguished the two distinct response patterns, further validating these findings. These results highlight the potential of chatbots in medical education while emphasising their limitations in handling image-based and aggression-related content.
dc.description.affiliation	Laboratory of Muscle Biology, Department of Structural and Functional Biology, Institute of Bioscience of Botucatu, Sao Paulo State University (UNESP), Botucatu, Brazil
dc.description.affiliation	School of Medical Sciences, University of Campinas (UNICAMP), Campinas, Brazil
dc.description.affiliation	Botucatu Medical School, Department of Neurosciences and Mental Health, São Paulo State University (UNESP), Botucatu, Brazil
dc.description.affiliationUnesp	Laboratory of Muscle Biology, Department of Structural and Functional Biology, Institute of Bioscience of Botucatu, Sao Paulo State University (UNESP), Botucatu, Brazil
dc.description.affiliationUnesp	Botucatu Medical School, Department of Neurosciences and Mental Health, São Paulo State University (UNESP), Botucatu, Brazil
dc.identifier	https://app.dimensions.ai/details/publication/pub.1195793335
dc.identifier.dimensions	pub.1195793335
dc.identifier.doi	10.1080/28338073.2025.2596550
dc.identifier.issn	2161-4083
dc.identifier.issn	2833-8073
dc.identifier.pmcid	PMC12679838
dc.identifier.uri	https://hdl.handle.net/11449/320929
dc.publisher	Taylor & Francis
dc.relation.ispartof	Journal of CME; n. 1; v. 14; p. 2596550
dc.rights.accessRights	Acesso aberto	pt
dc.rights.sourceRights	oa_all
dc.rights.sourceRights	gold
dc.source	Dimensions
dc.title	Chatbot Underperformance in Biology and Image-Based Questions in Medical Education
dc.type	Artigo	pt
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	a3cdb24b-db92-40d9-b3af-2eacecf9f2ba
relation.isOrgUnitOfPublication	ab63624f-c491-4ac7-bd2c-767f17ac838d
relation.isOrgUnitOfPublication.latestForDiscovery	a3cdb24b-db92-40d9-b3af-2eacecf9f2ba
unesp.campus	Universidade Estadual Paulista (UNESP), Instituto de Biociências, Botucatu	pt
unesp.campus	Universidade Estadual Paulista (UNESP), Faculdade de Medicina, Botucatu

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: Chatbot Underperformance in Biology and Image-Based Questions in Medical Education.pdf
Tamanho:: 869.62 KB
Formato:: Adobe Portable Document Format
Descrição:

Baixar

Coleções

Botucatu - IBB - Instituto de Biociências
Botucatu - FMB - Faculdade de Medicina