Performance of ChatGPT in the Portuguese National Residency Access Examination

Gonçalo Ferraz-Costa; Mafalda Griné; Manuel Oliveira-Santos; Rogério Teixeira

doi:10.20344/amp.22506

Authors

Gonçalo Ferraz-Costa * Shared co-authorship. Cardiology Department. Unidade Local de Saúde de Coimbra. Coimbra; Faculdade de Medicina. Universidade de Coimbra. Coimbra; Coimbra Institute for Clinical and Biomedical Research (iCBR). Coimbra.
Mafalda Griné * Shared co-authorship. Cardiology Department. Unidade Local de Saúde de Coimbra. Coimbra; Coimbra Institute for Clinical and Biomedical Research (iCBR). Coimbra.
Manuel Oliveira-Santos Cardiology Department. Unidade Local de Saúde de Coimbra. Coimbra; Coimbra Institute for Clinical and Biomedical Research (iCBR). Coimbra; Digital Health Study Group, Sociedade Portuguesa de Cardiologia. Lisboa.
Rogério Teixeira Cardiology Department. Unidade Local de Saúde de Coimbra. Coimbra; Faculdade de Medicina. Universidade de Coimbra. Coimbra; Digital Health Study Group, Sociedade Portuguesa de Cardiologia. Lisboa.

DOI:

https://doi.org/10.20344/amp.22506

Keywords:

Artificial Intelligence, Clinical Competence, Educational Measurement, Internship and Residency, Portugal

Abstract

ChatGPT, a language model developed by OpenAI, has been tested in several medical board examinations. This study aims to evaluate the performance of ChatGPT on the Portuguese National Residency Access Examination, a mandatory test for medical residency in Portugal. The study specifically compares the capabilities of ChatGPT versions 3.5 and 4o across five examination editions from 2019 to 2023. A total of 750 multiple-choice questions were submitted to both versions, and their answers were evaluated against the official responses. The findings revealed that ChatGPT 4o significantly outperformed ChatGPT 3.5, with a median examination score of 127 compared to 106 (p = 0.048). Notably, ChatGPT 4o achieved scores within the top 1% in two examination editions and exceeded the median performance of human candidates in all editions. Additionally, ChatGPT 4o’s scores were high enough to qualify for any specialty. In conclusion, ChatGPT 4o can be a valuable tool for medical education and decision-making, but human oversight remains essential to ensure safe and accurate clinical practice.

Downloads

References

Berşe S, Akça K, Dirgar E, Kaplan Serin E. The role and potential contributions of the artificial intelligence language model ChatGPT. Ann Biomed Eng 2024;52:130-3. DOI: https://doi.org/10.1007/s10439-023-03296-w

Liu M, Okuhara T, Chang X, Shirabe R, Nishiie Y, Okada H, et al. Performance of ChatGPT across different versions in medical licensing examinations worldwide: systematic review and meta-analysis. J Med Internet Res. 2024;26:e60807. DOI: https://doi.org/10.2196/60807

Knoedler L, Alfertshofer M, Knoedler S, Hoch CC, Funk PF, Cotofana S, et al. Pure wisdom or potemkin villages? A comparison of chatGPT 3.5 and ChatGPT 4 on USMLE step 3 style questions: quantitative analysis. JMIR Med Educ. 2024;10:e51148. DOI: https://doi.org/10.2196/51148

Malik A, Madias C, Wessler BS. Performance of ChatGPT-4o in the adult clinical cardiology self-assessment program. Eur Hear J - Digit Heal. 2024:ztae077. DOI: https://doi.org/10.1093/ehjdh/ztae077

Ribeiro JC, Villanueva T. The new medical licensing examination in Portugal. Acta Med Port. 2018;31:293-4. DOI: https://doi.org/10.20344/amp.10857

Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młynczak. Evaluation of the performance of GPT-3.5 and GPT-4 on the polish medical final examination. Sci Reports. 2023;13:1-13. DOI: https://doi.org/10.1038/s41598-023-46995-z

Alexandrou M, Mahtani AU, Rempakos A, Mutlu D, Ogaili AA, Gill GS, et al. Performance of ChatGPT on ACC/SCAI interventional cardiology certification simulation exam. JACC Cardiovasc Interv. 2024;17:1292-3. DOI: https://doi.org/10.1016/j.jcin.2024.03.012

Indran IR, Paranthaman P, Gupta N, Mustafa N. Twelve tips to leverage AI for efficient and effective medical question generation: a guide for educators using Chat GPT. Med Teach. 2024;46:1021-6. DOI: https://doi.org/10.1080/0142159X.2023.2294703