TY - JOUR
T1 - Chatbot Performance in Defining and Differentiating Palliative Care, Supportive Care, Hospice Care
AU - Kim, Min Ji
AU - Admane, Sonal
AU - Chang, Yuchieh Kathryn
AU - Shih, Kao swi Karina
AU - Reddy, Akhila
AU - Tang, Michael
AU - Cruz, Maxine De La
AU - Taylor, Terry Pham
AU - Bruera, Eduardo
AU - Hui, David
N1 - Publisher Copyright:
© 2024 American Academy of Hospice and Palliative Medicine
PY - 2024/5
Y1 - 2024/5
N2 - Context: Artificial intelligence (AI) chatbot platforms are increasingly used by patients as sources of information. However, there is limited data on the performance of these platforms, especially regarding palliative care terms. Objectives: We evaluated the accuracy, comprehensiveness, reliability, and readability of three AI platforms in defining and differentiating “palliative care,” “supportive care,” and “hospice care.” Methods: We asked ChatGPT, Microsoft Bing Chat, Google Bard to define and differentiate “palliative care,” “supportive care,” and “hospice care” and provide three references. Outputs were randomized and assessed by six blinded palliative care physicians using 0–10 scales (10 = best) for accuracy, comprehensiveness, and reliability. Readability was assessed using Flesch Kincaid Grade Level and Flesch Reading Ease scores. Results: The mean (SD) accuracy scores for ChatGPT, Bard, and Bing Chat were 9.1 (1.3), 8.7 (1.5), and 8.2 (1.7), respectively; for comprehensiveness, the scores for the three platforms were 8.7 (1.5), 8.1 (1.9), and 5.6 (2.0), respectively; for reliability, the scores were 6.3 (2.5), 3.2 (3.1), and 7.1 (2.4), respectively. Despite generally high accuracy, we identified some major errors (e.g., Bard stated that supportive care had “the goal of prolonging life or even achieving a cure”). We found several major omissions, particularly with Bing Chat (e.g., no mention of interdisciplinary teams in palliative care or hospice care). References were often unreliable. Readability scores did not meet recommended levels for patient educational materials. Conclusion: We identified important concerns regarding the accuracy, comprehensiveness, reliability, and readability of outputs from AI platforms. Further research is needed to improve their performance.
AB - Context: Artificial intelligence (AI) chatbot platforms are increasingly used by patients as sources of information. However, there is limited data on the performance of these platforms, especially regarding palliative care terms. Objectives: We evaluated the accuracy, comprehensiveness, reliability, and readability of three AI platforms in defining and differentiating “palliative care,” “supportive care,” and “hospice care.” Methods: We asked ChatGPT, Microsoft Bing Chat, Google Bard to define and differentiate “palliative care,” “supportive care,” and “hospice care” and provide three references. Outputs were randomized and assessed by six blinded palliative care physicians using 0–10 scales (10 = best) for accuracy, comprehensiveness, and reliability. Readability was assessed using Flesch Kincaid Grade Level and Flesch Reading Ease scores. Results: The mean (SD) accuracy scores for ChatGPT, Bard, and Bing Chat were 9.1 (1.3), 8.7 (1.5), and 8.2 (1.7), respectively; for comprehensiveness, the scores for the three platforms were 8.7 (1.5), 8.1 (1.9), and 5.6 (2.0), respectively; for reliability, the scores were 6.3 (2.5), 3.2 (3.1), and 7.1 (2.4), respectively. Despite generally high accuracy, we identified some major errors (e.g., Bard stated that supportive care had “the goal of prolonging life or even achieving a cure”). We found several major omissions, particularly with Bing Chat (e.g., no mention of interdisciplinary teams in palliative care or hospice care). References were often unreliable. Readability scores did not meet recommended levels for patient educational materials. Conclusion: We identified important concerns regarding the accuracy, comprehensiveness, reliability, and readability of outputs from AI platforms. Further research is needed to improve their performance.
KW - Artificial intelligence
KW - definitions
KW - hospice care
KW - palliative care
KW - supportive care
UR - http://www.scopus.com/inward/record.url?scp=85183556641&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85183556641&partnerID=8YFLogxK
U2 - 10.1016/j.jpainsymman.2024.01.008
DO - 10.1016/j.jpainsymman.2024.01.008
M3 - Article
C2 - 38219964
AN - SCOPUS:85183556641
SN - 0885-3924
VL - 67
SP - e381-e391
JO - Journal of pain and symptom management
JF - Journal of pain and symptom management
IS - 5
ER -