Open access peer-reviewed article
This Article is part of the special issue ADVANCED BIOMEDICAL SENSING AND MACHINE LEARNING TECHNIQUES IN HUMAN HEALTH MONITORING led by Prof. Lulu Wang from Shenzhen Technology University, China
Article metrics overview
316 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: August 2023
Date of publication: September 2023
DoI: 10.5772/dmht.19
copyright: ©2023 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Table of contents
ChatGPT-4, BARD, and YOU.com are AI large language models (LLM) developed by OpenAI based on the GPT-3-4 architecture and Google. They were trained using unsupervised learning, which allows them to learn from vast amounts of text data without requiring explicit human labels. ChatGPT-4 was exposed to training information up to September 2021. By presenting prompts (queries) to ChatGPT-4, BARD, and YOU.com, including a typical case presentation (vignette) of a new patient with squamous cell tonsillar cancer, we uncovered several specific issues that raise concerns for the current application of this early phase of advanced LLM AI technology for clinical medicine. By prompting and comparing responses of three different LLMs (ChatGPT-4, BARD, and YOU.com) to identical prompts, we reveal several flaws in each AI that, if taken as factual, would affect clinical therapeutic suggestions and possible survival. The presented clinical vignette of a patient with newly diagnosed tonsillar cancer is presented to three LLMs readily available for free trial allowing comparison of results. We observed frequent changing responses to unchanging prompts over just hours and days within the same and between LLMs, critical errors of guideline-recommended drug therapy, and noted that several AI-supplied references presented by the AIs are bogus AI-generated references whose DOI and or PMID identifiers were either nonexistent or led to completely irrelevant manuscripts on other subjects.
AI
ChatGPT-4
BARD
YOU.com
OpenAI
neural
tonsillar
cancer
Author information
There have been many so-called AI winters in the past several years. These are periods when the hype of artificial intelligence (AI) (or augmented intelligence to some) has fizzled in disappointing failure. Presumably, we are now in a newer and deeper learning period for AI. Newscasters have put the world on fire with excitement and explosive, if not irrational, exuberance for OpenAi’s ChatGPT [1–3]. OpenAI claims human-level performance in academic and professional settings, which includes passing the bar exam [4] with scores in the top 10% of test takers. GPT-4 is touted to have improved performance in understanding the nuances of language, including its content, tone, and meaning. We, therefore, resolved to evaluate the accuracy of medical information gleaned from not only ChatGPT-4 but also two of its competitors, BARD and YOU.com. We understand from many sources generalizations stating that ChatGPT-4, though universally available for a free trial, should not be used for clinical medical purposes at this time. However, AI is currently being used to help interpret mammograms [5], chest CTs [6], chest X-rays [7], pathology slides [8] and the like, so we investigated using ChatGPT-4, BARD, and YOU.com AIs for clinical decision-making for an accurate clinical vignette of a patient with a new diagnosis of squamous cell cancer of the tonsil. Instead of generalizations, we attempted to uncover real-life and detailed weaknesses of ChatGPT-4 and other AIs and included the available Large Language Models, AIs, BARD, and YOU.com for comparison. We anticipated that though most current AIs decline to offer patients specific medical advice, they do suggest therapeutic solutions to users identified as physicians. This manuscript will take a deep dive into a new case of tonsillar cancer predicated on the necessity of knowing the patient’s clinical tumor staging results. For those wanting to familiarize themselves with general therapeutic principles of head and neck cancers, including squamous cell cancer of the tonsil, we provide an excellent general review of the subject [9].
The oral mucosa is exposed to the infectious human papillomavirus (HPV) generally from vaginal-oral sex (22.4%), anal-oral sex (4.4%), excessive alcohol (3.5%), and unknown causes (70%). Viral persistence can lead to a progression to invasive squamous cell cancer of the oropharynx. Persistent HPV infections may progress to invasive cancer within ten years. It is estimated that 60% of all oropharyngeal cancers are caused by HPV infections. The most common high-risk HPV types are 16 and 18, but also include the less prevalent 31, 33, 45, 52, and 58 types However, most infections are naturally cleared in 12 to 24 months. The incidence of HPV-associated oropharyngeal squamous cell carcinoma (OPSCC) is quickly increasing and rarely discussed among the general population. In the United States, the incidence of HPV-positive OPSCC was 4.62 per 100,000 persons, and it now represents the sixth most common incidence of nonskin cancer among white male patients younger than 65 years. HPV-associated tonsillar tumors carry a much better prognosis than tumors negative for the virus. In the United States, about 80% of oropharyngeal cancer cases in men and about 65% of cases in women are attributable to smoking and tobacco use.
To corroborate inaccurate results from the three AI sources, general Google searches and/or procurement of published manuscripts were used for verification purposes.
Before starting, we asked ChatGPT-4 through the following prompt (MY PROMPT 1) to explain itself.
( MY PROMPT 1 the first query submitted to ChatGPT-4 follows):Where can I find a simple summary explanation of how ChatGPT-4 works?
ChatGPT-4 returned the following textual response to MY PROMPT 1:
A simple summary of how ChatGPT-4 works can be found below:
ChatGPT-4 is an AI language model developed by OpenAI based on the GPT-4 architecture. It is trained using unsupervised learning, which allows it to learn from vast amounts of text data without requiring explicit human labels.
The model works in two main steps: pre-training and fine-tuning. In the pre-training phase, ChatGPT-4 learns to predict the next word in a sentence from a large dataset containing parts of the internet. This helps it understand grammar, facts about the world, and some reasoning abilities. However, it might also learn some biases present in the training data.
In the fine-tuning phase, the model is refined on a narrower dataset with specific tasks and goals, often generated with the help of human reviewers. This phase helps ChatGPT-4 understand the context and respond more accurately to user inputs.
ChatGPT-4 uses a deep neural network called a Transformer, which has a large number of layers and attention mechanisms. This enables it to process and generate human-like text effectively. When given a text prompt or a question, ChatGPT-4 generates a response by predicting the most likely sequence of words based on the knowledge it has acquired during training.
Recalling that ChatGPT-4 might generate plausible-sounding but incorrect or nonsensical answers (RS interjects termed neural net hallucinations by some software experts) (10), as its primary goal is to predict text and not to reason deeply. It is essential to verify the information provided by the model and treat it as a starting point rather than a definitive answer . We have also noted that, at times, its responses may simulate what psychiatrists term confabulation or repeating nonsense without malcontent.
END
We then tested ChatGPT-4, BARD, and YOU.com with submitted prompts (some identical) to the different AIs and evaluated the accuracy and possible consequences of inaccurate and inappropriate responses. These prompts had been pretested with several similar textual versions of prompts that appeared to get a comprehensive (but rarely accurate) response from all of the AIs.
The following is a representative sampling of the variation of prompt responses concentrating on ChatGPT-4.
MY PROMPT 2-A submitted to ChatGPT-4 Could you stage the following tonsillar tumor? A 64-year-old Caucasian gentleman had noted a fullness in the left throat and consulted an ENT physician. CT of the neck with contrast revealed a slight interval increase in the size of the enhancing lesion involving the left palatine tonsil from the prior CT, consistent with the patient’s known squamous cell carcinoma of the left tonsil. There was a slight interval increase in the size of the enlarged left metastatic level IIA cervical lymph node containing solid and cystic components. This node abuts the posterior margin of the left submandibular gland. CT with the contrast of the chest, abdomen, and pelvis revealed the absence of metastatic disease. The patient underwent a transoral robotic resection (TORS) of the oropharyngeal tonsillar lesion and a left neck dissection. Pathology revealed a 2.7 cm left tonsillar squamous cell carcinoma absent special features. Notably, the tumor was HPV-associated, and specimen margins were negative for tumor; however, there were close margins: 1–2 mm (deep lateral margin at the inferior aspect of tumor at 2 mm (inferior mucosa), 3 mm (deep inferomedial) and there was an additional 3–4 mm close margin. There was an absence of lymphovascular or perineural invasion. A level II–IV left cervical lymph node dissection revealed 1 of 32 removed lymph nodes involved with the tumor. The abnormal node was at level II, 4 cm in size, and contained SCC without extranodal extension.
END
ChatGPT-4 returned the following textual response to MY PROMPT 2-A:
Based on the provided information, we can stage the tonsillar tumor using the TNM (Tumor, Node, Metastasis) classification system by the American Joint Committee on Cancer (AJCC).
Tumor (T): The primary tumor is 2.7 cm in size and was in the left palatine tonsil. According to the AJCC 8th Edition, this would be classified as T2 (T2: tumor >2 cm but ≤4 cm in greatest dimension).
Node (N): There is one involved lymph node at level II, which is 4 cm in size and contains squamous cell carcinoma (SCC) without extranodal extension (ENE). This would be classified as N1 (N1: Metastasis in a single ipsilateral lymph node, ≤6 cm in greatest dimension, ENE (–)). (Here, ChatGPT-4 uses the AJCC 8th Ed system, that is HPV-positive to define N1 GT ).
Metastasis (M): CT scans with contrast of the chest, abdomen, and pelvis revealed an absence of metastatic disease. This would be classified as M0 (M0: No distant metastasis).
So, the overall stage of the tonsillar tumor based on the TNM classification system would be .Stage IVA (T2 N1 M0)
END
WRONG
The correct answer is Clinical Stage I (T2N1M0)
MY PROMPT 2-B Are you sure this is an AJCC 8th ED TNM Stage IVA tonsillar tumor?
ChatGPT-4 Response to 2-B
I apologize for the confusion in my previous response. Upon closer evaluation, the correct staging of the tonsillar tumor should be:
Tumor (T): T2 (T2: tumor >2 cm but ≤4 cm in greatest dimension)
Node (N): N1 (N1: Metastasis in a single ipsilateral lymph node, ≤6 cm in greatest dimension, ENE (–)) Metastasis (M): M0 (M0: No distant metastasis)
According to the AJCC 8th Edition for Oropharyngeal Cancer (p16-positive), the correct overall stage of this tonsillar tumor should be Stage II (T2 N1 M0) and not Stage IVA as previously mentioned. I apologize for the confusion, and I hope this clears up any misunderstandings.
END
WRONG
Correct answer is Clinical Stage I (T2N1M0)
MY PROMPT 3-A This prompt is identical to the prior MY PROMPT 2-A (see above)
ChatGPT-4 RESPONSE to PROMPT 3-B (This is the correct response for staging)
Based on the information you provided and using the American Joint Committee on Cancer (AJCC) 8th Edition TNM staging system for oropharyngeal squamous cell carcinoma (as of September 2021, which is when my knowledge was last updated), we can break down the tumor staging as follows: T (Tumor): The tumor size is 2.7 cm and is in the left palatine tonsil. It is mentioned that the specimen margins were negative for the tumor. There’s no mention of the tumor invading nearby structures beyond the tonsil. Given this information, it could be classified as T2 (a tumor more than 2 cm but not more than 4 cm in greatest dimension).
N (Nodes): One of 32 lymph nodes is involved with the tumor, and this lymph node is located at level II and measures 4 cm in size. This might be classified as N2b (a single ipsilateral node larger than 3 cm but not larger than 6 cm). M (Metastasis): The information states that CT scans of the chest, abdomen, and pelvis revealed the absence of metastatic disease, which could classify as M0 (no distant metastasis). Since the tumor is HPV-associated , this can also impact staging. The AJCC 8th edition has separate staging criteria for HPV-positive oropharyngeal tumors compared to HPV-negative tumors. Putting this together, the tumor might be classified as T2 N2b M0 according to the AJCC 8th edition. This could correspond to:Stage I for HPV-positive oropharyngeal cancer or Stage III/IVA for HPV-negative oropharyngeal cancer , depending on additional factors.
However, as mentioned before, it is important to recognize that I am an AI and not capable of making clinical decisions or diagnoses. Staging of cancer should be conducted by a qualified healthcare professional who can evaluate all of the clinical data. Also, the TNM staging criteria may undergo updates, so it’s important to consult the latest guidelines.
END
Correct for clinical stage I; however, N stage is wrong
The given N description is incorrect. It should be N1, not N2b
Interestingly, the Correct answer is Clinical Stage I for HPV+; however correct N designation is (T2 N1 M0), not N2b.
ChatGPT-4 MYPROMPT 2-C, which is identical to MYPROMPT 2-A and MYPROMPT 2-B
MY PROMPT RESPONSE 2-C
Based on this information, the T category could likely be T1 (tumor ≤4 cm in most significant dimension).
N (Nodes): One of 32 lymph nodes is involved with the tumor, and this lymph node is located at level II and is 4 cm in size. This might be classified as N2b (single ipsilateral node larger than 3 cm but not larger than 6 cm). M (Metastasis): The information states that there is no distant metastasis, which would classify as M0. Using the TNM categories and considering the tumor is HPV-positive, this tumor could be classified as T1N2bM0. According to the AJCC 8th Edition for HPV-positive oropharyngeal cancers, this .corresponds to Stage II
Please note that the clinical staging of a tumor should be performed by a qualified healthcare professional who has access to all the relevant data and can interpret it in the context of the most up-to-date guidelines and the patient’s survival clinical picture. Always consult a medical professional for definitive cancer staging and treatment recommendations.
END
WRONG
Correct answer is Clinical Stage I (T2N1M0)
MY PROMPT 3 Are you absolutely sure this is an AJCC 8th ED TNM Stage IVA tonsillar tumor?
ChatGPT-4 Response 3
I apologize for the confusion in my previous response. Upon closer evaluation, the correct staging of the tonsillar tumor should be:
Tumor (T): T2 (T2: tumor >2 cm but ≤4 cm in greatest dimension)
Node (N): N1 (N1: Metastasis in a single ipsilateral lymph node, ≤6 cm in greatest dimension, ENE (–)) Metastasis (M): M0 (M0: No distant metastasis)
According to the AJCC 8th Edition for Oropharyngeal Cancer (p16-positive), the correct overall stage of this tonsillar tumor should be Stage II (T2 N1 M0) and not Stage IVA as previously mentioned. I apologize for the confusion, and I hope this clears up any misunderstandings.
END
WRONG
Correct answer is Clinical Stage I (T2N1M0)
MY PROMPT 4 Let’s try again; I wonder whether this tumor is Stage I or Stage II.
ChatGPT-4 Response
I apologize for the confusion in my previous responses. Based on the provided information and considering the tumor is HPV-associated (p16-positive), the correct staging according to the AJCC 8th Edition for Oropharyngeal Cancer should be:
Tumor (T): T2 (T2: tumor >2 cm but ≤4 cm in greatest dimension)
Node (N): N1 (N1: Metastasis in a single ipsilateral lymph node, ≤6 cm in greatest dimension, ENE (–))
Metastasis (M): M0 (M0: No distant metastasis)
The correct overall stage of the tonsillar tumor for HPV-associated oropharyngeal cancer should be Stage I (T2 N1 M0). The staging system for HPV-associated oropharyngeal cancer differs from the staging for non-HPV-associated cancers, which is why my previous responses were incorrect. I apologize for the confusion and any inconvenience this may have caused.
END
Correct answer is Clinical Stage I (T2N1M0)
Considering BARD PROMPTS and its responses
BARD PROMPT Could you stage the following tonsillar tumor? A 64-year-old Caucasian gentleman had noted a fullness in the left throat and consulted an ENT physician. CT of the neck with contrast revealed a slight interval increase in the size of the enhancing lesion involving the left palatine tonsil from the prior CT, consistent with the patient’s known squamous cell carcinoma of the left tonsil. There was a slight interval increase in the size of the enlarged left metastatic level IIA cervical lymph node containing solid and cystic components. This node abuts the posterior margin of the left submandibular gland. CT with contrast of the chest, abdomen, and pelvis revealed the absence of metastatic disease. The patient underwent a transoral robotic resection (TORS) of the oropharyngeal tonsillar lesion and a left neck dissection. Pathology revealed a 2.7 cm left tonsillar squamous cell carcinoma absent special features. Notably, the tumor was HPV-associated, and specimen margins were negative for tumor; however, there were close margins: 1–2 mm (deep lateral margin at the inferior aspect of tumor at 2 mm (inferior mucosa), 3 mm (deep inferomedial) and there was an additional 3–4 mm close margin. There was an absence of lymphovascular or perineural invasion. A level II–IV left cervical lymph node dissection revealed 1 of 32 removed lymph nodes involved with the tumor. The abnormal node was at level II, 4 cm in size, and contained SCC without extranodal extension.
BARD RESPONSE
Sure, I can help you stage the tonsillar tumor.
The patient’s tumor is 2.7 cm in size, and there is one positive lymph node that is 4 cm in size. The lymph node does not have any extranodal extension. The patient does not have any distant metastasis.
The TNM staging system is used to stage cancer. The TNM system classifies cancer based on the size of the tumor (T), the extent of lymph node involvement (N), and the presence of distant metastasis (M).
In this case, the patient’s tumor is T2 (2.7 cm). There is one positive lymph node, which is classified as N1a. There is no distant metastasis, so the patient’s stage is M0.
The palatine’s stage is, therefore, T2N1aM0, which is a stage IIA cancer.
The patient’s tumor is HPV-associated, which means that it is caused by the human papillomavirus. HPV-associated tumors are typically less aggressive than tumors that are not HPV-associated.
The patient’s margins were close, which means that there was a small amount of tumor that was not removed with the surgery. This can increase the risk of recurrence.
The patient’s lymph node dissection was negative for other involved lymph nodes, which is good news. This means that the cancer has not spread to other lymph nodes in the neck.
The patient’s survival prognosis is good. Stage IIA cancers are typically very treatable, and the patient’s tumor is HPV-associated, which also makes it less aggressive.
However, the patient’s close margins increase the risk of recurrence, so the patient will need to be monitored closely for any signs of cancer returning.
END
WRONG
Correct answer is Clinical Stage I (T2N1M0)
YOU.com
YOU.com PROMPT Same prompt as given to BARD above.
YOU.com RESPONSE
Based on the information provided, the tonsillar tumor would be staged as follows:
T: T2 (size of 2.7 cm) N: N1 (involvement of one lymph node at level II) M: M0 (absence of metastatic disease)
Overall stage: Stage II (based on the TNM staging system)
END
WRONG
Correct answer is Clinical Stage I (T2N1M0)
The same MYPROMPT 2A was fed to BARD and YOU.com, and they both answered stage II. It is recalled that the correct answer is stage I.
ChatGPT-4 initially responded with three incorrect, followed by two correct staging evaluations (Stage IVA/III: IVA: II: I: I). It appears that responses to the same prompt varied even though the AJCC 8th Edition has been around since 2017–2018, and that edition defined the newest staging criteria for tonsillar cancer based on tumor HPV+ or HPV- status.
First, try Stage IVA Wrong Second try Stage III/IVA Wrong Third try Stage II Wrong Fourth try Stage I Correct Fifth try Stage I Correct Stage II Wrong Stage II Wrong ∗NOTE: the above variation of responses depended on the day and even the hour the same prompt was fed to ChatGPT-4 and, in some instances, the follow-up questioning/prompt of ChatGPT-4 after reading an incorrect response. Some have claimed that LLMs can hallucinate [10]. Both BARD and YOU.com were correct for the nodal stage of N1 but incorrect for the final clinical stage.
When we rely on today’s LLM AI in medicine, in some respect, we are dealing with a multitude of unknowns within a black box, and it is close to impossible for users to apply logical reason(s) for the discrepancies and inaccuracies delivered by LLM AIs. After all, AJCC’s 7th and 8th editions were fully available for AI access during their essential training period years before our prompts were submitted. The staging errors are unacceptable because all therapeutic recommendations are stage related. Were such gross errors a manifestation of what some AI scientists call a neural net hallucination [10], a sophisticated apology for a wrong answer?
The true AJCC 8th ED [11] stage for this patient’s tumor is
We conjecture that ChatGPT-4 may have partially ignored or misinterpreted our initial vignette, MY PROMPT 2-A, revealing that the patient’s tumor was HPV-associated, but we are unsure. The mechanistic complexity of multilayered neural network “thinking” must be left to the experts [12].
Tonsillar squamous cell cancer is a successfully treatable disease when diagnosed early and associated with the human papillomavirus (HPV positivity). However, an advanced-stage illness, especially in the presence of HPV-negativity, continues to have a poor prognosis. AJCC 8th Ed Stage IV disease that is HPV-negative carries a 5-year survival of only 50% or less, while 5-year survival of HPV-positive tonsillar cancer reaches 65% [13]. In addition to the poor prognosis of advanced-Stage IV diseases, treatment of such high stages is associated with increased patient morbidity. Toxic therapeutic effects may include sore, dry mouth, dental issues, difficulty chewing and swallowing, excessive mucous accumulation, voice changes, coughing/choking, etc. [14]. Surgery and chemoradiotherapy alike can lead to significant patient impairment necessitating reliance on tracheostomy and gastrostomy tube placement to prevent aspiration and maintain nutrition. This can have a pronounced negative impact on a patient’s quality of life and, for some, may not be reversible [14, 15] Figures 4–8.
Recall that several prompt submissions were required for ChatGPT-4 to present the correct AJCC 8th Ed
Figure 2A and B reveal that the 5-year survival for Stage IV (p16-) tonsillar cancer as staged by the AJCC 7th Ed or 8th Ed is approximately 58%. Sharing such inaccurate staging with a patient could precipitate undue high anxiety.
It is essential to understand that ChatGPT, BARD, and YOU.com are types of large language model (LLM). ChatGPT is based on OpenAI’s GPT-3 language model and was trained on a massive amount of data to generate human-like text in response to prompts and only trained to guess words and does not contain any knowledge about medical or any other domains. The LLM is used to power its AI chatbots and to interpret natural language queries, allowing for a more human-like and intuitive search experience. Another AI, i.e., Med-PaLM 2, is a large language model (LLM) developed by Google that is designed to provide high-quality answers to medical questions. It is the second iteration and has been developed with a focus on safety and avoiding biases. Med-PaLM 2 has been trained by Google’s health research teams with medical knowledge and provides answers to a variety of complex medical issues. It is an artificial intelligence AI system that is designed to give healthcare professionals access to pretested-accurate medical information. The model’s access is limited to a select group of Google Cloud customers.
Had ChatGPT-4 applied the downstaging modulation of HPV-positivity to the incorrect stage IVA, it would have improved the 5-year clinical prognosis to approximately 78% [13] casting more confusion into the process (see Figure 2D). Had we taken ChatGPT-4’s original output as factual, our therapeutic assumptions would be erroneous since we would be building on an incorrect
Acceptance of other ChatGPT-4 wrong responses, including those made by BARD and YOU.com of
Considering ChatGPT-4’s correct response,
In Figure 1A, the 5-year survival of actual Stage I HPV-positive or (p16+) patients would approximate that of Stage II (p16+) patients; however, in the AJCC 8th Ed, there was a statistically significant difference in the 5-year overall survival for stage I and Stage II disease generally varying in different analyses [18, 19]. As an example, van Gysen reported the difference to be between (96.9% for Stage I vs. 77.1% for Stage II respectively;
What if we acted upon the initial incorrect Stage IVA suggestion and the accompanying decreased 5-year survival suggested by ChatGPT-4? As revealed in Figure 4, radiation, chemotherapy, and possibly immunotherapy [21, 22] might have been recommended with more significant associated risks of acute and perhaps chronic toxicity for this patient.
Notably, a relatively new study by Yoshida
NOTE: As a supplement for those not routinely involved in therapy for oropharyngeal cancer, we include a display of the latest NCCN Guidelines Version 1.2023 for Cancer of the Oropharynx (p16 [HPV]-positive). Figures 4–8 reveal how treatment intensity which is generally related to higher stages of tonsillar cancer, varies with progressive clinical stages [24, 25].
Accurate staging is an important aspect of cancer care. Careful assessment by imaging, surgical anatomy, pertinent biomarkers (p16±), and the surgical and pathology findings of patients harboring the cancer is crucial to understanding prognosis [18] and making appropriate treatment recommendations.
We have evaluated three AIs’ responses to prompts or queries asking for help delineating our patient’s tumor stage and observed ChatGPT-4’s comments. MY PROMPT 2 presented our patient’s clinical vignette and asked ChatGPT-4 to return to the patient’s clinical stage. It first responded to Stage IVA, then Stage II, and finally with the correct answer, Stage I, but only after several tries. The other two AIs agreed on the incorrect stage II answer.
As can be seen in Figures 1–3, each clinical stage is associated with a differing anticipated 5-year survival, and each stage generally implicates different therapeutic modalities and associated toxicities Figures 4–8. In addition, therapeutic decisions are frequently nuanced by intermediate factors, such as histopathological features, including the status of tumor margins and the presence of lymphovascular and perineural invasion [26] and extranodal extension (ENE positive or negative) [27] and social and other coexistent health factors. We had planned to query ChatGPT-4 about those factors; however, we stopped short when we saw it could not accurately evaluate the patient’s primary clinical stage with confidence.
At first blush, it appears that the ChatGPT-4, BARD, and YOU.com LLMs missed their mark for this patient and may not have been trained on enough pertinent data. When ChatGPT-4 was asked to try again after delivering an incorrect answer, it frequently returned a response starting with “I apologize; however, I am restricted to the information I am trained on up to September 2021” We should recall that AJCC Stages generally represent prognoses, not specific treatment recommendations! However, as the clinical stage increases, so goes the intensity of recommended radiation [28] and or chemoradiation [22] and, at times, immunotherapy [29] is added for higher stages or resistant disease. Figures 4–8.
The authors believe that the three AIs used here (ChatGPT-4, BARD and YOU.com) should not be allowed to go unbridled in clinical medicine, and importantly these AIs are very conscious of that and are constantly reminding the users of that fact.
The choice of surgery, chemotherapy, and radiation intensity depends on accurate clinical staging, at the least. Clinical staging was a complete failure, as noted in our first attempts with ChatGPT-4. As we introduced similar prompts to the BARD and YOU.com AIs, there were similar failures despite intense refining of prompts to yield complete responses related to OPSCC staging. Prompt engineering is a process of refining interactions with AI systems, such as chatbots, to produce optimal responses. We anticipate that many physicians will not have the time to become prompt engineers; however, a useful prompt should not produce false references, a common issue with all three AIs discussed.
At this stage of AI development, inaccurate evaluations in the field of clinical medicine should not be counted on. We observed that response accuracy varied and highly depended upon the text presented within the prompt. Medical professional users must always be prepared to check facts from fiction whenever using AI!
More times than we liked, we were surprised to find that BARD or any of the other two AIs tested would suggest literature references when asked that, at first blush looked prim and proper with both DOI and PMID numbers. However, both numbers would frequently lead to entirely unrelated manuscripts or could not be found in any other available medical libraries we searched. This was a repeated problem that occurred within all three AIs tested and was very disappointing. When an unfindable (through DOI and PMID or paper title) reference offered by one AI was presented to another AI, the second AI surprisingly stated that the reference was made up and not real and did not exist. This nonexistent unfindable reference issue was universal between all AIs tested! When BARD was directly confronted with this problem through its own AI, it apologized and admitted BARD had a reference problem that they were working on fixing. The problem is that as all three AIs tested, they usually produced the references as though they were correct, and frequently they were either incorrect or floating in a cloud somewhere.
We submitted a prompt to BARD recently asking for the latest 2023 ASCO guidelines for the treatment of male breast cancer, and it came back with “…
We submitted an additional prompt to ChatGPT-4, and it was asked to return the TNM staging for the AJCC 7th Ed for oropharyngeal cancer as MY PROMPT = Please send me the AJCC 7th Ed of the staging manual for oropharyngeal cancer. Note below that ChatGPT-4 correctly returned the AJCC 7th Ed, which accurately did not incorporate the influence of HPV (Human Papilloma Virus (p16) positivity).
ChatGPT-4 answer:
(AJCC) 7th edition Primary Tumor (T):
TX: Primary tumor cannot be assessed
T0: No evidence of primary tumor
Tis: Carcinoma in situ
T1: Tumor ≤2 cm in greatest dimension
T2: Tumor >2 cm but ≤4 cm in greatest dimension
T3: Tumor >4 cm in greatest dimension
T4a: Tumor invades the larynx, extrinsic muscle of the tongue, medial pterygoid, hard palate, or mandible.
T4b: Tumor invades the lateral pterygoid muscle, pterygoid plates, or skull base and/or encases the internal carotid artery.
Regional Lymph Nodes (N):
NX: Regional lymph nodes cannot be assessed.
N0: No regional lymph node metastasis
N1: Metastasis in a single ipsilateral lymph node, ≤3 cm in greatest dimension
N2:
N2a: Metastasis in a single ipsilateral lymph node, >3 cm but ≤6 cm in greatest dimension
N2b: Metastasis in multiple ipsilateral lymph nodes, none >6 cm in greatest dimension
N2c: Metastasis in bilateral or contralateral lymph nodes, none >6 cm in greatest dimension
N3: Metastasis in a lymph node >6 cm in greatest dimension
Distant Metastasis (M):
MX: Distant metastasis cannot be assessed.
M0: No distant metastasis
M1: Distant metastasis
These factors are then combined to create the overall stage groupings:
Stage I: T1, N0, M0
Stage II: T2, N0, M0
Stage III: T1–T3, N1, M0 OR T3, N0, M0
Stage IVA:
T4a, N0–N1, M0
T1–T4a, N2, M0
Stage IVB:
T4b, Any N, M0
Any T, N3, M0
Stage IVC: Any T, Any N, M1
Please note that this information is based on the AJCC 7th edition, and there may be updates or changes in the more recent editions. Always consult the latest edition for the most accurate information. END
Tonsil malignancy is staged as oropharyngeal cancer according to the AJCC TNM classification of malignant tumors. The following is the
(AJCC) 8th edition T Classification Oropharyngeal Cancers
T1: Tumor 2 cm or less
T2: Tumor more than 2 cm but less than 4 cm
T3: Tumor greater than 4 cm or extension into the lingual surface of epiglottis
p16 negative tumors
T4a: Tumor invades the larynx, deep/extrinsic muscle of the tongue, medial pterygoid, hard palate, or mandible.
T4b: Tumor invades lateral pterygoid muscle, pterygoid plates, lateral nasopharynx, skull base; or encases carotid artery
p16 positive tumors
T4: Larynx, deep/extrinsic muscle of tongue, medial pterygoid, hard palate, mandible, lateral pterygoid muscle, pterygoid plates, lateral nasopharynx, skull base, or encases carotid artery.
N Classification p16 Negative
N0: No regional lymph node metastasis
N1: Single ipsilateral node less than 3 cm
N2
N2a: Single ipsilateral node greater than 3 cm but less than 6 cm
N2b: Multiple ipsilateral nodes less than 6 cm
N2c: Bilateral and contralateral nodes less than 6 cm
N3
N3a: Single node greater than 6 cm
N3b: Single or multiple nodes with extracapsular spread
N Classification p16 Positive
N0: No regional lymph node metastasis
N1: Unilateral nodes all less than 6 cm
N2: Contralateral or bilateral nodes, all less than 6 cm
N3: Metastasis greater than 6 cm
M Classification
M0: No distant metastasis
M1: Distant metastasis
Clinical stages for HPV-associated or (p16+) oropharyngeal Squamous Cell Cancer (SCC) of the tonsil [11].
Stage I T0, T1 or T2 N0 or N1 M0 Stage II T0, T1 or T2 N2 M0 Stage II T3 N1 or N2 M0 Stage III T0 T1 T2 T3 or T4 N3 M0 Stage III T4 N0 N1 N2 or N3 M0 Stage IV ANY T Any N M1
Additional discussion of AJCC 8th Ed for staging oropharyngeal cancers:
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: August 2023
Date of publication: September 2023
DOI: 10.5772/dmht.19
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2023. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
316
Downloads
496
Views
1
Crossref Citations
2
Dimensions Citations
1
Altmetric Score
Join us today!
Submit your Article