Vol 3, No 3 (2025): Current Issue (Volume 3, Issue 3), 2025
Editorial
From Resource-Limited to Research-Rich: Unlocking the Scientific Potential of Developing Nations
Zuhair Dahham Hammood
For too long, the scientific narrative has been dominated by voices from wealthier nations. While their contributions are invaluable, the imbalance has left a vast reservoir of untapped knowledge and innovation in the developing world. Today, the time has come to shift the paradigm—from viewing developing countries as mere recipients of scientific progress to recognizing them as active producers of valuable, context-specific knowledge.
From Resource-Limited to Research-Rich is not a rhetorical flourish—it is a vision, a goal, and a challenge. It reflects a belief that scientific excellence is not the exclusive property of nations with abundant financial resources, but rather, a pursuit driven by curiosity, commitment, and community.
Developing countries, despite limited infrastructure and funding, are home to some of the most pressing health challenges—from endemic infectious diseases and rising non-communicable burdens to unique environmental and sociopolitical contexts. These challenges demand local insight, homegrown data, and context-sensitive solutions. The answers will not come from imported models alone. They must arise from within [1].
In this transformation, medical journals have a profound responsibility—not just as gatekeepers of knowledge, but as platforms for empowerment. Barw Medical Journal stands committed to this mission: to provide a voice to researchers working under constraints, to mentor and guide early-career scientists, and to uphold the integrity and quality of regional scholarship.
Success stories are already emerging. Across Africa, Asia, the Middle East, and Latin America, we are witnessing a rise in high-quality research led by local scientists. These efforts, often fueled by personal passion more than institutional support, prove that scientific ingenuity thrives even where resources are scarce [2].
However, more must be done. Governments must prioritize funding for health research. International agencies must listen more and dictate less. And academic partnerships must be based on equity, not extraction.
The path from resource-limited to research-rich is not paved overnight. It requires intentional investment, strategic collaboration, and relentless belief in the intellectual power of every nation. As we look ahead, let us remember: the next breakthrough in global health may very well come from a modest lab, in a hospital like ours, led by minds that simply needed a chance to be heard.
At Barw Medical Journal, we are here to amplify those voices.
Original Articles

The Effect of Clinical Knee Measurement in Children with Genu Varus
Kamal Jamil, Chong YT, Ahmad Fazly Abd Rasid, Abdul Halim Abdul Rashid, Lawand Ahmed
Abstract
Introduction
Children with genu varus needs frequent assessment and follow up that may need several radiographies. This study investigates the effectiveness of the clinical assessment of genu varus in comparison to the radiological assessment.
Methods
In this study, relationship between clinical and radiographic assessments of genu varus (bow leg) in children, focusing on the use of intercondylar distance (ICD) and clinical tibiofemoral angle (cTFA) as clinical measures, compared to the mechanical tibiofemoral angle (mTFA) obtained via scanogram, the radiographic gold standard for assessing lower limb deformity. Clinical measurements (ICD and cTFA) were gathered along with the mTFA from scanogram radiographs. Reliability was tested between two observers, and Spearman’s correlation coefficient was used to evaluate the relationships between the clinical and radiographic measurements.
Results
The study involved 36 children with an average age of 6.3 years. There were strong intra-rater reliability for both observers (ICC 0.87 for observer 1, ICC 0.97 for observer 2) and excellent inter-observer agreement (ICC 0.97). Positive correlations were found between cTFA and mTFA (r² = 0.67, p < 0.001), between ICD and cTFA (r² = 0.53, p < 0.001), and between ICD and mTFA (r² = 0.62, p < 0.001).
Conclusion
This study suupports the idea that clinical methods may be sufficient for evaluation, minimizing the need for radiation exposure and offering a reliable alternative to radiography.
Introduction
Genu varus, also known as bow-leggedness is defined as any separation of the medial surfaces of the knees when the medial malleoli are in contact, and the patient is standing in the anatomical position [1]. The prevalence of genu varus ranges from 11.4% to 14.5% [2,3]. It is found to be more prevalent in boys than in girls [2]. Genu varus may be physiological or pathological. There are multiple ways to aid in the screening and diagnosis of genu varus, which include clinical and radiological methods. Clinical methods such as intercondylar distance (ICD) and tibiofemoral angle measurement have been used to screen and assess the degree of genu varus. However, imaging modality such as a long-leg AP radiograph or scanogram is considered the gold standard assessment for lower limb deformity.
Many studies on genu varus in children have utilized either the clinical or radiological lower limb measurements to describe the tibiofemoral angle progression in normal children, data of normal ranges of knee angle in relation to age, and transition time from varus to valgus of different populations and ethnic groups [4-10].In a recent systematic review, it is proposed that children above the age of 18 months with genu varus should be closely monitored clinically using ICD or cTFA, whereby an ICD of more than 4 cm needed to be investigated for pathologic cause [11]. However, reliability has not been confirmed.
Hence, serial assessment might be needed to manage children with genu varus. Clinical methods of assessment are preferrable due to no exposure to radiation as compared to a radiograph but may be inaccurate or unreliable [12]. We are interested to find out the correlation between the radiological and clinical assessments.
Methods
Study design and setting
This was a single center cohort study. The study was conducted in an orthopaedic clinic of a tertiary hospital. Children with age ranging from 1 to 17 years old who were diagnosed as genu varus by orthopaedic specialists and has long leg radiograph done, were included. We excluded children who have previous history of fracture of the lower limb, had any knee swelling, tumour or contracture. Consent was taken from the parents before enrolment to the study. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Universiti Kebangsaan Malaysia Institutional Ethical Committee (JEP-2020-194).
Procedure
The baseline data such as age, gender, weight/height and underlying diagnosis were taken. The knee intercondylar distance was measured using a measuring tape, with the child standing, and both medial malleoli touching. The centre of the medial femoral condyles was identified by palpation of the most prominent part of the distal femur. The measurement between the condyles was performed following the method described by Heath et al [13]. The reading was measured in centimetres as the intercondylar distance. The clinical tibiofemoral angle (cTFA) was measured with a goniometer, following the method described by Arazi et al [14]. With the child in standing, the anterior superior iliac spine, centre of the patella, and midpoint of the ankle joint were marked with a pen. After the marking of the tibiofemoral axis, the angle was measured and recorded. The angle was expressed in degrees. Illustrates the method of measurements on a patient (Figure 1).
A standardized long-leg anterior-posterior radiograph (scanogram) of lower limbs was obtained from hospital radiological database. The angle formed between the mechanical axis of the femur and the mechanical axis of tibia was recorded as mechanical tibiofemoral angle (mTFA). The mTFA was determined from digital X-Ray by using the measuring tool from Medweb (Medweb, Inc, San Francisco, CA) software. In bilateral cases, the limb with the worst angle measured was chosen for analysis.
The clinical and radiological measurements were performed by a single researcher (CYT), who was trained on the measurement technique. For the radiographic measurements, a prior intra- and inter-observer reliability study was performed on 10 radiographs by two main researchers (CYT and KJ) on the same children at two different intervals.
Data analysis
The intra- and inter-observer reliability of tibiofemoral angle measurement was measured using with 95% confidence intervals to gauge the precisions of the ICCs [15]. Correlations between clinical tibiofemoral angle (cTFA), mechanical tibiofemoral angle (mTFA) and intercondylar distance (ICD) were tested using Spearman’s Correlation test. Differences between cTFA and mTFA were investigated using paired sample t-test and Bland Altman 95% limits of agreement. All statistical analysis was performed using SPSS (v24, IBM, NY, USA). Statistical significance was set at a cut-off of p<0.05.
Results
There were 36 children included with the mean age of 6.3 years. Thirty-two were Malay (88.8%), while the remaining participants were three Indians (8.3 %) and one Chinese (2.7%) by ethnicity. Twenty-two children were male (61%) and 14 female (38%). There were five unilateral and 31 bilateral genu varus. Eleven children had Blount disease; 13 cases had rickets while the remaining 12 was managed as physiological genu varus.
Reliability study performed between two observers for the tibiofemoral angle measurements revealed Good intra-rater reliability for observer 1 (ICC 0.87) and Excellent intra-rater reliability for observer 2 (ICC 0.97). Excellent inter-observer agreement (ICC 0.97) was also shown.
All thirty-six children (mean age 6.6 ± 5.7) were examined in standing position. The association between the radiological mTFA and clinical TFA measurements was assessed. Our findings revealed that there was a moderate correlation between cTFA and mTFA (r2=0.67, p< 0.001) (Figure 2).
Subsequently, the association between the ICD and clinical TFA and between ICD and radiological mTFA measurements were assessed. We also found a moderate positive correlation between ICD and cTFA, (r2=0.53, p< 0.001) and between ICD and mTFA, (r2=0.62, p< 0.001), respectively (Figure 3).
Paired t- test revealed a mean difference of -4.67 degrees between the cTFA and mTFA. The difference was statistically significant of p= 0.00. The limits of agreement revealed were a lower limit of -7.02 degrees and an upper limit of -2.34 degrees (Figure 4).
Discussion
We examined the correlation between clinical and radiographic TFA measurements of the lower extremities in 36 children with genu varus who has been referred to our centre. We found a significant correlation between radiological mTFA and clinical TFA. This result is in parallel with other studies by [16,17]. Navali et al concluded that goniometer measurement appears to be valid alternatives to the mechanical axis on full-leg radiograph for determining frontal plane knee alignment [17]. Kraus et al also concluded knee alignment assessed clinically by goniometer or measured on a knee radiograph is correlated with the angle measured on the full-limb radiograph [17]. However, both studies were carried out in adults’ population with osteoarthritis knee. Our study determined the correlation between radiological and clinical TFA specifically in paediatric population with genu varus.
Another significant finding in this study is ICD has moderate correlation with cTFA and mTFA. There are several correlation studies that were reported on ICD. Saini et al found that a fair degree of correlation was established between ICD and tibiofemoral angle (TFA), measured clinically by a goniometer [8]. A similar finding between ICD and TFA was seen in other studies [6-11]. This suggested that both measurements can complement each other in monitoring genu varus. The importance of ICD measurement was highlighted by other authors. Cahuzac et al in 1995 has established a data for the normal values of varus profile of the legs in normal children between 10 and 16 years of age, whereby a measurement of ICD of more than 5 cm is considered abnormal [18]. This is supported by other investigators [14-19]. For younger children aged of at least 18 months, ICD of 4cm should be closely monitored [11].
The different degrees of correlation in various studies might be influenced by the different method of measurements. Mathew et al had found the clinical measurement of using ICD to have minimal intra-observer variability [6]. However, a standardized way of measurement and positioning of the patients is important to get a consistent finding. Obtaining a proper standing radiograph in a young child can proved to be challenging, so other measures such as footprint drawn on the floor have been suggested [11].
We also found that the difference of agreement between cTFA and mTFA measurement were significant. mTFA consistently produced a higher value with the mean difference around 5 degrees indicating that the angles were not similar between the two techniques. However, as mentioned earlier both measurements correlated with each other. This means that although not totally accurate as measured on radiograph (mTFA), clinical method can still show similar trend of deformity therefore useful for monitoring change or progress.
There were some limitations in our study. Firstly, our sample population was relatively small with a wide age range (1-17 years). Secondly, we only performed observer reliability study for the radiographic measurement. However, the clinical measurements were done by a single researcher, who was trained to perform the measurement following the standard protocol.
Conclusion
Clinical measurement of tibiofemoral angle and ICD to good correlation with radiological measurement, when performed with the child in standing position. Therefore, for monitoring purposes or serial alignment assessment, these methods are adequate.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: The study's ethical approval was obtained from the Universiti Kebangsaan Malaysia Institutional Ethical Committee (JEP-2020-194).
Patient consent (participation and publication): Verbal informed consent was obtained from patients for publication.
Source of Funding: Universiti Kebangsaan Malaysia
Role of Funder: The funder remained independent, refraining from involvement in data collection, analysis, or result formulation, ensuring unbiased research free from external influence.
Acknowledgements: None to be declared.
Authors' contributions: KJ and CYT conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript. CYT designed the data collection instruments, collected data and carried out the initial analyses. AFAR and AHAR coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Note applicable.

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Talar Sabir Ahmed, Rawa M. Ali, Ari M. Abdullah, Hadeel A. Yasseen, Ronak S. Ahmed, Ameer M....
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, benefits, biases, and limitations of LLMs in diagnosing dermatologic conditions within pathology.
Methods
A pathologist compiled 60 real histopathology case scenarios of skin conditions from a hospital database. Two other pathologists reviewed each patient’s demographics, clinical details, histopathology findings, and original diagnosis. These cases were presented to ChatGPT-3.5, Gemini, and an external pathologist. Each response was classified as complete agreement, partial agreement, or no agreement with the original pathologist’s diagnosis.
Results
ChatGPT-3.5 had 29 (48.4%) complete agreements, 14 (23.3%) partial agreements, and 17 (28.3%) none agreements. Gemini showed 20 (33%), 9 (15%), and 31 (52%) complete agreement, partial agreement, and no agreement responses, respectively. Additionally, the external pathologist had 36(60%), 17(28%), and 7(12%) complete agreements, partial agreements, and no agreements responses, respectively, in relation to the pathologists’ diagnosis. Significant differences in diagnostic agreement were found between the LLMs and the pathologist (P < 0.001).
Conclusion
In certain instances, ChatGPT-3.5 and Gemini may provide an accurate diagnosis of skin pathologies when presented with relevant patient history and descriptions of histopathological reports. However, their overall performance is insufficient for reliable use in real-life clinical settings.
Introduction
The healthcare sector is undergoing significant transformation with the emergence of large language models (LLMs), which have the potential to revolutionize patient care and outcomes. In November 2022, OpenAI introduced a natural language model called Chat Generative Pre-Trained Transformer (ChatGPT). It is renowned for its ability to generate responses that approximate human interaction in various tasks. Gemini, developed by Google, is a text-based AI conversational tool that utilizes machine learning and natural language understanding to address complex inquiries. These models generate new data by identifying structures and patterns from existing data, demonstrating their versatility in producing content across different domains. Generative LLMs rely on sophisticated deep learning methodologies and neural network architectures to scrutinize, comprehend, and produce content that closely resembles human-created outputs. Both ChatGPT and Gemini have gained global recognition for their unprecedented ability to emulate human conversation and cognitive abilities [1-3].
ChatGPT offers a notable advantage in medical decision-making due to its proficiency in analyzing complex medical data. It is a valuable resource for healthcare professionals, providing quick insights derived from patient records, medical research, and clinical guidelines [1,4]. Moreover, ChatGPT can play a crucial role in the differential diagnostic process by synthesizing information from symptoms, medical history, and risk factors, and comprehensively processing this data to present a range of potential medical diagnoses, thereby assisting medical practitioners in their assessments. This has the potential to improve diagnostic accuracy and reduce instances of misdiagnosis or delays [4].
The integration of ChatGPT and Gemini into the medical decision-making landscape has generated interest from various medical specialties. Multiple disciplines have published articles highlighting the significance and potential applications of ChatGPT and Gemini in their respective fields [2,5]. Despite the growing number of these models used in diagnostics, patient management, preventive medicine, and genomic analysis across medicine, the integration of LLMs in dermatology remains limited. This study emphasizes the exploration of large language models, highlighting their less common yet promising role in advancing dermatologic diagnostics and patient care [6]
This study aims to explore the role of LLMs and its decision-making capabilities in the field of pathology, specifically in dermatologic conditions. It focuses on ChatGPT 3.5 and Gemini and compares their accuracy and concordance with the diagnoses of human pathologists. The study also investigates the potential advantages, biases, and constraints of integrating LLM tools into pathology decision-making processes.
Methods
Case Selection
A pathologist selected 60 real case scenarios, with half being neoplastic conditions and the other half non-neoplastic, from a hospital’s medical database. The cases involved patients who had undergone biopsy and histopathological examination for skin conditions. The records included information on age, sex, and the chief complaint of the patients, in addition to a detailed description of the histopathology reports (clinical and microscopic description without the diagnosis).
Consensus Diagnosis
Two additional board-certified pathologists reviewed each case, reaching a collaborative consensus diagnosis through a meticulous review of clinical and microscopic descriptions. This process ensured diagnostic accuracy and reliability while minimizing individual biases.
Eligibility Criteria
The study included cases that had complete and relevant histopathological reports and comprehensive patient demographic information. Specifically, cases were included if they provided a definitive diagnosis in the histopathological report and contained detailed patient data such as age, gender, and clinical history. Cases were excluded if the histopathological report was incomplete, lacked critical patient information, or if the diagnosis could not be definitively made based solely on the textual description.
Sampling Method
The selection process involved a systematic review of available cases from the hospital's medical database to ensure a representative sample of different dermatologic diagnoses. A random sampling method was employed to minimize selection bias and to ensure the sample was representative of the broader population of dermatologic conditions within the database. The selected cases span a range of common and less common dermatologic conditions, enhancing the generalizability of the study’s findings.
Evaluation by AI Systems and External Pathologist
In March 2023, these cases were evaluated using two LLM systems, namely ChatGPT-3.5 and Gemini. In addition, an external board-certified pathologist was tested similarly to the AI systems, receiving only the necessary histopathology report descriptions (without histopathological images) to ensure a fair comparison between the LLM systems and the external pathologist.
Pathologists’ Experience
The Pathologists involved in the study had a minimum of eight years of experience in their respective specialties, handling an average of 30 cases per month. This level of experience ensured a deep familiarity with a wide range of case scenarios. Crucially, the pathologists conducted their assessments were fully informed of the study design, including the comparative analysis with AI systems. Their expertise and understanding were vital in upholding the integrity and reliability of the diagnostic evaluations throughout the study
AI Prompting Strategy
The LLM systems were initially greeted with a prompt saying “Hello,” followed by standardized inquiries presented as: “Please provide the most accurate diagnoses from the texts that will be given below.” Each case was individually presented by copy-pasting it from a Word document and requesting each system to provide a diagnosis of the case scenario based on the information presented. The first response of each system to the inquiry was documented. If no diagnosis was given, the prompt was repeated as such: “Please, based on the histopathological report information given above, provide the most likely disease that causes it.” Until a diagnosis was obtained. In some cases, after a diagnosis was provided, an additional question was asked to specify the histologic subtype of the condition (e.g., if the diagnosis was “seborrheic keratosis”, the system was asked to specify the histologic subtype). Furthermore, the board-certified external pathologist was tested with the same questions, and the correct diagnosis was inquired.
Response Categorization
The responses from both systems and the external pathologist were categorized into three subtypes: complete agreement with the original diagnosis by the human pathologists, partial agreement, or none agreement. The criteria for categorizing agreement levels into "complete," "partial," and "none agreement" are based on the distinction between general and specific diagnostic classifications. For instance, when the original diagnosis provides a detailed type and subtype (e.g., "Seborrheic keratosis, irritated type"), an AI tool's or external pathologist's response was classified as demonstrating "complete agreement" if it accurately identifies both the general diagnosis ("Seborrheic keratosis") and the specific subtype ("irritated type"). This classification acknowledges that accurate identification of both components reflects a thorough understanding and alignment with the original diagnosis. Conversely, an assessment was categorized as "partial agreement" if the response correctly identifies the general diagnosis but inaccurately specifies the subtype. Furthermore, a diagnosis was classified as demonstrating "no agreement" when both the general diagnosis and subtype provided by the AI tool or external pathologist are incorrect. These classification criteria draw upon established methodologies in diagnostic agreement studies, emphasizing the importance of distinguishing between different levels of agreement based on the precision and correctness of diagnostic outputs [7].
Data Processing and Statistical Analysis
The initial processing of the acquired data involved several steps before statistical analysis. First, the data were inputted into Microsoft Excel 2019. Subsequently, they were transferred to Statistical Package for the Social Sciences software (SPSS) 27.0 and the DATA tab for further analysis. Fleiss kappa was utilized to measure agreement among Chat GPT, the external pathologist, and Gemini. Additionally, Chi-square tests were applied to investigate associations between the two LLMs and the external pathologist. In this study, significance was defined as a p-value of < 0.05. A literature review was performed for the study, selectively considering papers from reputable journals while excluding those from predatory sources based on established criteria [8].
Results
ChatGPT-3.5 provided 29 (48.4%) complete agreement, 14 (23.3%) partial agreement, and 17 (28.3%) none agreement responses for the scenarios presented. In contrast, Gemini offered 20 (33%), 9(15%), and 31 (52%) complete agreement, partial agreement, and none agreement responses, respectively, for the same scenarios. Moreover, the external pathologist provided 36 (60%) complete agreement, 17 (28%) partial agreement, and 7 (12%) none agreement responses (Table 1). The complete details of the scenarios, including the diagnosis from the pathologists, ChatGPT’s, Gemini’s, and the external pathologist diagnoses are available in (Supplement 1).
Variables |
Frequency/percentage |
Pathological classification Neoplastic Non-neoplastic |
30 (50%) 30 (50%) |
Neoplastic Benign Malignant |
19 (31.7%) 11 (18.3%) |
Non-neoplastic Dermatosis Infectious, pilosebaceous Connective tissue disease Infectious Granulomatous Vascular Epidermal maturation/keratinization disorder Dermatosis, pilosebaceous Pilosebaceous Panniculitis, Dermatosis, infectious Dermatosis, pigmentation disorder Granulomatous, panniculitis Bullous |
9 (15%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 1 (3.3%) 1 (1.7%) 1 (1.7%) 1 (1.7%) 1 (1.7%) |
External Pathologist Complete agreement Partial agreement None agreement |
36 (60%) 17 (28%) 7 (12%) |
ChatGPT Complete agreement Partial agreement None agreement |
29 (48.4%) 14 (23.3%) 17 (28.3%) |
Gemini Complete agreement Partial agreement None agreement |
20 (33%) 9 (15%) 31 (52%) |
The agreement between Chat GPT, the external pathologist, and Gemini was assessed using Fleiss' kappa, which indicated a statistically significant at a level of <0.001, demonstrating slight to moderate agreement with respect to the original diagnosis made by the pathologists. Out of the 29 questions where Chat GPT agreed with the original diagnosis, only 12 (41.4%) instances also received complete agreement from both Gemini and the external pathologist (Table 2).
Variables | External pathologist |
Measurement of Agreement (Fleiss) |
Significance level |
|||
Complete agreement |
Partial agreement |
None agreement |
||||
Gemini |
Complete agreement |
12 (41.4%) |
1(7.1%) |
2 (11.8%) |
0.25 | <0.001 |
Partial agreement |
3 (10.4%) |
1(7.1%) |
0 (0.0%) |
|||
None agreement |
1(3.4%) |
0(0.0%) |
0 (0.0%) |
|||
Complete agreement |
2 (7%) |
3(21.4%) |
0 (0.0%) |
|||
Partial agreement |
1(3.4%) |
3(21.4%) |
0 (0.0%) |
|||
None agreement |
5(17.2%) |
2(14.4%) |
9 (53%) |
|||
Total |
29 |
14 |
17 |
When assessing the agreement between Chat GPT, the external pathologist, and Gemini, using the external pathologist as the reference, the external pathologist showed complete agreement with the original diagnosis in 36 cases. Among these, Chat GPT achieved complete agreement in 19 cases (52.7%), while Gemini achieved complete agreement in 15 cases (41.7%). Additionally, the external pathologist showed none agreement with the original diagnosis in only 7 cases. Among these, Chat GPT achieved none agreement in 5 cases (71.4%), while Gemini achieved none agreement in 6 cases (85.7%). Statistical analysis indicated significant differences in agreement levels between AI tools (ChatGPT and Gemini) and the external pathologist, with a P-value of <0.001 (Table 3).
AI tools | ChatGPT |
P-value |
|||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
19(52.7%) |
8(47.1%) |
2(28.6%) |
<0.001 |
Partial agreement |
6(16.7%) |
8(47.1%) |
0(0%) |
||
None agreement |
11(30.6%) |
1(5.8%) |
5(71.4%) |
||
Gemini |
Complete agreement |
15(41.7%) |
4(23.5%) |
1(14.3%) |
<0.001 |
Partial agreement |
5(13.9%) |
4(23.5%) |
0(0%) |
||
None agreement |
16(44.4%) |
9(53%) |
6(85.7%) |
||
Total |
36(100%) |
17(100%) |
7(100%) |
In addition, the agreement between the external pathologist, ChatGPT, and Gemini was assessed for both neoplastic and non-neoplastic cases. Statistical analysis revealed significant differences in the agreement levels between the LLMs and the external pathologist, with a P-value of <0.001, highlighting the statistically significant disparity in agreement rates between the AI tools and the external pathologist (Table 4 and 5).
AI tools | External pathologist | P-value | |||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
11(61.1%) |
2(40%) |
4(57.1%) |
<0.001 |
Partial agreement |
3(16.7%) |
3(60%) |
1(14.3%) |
||
None agreement |
4(22.2%) |
0(0%) |
2(28.6%) |
||
Gemini |
Complete agreement |
9(50%) |
1(20%) |
1(14.3%) |
<0.001 |
Partial agreement |
7(38.9%) |
4(80%) |
4(57.1%) |
||
None agreement |
2(11.1%) |
0(0%) |
2(28.6%) |
||
Total non-neoplastic cases |
18(100%) |
5(100%) |
7(100%) |
AI tools | External pathologist | P-value | |||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
8(44.4%) |
0(40%) |
4(40%) |
<0.001 |
Partial agreement |
8(44.4%) |
2(1000%) |
0(0%) |
||
None agreement |
2(11.1%) |
0(0%) |
6(60%) |
||
Gemini |
Complete agreement |
6(33.3%) |
0(20%) |
3(30%) |
<0.001 |
Partial agreement |
9(50%) |
2(100%) |
5(50%) |
||
None agreement |
3(16.7%) |
0(0%) |
2(20%) |
||
Total neoplastic cases |
18(100%) |
2(100%) |
10(100%) |
Discussion
Despite being in existence for over five decades, LLM has recently garnered substantial attention in the public sphere. The increased focus on LLMs in the medical field has led to speculation about the potential replacement of doctors by these systems. However, LLMs are more likely to serve as a complementary tool, aiding clinicians in efficiently processing data and making clinical decisions. This is substantiated by the fact that LLMs can "learn" from extensive collections of medical data. Modern systems are also noted for their self-correcting capabilities. As electronic medical records become more prevalent, there is a growing reservoir of stored patient data. While having access to more data is undoubtedly advantageous, scanning through patient charts can be challenging. Algorithms have been developed to sift through patient notes and detect individuals with specific risk factors, diagnoses, or outcomes. This capability is particularly valuable because, in theory, a LLM system could be developed to review and extract data from medical charts, including pathology reports, and promptly identify patients at highest risk for conditions that could cause significant morbidity or mortality if missed by the physician [6,9].
The field of pathology is no exception to the adaptation of LLMs and the utilization of these technological advancements. Various in recent years have assessed LLM’s accuracy, potential use, and associated limitations. For instance, a study by Vaidyanathaiyer et al., evaluated ChatGPT's proficiency in pathology through thirty clinical case scenarios. These cases were evenly distributed across three primary subcategories: hematology, histopathology, and clinical pathology, with ten cases from each category. The researchers reported that ChatGPT received high grade of “A” on nearly three-quarters of the questions; in the remaining questions, and “B” grades on remaining questions. They found that ChatGPT demonstrated moderate proficiency in these subcategories, excelling in rapid data analysis and providing fundamental insights, though it had limitations in generating thorough and elaborate information [10]. Furthermore, Passby et al. demonstrated capacity of ChatGPT to address multiple-choice inquiries in the Specialty Certificate Examination of dermatology, with ChatGPT-4 outperforming ChatGPT-3.5, scoring 90% versus 63%, respectively, compared to an approximate passing score of 70% [11]. In an investigation by Delsoz et al., twenty corneal pathologies with their respective case descriptions were provided to ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed better, correctly answering 85% of the questions, whereas ChatGPT-3.5 answered only 60% correctly [12]. The current study found that ChatGPT-3.5 performed similarly in the percentage of correct responses. However, this study further evaluated the LLM responses and found that nearly 23.3% and 15% of ChatGPT and Gemini answers, respectively, were fair but still had inaccuracies. This highlight areas where these systems can improve, as they sometimes almost answer correctly but not fully. For instance, when a histopathology report of squamous cell carcinoma in situ was given to ChatGPT-3.5, it answered with squamous cell carcinoma. On further prompting, the system favored an invasive squamous cell carcinoma over an in-situ one, even when the suggestion was made to it whether an in-situ lesion was more appropriate for that scenario. similarly, in the case of guttate psoriasis, Gemini answered with only “psoriasis” did not specify the type, while ChatGPT-3.5 responded with “psoriasis vulgaris”. In a study by Rahsepar et al. on pulmonary malignancies, Google Bard (the former name of Gemini) provided 9.2% partially correct answers, similar to Gemini's 15% partially correct responses in this study. However, ChatGPT-3.5 answered 17.5% of lung cancer questions incorrectly, whereas in the present study, ChatGPT-3.5’s incorrect answers were nearly twice as frequent. This may be due to ChatGPT broader access to data and medical information on lung cancer compared to the dermatological conditions tested in this study, highlighting the limitations and risks of relying on these systems for rarer diseases [13].
Although existing language models have access to extensive medical data, they often lack a nuanced understanding of individual diseases or specific patient cases. They have not undergone specialized training for medical tasks, relying solely on the provided data and information. The unclear methodology behind the LLM's diagnostic process leads to skepticism regarding the reliability of LLM-generated diagnoses. Consequently, their ability to accurately diagnose complex or unique cases may be limited, as demonstrated in the current study on skin histopathology cases. Notably, in a few cases, LLMs declined to provide a diagnosis on the initial prompt, citing concerns about giving medical advice, and only issued a diagnosis after repeated prompting with the same scenario. Despite their ability to offer insights based on existing knowledge, LLMs may lack a complete understanding of the intricate details and visual indicators crucial for pathologists' diagnosis. In the current study, the pathologist initially examined the histopathology slides and then provided the report to the AI systems. Another issue is that preserving the integrity of LLMs and safeguarding the confidentiality of associated data from unauthorized access is critical, particularly in scenarios involving sensitive patient information [14,15]. The case scenarios in this study did not include specific patient identifiers. Additionally, failure to evolve the LLM tools utilized in the pathological assessment alongside advancements in clinical practice and treatment poses the risk of stagnation and adherence to outdated methodologies. Although it is possible to manually update LLM algorithms to align with new protocols, their efficacy depends heavily on the availability of pertinent data, which might not be readily accessible during transitional periods. Such adaptations could introduce errors, particularly in pathology, through misclassifications of entities as classification and staging systems undergo revisions. Another concern is automation bias, which refers to the tendency of clinicians to regard LLM-based predictions as flawless or to adhere to them without questioning their validity. This bias often emerges soon after exposure to new technology and may stem from concerns about the legal consequences of disregarding an algorithm's output. Research across various fields has shown that automation bias can reduce clinician accuracy, affecting areas such as electrocardiogram interpretation and dermatologic diagnoses. Clinicians at all proficiency levels, including experts, are susceptible to this phenomenon [3,14-16].
The LLM has numerous applications in the medical field, with various technologies being developed at an unprecedented pace. For example, in the field of epilepsy, Empatica has created a wearable monitor called Embrace, which detects the onset of seizures in patients with epilepsy and notifies designated family members or trusted physicians. This innovation enhances safety and facilitates early management of such cases and received FDA approval six years ago [17]. Additionally, one of the earliest uses of LLM was for the detection of atrial fibrillation. AliveCor mobile application, which facilitates ECG monitoring and atrial fibrillation detection using a mobile phone, was FDA-approved. Recent findings from the REHEARSE-AF study indicated that traditional care methods are less effective at detecting atrial fibrillation in ambulatory individuals compared to remote ECG monitoring using Kardia [17,18]. Another example is the artificial immune recognition system, which has demonstrated remarkable accuracy in diagnosing tuberculosis by using support vector machine classifiers. These advanced systems significantly outperform traditional methods, making them a robust tool in identifying tuberculosis cases with high reliability. This underscores the potential of these models to enhance diagnostic processes in infectious diseases [19]. The advancements across various medical disciplines render the application of LLMs in histopathological diagnostics increasingly viable and anticipated for future clinical implementation. This progress motivates further research by scientists and numerous companies, as the focus has shifted from questioning whether LLM will be used in pathology or not to when and how these models will be utilized precisely.
One limitation of this study is that the aforementioned LLM systems were not evaluated for their ability and accuracy in directly reaching a diagnosis from histopathological images. Instead, the study relied on providing necessary information from the histopathological reports in text form, which imposes practical constraints and still requires an expert pathologist. Future studies focusing on both histopathological images and texts are necessary to further evaluate the comprehensive capabilities of LLM tools in this domain.
Conclusion
In certain instances, ChatGPT-3.5 and Gemini may provide an accurate diagnosis of skin conditions when provided with pertinent patient history and descriptions of histopathological reports. Specifically, Gemini showed higher accuracy in diagnosing non-neoplastic cases, while ChatGPT-3.5 demonstrated better performance in neoplastic cases. However, despite these strengths, the overall performance of both models is insufficient for reliable use in real-life clinical settings.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: RMA and AMA were significant contributors to the conception of the study and the literature search for related studies. DSH and SHM involved in the literature review, study design, and manuscript writing. TSA, HAY, RSA, and AMS were involved in the literature review, the study's design, the critical revision of the manuscript, and data collection. RMA and DSH confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: ChatGPT-3.5 was used to assist in language editing and improving the clarity of the introduction section. All content was reviewed and verified by the authors. Authors are fully responsible for the entire content of their manuscript.
Data availability statement: Not applicable.
Early View Articles
Annotations on Indeterminate Cytology of Thyroid Nodules in Thyroidology: Novi Sub Sole?
Ilker Sengul, Demet Sengul
Letter to the Editor
Dear Editor,
Indeterminate cytology (IC) remains the most challenging issue for health professionals working in thyroidology, thyroidologists [1-4]. We read a great deal of the article by Ali et al [5]. entitled "Clinicopathological Features of Indeterminate Thyroid Nodules: A Single-center Cross-sectional Study," published in 3rd volume, Barw Medical Journal. This study addresses a challenging and crucial issue by examining the characteristics and malignancy rates of thyroid nodules with IC, the most controversial category for The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC). The authors evaluated the clinicopathological features of the thyroid nodules with Category III, TBSRTC, in a single-center cross-sectional study [5].
One of the strengths of the article is its focus on the challenges in managing IC. Ali and colleagues [5] thoroughly examine comprehensive data, including demographic details, medical history, laboratory tests, preoperative imaging, cytologic evaluation, and histopathological diagnosis. The results indicate a notable malignancy rate in Category III, TBSRTC. Furthermore, the study points out that malignancy tended to be younger, while benign nodules were significantly larger than malignant ones. The study also found a significant association between malignant nodules and Thyroid Imaging Reporting and Data System (TI-RADS) categories 4 and 5 and benign with TI-RADS 2 and 3, which findings align with some existing literature, providing valuable insights into the clinical assessment of IC.
However, several limitations of the study warrant consideration. Firstly, its single-center and retrospective design may limit the generalizability of the findings to diverse populations and settings. As the authors acknowledge, the retrospective data collection might have resulted in missing crucial information. While TI-RADS scoring was provided, more specific ultrasound features of thyroid nodules could have been beneficial. Of note, does including or excluding noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), which has been considered a low-risk entity by the current understanding, affect and/or alter the overall results and the assessment of diagnostic performance and study outcome(s)? [2-4] Furthermore, which caliber of the needle had been utilized throughout the study with or without local and/or topical anesthetic agent(s), and would the utilization of thicker or finer needles in order to obtain cytologic samples with or without any local and/or topical anesthesia alter the outcome(s) of this study? [2] Moreover, which edition of TBSRTC has been used for the work and would stress the up-to-date 3rd edition of TBSRTC [3], considering both the novel and crucial subdivisions of category III might affect the study’s relevant outcome(s)? [3,4] Another point of attention is the relatively short data collection period compared to the publication. Finally, while the discussion section compares the findings with various studies in the literature, a more in-depth exploration of the methodological differences and potential discrepancies in results could have been provided. For instance, the conflicting views in the literature regarding the relationship between nodule size and malignancy risk could have been further contrasted with the study's findings. The authors also acknowledge the small sample size as a limitation. For future research, multi-center and prospective studies with detailed imaging, such as elastography and contrast-enhanced sonography, and investigations into the role of molecular markers in thyroid nodules with Category III could improve diagnostic accuracy and potentially reduce unnecessary surgical interventions.
In conclusion, this study significantly contributes to the evaluation of IC in thyroidology despite its limitations. However, considering the noted limitations, further research with more comprehensive and methodologically robust studies in this area is warranted. This issue merits further investigation.
Sincerely,
The Barw Medical Journal is an online multidisciplinary, open-access journal with an extensive and transparent peer review process that covers a wide range of medical aspects. This Journal offers a distinct and progressive service to assist scholars in publishing high-quality works across a wide range of medical disciplines and aspects, as well as delivering the most recent and reliable scientific updates to the readers.
To maintain the quality of the Barw Medical Journal’s contents, a double-blind, unbiased peer-review process has been established and followed to only publish works that adhere to the scientific, technical, ethical, and standard guidelines. Barw Medical Journal focuses especially on the research output from developing countries and encourages authors from these regions of the world to contribute actively and effectively to the construction of medical literature.
We accept original articles, systematic reviews, meta-analyses, review articles, case reports and case series, editorials, letter to editors, and brief reports/commentaries/perspectives/short communications/correspondence. The journal mainly focuses on the following areas:
- Evidence-based medicine.
- public health and healthcare policies.
- Current diseases’ diagnosis and management.
- Biomedicine, including physiology, genetics, molecular biology, pharmacology, pathology, and pathophysiology.
- Clinical and applied studies like surgery and innovated techniques, internal medicine, gastroenterology, obstetrics, gynecology, pediatrics, and otorhinolaryngology.
The Barw Medical Journal is an online multidisciplinary, open-access medical journal with an extensive and transparent peer review process that covers a wide range of medical aspects including clinical and basic science.
Latest Articles
Annotations on Indeterminate Cytology of Thyroid Nodules in Thyroidology: Novi Sub Sole?
Ilker Sengul, Demet Sengul
Letter to the Editor
Dear Editor,
Indeterminate cytology (IC) remains the most challenging issue for health professionals working in thyroidology, thyroidologists [1-4]. We read a great deal of the article by Ali et al [5]. entitled "Clinicopathological Features of Indeterminate Thyroid Nodules: A Single-center Cross-sectional Study," published in 3rd volume, Barw Medical Journal. This study addresses a challenging and crucial issue by examining the characteristics and malignancy rates of thyroid nodules with IC, the most controversial category for The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC). The authors evaluated the clinicopathological features of the thyroid nodules with Category III, TBSRTC, in a single-center cross-sectional study [5].
One of the strengths of the article is its focus on the challenges in managing IC. Ali and colleagues [5] thoroughly examine comprehensive data, including demographic details, medical history, laboratory tests, preoperative imaging, cytologic evaluation, and histopathological diagnosis. The results indicate a notable malignancy rate in Category III, TBSRTC. Furthermore, the study points out that malignancy tended to be younger, while benign nodules were significantly larger than malignant ones. The study also found a significant association between malignant nodules and Thyroid Imaging Reporting and Data System (TI-RADS) categories 4 and 5 and benign with TI-RADS 2 and 3, which findings align with some existing literature, providing valuable insights into the clinical assessment of IC.
However, several limitations of the study warrant consideration. Firstly, its single-center and retrospective design may limit the generalizability of the findings to diverse populations and settings. As the authors acknowledge, the retrospective data collection might have resulted in missing crucial information. While TI-RADS scoring was provided, more specific ultrasound features of thyroid nodules could have been beneficial. Of note, does including or excluding noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), which has been considered a low-risk entity by the current understanding, affect and/or alter the overall results and the assessment of diagnostic performance and study outcome(s)? [2-4] Furthermore, which caliber of the needle had been utilized throughout the study with or without local and/or topical anesthetic agent(s), and would the utilization of thicker or finer needles in order to obtain cytologic samples with or without any local and/or topical anesthesia alter the outcome(s) of this study? [2] Moreover, which edition of TBSRTC has been used for the work and would stress the up-to-date 3rd edition of TBSRTC [3], considering both the novel and crucial subdivisions of category III might affect the study’s relevant outcome(s)? [3,4] Another point of attention is the relatively short data collection period compared to the publication. Finally, while the discussion section compares the findings with various studies in the literature, a more in-depth exploration of the methodological differences and potential discrepancies in results could have been provided. For instance, the conflicting views in the literature regarding the relationship between nodule size and malignancy risk could have been further contrasted with the study's findings. The authors also acknowledge the small sample size as a limitation. For future research, multi-center and prospective studies with detailed imaging, such as elastography and contrast-enhanced sonography, and investigations into the role of molecular markers in thyroid nodules with Category III could improve diagnostic accuracy and potentially reduce unnecessary surgical interventions.
In conclusion, this study significantly contributes to the evaluation of IC in thyroidology despite its limitations. However, considering the noted limitations, further research with more comprehensive and methodologically robust studies in this area is warranted. This issue merits further investigation.
Sincerely,

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Talar Sabir Ahmed, Rawa M. Ali, Ari M. Abdullah, Hadeel A. Yasseen, Ronak S. Ahmed, Ameer M....
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, benefits, biases, and limitations of LLMs in diagnosing dermatologic conditions within pathology.
Methods
A pathologist compiled 60 real histopathology case scenarios of skin conditions from a hospital database. Two other pathologists reviewed each patient’s demographics, clinical details, histopathology findings, and original diagnosis. These cases were presented to ChatGPT-3.5, Gemini, and an external pathologist. Each response was classified as complete agreement, partial agreement, or no agreement with the original pathologist’s diagnosis.
Results
ChatGPT-3.5 had 29 (48.4%) complete agreements, 14 (23.3%) partial agreements, and 17 (28.3%) none agreements. Gemini showed 20 (33%), 9 (15%), and 31 (52%) complete agreement, partial agreement, and no agreement responses, respectively. Additionally, the external pathologist had 36(60%), 17(28%), and 7(12%) complete agreements, partial agreements, and no agreements responses, respectively, in relation to the pathologists’ diagnosis. Significant differences in diagnostic agreement were found between the LLMs and the pathologist (P < 0.001).
Conclusion
In certain instances, ChatGPT-3.5 and Gemini may provide an accurate diagnosis of skin pathologies when presented with relevant patient history and descriptions of histopathological reports. However, their overall performance is insufficient for reliable use in real-life clinical settings.
Introduction
The healthcare sector is undergoing significant transformation with the emergence of large language models (LLMs), which have the potential to revolutionize patient care and outcomes. In November 2022, OpenAI introduced a natural language model called Chat Generative Pre-Trained Transformer (ChatGPT). It is renowned for its ability to generate responses that approximate human interaction in various tasks. Gemini, developed by Google, is a text-based AI conversational tool that utilizes machine learning and natural language understanding to address complex inquiries. These models generate new data by identifying structures and patterns from existing data, demonstrating their versatility in producing content across different domains. Generative LLMs rely on sophisticated deep learning methodologies and neural network architectures to scrutinize, comprehend, and produce content that closely resembles human-created outputs. Both ChatGPT and Gemini have gained global recognition for their unprecedented ability to emulate human conversation and cognitive abilities [1-3].
ChatGPT offers a notable advantage in medical decision-making due to its proficiency in analyzing complex medical data. It is a valuable resource for healthcare professionals, providing quick insights derived from patient records, medical research, and clinical guidelines [1,4]. Moreover, ChatGPT can play a crucial role in the differential diagnostic process by synthesizing information from symptoms, medical history, and risk factors, and comprehensively processing this data to present a range of potential medical diagnoses, thereby assisting medical practitioners in their assessments. This has the potential to improve diagnostic accuracy and reduce instances of misdiagnosis or delays [4].
The integration of ChatGPT and Gemini into the medical decision-making landscape has generated interest from various medical specialties. Multiple disciplines have published articles highlighting the significance and potential applications of ChatGPT and Gemini in their respective fields [2,5]. Despite the growing number of these models used in diagnostics, patient management, preventive medicine, and genomic analysis across medicine, the integration of LLMs in dermatology remains limited. This study emphasizes the exploration of large language models, highlighting their less common yet promising role in advancing dermatologic diagnostics and patient care [6]
This study aims to explore the role of LLMs and its decision-making capabilities in the field of pathology, specifically in dermatologic conditions. It focuses on ChatGPT 3.5 and Gemini and compares their accuracy and concordance with the diagnoses of human pathologists. The study also investigates the potential advantages, biases, and constraints of integrating LLM tools into pathology decision-making processes.
Methods
Case Selection
A pathologist selected 60 real case scenarios, with half being neoplastic conditions and the other half non-neoplastic, from a hospital’s medical database. The cases involved patients who had undergone biopsy and histopathological examination for skin conditions. The records included information on age, sex, and the chief complaint of the patients, in addition to a detailed description of the histopathology reports (clinical and microscopic description without the diagnosis).
Consensus Diagnosis
Two additional board-certified pathologists reviewed each case, reaching a collaborative consensus diagnosis through a meticulous review of clinical and microscopic descriptions. This process ensured diagnostic accuracy and reliability while minimizing individual biases.
Eligibility Criteria
The study included cases that had complete and relevant histopathological reports and comprehensive patient demographic information. Specifically, cases were included if they provided a definitive diagnosis in the histopathological report and contained detailed patient data such as age, gender, and clinical history. Cases were excluded if the histopathological report was incomplete, lacked critical patient information, or if the diagnosis could not be definitively made based solely on the textual description.
Sampling Method
The selection process involved a systematic review of available cases from the hospital's medical database to ensure a representative sample of different dermatologic diagnoses. A random sampling method was employed to minimize selection bias and to ensure the sample was representative of the broader population of dermatologic conditions within the database. The selected cases span a range of common and less common dermatologic conditions, enhancing the generalizability of the study’s findings.
Evaluation by AI Systems and External Pathologist
In March 2023, these cases were evaluated using two LLM systems, namely ChatGPT-3.5 and Gemini. In addition, an external board-certified pathologist was tested similarly to the AI systems, receiving only the necessary histopathology report descriptions (without histopathological images) to ensure a fair comparison between the LLM systems and the external pathologist.
Pathologists’ Experience
The Pathologists involved in the study had a minimum of eight years of experience in their respective specialties, handling an average of 30 cases per month. This level of experience ensured a deep familiarity with a wide range of case scenarios. Crucially, the pathologists conducted their assessments were fully informed of the study design, including the comparative analysis with AI systems. Their expertise and understanding were vital in upholding the integrity and reliability of the diagnostic evaluations throughout the study
AI Prompting Strategy
The LLM systems were initially greeted with a prompt saying “Hello,” followed by standardized inquiries presented as: “Please provide the most accurate diagnoses from the texts that will be given below.” Each case was individually presented by copy-pasting it from a Word document and requesting each system to provide a diagnosis of the case scenario based on the information presented. The first response of each system to the inquiry was documented. If no diagnosis was given, the prompt was repeated as such: “Please, based on the histopathological report information given above, provide the most likely disease that causes it.” Until a diagnosis was obtained. In some cases, after a diagnosis was provided, an additional question was asked to specify the histologic subtype of the condition (e.g., if the diagnosis was “seborrheic keratosis”, the system was asked to specify the histologic subtype). Furthermore, the board-certified external pathologist was tested with the same questions, and the correct diagnosis was inquired.
Response Categorization
The responses from both systems and the external pathologist were categorized into three subtypes: complete agreement with the original diagnosis by the human pathologists, partial agreement, or none agreement. The criteria for categorizing agreement levels into "complete," "partial," and "none agreement" are based on the distinction between general and specific diagnostic classifications. For instance, when the original diagnosis provides a detailed type and subtype (e.g., "Seborrheic keratosis, irritated type"), an AI tool's or external pathologist's response was classified as demonstrating "complete agreement" if it accurately identifies both the general diagnosis ("Seborrheic keratosis") and the specific subtype ("irritated type"). This classification acknowledges that accurate identification of both components reflects a thorough understanding and alignment with the original diagnosis. Conversely, an assessment was categorized as "partial agreement" if the response correctly identifies the general diagnosis but inaccurately specifies the subtype. Furthermore, a diagnosis was classified as demonstrating "no agreement" when both the general diagnosis and subtype provided by the AI tool or external pathologist are incorrect. These classification criteria draw upon established methodologies in diagnostic agreement studies, emphasizing the importance of distinguishing between different levels of agreement based on the precision and correctness of diagnostic outputs [7].
Data Processing and Statistical Analysis
The initial processing of the acquired data involved several steps before statistical analysis. First, the data were inputted into Microsoft Excel 2019. Subsequently, they were transferred to Statistical Package for the Social Sciences software (SPSS) 27.0 and the DATA tab for further analysis. Fleiss kappa was utilized to measure agreement among Chat GPT, the external pathologist, and Gemini. Additionally, Chi-square tests were applied to investigate associations between the two LLMs and the external pathologist. In this study, significance was defined as a p-value of < 0.05. A literature review was performed for the study, selectively considering papers from reputable journals while excluding those from predatory sources based on established criteria [8].
Results
ChatGPT-3.5 provided 29 (48.4%) complete agreement, 14 (23.3%) partial agreement, and 17 (28.3%) none agreement responses for the scenarios presented. In contrast, Gemini offered 20 (33%), 9(15%), and 31 (52%) complete agreement, partial agreement, and none agreement responses, respectively, for the same scenarios. Moreover, the external pathologist provided 36 (60%) complete agreement, 17 (28%) partial agreement, and 7 (12%) none agreement responses (Table 1). The complete details of the scenarios, including the diagnosis from the pathologists, ChatGPT’s, Gemini’s, and the external pathologist diagnoses are available in (Supplement 1).
Variables |
Frequency/percentage |
Pathological classification Neoplastic Non-neoplastic |
30 (50%) 30 (50%) |
Neoplastic Benign Malignant |
19 (31.7%) 11 (18.3%) |
Non-neoplastic Dermatosis Infectious, pilosebaceous Connective tissue disease Infectious Granulomatous Vascular Epidermal maturation/keratinization disorder Dermatosis, pilosebaceous Pilosebaceous Panniculitis, Dermatosis, infectious Dermatosis, pigmentation disorder Granulomatous, panniculitis Bullous |
9 (15%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 2 (3.3%) 1 (3.3%) 1 (1.7%) 1 (1.7%) 1 (1.7%) 1 (1.7%) |
External Pathologist Complete agreement Partial agreement None agreement |
36 (60%) 17 (28%) 7 (12%) |
ChatGPT Complete agreement Partial agreement None agreement |
29 (48.4%) 14 (23.3%) 17 (28.3%) |
Gemini Complete agreement Partial agreement None agreement |
20 (33%) 9 (15%) 31 (52%) |
The agreement between Chat GPT, the external pathologist, and Gemini was assessed using Fleiss' kappa, which indicated a statistically significant at a level of <0.001, demonstrating slight to moderate agreement with respect to the original diagnosis made by the pathologists. Out of the 29 questions where Chat GPT agreed with the original diagnosis, only 12 (41.4%) instances also received complete agreement from both Gemini and the external pathologist (Table 2).
Variables | External pathologist |
Measurement of Agreement (Fleiss) |
Significance level |
|||
Complete agreement |
Partial agreement |
None agreement |
||||
Gemini |
Complete agreement |
12 (41.4%) |
1(7.1%) |
2 (11.8%) |
0.25 | <0.001 |
Partial agreement |
3 (10.4%) |
1(7.1%) |
0 (0.0%) |
|||
None agreement |
1(3.4%) |
0(0.0%) |
0 (0.0%) |
|||
Complete agreement |
2 (7%) |
3(21.4%) |
0 (0.0%) |
|||
Partial agreement |
1(3.4%) |
3(21.4%) |
0 (0.0%) |
|||
None agreement |
5(17.2%) |
2(14.4%) |
9 (53%) |
|||
Total |
29 |
14 |
17 |
When assessing the agreement between Chat GPT, the external pathologist, and Gemini, using the external pathologist as the reference, the external pathologist showed complete agreement with the original diagnosis in 36 cases. Among these, Chat GPT achieved complete agreement in 19 cases (52.7%), while Gemini achieved complete agreement in 15 cases (41.7%). Additionally, the external pathologist showed none agreement with the original diagnosis in only 7 cases. Among these, Chat GPT achieved none agreement in 5 cases (71.4%), while Gemini achieved none agreement in 6 cases (85.7%). Statistical analysis indicated significant differences in agreement levels between AI tools (ChatGPT and Gemini) and the external pathologist, with a P-value of <0.001 (Table 3).
AI tools | ChatGPT |
P-value |
|||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
19(52.7%) |
8(47.1%) |
2(28.6%) |
<0.001 |
Partial agreement |
6(16.7%) |
8(47.1%) |
0(0%) |
||
None agreement |
11(30.6%) |
1(5.8%) |
5(71.4%) |
||
Gemini |
Complete agreement |
15(41.7%) |
4(23.5%) |
1(14.3%) |
<0.001 |
Partial agreement |
5(13.9%) |
4(23.5%) |
0(0%) |
||
None agreement |
16(44.4%) |
9(53%) |
6(85.7%) |
||
Total |
36(100%) |
17(100%) |
7(100%) |
In addition, the agreement between the external pathologist, ChatGPT, and Gemini was assessed for both neoplastic and non-neoplastic cases. Statistical analysis revealed significant differences in the agreement levels between the LLMs and the external pathologist, with a P-value of <0.001, highlighting the statistically significant disparity in agreement rates between the AI tools and the external pathologist (Table 4 and 5).
AI tools | External pathologist | P-value | |||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
11(61.1%) |
2(40%) |
4(57.1%) |
<0.001 |
Partial agreement |
3(16.7%) |
3(60%) |
1(14.3%) |
||
None agreement |
4(22.2%) |
0(0%) |
2(28.6%) |
||
Gemini |
Complete agreement |
9(50%) |
1(20%) |
1(14.3%) |
<0.001 |
Partial agreement |
7(38.9%) |
4(80%) |
4(57.1%) |
||
None agreement |
2(11.1%) |
0(0%) |
2(28.6%) |
||
Total non-neoplastic cases |
18(100%) |
5(100%) |
7(100%) |
AI tools | External pathologist | P-value | |||
Complete agreement |
Partial agreement |
None agreement |
|||
ChatGPT |
Complete agreement |
8(44.4%) |
0(40%) |
4(40%) |
<0.001 |
Partial agreement |
8(44.4%) |
2(1000%) |
0(0%) |
||
None agreement |
2(11.1%) |
0(0%) |
6(60%) |
||
Gemini |
Complete agreement |
6(33.3%) |
0(20%) |
3(30%) |
<0.001 |
Partial agreement |
9(50%) |
2(100%) |
5(50%) |
||
None agreement |
3(16.7%) |
0(0%) |
2(20%) |
||
Total neoplastic cases |
18(100%) |
2(100%) |
10(100%) |
Discussion
Despite being in existence for over five decades, LLM has recently garnered substantial attention in the public sphere. The increased focus on LLMs in the medical field has led to speculation about the potential replacement of doctors by these systems. However, LLMs are more likely to serve as a complementary tool, aiding clinicians in efficiently processing data and making clinical decisions. This is substantiated by the fact that LLMs can "learn" from extensive collections of medical data. Modern systems are also noted for their self-correcting capabilities. As electronic medical records become more prevalent, there is a growing reservoir of stored patient data. While having access to more data is undoubtedly advantageous, scanning through patient charts can be challenging. Algorithms have been developed to sift through patient notes and detect individuals with specific risk factors, diagnoses, or outcomes. This capability is particularly valuable because, in theory, a LLM system could be developed to review and extract data from medical charts, including pathology reports, and promptly identify patients at highest risk for conditions that could cause significant morbidity or mortality if missed by the physician [6,9].
The field of pathology is no exception to the adaptation of LLMs and the utilization of these technological advancements. Various in recent years have assessed LLM’s accuracy, potential use, and associated limitations. For instance, a study by Vaidyanathaiyer et al., evaluated ChatGPT's proficiency in pathology through thirty clinical case scenarios. These cases were evenly distributed across three primary subcategories: hematology, histopathology, and clinical pathology, with ten cases from each category. The researchers reported that ChatGPT received high grade of “A” on nearly three-quarters of the questions; in the remaining questions, and “B” grades on remaining questions. They found that ChatGPT demonstrated moderate proficiency in these subcategories, excelling in rapid data analysis and providing fundamental insights, though it had limitations in generating thorough and elaborate information [10]. Furthermore, Passby et al. demonstrated capacity of ChatGPT to address multiple-choice inquiries in the Specialty Certificate Examination of dermatology, with ChatGPT-4 outperforming ChatGPT-3.5, scoring 90% versus 63%, respectively, compared to an approximate passing score of 70% [11]. In an investigation by Delsoz et al., twenty corneal pathologies with their respective case descriptions were provided to ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 performed better, correctly answering 85% of the questions, whereas ChatGPT-3.5 answered only 60% correctly [12]. The current study found that ChatGPT-3.5 performed similarly in the percentage of correct responses. However, this study further evaluated the LLM responses and found that nearly 23.3% and 15% of ChatGPT and Gemini answers, respectively, were fair but still had inaccuracies. This highlight areas where these systems can improve, as they sometimes almost answer correctly but not fully. For instance, when a histopathology report of squamous cell carcinoma in situ was given to ChatGPT-3.5, it answered with squamous cell carcinoma. On further prompting, the system favored an invasive squamous cell carcinoma over an in-situ one, even when the suggestion was made to it whether an in-situ lesion was more appropriate for that scenario. similarly, in the case of guttate psoriasis, Gemini answered with only “psoriasis” did not specify the type, while ChatGPT-3.5 responded with “psoriasis vulgaris”. In a study by Rahsepar et al. on pulmonary malignancies, Google Bard (the former name of Gemini) provided 9.2% partially correct answers, similar to Gemini's 15% partially correct responses in this study. However, ChatGPT-3.5 answered 17.5% of lung cancer questions incorrectly, whereas in the present study, ChatGPT-3.5’s incorrect answers were nearly twice as frequent. This may be due to ChatGPT broader access to data and medical information on lung cancer compared to the dermatological conditions tested in this study, highlighting the limitations and risks of relying on these systems for rarer diseases [13].
Although existing language models have access to extensive medical data, they often lack a nuanced understanding of individual diseases or specific patient cases. They have not undergone specialized training for medical tasks, relying solely on the provided data and information. The unclear methodology behind the LLM's diagnostic process leads to skepticism regarding the reliability of LLM-generated diagnoses. Consequently, their ability to accurately diagnose complex or unique cases may be limited, as demonstrated in the current study on skin histopathology cases. Notably, in a few cases, LLMs declined to provide a diagnosis on the initial prompt, citing concerns about giving medical advice, and only issued a diagnosis after repeated prompting with the same scenario. Despite their ability to offer insights based on existing knowledge, LLMs may lack a complete understanding of the intricate details and visual indicators crucial for pathologists' diagnosis. In the current study, the pathologist initially examined the histopathology slides and then provided the report to the AI systems. Another issue is that preserving the integrity of LLMs and safeguarding the confidentiality of associated data from unauthorized access is critical, particularly in scenarios involving sensitive patient information [14,15]. The case scenarios in this study did not include specific patient identifiers. Additionally, failure to evolve the LLM tools utilized in the pathological assessment alongside advancements in clinical practice and treatment poses the risk of stagnation and adherence to outdated methodologies. Although it is possible to manually update LLM algorithms to align with new protocols, their efficacy depends heavily on the availability of pertinent data, which might not be readily accessible during transitional periods. Such adaptations could introduce errors, particularly in pathology, through misclassifications of entities as classification and staging systems undergo revisions. Another concern is automation bias, which refers to the tendency of clinicians to regard LLM-based predictions as flawless or to adhere to them without questioning their validity. This bias often emerges soon after exposure to new technology and may stem from concerns about the legal consequences of disregarding an algorithm's output. Research across various fields has shown that automation bias can reduce clinician accuracy, affecting areas such as electrocardiogram interpretation and dermatologic diagnoses. Clinicians at all proficiency levels, including experts, are susceptible to this phenomenon [3,14-16].
The LLM has numerous applications in the medical field, with various technologies being developed at an unprecedented pace. For example, in the field of epilepsy, Empatica has created a wearable monitor called Embrace, which detects the onset of seizures in patients with epilepsy and notifies designated family members or trusted physicians. This innovation enhances safety and facilitates early management of such cases and received FDA approval six years ago [17]. Additionally, one of the earliest uses of LLM was for the detection of atrial fibrillation. AliveCor mobile application, which facilitates ECG monitoring and atrial fibrillation detection using a mobile phone, was FDA-approved. Recent findings from the REHEARSE-AF study indicated that traditional care methods are less effective at detecting atrial fibrillation in ambulatory individuals compared to remote ECG monitoring using Kardia [17,18]. Another example is the artificial immune recognition system, which has demonstrated remarkable accuracy in diagnosing tuberculosis by using support vector machine classifiers. These advanced systems significantly outperform traditional methods, making them a robust tool in identifying tuberculosis cases with high reliability. This underscores the potential of these models to enhance diagnostic processes in infectious diseases [19]. The advancements across various medical disciplines render the application of LLMs in histopathological diagnostics increasingly viable and anticipated for future clinical implementation. This progress motivates further research by scientists and numerous companies, as the focus has shifted from questioning whether LLM will be used in pathology or not to when and how these models will be utilized precisely.
One limitation of this study is that the aforementioned LLM systems were not evaluated for their ability and accuracy in directly reaching a diagnosis from histopathological images. Instead, the study relied on providing necessary information from the histopathological reports in text form, which imposes practical constraints and still requires an expert pathologist. Future studies focusing on both histopathological images and texts are necessary to further evaluate the comprehensive capabilities of LLM tools in this domain.
Conclusion
In certain instances, ChatGPT-3.5 and Gemini may provide an accurate diagnosis of skin conditions when provided with pertinent patient history and descriptions of histopathological reports. Specifically, Gemini showed higher accuracy in diagnosing non-neoplastic cases, while ChatGPT-3.5 demonstrated better performance in neoplastic cases. However, despite these strengths, the overall performance of both models is insufficient for reliable use in real-life clinical settings.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: RMA and AMA were significant contributors to the conception of the study and the literature search for related studies. DSH and SHM involved in the literature review, study design, and manuscript writing. TSA, HAY, RSA, and AMS were involved in the literature review, the study's design, the critical revision of the manuscript, and data collection. RMA and DSH confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: ChatGPT-3.5 was used to assist in language editing and improving the clarity of the introduction section. All content was reviewed and verified by the authors. Authors are fully responsible for the entire content of their manuscript.
Data availability statement: Not applicable.

The Effect of Clinical Knee Measurement in Children with Genu Varus
Kamal Jamil, Chong YT, Ahmad Fazly Abd Rasid, Abdul Halim Abdul Rashid, Lawand Ahmed
Abstract
Introduction
Children with genu varus needs frequent assessment and follow up that may need several radiographies. This study investigates the effectiveness of the clinical assessment of genu varus in comparison to the radiological assessment.
Methods
In this study, relationship between clinical and radiographic assessments of genu varus (bow leg) in children, focusing on the use of intercondylar distance (ICD) and clinical tibiofemoral angle (cTFA) as clinical measures, compared to the mechanical tibiofemoral angle (mTFA) obtained via scanogram, the radiographic gold standard for assessing lower limb deformity. Clinical measurements (ICD and cTFA) were gathered along with the mTFA from scanogram radiographs. Reliability was tested between two observers, and Spearman’s correlation coefficient was used to evaluate the relationships between the clinical and radiographic measurements.
Results
The study involved 36 children with an average age of 6.3 years. There were strong intra-rater reliability for both observers (ICC 0.87 for observer 1, ICC 0.97 for observer 2) and excellent inter-observer agreement (ICC 0.97). Positive correlations were found between cTFA and mTFA (r² = 0.67, p < 0.001), between ICD and cTFA (r² = 0.53, p < 0.001), and between ICD and mTFA (r² = 0.62, p < 0.001).
Conclusion
This study suupports the idea that clinical methods may be sufficient for evaluation, minimizing the need for radiation exposure and offering a reliable alternative to radiography.
Introduction
Genu varus, also known as bow-leggedness is defined as any separation of the medial surfaces of the knees when the medial malleoli are in contact, and the patient is standing in the anatomical position [1]. The prevalence of genu varus ranges from 11.4% to 14.5% [2,3]. It is found to be more prevalent in boys than in girls [2]. Genu varus may be physiological or pathological. There are multiple ways to aid in the screening and diagnosis of genu varus, which include clinical and radiological methods. Clinical methods such as intercondylar distance (ICD) and tibiofemoral angle measurement have been used to screen and assess the degree of genu varus. However, imaging modality such as a long-leg AP radiograph or scanogram is considered the gold standard assessment for lower limb deformity.
Many studies on genu varus in children have utilized either the clinical or radiological lower limb measurements to describe the tibiofemoral angle progression in normal children, data of normal ranges of knee angle in relation to age, and transition time from varus to valgus of different populations and ethnic groups [4-10].In a recent systematic review, it is proposed that children above the age of 18 months with genu varus should be closely monitored clinically using ICD or cTFA, whereby an ICD of more than 4 cm needed to be investigated for pathologic cause [11]. However, reliability has not been confirmed.
Hence, serial assessment might be needed to manage children with genu varus. Clinical methods of assessment are preferrable due to no exposure to radiation as compared to a radiograph but may be inaccurate or unreliable [12]. We are interested to find out the correlation between the radiological and clinical assessments.
Methods
Study design and setting
This was a single center cohort study. The study was conducted in an orthopaedic clinic of a tertiary hospital. Children with age ranging from 1 to 17 years old who were diagnosed as genu varus by orthopaedic specialists and has long leg radiograph done, were included. We excluded children who have previous history of fracture of the lower limb, had any knee swelling, tumour or contracture. Consent was taken from the parents before enrolment to the study. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Universiti Kebangsaan Malaysia Institutional Ethical Committee (JEP-2020-194).
Procedure
The baseline data such as age, gender, weight/height and underlying diagnosis were taken. The knee intercondylar distance was measured using a measuring tape, with the child standing, and both medial malleoli touching. The centre of the medial femoral condyles was identified by palpation of the most prominent part of the distal femur. The measurement between the condyles was performed following the method described by Heath et al [13]. The reading was measured in centimetres as the intercondylar distance. The clinical tibiofemoral angle (cTFA) was measured with a goniometer, following the method described by Arazi et al [14]. With the child in standing, the anterior superior iliac spine, centre of the patella, and midpoint of the ankle joint were marked with a pen. After the marking of the tibiofemoral axis, the angle was measured and recorded. The angle was expressed in degrees. Illustrates the method of measurements on a patient (Figure 1).
A standardized long-leg anterior-posterior radiograph (scanogram) of lower limbs was obtained from hospital radiological database. The angle formed between the mechanical axis of the femur and the mechanical axis of tibia was recorded as mechanical tibiofemoral angle (mTFA). The mTFA was determined from digital X-Ray by using the measuring tool from Medweb (Medweb, Inc, San Francisco, CA) software. In bilateral cases, the limb with the worst angle measured was chosen for analysis.
The clinical and radiological measurements were performed by a single researcher (CYT), who was trained on the measurement technique. For the radiographic measurements, a prior intra- and inter-observer reliability study was performed on 10 radiographs by two main researchers (CYT and KJ) on the same children at two different intervals.
Data analysis
The intra- and inter-observer reliability of tibiofemoral angle measurement was measured using with 95% confidence intervals to gauge the precisions of the ICCs [15]. Correlations between clinical tibiofemoral angle (cTFA), mechanical tibiofemoral angle (mTFA) and intercondylar distance (ICD) were tested using Spearman’s Correlation test. Differences between cTFA and mTFA were investigated using paired sample t-test and Bland Altman 95% limits of agreement. All statistical analysis was performed using SPSS (v24, IBM, NY, USA). Statistical significance was set at a cut-off of p<0.05.
Results
There were 36 children included with the mean age of 6.3 years. Thirty-two were Malay (88.8%), while the remaining participants were three Indians (8.3 %) and one Chinese (2.7%) by ethnicity. Twenty-two children were male (61%) and 14 female (38%). There were five unilateral and 31 bilateral genu varus. Eleven children had Blount disease; 13 cases had rickets while the remaining 12 was managed as physiological genu varus.
Reliability study performed between two observers for the tibiofemoral angle measurements revealed Good intra-rater reliability for observer 1 (ICC 0.87) and Excellent intra-rater reliability for observer 2 (ICC 0.97). Excellent inter-observer agreement (ICC 0.97) was also shown.
All thirty-six children (mean age 6.6 ± 5.7) were examined in standing position. The association between the radiological mTFA and clinical TFA measurements was assessed. Our findings revealed that there was a moderate correlation between cTFA and mTFA (r2=0.67, p< 0.001) (Figure 2).
Subsequently, the association between the ICD and clinical TFA and between ICD and radiological mTFA measurements were assessed. We also found a moderate positive correlation between ICD and cTFA, (r2=0.53, p< 0.001) and between ICD and mTFA, (r2=0.62, p< 0.001), respectively (Figure 3).
Paired t- test revealed a mean difference of -4.67 degrees between the cTFA and mTFA. The difference was statistically significant of p= 0.00. The limits of agreement revealed were a lower limit of -7.02 degrees and an upper limit of -2.34 degrees (Figure 4).
Discussion
We examined the correlation between clinical and radiographic TFA measurements of the lower extremities in 36 children with genu varus who has been referred to our centre. We found a significant correlation between radiological mTFA and clinical TFA. This result is in parallel with other studies by [16,17]. Navali et al concluded that goniometer measurement appears to be valid alternatives to the mechanical axis on full-leg radiograph for determining frontal plane knee alignment [17]. Kraus et al also concluded knee alignment assessed clinically by goniometer or measured on a knee radiograph is correlated with the angle measured on the full-limb radiograph [17]. However, both studies were carried out in adults’ population with osteoarthritis knee. Our study determined the correlation between radiological and clinical TFA specifically in paediatric population with genu varus.
Another significant finding in this study is ICD has moderate correlation with cTFA and mTFA. There are several correlation studies that were reported on ICD. Saini et al found that a fair degree of correlation was established between ICD and tibiofemoral angle (TFA), measured clinically by a goniometer [8]. A similar finding between ICD and TFA was seen in other studies [6-11]. This suggested that both measurements can complement each other in monitoring genu varus. The importance of ICD measurement was highlighted by other authors. Cahuzac et al in 1995 has established a data for the normal values of varus profile of the legs in normal children between 10 and 16 years of age, whereby a measurement of ICD of more than 5 cm is considered abnormal [18]. This is supported by other investigators [14-19]. For younger children aged of at least 18 months, ICD of 4cm should be closely monitored [11].
The different degrees of correlation in various studies might be influenced by the different method of measurements. Mathew et al had found the clinical measurement of using ICD to have minimal intra-observer variability [6]. However, a standardized way of measurement and positioning of the patients is important to get a consistent finding. Obtaining a proper standing radiograph in a young child can proved to be challenging, so other measures such as footprint drawn on the floor have been suggested [11].
We also found that the difference of agreement between cTFA and mTFA measurement were significant. mTFA consistently produced a higher value with the mean difference around 5 degrees indicating that the angles were not similar between the two techniques. However, as mentioned earlier both measurements correlated with each other. This means that although not totally accurate as measured on radiograph (mTFA), clinical method can still show similar trend of deformity therefore useful for monitoring change or progress.
There were some limitations in our study. Firstly, our sample population was relatively small with a wide age range (1-17 years). Secondly, we only performed observer reliability study for the radiographic measurement. However, the clinical measurements were done by a single researcher, who was trained to perform the measurement following the standard protocol.
Conclusion
Clinical measurement of tibiofemoral angle and ICD to good correlation with radiological measurement, when performed with the child in standing position. Therefore, for monitoring purposes or serial alignment assessment, these methods are adequate.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: The study's ethical approval was obtained from the Universiti Kebangsaan Malaysia Institutional Ethical Committee (JEP-2020-194).
Patient consent (participation and publication): Verbal informed consent was obtained from patients for publication.
Source of Funding: Universiti Kebangsaan Malaysia
Role of Funder: The funder remained independent, refraining from involvement in data collection, analysis, or result formulation, ensuring unbiased research free from external influence.
Acknowledgements: None to be declared.
Authors' contributions: KJ and CYT conceptualized and designed the study, drafted the initial manuscript, and reviewed and revised the manuscript. CYT designed the data collection instruments, collected data and carried out the initial analyses. AFAR and AHAR coordinated and supervised data collection, and critically reviewed the manuscript for important intellectual content. All authors approved the final manuscript as submitted and agree to be accountable for all aspects of the work.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Note applicable.
From Resource-Limited to Research-Rich: Unlocking the Scientific Potential of Developing Nations
Zuhair Dahham Hammood
For too long, the scientific narrative has been dominated by voices from wealthier nations. While their contributions are invaluable, the imbalance has left a vast reservoir of untapped knowledge and innovation in the developing world. Today, the time has come to shift the paradigm—from viewing developing countries as mere recipients of scientific progress to recognizing them as active producers of valuable, context-specific knowledge.
From Resource-Limited to Research-Rich is not a rhetorical flourish—it is a vision, a goal, and a challenge. It reflects a belief that scientific excellence is not the exclusive property of nations with abundant financial resources, but rather, a pursuit driven by curiosity, commitment, and community.
Developing countries, despite limited infrastructure and funding, are home to some of the most pressing health challenges—from endemic infectious diseases and rising non-communicable burdens to unique environmental and sociopolitical contexts. These challenges demand local insight, homegrown data, and context-sensitive solutions. The answers will not come from imported models alone. They must arise from within [1].
In this transformation, medical journals have a profound responsibility—not just as gatekeepers of knowledge, but as platforms for empowerment. Barw Medical Journal stands committed to this mission: to provide a voice to researchers working under constraints, to mentor and guide early-career scientists, and to uphold the integrity and quality of regional scholarship.
Success stories are already emerging. Across Africa, Asia, the Middle East, and Latin America, we are witnessing a rise in high-quality research led by local scientists. These efforts, often fueled by personal passion more than institutional support, prove that scientific ingenuity thrives even where resources are scarce [2].
However, more must be done. Governments must prioritize funding for health research. International agencies must listen more and dictate less. And academic partnerships must be based on equity, not extraction.
The path from resource-limited to research-rich is not paved overnight. It requires intentional investment, strategic collaboration, and relentless belief in the intellectual power of every nation. As we look ahead, let us remember: the next breakthrough in global health may very well come from a modest lab, in a hospital like ours, led by minds that simply needed a chance to be heard.
At Barw Medical Journal, we are here to amplify those voices.

Gastric Pyloric Schwannoma: A Case Report and Review of the Literature
Rebaz M. Ali, Shkar O. Arif, Rezhen J. Rashid, Rawa M. Ali, Abubakar Ibrahim, Dana T. Gharib,...
Abstract
Introduction
Schwannomas are slow-growing, subclinical neoplasms rarely found in the gastrointestinal tract. This study reports a schwannoma in the pyloric region of the stomach.
Case presentation
A 50-year-old female presented with a one-week history of epigastric pain, dark tarry stools, and nausea. Endoscopic examination and biopsy confirmed the diagnosis of gastric schwannoma. The patient underwent surgical resection of the tumor. Histopathological examination showed benign spindle cells with strong S100 positivity, confirming schwannoma. Post-operative follow-up included treatment for H. pylori infection and monitoring for recurrence or complications. No recurrence was reported after six months.
Literature Review
Gastric schwannoma is challenging to distinguish from other submucosal tumors preoperatively. Reviews of recent case reports indicate the importance of detailed imaging in diagnosis, and surgical resection remains the treatment of choice, with an excellent prognosis and low recurrence rates.
Conclusion
Schwannoma is rare in the stomach, especially in the pyloric region. Definitive diagnosis may require immunohistochemical analysis. Appropriate follow-up after treatment can be essential to identify emerging complications and ensure timely intervention.
Introduction
Schwannomas, also referred to as neurilemmomas or neurinomas, are uncommon and non-cancerous spindle cell tumors that originate from excessive proliferation of myelin-producing Schwann cells in the nerve sheath and remain localized in their primary location [1,2]. They are frequently slow-growing and rarely found in the gastrointestinal tract (GIT). Gastric schwannoma (GS) represents only 2–6% of mesenchymal tumors within the GIT and 0.2% of all gastric tumors [3,4]. Although most schwannomas occur alone, GS is often part of neurofibromatosis type 2 and has an association with other tumors. There is a gender predilection towards females [3]. Preoperatively, GS is often challenging to accurately distinguish from gastric submucosal or other stromal tumors due to physicians' limited recognition of GS [5,6]. Herein, a case of schwannoma originating from the pyloric region of the stomach is reported. The references’ eligibility has been verified, and the report has been structured in accordance with CaReL guidelines [7,8].
Case Presentation
Patient information
A 50-year-old female presented with epigastric pain for one week. The pain was associated with melena and nausea, with no constipation, diarrhea, or fever. Her past medical history was negative for any chronic disease. She had a thyroid lobectomy two years ago and was on thyroxin 100 mcg/day.
Clinical findings
Only epigastric tenderness was noted on physical examination, with no other systemic abnormalities.
Diagnostic assessment
An abdominal ultrasound (U/S) showed a well-defined hypoechoic mass (30 mm) in the pyloric region of the stomach. A contrast-enhanced computed tomography (CT) scan of the abdomen revealed a well-defined, smooth outline lesion measuring 35x30 mm in the pyloric region with mild mucosal wall thickening (7 mm) and relative proximal dilation of the stomach, without lymphadenopathy. A dynamic magnetic resonance imaging (MRI) of the abdomen revealed a well-defined, 33 x 27 x 27 mm, space-occupying lesion in the epigastric region between the lesser curvature of the stomach and the left hepatic lobe. The mass exhibited T1 hypointensity, T2 hyperintensity, restricted diffusion on diffusion-weighted imaging, and diffuse early enhancement with retained contrast in the delayed phase. The mass was attached to the stomach wall (Figure 1). An esophagogastroduodenoscopy (EGD) revealed a large subepithelial lesion, approximately 4 cm, located in the incisura and extending to the lower body on the lesser curvature side, with an antral nipple sign. The overlying mucosa was normal, but there was nodular antral gastropathy.
Therapeutic intervention
A resection of the anterior gastric wall near the incisura angularis was performed to remove the mass. Histopathological examination of the lesion revealed hypo- and hypercellular areas of spindled cells arranged in loose fascicles and having neural-type, lightly eosinophilic and clear cytoplasm and spindled and buckled nuclei with fine chromatin. Scattered hyalinized blood vessels were present within the lesion. These findings indicated a benign spindle cell lesion suggestive of schwannoma. The tumor had a mitotic rate of less than five mitoses per 50 high-power fields. There was necrosis or vascular invasion, and the resection margins were free. Four lymph nodes were examined and found to be negative for metastasis (Figure 2). Immunohistochemistry of the tumor revealed positivity for S100, characterized by strong and diffuse cytoplasmic and nuclear staining. Weak and focal cytoplasmic staining was observed for desmin and smooth muscle actin, while the tumor was negative for CD117 and CD34.
Follow-up and Outcome
Six months postoperatively, the patient developed lower back pain. Lumbosacral MRI showed L3-4 and L4-5 mild disc thecal sac indentation, causing mild bilateral foraminal narrowing and a focal bone lesion at the acetabular root. Pelvic MRI indicated early osteoarthritis changes in the right hip joint with subchondral pseudocystic changes at the acetabular roof. It also showed a thin-walled unilocular cystic lesion in the left ovary measuring 40x20 mm. The patient received analgesics for disc problems and osteoarthritis. A follow-up EGD one year later revealed a long transverse scar in the incisura area and mucosal congestion. Multiple biopsies from the area showed chronic active H. pylori pangastritis without metaplastic and dysplastic changes. The patient was referred to a gastroenterologist and started a regimen for H. pylori eradication. Subsequently, a CT scan revealed a thickening of the gastric outlet and pyloric wall. The rest of the stomach was distended and fluid-filled, raising suspicion of a gastric ulcer. An endoscopic biopsy from the gastric mucosa revealed mild chronic gastritis with mild glandular atrophy, ulceration, and fibromuscular hyperplasia, with no evidence of H. pylori microorganisms, intestinal metaplasia, or dysplasia. A biopsy was also taken from a small mid-esophageal ulcer, which showed no pathologic findings.
Discussion
Gastric schwannomas are rare mesenchymal benign tumors, comprising 0.2% of all gastric tumors [4]. Malignant transformation of GS is extremely rare [9]. The development of GS can occur at any age, but it is most commonly found in individuals in their fifties and sixties, with a higher incidence in women. Although many GSs are discovered incidentally, they may cause nonspecific symptoms such as pain or GIT bleeding [6]. In reviewing eight cases of GS, four cases presented with abdominal pain or discomfort, two had a palpable abdominal mass, one had hematochezia, and another one had weight loss and early satiety (Table 1). The present case was a 50-year-old female who complained about abdominal pain in the epigastric region for a week.
Author, year [Reference] |
Study design |
No. of cases |
Age (year) |
Sex |
Clinical Presentation |
History |
Diagnostic Assessment |
Treatment |
Follow up-time |
Recurrence |
||||
Medical |
Surgical |
Mass Location/ Detection method |
Disease appearance |
Mass size (mm) |
Immunohistochemical findings |
|||||||||
Sorial et al., 2024[1] |
Case report |
1 |
82 |
M |
Abdominal pain |
N/A |
N/A |
GC of the stomach / CT |
No lesions or ulcers |
32 |
S100 (+). CD34, CD117, DOG-1, DS, & SMA (-) Ki67 (2-3%) |
SWGR during laparoscopy |
N/A |
N/A |
Majdoubi et al., 2024[3] |
Case report |
1 |
50 |
M |
Persistent postprandial pain, melena, anorexia, asthenia |
None |
None |
Fundic region along the GC/ CT, GI endoscopy |
Peptic ulcer and lesion |
33 |
Fusiform cells. CD 117 (-) S100 (+) DOG (-) DS (-) AML (-) |
Laparoscopic gastric atypical resection |
8 months |
No |
Manjesh et al., 2024[4] |
Case report |
1 |
62 |
F |
Early satiety, weight loss |
N/A |
N/A |
Fundus / USG, upper GI endoscopy, CECT |
Hypoechoic mass |
83 |
Spindle cells. S100, GFAP, & P16 (+). CD117, DOG-1, SMA, & CD34 (-)
|
Exploratory laparotomy & Excision |
N/A |
N/A |
Huang et al., 2024[5] |
Case report |
1 |
72 |
F |
Abdominal distension |
N/A |
N/A |
Anterior lower stomach near GC / EUS, CT, EG |
White ulcer scar |
55 |
S-100, SDHB, & SOX-10 (+) CD34, CD117, DOG-1, & DS (−). Ki67 (1%) |
EFTR |
1 week |
No |
Kostovski et al., 2024[6] |
Case report |
1 |
68 |
M |
Lower abdominal discomfort |
None |
None |
Antral region along the GC / CECT |
Submucosal gastric lesion |
62 |
Spindle cells. S100, SDHB, & SOX10, (+). CD34, CD 117, DOG-1, & DS (−). Ki-67 (<1%) |
Supraumbilical median laparotomy & Excision |
1 month |
No |
Kormann et al., 2024[10] |
Case report |
1 |
67 |
M |
Progressive fatigue, paleness, hematochezia |
None |
N/A |
Gastric antrum / Upper EUS |
Submucosal lesion |
9 |
S100 (+) |
Hybrid ESD-EFTR |
1st 6 years
|
No |
2nd 3 months after intervention |
||||||||||||||
Lin et al., 2024[11] |
Case report |
1 |
66 |
F |
Protruding subepithelial mass |
N/A |
N/A |
Gastric fundus / EGD, EUS, CT |
Subepithelial lesion |
25 |
Short spindle-shaped cells. S100 (+)
|
Laparoscopic surgery |
N/A |
N/A |
Huang et al., 2024[12] |
Case report |
1 |
31 |
F |
Palpable abdominal mass |
N/A |
N/A |
Posterior wall of the GLC/ USG, CT |
Mucosal ulcer |
64 |
N/A |
Laparoscopic gastric lesion resection |
2 weeks |
No |
M: male F: female IDA: iron-deficiency anemia N/A: non-available GC: greater curvature CT: computed tomography GLC: gastric lesser curvature USG: ultrasonography EUS: endoscopic ultrasound EGD: esophagogastroduodenoscopy GI: gastrointestinal CECT: contrast-enhanced computed tomography EG: electrocoagulation mm: millimeter (+): positive (-): negative GFAP: glial fibrillary acidic protein DS: desmin SMA: smooth muscle actin ESD: endoscopic submucosal dissection EFTR: Endoscopic full-thickness resection SWGR: stapled wedge gastric resection |
Diagnosing GSs presents several challenges due to their rare occurrence and nonspecific symptoms. Due to overlapping clinical, radiological, and endoscopic features, these tumors are often misidentified as other submucosal tumors, such as gastrointestinal stromal tumors, leiomyomas, or other mesenchymal tumors, which might bring delay in availing the right therapy. Endoscopic ultrasound is a valuable tool for differentiation but is not definitive. Schwannomas are typically diagnosed postoperatively through histopathological and immunohistochemical analysis, with S100 and SOX-10 providing positive confirmation and negative markers such as CD117 and smooth muscle actin, which help exclude other differential diagnoses [5,10]. Among the reviewed cases, CT was one of the most common methods used for the early characterization of the lesion. It was used in 87.5% of the cases, providing detailed imaging that helped identify the tumor's size, location, and relation to surrounding structures. Ultrasonography was used in 62.5% of cases, with 60% being endoscopic U/S, which aided in characterizing the lesion and determining its layer of origin, as this is essential for planning the therapeutic approach. Other methods, like EGD and upper gastrointestinal endoscopy, were less commonly used but played supportive roles in the diagnostic process [4,11]. In the current case, an abdominal U/S initially identified a well-defined hypoechoic mass in the pyloric region of the stomach. This was followed by a contrast-enhanced CT scan of the abdomen, which confirmed a smooth contoured lesion and mild mucosal wall thickening. Dynamic MRI showed a 33 x 27 x 27 mm lesion in the epigastric region between the lesser curvature of the stomach and the left hepatic lobe. An EGD revealed a large subepithelial lesion with an antral nipple sign.
Primarily, GS occurs in the fundus and body of the stomach, as documented in multiple case reports [1,3-5,11,12]. In three cases, including those by Majdoubi et al., Lin et al., and Manjesh et al., the tumor was located in the fundus [3,4,11], while in three other cases, the tumor was located in the gastric body [1,5,12].
The antrum can also be the site of tumor origin, as described in two cases by Kormann et al. and Kostovski et al. [6,10]. No cases were reported in the cardia or pylorus. However, in the present case, the tumor originated from the pyloric region.
In the current case, a surgical resection of the anterior gastric wall near the incisura angularis was performed, and a 4.2-cm tumor was successfully removed. Histopathological examination showed a neural-type, low-grade, spindle cell tumor with a low mitotic rate, no necrosis or vascular invasion, and with negative margins. Immunohistochemical analysis revealed strong S100 positivity, insignificant staining for desmin and SMA, and negativity for CD117 and CD34, confirming the diagnosis of schwannoma. This approach aligns with other studies that reported successful surgical resection outcomes and similar histological and immunohistochemical profiles [1,4-6].
Post-surgical monitoring was crucial in the current case. The need to address new health concerns, such as H. pylori eradication, aligns with the follow-up practices noted by Kormann et al., who emphasized the importance of routine imaging and endoscopy to monitor for recurrence and manage complications [10]. However, unlike the present case, Lin et al. reported no significant post-surgical complications or new health issues during follow-up, suggesting a variation in the clinical outcomes [11]. Although detailed imaging and histopathological analyses were conducted in the current case, genetic profiling of the tumor was not performed. This could have provided valuable information about the tumor’s potential for malignant behavior and recurrence.
Conclusion
Schwannoma is rare in the stomach, especially in the pyloric region. Definitive diagnosis may require immunohistochemical analysis. Appropriate follow-up after treatment can be essential to identify emerging complications and ensure timely intervention.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Written informed consent was obtained from the parent of the patient for publication.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: RMA and SOA were significant contributors to the conception of the study and the literature search for related studies. AI, DTG, AHA, BAM and SMA were involved in the literature review, the study's design, and the critical revision of the manuscript, and they participated in data collection. MAG, DAI and HRA were involved in the literature review, study design, and manuscript writing. RJR was the radiologist who performed the assessment of the case. RMA was the pathologist who performed the diagnostic of the case. RMA and MAG confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Not applicable.

Kikuchi-Fujimoto Disease Coexistent with Papillary Thyroid Carcinoma: A Report of Two Cases
Ari M. Abdullah, Aras J. Qaradakhy, Yadgar A. Saeed, Aso S. Muhialdeen, Rebaz O. Mohammed, Hiwa...
Abstract
Introduction
Kikuchi-Fujimoto Disease (KFD), characterized by histiocytic necrotizing lymphadenitis, is a rare condition of unknown etiology. Diagnosis is dependent on lymph node biopsy. Despite its self-limiting nature, accurate identification is essential to exclude more serious conditions. This paper reports on two cases of KFD coexisting with papillary thyroid carcinoma (PTC).
Case presentation
Two cases of KFD related to papillary thyroid carcinoma (PTC) are described. In Case 1, a 25-year-old woman experienced submental swelling, fever, and exhaustion. Subsequent tests revealed a thyroid lesion and cervical lymphadenopathy, which were confirmed as PTC and KFD. In Case 2, a 39-year-old female patient had right neck swelling, prompting a complete thyroidectomy that revealed papillary thyroid cancer with KFD in cervical lymph nodes.
Conclusion
The conclusion emphasizes the importance of considering KFD while highlighting its masquerading nature and the unique scenario of its coexistence with PTC.
Introduction
Kikuchi-Fujimoto Disease (KFD), or histiocytic necrotizing lymphadenitis, is a relatively rare medical condition characterized by painful cervical lymphadenitis and fever. It was first identified in Japan in 1972 [1,2] and typically manifests as a benign and self-limiting disorder [3] with documented cases primarily in Asian countries [4]. Although both genders can be affected, there is a slight predilection towards females. The etiology of KFD remains unknown [3]. The disease exhibits a higher incidence in adults aged 20 to 35 [1]. Due to its rarity, KFD is often not considered in the initial differential diagnosis and its diagnosis relies on histopathologic examination (HPE) of lymph node biopsies. Despite its benign nature, accurate diagnosis is crucial to exclude other causes of lymphadenopathy such as lymphoma, tuberculous adenitis, and systemic lupus erythematosus [5]. There is no specific treatment for KFD; however, supportive care with analgesics, antipyretics, and corticosteroids can alleviate symptoms. In refractory cases, treatment with immunoglobulins or hydroxychloroquine may be considered [3]. While PTC is the most common type of thyroid cancer, its association with KFD is seldom emphasized [6]. Although metastatic lymphadenopathy can occur in cancer patients, simultaneous occurrence with other conditions in the same lymph node is unusual [7]. The current study aims to present two cases of KFD associated with PTC.
Case Presentations
Case 1
Patient information
A 25-year-old female presented with a one-month history of submental swelling, accompanied by fever and fatigue. She had no significant past medical history except for a tonsillectomy and rhinoplasty.
Clinical findings
Thyroid examination revealed a palpable submental lymph node classified as Grade 0. Other systemic examinations were unremarkable.
Diagnostic assessment
Routine laboratory tests showed normal thyroid-stimulating hormone (TSH) levels at 2.083 mIU/L, and elevated free T4 (FT4) levels at 15.13 ng/dL, indicating hyperthyroidism rather than normal thyroid function. Neck ultrasound (U/S) revealed a well-defined, irregular surface and a solid hypoechoic nodule of 10*9*7.8 mm in the mid-upper third categorized as TIRAD 5.
Multiple bilateral cervical lymphadenopathies were noted with well-defined margins, round to oval shape, loss of hilar echogenicity, and mild vascularity. The largest lymph node, measuring 12x9x8mm, was located submentally, and another measuring 10x6 mm was found in the left level III group, suggesting potential pathological involvement. Fine needle aspiration (FNA) confirmed PTC VI.
Therapeutic intervention
Under general anesthesia, total thyroidectomy, left central and lateral lymph node dissection, and submental lymph node biopsy were performed via a collar incision. Preservation of both recurrent laryngeal nerves and parathyroid glands was ensured. Hemostasis was achieved, and the wound was closed in layers with a drain on the left side. A total of 37 lymph nodes were evaluated from the left central and lateral cervical groups during the procedure. Among these, three lymph nodes were involved by papillary thyroid carcinoma. The submental lymph node biopsy revealed histiocytic necrotizing lymphadenitis, confirmed by immunohistochemistry (Figure 1). Specific staining patterns were observed using antibodies sourced from monoclonal mouse for CD15 (pH 9), CD20 (pH 9), and CD30 (pH 6), and from rabbit for CD68 (pH 6). The CD68 exhibited predominant cytoplasmic positivity in histiocytes localized within the necrotic areas, while CD20, CD15, and CD30 demonstrated negative staining within the necrotic regions, indicative of the absence of B-cell lymphocytic infiltrates and granulocytes, respectively. Scattered positive cells for CD15 and CD30 were observed both within and outside the necrotic foci.
Follow-up and Outcome
Post-operatively, the patient received levothyroxine 100 mg daily for thyroid hormone replacement therapy and was placed on regular follow-up. Three months later, neck U/S showed no focal lesions or signs of recurrence, with recovery supported by symptomatic care.
Case 2
Patient information
A 39-year-old female presented with right-sided neck swelling and no prior medical or surgical history.
Clinical findings
Examination revealed cervical lymphadenopathy without additional clinical complaints.
Diagnostic assessment
Routine laboratory tests indicated normal thyroid function with thyroid-stimulating hormone (TSH) levels of 3.82 mIU/L and free T4 (FT4) levels of 12.6 ng/dL. Anti-thyroid peroxidase (ATPO) levels were elevated at 600 IU/ml. Neck U/S revealed multiple bilateral cervical lymphadenopathies, predominantly on the right side, characterized by well-defined hypoechoic, mildly vascular lymph nodes with loss of hilum echogenicity. The largest lymph node in the right group III measured 17×8mm and was pathologically significant. The thyroid gland appeared normal, with small nodules <3mm in the right lobe, and the largest measuring 13×12×10mm in the left lower third, classified as TR4 with solid isoechoic features and microcalcifications. The FNA confirmed PTC VI and benign lymphoid cells in the left lymph node.
Therapeutic intervention
Under general anesthesia, total thyroidectomy with excision of left central and right posterior cervical lymph nodes was performed through a collar incision. Both recurrent laryngeal nerves and parathyroid glands were preserved. Hemostasis was achieved, and the wound was closed in layers with a drain on the left side. A total of five central lymph nodes were evaluated during the thyroidectomy, all of which were tumor-free. Additionally, two right posterior cervical lymph nodes were sampled, both showing histological features consistent with Kikuchi disease (Figure 2), confirmed by immunohistochemistry with CD68 positivity in histiocytic cells, CD20 negativity, CD15, CD30 negativity in the necrotic area, and sporadic CD15, CD30 positivity outside necrotic regions.
Follow-up and Outcome
Post-operatively, the patient was stable and started on levothyroxine 100 mg daily for thyroid hormone replacement therapy. Three months later, U/S showed no focal lesions, indicating recovery under supportive care.
Discussion
The KFD is a rare, benign lymphadenopathy predominantly affecting cervical lymph nodes, although cases involving axillary and supraclavicular nodes have been documented [1]. Initially identified in Japan, KFD has been reported globally across Europe, America, Asia, and the Middle East [5], with a higher prevalence among women under 40 years of age [9]. The exact cause of KFD remains unclear, with theories suggesting infectious and autoimmune origins. Associations with herpes viruses and Epstein-Barr virus have been noted, although evidence remains inconclusive. Concurrent autoimmune diseases like systemic lupus erythematosus also suggest an autoimmune component [3]. While lymph nodes as large as 5 to 6 cm have been reported, typical KFD-associated lymphadenopathy is less than 3 cm. Fever episodes lasting from one to seven weeks with temperatures ranging from 38.6°C to 40.5°C are common, with variable tenderness on palpation. Additional symptoms may include chills, headaches, splenomegaly, arthralgia, vomiting, night sweats, fatigue, and malaise [1]. The disease onset is acute or subacute, progressing over 1-3 weeks and resolving spontaneously within 1-4 months [3]. In the present study, two females (25 and 39 years old) presented with submental and anterior neck swelling, respectively.
In a study conducted by MD et al., an 11-year-old female presented with three weeks of multiple lymph node enlargement and one week of fever without systemic or oropharyngeal infection [9]. Maruyama et al., reported a case of a 48-year-old man who initially presented with a tongue lesion. Despite initial negative findings on examination and imaging for lymphadenopathy, subsequent biopsy revealed squamous cell carcinoma. Following tumor reduction surgery, lymphadenopathy developed [10]. In the current study, the first case presented with submental swelling and fever, while the second case presented with right-sided neck swelling without fever.
No specific laboratory tests are pathognomonic for the diagnosis KFD. Reported findings include variable results such as increased lactate dehydrogenase (LDH), leukopenia or leukocytosis, anemia, elevated erythrocyte sedimentation rate, raised C-reactive protein levels, and elevated transaminases. Leukopenia is observed in 25% to 58% of cases, while leukocytosis occurs in approximately 2% to 5% [1]. Diagnostic workup typically includes imaging with US and/or CT scans. Definitive diagnosis is established through excisional biopsy and HPE [1]. Radiologically, KFD lacks a distinct appearance and can resemble various nodal conditions with necrosis, including lymphoma, metastases, and tuberculosis. A retrospective CT study by Kwon et al. identified predominantly homogeneous lymphadenopathies involving levels II to V, with most nodes measuring less than 2.5 cm, distinguishing them from lymphoma which often presents with fewer but larger nodes, perinodal infiltration, and necrosis [9]. Garg et al. reported cases of females presenting with neck swelling, undergoing ultrasound and FNA revealing PTC [9]. Similarly, in the current study, both patients exhibited normal lab tests. Ultrasound revealed cervical lymphadenopathy and a thyroid nodule. FNA of the first patient's TR5 nodule confirmed PTC VI, while FNA of the second patient's LN and TR4 nodule suggested Kikuchi disease and PTC.
Three histological types were proposed: proliferative, necrotizing, and xanthomatous types. Notably, the absence of granulocytes distinguishes the xanthomatous variant, although differentiation from conditions like SLE, lymphoma, drug-induced lymphadenopathy, or Kawasaki disease poses challenges [11]. Immunohistochemistry plays a crucial role in resolving overlaps in histopathological findings [12]. Typically self-limiting, KFD resolves within one to four months without specific therapy, although recurrent cases, seen in 3–4% of patients, necessitate monitoring. No hereditary predisposition has been reported. Supportive care includes analgesics, NSAIDs, and antipyretics for symptom relief. Corticosteroids are beneficial for neurological involvement, while hydroxychloroquine, immunoglobulins, and minocycline have shown efficacy in selected cases [13]. In the context of the current study, patients with papillary thyroid carcinoma (PTC) and suspicious lymph nodes underwent total thyroidectomy and neck dissection, with subsequent HPE revealing concurrent PTC and KFD in the submental lymph node of the first case and the right cervical lymph node of the second. Both patients recovered with resolution of lymphadenopathy, highlighting the rarity of synchronous PTC with KFD, as documented minimally in the genuine literature by Park et al. and emphasized by Garg et al. [8,9]. In the current study, HPE of thyroid tumors revealed papillary structures with fibrovascular cores and nuclear features consistent with PTC VI classification based on the Bethesda system, without necrosis—a hallmark of well-differentiated papillary carcinomas. Conversely, non-tumoral tissues, particularly lymph nodes affected by KFD, exhibited histiocytic necrotizing lymphadenitis with necrotic foci surrounded by CD68-positive histiocytes, distinguishing it from PTC and emphasizing the diagnostic role of HPE in distinguishing these conditions.
The clinical diagnosis of KFD and PTC presents several limitations and challenges. Accurate diagnosis is crucial yet often hindered by the overlapping clinical and histopathological features of KFD and other conditions. To improve diagnostic precision, it is essential to utilize more detailed histopathologic images at both low and high magnifications. These enhanced imaging techniques can provide clearer insights into the cellular and structural characteristics of the lesions, thereby facilitating more accurate differentiation between KFD and other lymphadenopathies or neoplastic condition.
Conclusion
The simultaneous presence KFD and PTC highlights complex diagnostic challenges. Surgical intervention underscores the crucial role of detailed histopathological examination in achieving accurate diagnosis and tailored treatment strategies for these rare concurrent conditions.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Written informed consent was obtained from the parent of the patient for publication.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: AMA and FHK were significant contributors to the conception of the study and the literature search for related studies. YAS, ASM, ROM, HOB and AMS were involved in the literature review, the study's design, and the critical revision of the manuscript, and they participated in data collection. AAQ and FHK were involved in the literature review, study design, and manuscript writing. AJQ and RJR were the radiologists who performed the assessment of the case. AAQ and AMS confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Not applicable.

Asymptomatic Osteonecrosis of the Trochlea in an Adolescent: A Case Report
Abdullah K. Ghafour, Soran S. Raoof, Soran H. Tahir, Rezheen J. Rashid, Dyari Q. Hamad, Pshdar H....
Abstract
Introduction
Osteonecrosis, also known as avascular necrosis, aseptic necrosis, or ischemic necrosis, results from a temporary or permanent halt in blood flow to a portion of bone. This lack of blood supply can eventually cause the affected bone to collapse. Osteonecrosis around the elbow is not frequently observed. However, its occurrence in the trochlea known as Hegemann's disease is even rarer. Incidence rates of trochlear osteonecrosis have been reported to vary from 0.27% to less than 0.001% across different studies.
Case presentation
A 14-year-old male presented with severe right shoulder pain and swelling, along with mild right lateral-sided elbow pain due to a fall to the ground. The radiograph of the right shoulder revealed a proximal humeral metaphyseal greenstick fracture. Additionally, the radiograph of the right elbow incidentally revealed osteonecrosis of the distal humeral trochlea. The affected shoulder was immobilized and Conservative management was selected for treating the trochlear osteonecrosis.
Conclusion
Trochlear avascular necrosis is a rare condition that might cause mild discomfort or even be asymptomatic, potentially being diagnosed incidentally through radiographs. Typically, it can be managed with conservative treatment methods.
Introduction
Osteonecrosis, alternatively termed avascular necrosis (AVN), aseptic necrosis, or ischemic necrosis, occurs due to a temporary or permanent interruption of blood flow to a section of bone.
This deprivation of blood supply can lead to the eventual collapse of the affected bone [1]. This condition often manifests as joint pain, bone damage, and reduced function. Commonly affected areas include the ends of long bones like the femur and humerus, as well as regions like the knee's femoral condyles, the tibial plateau, and the small bones in the hands and feet [2,3]. While AVN around the elbow is not frequently observed, its occurrence in the trochlea is even rarer compared to other elbow regions such as the capitellum, radial head, and olecranon [4].
The term osteochondrosis encompasses over 50 various conditions that affect the developing skeleton. In 1951, Dr. Gerd Hegemann documented the radiographic alterations observed in the humeral trochlea of young adults. Hence, osteochondrosis specifically affecting the humeral trochlea is referred to as Hegemann's disease [5].
Hegemann's disease can arise from either traumatic or non-traumatic causes. Instances involving trauma often involve elbow contusions or fractures as contributing factors [6]. However, Osteonecrosis commonly develops in individuals who have certain risk factors, including high-dose corticosteroid therapy, excessive alcohol consumption, injury, malignancy, systemic lupus erythematosus, and hematologic disorders like sickle cell disease along with certain Infectious causes [1,7].
Osteonecrosis of the trochlea is an extremely rare condition affecting the lower end of the humerus. Incidence rates have been reported to vary from 0.27% to less than 0.001% across different studies [6,8].
This report presents a rare case of trochlear osteonecrosis in an adolescent. All of the references cited in this report were evaluated for eligibility [9].
Case Presentation
Patient information
A 14-year-old male was brought to the emergency department of our hospital with severe right shoulder pain and swelling, along with mild right lateral-sided elbow pain. These symptoms had started approximately two hours after he fell to the ground. Before the fall, the patient did not complain of any pain or limitation of range of motion in either joint. The patient's parents reported two previous traumas. The first, at the age of eleven, involved a fall on an outstretched hand, resulting in mild elbow pain for approximately three days, which resolved without medical intervention. The second incident occurred one year prior in a road traffic accident, resulting in a right distal tibial greenstick fracture. However, there were no concurrent upper limb complaints during this episode, and the fracture was managed conservatively with long leg casting.
Clinical findings
The patient had severe tenderness over the proximal humerus with limitation of shoulder range of motion due to pain and mild swelling. He had also complained about mild right lateral-sided elbow tenderness with a normal elbow range of motion and no elbow deformity was noted with the normal neurovascular examination of that limb.
Diagnostic assessment
The radiograph of the right shoulder revealed a proximal humeral metaphyseal greenstick fracture, (figure. 1). Additionally, the radiograph of the right elbow incidentally revealed osteonecrosis of the distal humeral trochlea, with no other superimposed findings noted. Notably, the carrying angle was measured at 12 degrees in valgus.
Therapeutic intervention
A sling and swathe were applied to immobilize the affected shoulder, and the patient was provided analgesics. Conservative management was selected for treating trochlear osteonecrosis, which involved incorporating a range of motion exercises after the proximal humerus fracture had fully healed. Close follow-up was arranged to monitor his progress. Subsequently, he was discharged from the hospital.
Follow-up and Outcome
During the follow-up, the patient had no complaints regarding his elbow.
Discussion
The exact causes of Hegemann's disease remain unidentified. Nevertheless, various traumatic and non-traumatic factors have been conclusively associated with trochlear osteonecrosis. These include acute or past trauma such as fractures, persistent repetitive microtrauma, and contusions. Additionally, in some cases, the condition may arise without an identifiable cause, being classified as idiopathic [8,10,11]. However, certain risk factors have been associated with osteonecrosis such as corticosteroid therapy, alcohol consumption, bone injuries, systemic conditions such as malignancy, lupus erythematosus, sickle cell disease, Gaucher's disease, Caissons disease, gout, vasculitis, osteoarthritis, osteoporosis, radiation therapy, chemotherapy, and organ transplantation, particularly renal transplants [7]. Rarely, infections such as HIV and meningococcemia leading to disseminated intravascular coagulation have been associated [12,13]. Nonetheless, a notable proportion of cases remain idiopathic [7]. In this study, the patient had a history of two previous traumas, followed by a recent fall to the ground.
The ossification center of the trochlear epiphysis typically becomes visible after the age of five, progressing in development between 8 and 13 years in boys. Fusion with the humeral metaphysis occurs between the ages of 13 and 16. [8]. Two vessels enter the posterior aspect of the lateral humeral condyle and traverse an extended path through the lateral condylar ossification center, ultimately reaching the lateral section of the trochlea. The trochlea itself is nourished by these lateral vessels, along with a distinct vessel that permeates the medial, nonarticular portion of the trochlea [6]. The presence of these two blood supplies gives rise to a watershed area within the trochlear groove. Disruption of this distinctive blood supply can occur during the injury, as well as during closed or open reduction maneuvers, or internal fixation procedures [6,14].
Trochlear AVN can manifest either partially or entirely. In Type A cases, where there is partial involvement, the apex or lateral segment of the trochlear medial crista is typically affected. Patients in this category typically show no symptoms and do not exhibit angular deformities. Radiologically, they display a central deficiency in the distal humeral epiphysis. Conversely, in Type B cases, where there is complete involvement, the entire trochlear metaphysis is affected. These patients often experience a gradual onset of elbow varus deformity and a notable reduction in range of motion [8].
According to Schumacher et al., Hegemann's disease progresses through five distinct stages as observed on radiographs [15]. In Stage 1, there is an initial decrease in density followed by plaque-like sclerosis in the center of epiphyseal ossification. Stage 2 is characterized by a decrease in size and increased condensation of the ossification center. In Stage 3, loosening occurs along with the emergence of new ossification. Stage 4 is marked by regeneration and enlargement of the ossification center. Finally, Stage 5 represents the ultimate stage, which may involve either complete or partial recovery [11,16].
Uhrmacher et al. were the pioneers in identifying Hegemann's disease in two children aged 7 and 9 years. The primary symptoms observed were swelling and limited range of motion in the elbow [17]. In the current case, the patient presented with significant discomfort characterized by severe right shoulder pain and swelling attributed to a proximal humeral metaphyseal greenstick fracture incurred from a fall, accompanied by mild discomfort localized to the right lateral aspect of the elbow. The elbow exhibited a normal range of motion, with no observable deformity noted. Notably, preceding the recent accident, the patient had been asymptomatic for trochlear AVN.
Hegemann's disease is frequently identified through radiographic examination months or even years following trauma, leading to potential confusion with a condition known as fishtail deformity. This deformity, uncommonly encountered, typically arises as a complication after a distal humeral fracture during childhood [5]. Hegemann’s disease was initially identified before the availability of computed tomography (CT) scans or magnetic resonance imaging (MRI) techniques. Consequently, the fishtail deformity might have been considered a subsequent stage of Hegemann’s disease, which is typically benign following a mild vascular disorder. However, complete AVN could potentially develop following traumatic incidents. Another perspective suggests that Hegemann’s disease could represent a benign, self-limiting phase of the fishtail deformity after unrecognized injury or repetitive micro-trauma. Characterized by irregularity of the trochlea and sclerosis, Hegemann’s disease presents distinct clinical features [5]. However, Beyer et al. showed that trochlear aseptic necrosis exhibits a low-intensity signal on T1-weighted MRI images. They also emphasized MRI's utility in diagnosing Hegemann's disease and confirming recovery [11]. In the current report, the radiograph of the right elbow incidentally revealed osteonecrosis of the distal humeral trochlea, with no other superimposed findings noted.
The objective of treating AVN is to enhance the functionality of the affected joint, prevent further deterioration of the bone, and secure the survival of both bone and joint structures. Identifying and addressing the underlying cause of AVN is imperative whenever feasible [7]. A review conducted by Claessen et al. observed that all eight documented cases of Hegemann disease underwent conservative treatment, involving rest and modifications in activity. Among the five patients with recorded clinical progress, four experienced complete alleviation of pain following conservative management, while the fifth patient continued to experience intermittent pain [5]. However, surgical treatment options such as arthroscopic debridement, core decompression, vascularized bone grafting, and bone reconstruction are recommended when symptoms persist and signs of collapse become apparent [1]. In the present case, the affected shoulder was immobilized using a sling and swathe, and the patient received pain relief medication. Conservative treatment was chosen for trochlear osteonecrosis, including the range of motion exercises once the proximal humerus fracture had healed.
Conclusion
Trochlear AVN is a rare condition that might cause mild discomfort or even be asymptomatic, potentially being diagnosed incidentally through radiographs. Typically, it can be managed with conservative treatment methods.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable.
Patient consent (participation and publication): Written informed consent was obtained from the parent of the patient for publication.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: AKG was a significant contributor to the conception of the study and the literature search for related studies. SSR, RJR, DQH, BJR and PHR were involved in the literature review, the study's design, and the critical revision of the manuscript, and they participated in data collection. HAN and KKM were involved in the literature review, study design, and manuscript writing. SHT was the radiologists who performed the assessment of the case. HAN and AKG confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Not applicable.

The Hidden Problem of Cross-Reactivity: Challenges in HIV Testing During the COVID-19 Era: A Systematic Review
Berun A. Abdalla, Meer M. Abdulkarim, Shvan H. Mohammed, Rewas Ali Azeez, Talar Sabir Hameed,...
Abstract
Introduction
Human immunodeficiency virus (HIV) and Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV2) surface glycoproteins, including shared epitope motifs, show similarities. This may lead to false-positive HIV results due to cross-reactivity between the two viruses. This study presents a systematic review of the published studies on their cross-reactivity.
Methods
A systematic review of the published studies of HIV and SARS-CoV2 cross-reactivity was conducted, the studies that met the following criteria were included: 1) Studies in the English language. 2) Studies in which the title included the required keywords. 3) Studies in which false positive results were achieved and confirmed. 4) Studies investigating the possibility of cross-reactivity between HIV and SARS-CoV2.
Results
A total of 11 studies and 466,140 patients were analyzed. Of the specified sexes, 363,786 (82.1%) of the participants were males. A total of 707 false-positive HIV results were recorded, of which 122 (17.3%) had detectable Coronavirus disease 2019 (COVID-19) antibodies. The remaining 585 (82.7%) false positives were either healthy patients or patients recovered from COVID-19 with no detectable COVID-19 antibodies. Twenty-five distinct tests were used as initial and confirmatory tests for both COVID-19 and HIV. Six (24%) unique fourth-generation HIV antigen/antibody combination tests, six (24%) HIV-specific molecular tests, and four (16%) HIV immunoassays were used.
Conclusion
COVID-19 should be considered a potential cause of false-positive results in HIV tests, due to the cross-reactivity between the antibodies or antigens from both viruses.
Introduction
Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global pandemic, leading to widespread illness and high mortality rates. This infectious disease exhibits a wide range of clinical manifestations, from no symptoms or mild cases to severe respiratory distress and multi-organ failure [1]. COVID-19 was first identified in individuals exposed to a seafood market in Wuhan City, China, in December 2019. Its rapid spread led the World Health Organization (WHO) to declare it a public health emergency of international concern on January 30, 2020, and it was officially classified as a pandemic on March 11, 2020 [2].
Since the first commercial approval of HIV testing in 1985, significant advancements have been made in the field. However, false positive results are often linked to infections with other pathogens such as Epstein-Barr virus, influenza, and Mycobacterium tuberculosis. Additionally, instances of false positive HIV test results have been reported in conjunction with infections caused by SARS-CoV-2 [3].
Surface glycoproteins of HIV and SARS-CoV-2 exhibit similarities, including shared epitope motifs. As a result, false-positive HIV screening results have been reported in 2020 and 2021 among individuals with acute or previous SARS-CoV-2 infections. False-positive results in HIV enzyme-linked immunosorbent assay (ELISA) tests were also observed during COVID-19 vaccine trials conducted in Australia [4]. These findings emphasize the need to consider recent SARS-CoV-2 infections when interpreting HIV test results. Clinicians should remain vigilant about this association and may need repeated testing to confirm accurate diagnoses. This study aims to add to the available literature through a thorough investigation and comprehensive review of the causes, correlations, and considerations regarding this topic.
Methods
Study design
This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [5].
Data sources and search strategy
Several strategies were used in conducting the search process, PubMed and Google Scholar were initially utilized using the following keywords: (HIV OR human immunodeficiency virus) AND (COVID-19 OR SARS-CoV2) AND (Cross-reactivity) AND (False-positive). Citations in the retrieved studies were also utilized to recover more papers. The AI tools “Perplexity” and “Consensus” were also used to strengthen the search process to find similar documents.
Eligibility criteria
The studies with the following specifications were included in the study: 1) Studies in the English language. 2) Studies in which the title included the required keywords. 3) Studies in which false positive results were achieved and confirmed. 4) Studies investigating the possibility of cross-reactivity between HIV and COVID-19. Studies published in non-peer-reviewed journals [6] and those failing to meet the inclusion criteria were excluded from the review.
Selection and extraction of data
The titles and abstracts of identified studies were first screened, followed by a thorough full-text review to assess eligibility. Key data, including study design, number of patients, patient demographics, COVID-19 status, HIV status, testing techniques, and test results were extracted from the included studies.
Data analysis
Data was analyzed using Microsoft Excel (2019) to collect and organize the extracted data. The Statistical Package for Social Sciences (SPSS) version 27.0 was employed for the analysis, specifically for descriptive statistics. The results are presented as frequencies, percentages, medians, and mean with standard deviations.
Results
A total of 43 studies were retrieved from the search, of which four were excluded, before any screening due to being unretrievable, and one study was excluded for being written in non-English language. During the initial screening, the titles of 19 studies didn’t meet the inclusion criteria. Upon screening, six more studies were excluded as their abstracts didn’t meet the inclusion criteria. After a thorough assessment for eligibility, two more studies were removed because they were from non-peer-reviewed journals. Ultimately, 11 studies were included and analyzed [3,4,7-9,12,15-19] (Figure 1).
A total of 466,140 patients were analyzed. Of the specified sexes, 363,786 (82.14%) of the participants were males. A total of 707 false-positive HIV results were recorded, of which 122 (17.3%) had detectable COVID-19 antibodies. The remaining 585 (82.7%) false positives were either healthy patients or recovered from COVID-19 with no detectable antibodies. One case of false-positive COVID-19 in an acute HIV infection was also recorded (Table 1).
Author/Year |
Study design |
Number of patients |
Age* |
Sex |
COVID-19 status |
HIV status |
Initial testing technique |
Confirmatory testing technique |
Final Results |
|
M |
F |
|||||||||
Alfie et al./ 2023 [4] |
Cohort |
921 |
Median age 41 (IQR 32-54)
|
277 |
397 |
Detectable Covid antibodies = 674 |
True +ve in 3 patients -ve in 671 patients |
Genscreen Ultra HIV Ag-Ab & COVIDAR kit |
ELISA, RecomLine HIV-1 & HIV-2 IgG, &Abbott m2000 RealTime PCR |
False +ve HIV in 12 (1.8%) patients |
43 (IQR 34-56)
|
90 |
110 |
Previously diagnosed with COVID with no detectable antibody = 200 |
-ve |
No false +ve HIV results |
|||||
42 (IQR 36-57) |
18 |
29 |
Vaccinated = 47 |
-ve |
No false +ve HIV results |
|||||
Shallal et al./ 2022 [9] |
Cross-sectional |
23,278 |
N/A |
N/A |
Total=167 +ve = 12 -ve = 155 |
True +ve in 167 patients |
Elecsys HIV Duo & PCR test |
HIV-1 and 2 antibody tests & Quantitative HIV RNA test |
No +ve HIV tests |
|
Total=70 +ve= 16 -ve = 54 |
False +ve HIV in 70 patients, of which 16 (22.9%) were +ve for Covid. |
|||||||||
Total=23,041 +ve = 0 -ve = 23,041 |
No false +ve HIV tests |
|||||||||
Hayat et al./ 2021 [15] |
Cross-sectional |
2,593 |
Median age
21.5 |
2,361 |
232 |
Recovered with detectable antibodies |
True +ve in one patient |
Electrochemiluminescence immunoassay & polymerase chain reaction |
Line immunoassay |
False +ve HIV in 68 (1.84%) donations |
407,363 |
27 |
350,724 |
56,639 |
Healthy |
True +ve in 49 patients |
False +ve HIV in 461 donations |
||||
Gudipati et al./ 2023 [8] |
Cross-sectional |
31,910 |
Mean age 37.13 |
10,295 |
21,615 |
True +ve in 229 patients |
True +ve in 248 patients |
SARS-CoV-2 Real-Time PCR Test & HIV Fourth-Generation Ag/Ab Assay |
HIV-1/HIV-2 Antibody Differentiation Immunoassay & HIV-1 Nucleic Acid Amplification Test |
False +ve HIV in 87 patients of which 17 (19.54%) were +ve for Covid |
Elsner et al./2023 [16] |
Cohort |
65 |
Median age 51 (IQR 19) |
13 |
42 |
Previously diagnosed with covid |
-ve |
Elecsys HIV combi PT & Architect HIV Ag/Ab Combo |
INNO-LIA HIV I/II Score |
No false +ve HIV results |
1 |
32 |
|
1 |
+ve |
-ve |
Elecsys HIV combi PT, INNO-LIA HIV I/II Score |
Architect HIV Ag/Ab Combo, INNO-LIA HIV I/II Score, HIV-1 qPCR |
Repeated False +ve HIV for 3 months with subsequent Resolution |
||
Hakobyan et al./2023 [17] |
Case report |
2 |
69 |
1 |
|
+ve |
-ve |
Fourth-generation HIV combination test |
ELISA, HIV-1 genotype testing, Western blot & HIV integrase genotype test |
False +ve HIV |
80 |
1 |
|
+ve |
-ve |
Fourth-generation HIV combination test |
ELISA, Viral load test |
False +ve HIV |
|||
Tan et al./ 2020 [18] |
Case report |
2 |
Early 20s |
1 |
|
+ve |
-ve |
Chemiluminescent immunoassay |
VIDAS HIV duo assay & MP Biomedicals HIV immunoblot |
False +ve HIV |
Early 70s |
1 |
|
+ve |
-ve |
Chemiluminescent immunoassay |
VIDAS HIV duo assay & MP Biomedicals HIV immunoblot |
False +ve HIV |
|||
Srivastava et al./2022 [19] |
Case report |
2 |
69 |
1 |
|
+ve |
-ve |
HIV DUO ULTRA, 4th generation assay |
TRI-DOT Rapid HIV flow-through test |
False +ve HIV |
9 |
1 |
|
+ve |
-ve |
HIV DUO ULTRA, 4th generation assay |
TRI-DOT Rapid HIV flow-through test |
False +ve HIV |
|||
Salih et al./ 2021 [7] |
Case report |
1 |
32 |
|
1 |
+ve |
-ve |
HIV immunoassay test |
RN PCR |
False +ve HIV |
Balasubramanian et al./ 2023 [3] |
Case report |
1 |
20 |
1 |
|
+ve |
-ve |
4th Generation HIV 1 and 2 antibody/antigen testing |
HIV antibody testing |
False +ve HIV |
Yamaniha et al./2021 [12] |
Case report |
1 |
39 |
1 |
|
-ve |
+ve |
Rapid Antigen Test for SARS-CoV-2 & Rapid Antigen/Antibody Test for HIV |
Real-Time Polymerase Chain Reaction, Chemiluminescent Immunoassay, Western Blot Assay & HIV-RNA |
False +ve Covid in a patient with acute HIV infection |
*Age was not given in a uniform manner among the different studies. N/A: not applicable, +ve: Positive, -ve: Negative, Cp:Convalescent plasma |
Twenty-five distinct tests were used as initial and confirmatory tests for both COVID-19 and HIV. Six (24%) unique fourth-generation HIV antigen/antibody combination tests, six (24%) HIV-specific molecular tests, and four (16%) HIV-specific antibody tests were used (Table 2).
Variables |
Frequency (%) |
Sex* Male Female |
Number of patients (442,852) 363,786 (82.1%) 79,066 (17.9%) |
Age* Combined mean Combined median Age variance |
Number of patients (442,852) 46.89 ± 8.48 38.65 19.94 |
Testing techniques HIV Antibody/Antigen (4th Generation) Test HIV-Specific Molecular Tests HIV Antibody-Specific Tests HIV Immunoassays Rapid tests SARS-CoV-2-Specific Tests HIV Differentiation Tests |
Total unique tests (25)
6 (24%)
6 (24%) 4 (16%) 4 (16%) 2 (8%) 2 (8%) 1 (4) |
False-positive HIV results Detectable COVID-19 antibodies Idiopathic false-positives Idiopathic false-positive HIV results 4TH Generation HIV Ag/Ab Test Enzyme-linked immunosorbent assay |
Total (707) 122 (17.3%) 585 (82.7%) Total (585) 124 (21.2%) 461 (78.8%) |
*The sex and age of 23,278 patients from Shallal et al. were not mentioned |
Discussion
As a systemic illness, COVID-19 affects multiple body systems, and a minority of patients may also develop additional microbial co-infections that worsen their condition. Approximately 7.2% of cases are reported to involve co-infections with other bacterial, fungal, or viral pathogens, which can influence both patient outcomes and treatment strategies. However, instances of false-positive results for co-infections and misdiagnoses have been documented in the context of COVID-19. For example, cross-reactivity between SARS-CoV-2 and certain pathogens, such as the Dengue virus, has been occasionally reported in the literature [7]. During the 2003 severe acute respiratory syndrome (SARS) pandemic, it was demonstrated through sequence analysis that the viral proteins of HIV and SARS-CoV-1 shared sequence motifs that contributed to forming their active conformation [8]. In the current review, 17.3% of the false positives were of patients with detectable COVID-19 antibodies, showing a high possibility of cross-reactivity. Shallal et al. analyzed 23,278 medical charts and found that false-positive HIV was significantly higher in patients with COVID-19 [9].
Alfie et al. showed that compared to the Centers for Disease Control and Prevention (CDC) rate of false positive HIV screenings, which is 0.4%, the rate of false positives is significantly higher when COVID-19 antibodies are detectable, at 1.8%. When considering samples only from people previously diagnosed with COVID-19, the rate is again significantly higher at 1.4% [4]. In a cross-sectional study of 31,910 medical records, Gudipati et al. showed that After accounting for all covariates, only false-positive HIV was significantly linked to COVID-19 [8] .
While exploring the cross-reactivity of antibodies targeting HIV-1 with the SARS-CoV-2 spike protein, Mannar et al. identified 2G12, PGT128, and PGT126, three glycan-reactive antibodies that exhibited various levels of cross-reactivity with SARS-CoV-2 spike protein [10]. In a similar investigation, Perween et al. demonstrated that antibodies targeting the SARS-CoV-2 spike protein could cross-react with HIV-1 envelope proteins, particularly gp41; however, these antibodies did not neutralize HIV-1. Conversely, antibodies against HIV-1 envelope protein gp140 also exhibited cross-reactivity with SARS-CoV-2 spike protein but lacked neutralizing capability against SARS-CoV-2 [11]. This bidirectional cross-reactivity was further illustrated by a case reported by Yamaniha et al. , which reported a case of false positive COVID-19 in a 39-year-old male with acute HIV infection [12]. Zhang et al. contributed to this discourse by confirming that 4 specific insertions in the spike protein of SARS-CoV-2 share similarities with HIV-1 proteins [13]. They also observed that the spike protein contained short insertions made up of 6-8 amino acid segments. However, they posited that while these similarities suggest potential cross-reactivity between antigens of both viruses, they may also result from convergent evolution or shared structural features across different viral families. In the current review, 585 (82.7%) of the false positives were idiopathic, of which 124 (21.2%) were tested with 4th generation HIV assays, which work by utilizing distinct, simultaneous reactions to identify HIV antigen (p24) and HIV-1/2 antibodies. The system converts cut-off index (COI) values into qualitative results, reporting them as nonreactive (COI < 1.0) or reactive (COI ≥ 1.0) [8]. Zhang et al. suggested that due to the nature of the test, an exact amino acid sequence homology to HIV is not required to yield a false positive test result, it requires only enough antigenic similarity for a detectable amount of false signal [13] . The absence of strict homology and the short length may help to explain the idiopathic occurrence of false positive HIV results in some individuals. While antigenic homology may play a key role, the connection to SARS-CoV-2 antigens remains unclear. Yang et al. published the results of an HIV screening program that used a 4th generation HIV assay, they reported that out of the 578 participants who screened positive for HIV, 13.3% were positive for both antigen and antibody, 77.7% were positive for antibodies only, and 9.0% were positive for antigens only, making it important for more research to be conducted to build models that offer empirical evidence to further support these hypotheses in future research [14].
While conducting the review, certain limitations were identified. Firstly, the variation in data presentation across the papers hindered the ability to maintain uniformity when finalizing the data. The retrospective nature of the studies made it difficult to create a true correlation between the variables.
Conclusion
Human immunodeficiency virus and COVID-19 exhibit cross-reactivity at several levels. Although the exact mechanisms and models have not been established yet, the findings highlight the importance of considering recent SARS-CoV-2 infections when interpreting HIV test results and implementing confirmatory tests to achieve true results.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable, as systematic reviews do not require ethical approval.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgements: None to be declared.
Authors' contributions: BAA, RQS and DAH significantly contributed to the study's conception and the literature search for related studies. MMA, SHM, SLE, and REA were involved in the literature review, manuscript writing, and data analysis and interpretation. RAA, TSH, NHM, KKM, SJI, DQH and BHB were involved in the literature review, the study's design, and the manuscript's critical revision. BAA and MMA confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: Perplexity AI v3.2.0 and Consensus AI were used in the literature review, the author assumes full responsibility for the content of the paper.
Data availability statement: Not applicable.

Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Eli Pradhan, Saif Hassan Alrasheed, Oksana I. Melnyk, Habibullah Azimi, Sina Raeisi, Amirhossein...
Abstarct
Introduction
Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This study presents a systematic review of reported cases of orbital HCs.
Methods
A systematic review of the published studies of orbital HCs was conducted, the studies that met the following criteria were included: 1) The presence of the infection was confirmed through diagnostic methods, surgical findings, or histopathology. 2) The study provided a detailed case presentation.
Results
Thirty-two studies (56 cases) met the inclusion criteria. Ten patients were from Afghanistan (17.9%). There was no gender predilection, the distribution was almost equal. The ages ranged from three to 80 years old. The most common symptoms that the patients presented with were proptosis of the affected eye (98.2%) and visual impairment (64.3%). The therapeutic approach of orbital HC was primarily surgical removal of the cyst accompanied by anthelmintic drugs in 41 (73.2%) cases. Concurrent HC was reported in two cases (3.6%), and recurrence with subsequent recovery was reported in four (7.1%) cases.
Conclusion
Orbital HC is a rare condition, primarily diagnosed using MRI, with surgery as the definitive treatment. Concurrent hydatidosis increases the risk of recurrence, requiring thorough and ongoing follow-up.
Introduction
Hydatidosis or hydatid cyst (HC) is a commonly recognized zoonotic disease caused by the larval form of the tapeworm Echinococcus granulosus. Humans act as intermediate hosts for this parasite, acquiring infection through direct contact with definitive hosts (e.g., sheep, goats, cattle, dogs) or consuming contaminated food or water. [1].
The global incidence of hydatidosis varies, with higher rates observed in regions where livestock farming is widespread. Key risk factors for contracting hydatidosis include close contact with dogs, livestock-related activities, and residence in areas where the disease is endemic. These cysts typically occur in the liver (50-70%) and lungs (20-30%). The global burden of HC is significant, with an estimated 2 to 3 million cases reported worldwide [2]. However, orbital HC is uncommon, representing less than 1% of all cases, accounting for 19.8% in endemic countries [3].
The World Health Organization (WHO) has classified Echinococcosis as one of the 20 neglected tropical diseases that pose significant public health concerns. To ensure consistent global monitoring, the WHO Informal Working Group on echinococcosis has categorized cysts of echinococcosis into five distinct types, grouped into three main categories. Specifically, CE1 and CE2 are indicative of active infection, CE3 represents an intermediate stage, while CE4 and CE5 are associated with inactive cysts [4].
In endemic regions, environmental and climatic conditions play a crucial role in the survival of parasite eggs and the living conditions of livestock and stray dogs. For example, Echinococcus granulosus eggs remain viable in water and damp sand for up to three weeks at 30°C, 4.5 weeks at 10–21°C, and 32 weeks at 6°C. They can also survive for several months in green pastures and gardens [4]. Although the WHO classifies hydatidosis as a neglected disease, it continues to be a significant public health concern due to its status as the second most impactful foodborne parasitic disease, its endemic presence in certain regions, and its potential to cause substantial morbidity. The WHO prioritizes the control and prevention of hydatidosis, particularly given its impact on human health, animals, and the food supply chain.
Orbital HC, although rare, is often linked to severe visual complications. As of the date of the current review, the available literature on orbital HC primarily consists of case reports and case series, with no reviews currently available. This study aims to provide and analyze a collection of data through a systematic review and a meta-data presentation.
Methods
Study design
This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [5].
Data sources and search strategy
A systematic review of the published studies of orbital HCs was conducted using Google Scholar and PubMed. Boolean operators (OR/AND) were used to refine the results. The keywords that were used in the search included: (eye OR orbital OR intraorbital OR ocular) AND (hydatid OR echinococcosis OR hydatidosis).
Eligibility criteria
Studies in languages other than English, as well as those not related to humans, were excluded either before or during the initial screening process. All studies on orbital HCs that met the following criteria were included: 1) The presence of the infection was confirmed through diagnostic methods, surgical findings, or histopathology. 2) The case presentation was detailed in the study. Studies published in non-peer-reviewed journals [6] or those failing to meet inclusion criteria were excluded.
Selection and extraction of data
The titles and abstracts of identified studies were first screened, followed by a thorough full-text review to assess eligibility. Key data were extracted from the included studies, including study design, country of origin, patient demographics (age, gender, residence), symptoms, history of HC, serological tests, diagnosis, management strategies, follow-up details, and recurrence rates.
Data analysis
Microsoft Excel (2019) was used to gather and organize the extracted data, while the Statistical Package for Social Sciences (SPSS) version 27.0 was utilized for data analysis (descriptive statistics). The findings were displayed as frequencies, percentages, ranges, and means with standard deviations.
Results
A total of 146 studies were retrieved. One was excluded as a duplicate, 14 were non-English, and 62 were unretrievable. After title and abstract screening, 21 studies did not meet the inclusion criteria. The remaining 48 underwent full-text review, with seven more excluded. Of the 41 studies assessed for eligibility, nine were excluded for being from non-peer-reviewed journals or preprints. Ultimately, 32 studies [3,7-37] (56 cases) met the inclusion criteria (Figure 1).
Of the included studies [3,7-37], 28 (87.5%) were case reports, while the remaining 4 (12.5%) were case series (Table 1). The highest number of patients were from Afghanistan (10, 17.9%), followed by India (8, 14.3%), Azerbaijan (8, 14.3%), and Morocco and Turkey (6 each, 10.7%). Patient ages spanned from 3 to 80 years, with a mean age of 27.45 ± 19.57 years. The majority of the cases occurred between the first and fifth decades of life (47, 83.9%). The right side was affected in 33 (58.9%) cases and there were no cases with bilateral HC. Sixteen patients (28.6%) were from rural areas, and 13 (23.2%) reported contact with dogs, sheep, or other cattle (Table 2).
Author |
Type of study |
Country of the patients |
N. of Patients |
Age |
Sex |
Symptoms |
Affected side |
Cyst Size (Cm) |
Surgical approach |
Cyst removal approach |
Adjuvant therapy |
Outcome |
Follow up (months) |
|
Abouassi et al. [3] |
Case Report |
Syria |
1 |
21 |
F |
Proptosis & visual impairment |
Right |
4.2 |
Fronto-orbitozy-gomatic orbitotomy |
Cystectomy |
Albendazole |
Recovered |
3 |
|
Ilhami et al. [7] |
Case series |
Morocco |
3 |
13 |
F |
Proptosis & decreased visual acuity |
Right |
4.2 |
Internal paracanthal orbitotomy |
Cystectomy |
Albendazole |
Recovered |
N/A |
|
67 |
F |
Proptosis, pain, headache & chemosis |
Left |
3.5 |
Superolateral orbitotomy |
Enucleation cystectomy |
Albendazole |
Recovered |
N/A |
|||||
43 |
F |
Proptosis |
Left |
2.9 |
Internal paracanthal orbitotomy |
Enucleation cystectomy |
Albendazole |
Recovered |
N/A |
|||||
Alabdullah et al. [8] |
Case Report |
Syria |
1 |
10 |
M |
Proptosis, diplopia & decreased vision |
Left |
2.7 |
Subperiosteal orbitotomy |
Lynch method |
Albendazole |
Recovered |
N/A |
|
Khan et al. [9] |
Case Series |
Pakistan |
11 |
15 |
F |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|
15 |
M |
Proptosis & visual impairment |
Right |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
3 |
F |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
17 |
F |
Proptosis& visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
28 |
F |
Proptosis & visual impairment |
Right |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
19 |
M |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
20 |
F |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
6 |
M |
Proptosis & visual impairment |
Right |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
6 |
M |
Proptosis & visual impairment |
Right |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
5 |
M |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
65 |
M |
Proptosis & visual impairment |
Left |
N/A |
Orbitotomy |
Unspecified |
Albendazole |
Recovered |
* |
|||||
Bamashmus et al. [18] |
Case report |
Yemen |
1 |
58 |
M |
Proptosis, impaired vision & chemosis |
Right |
N/A |
Transconjuctival & lateral orbitotomy |
PAIR method |
Mebendazole |
Recovered |
N/A |
|
Assimakopoulos et al. [19] |
Case report |
Greece |
1 |
31 |
F |
Proptosis & impaired vision |
Left |
N/A |
Lateral orbitotomy |
Modified cystectomy |
Albendazole |
Recovered |
3 |
|
Berradi et al. [20] |
Case report |
Morocco |
1 |
46 |
M |
Proptosis |
Left |
4.2 |
Unspecified |
Modified PAIR method |
None |
Recovered |
3 |
|
Chitra et al. [21] |
Case report |
Morocco |
1 |
3 |
F |
Proptosis & impaired vision |
Left |
2.8 |
Extradural frontal orbitotomy |
Barrett’s technique |
Albendazole |
Recovered |
24 |
|
Elkrimi et al. [22] |
Case Report |
Morocco |
1 |
5 |
M |
Proptosis |
Left |
3.1 |
Combined approach (endoscopy & supraorbital incision) |
Partial cystectomy |
Albendazole |
Recovered |
6 |
|
Hosaini et al. [23] |
Case report |
Afghanistan |
1 |
8 |
M |
Proptosis, chemosis, reduced vision & headache |
Right |
5 |
Transconjuctival orbitotomy |
Modified cystectomy |
Albendazole |
Recovered |
N/A |
|
Jaffar et al. [24] |
Case report |
Pakistan |
1 |
27 |
M |
Proptosis, visual impairment, reduced ocular motion & discharge |
Left |
5 |
Unspecified |
Unspecified |
None |
Recovered |
N/A |
|
Kars et al. [25] |
Case report |
Turkey |
2 |
7 |
M |
Proptosis & impaired vision |
Left |
N/A |
Transcranial orbitotomy |
Unspecified |
None |
Had recurrence, recovered after a second surgery |
24 |
|
11 |
F |
Proptosis, impaired vision & limited ocular motility |
Right |
N/A |
Transcranial orbitotomy |
Unspecified |
None |
Recovered |
6 |
|||||
Das et al. [26] |
Case report |
India |
1 |
52 |
M |
Proptosis |
Left |
4 |
Orbitotomy |
Unspecified |
Albendazole |
N/A |
N/A |
|
Motlagh et al. [27] |
Case report |
Iran |
1 |
24 |
M |
Proptosis & diplopia |
Right |
N/A |
Frontotemporal craniotomy & superior orbitotomy |
Partial cystectomy with saline irrigation |
Albendazole, antibiotics & steroid |
Recovered |
N/A |
|
Özek et al. [28] |
Case report |
Turkey |
1 |
52 |
F |
Proptosis, visual loss & orbital pain |
Right |
N/A |
Lateral orbitotomy |
Cystectomy with saline irrigation |
Mebendazole |
Recovered |
7 |
|
Rajabi et al. [29]
|
Case series
|
Azerbaijan
|
8
|
14 |
M |
Proptosis |
Right |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|
24 |
M |
Proptosis |
Right |
N/A |
Medial orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
13 |
M |
Proptosis |
Right |
N/A |
Superior orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
18 |
F |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
18 |
||||
62 |
F |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
62 |
||||
33 |
F |
Proptosis |
Right |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
33 |
||||
44 |
F |
Proptosis |
Left |
N/A |
Inferior orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
44 |
||||
26 |
M |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
26 |
||||
Haydar et al. [10] |
Case report |
Afghanistan |
1 |
22 |
M |
Proptosis, decreased vision &pain |
Left |
3.6 |
Inferior transconjunctival orbitotomy |
Aspiration and excision |
Albendazole |
Recovered |
10 |
|
Sendul et al. [11] |
Case report |
Turkey |
1 |
24 |
F |
Proptosis & visual impairment |
Right |
2.2 |
Medial transconjonctival orbitotomy |
Cystectomy with aspiration |
Albendazole |
Had recurrence, recovered after a second surgery |
N/A |
|
Mathad et al. [12] |
Case Report |
India |
1 |
80 |
F |
Proptosis & visual impairment |
Left |
3 |
Lateral orbitotomy |
Cystectomy |
None |
Recovered |
N/A |
|
Öztekin et al. [13] |
Case Report |
Turkey |
1 |
57 |
M |
Proptosis & visual impairment |
Right |
1.5 |
unspecified |
Unspecified |
None |
Recovered |
N/A |
|
Kumar et al. [14] |
Case Report |
India |
1 |
47 |
F |
Proptosis. Headache, pain & visual impairment |
Left |
3.7 |
Orbitotomy |
Modified cystectomy |
Albendazole |
Recovered |
12 |
|
Debela et al. [15] |
Case Report |
Ethiopia |
1 |
60 |
F |
Proptosis & visual impairment |
Left |
2.6 |
Medial anterior orbitotomy |
Modified cystectomy |
Albendazole |
Recovered |
3 weeks |
|
Anandpara et al. [16] |
Case report |
India |
1 |
45 |
F |
Gradual loss of vision & proptosis |
Left |
3.7 |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
10 |
|
Awad et al. [17] |
Case Series |
Egypt |
5 |
44 |
F |
Proptosis & diminished visual acuity |
Right |
N/A |
Transconjuctival incision |
Endocystectomy |
Topical antibiotics, steroid eye drops & NSAIDs |
Recovered |
58 |
|
13 |
M |
Proptosis, pain & diminished visual acuity |
Left |
N/A |
Transconjuctival incision |
Endocystectomy |
Topical antibiotics, steroid eye drops & NSAIDs |
Recovered |
42 |
|||||
11 |
M |
Proptosis & diminished visual acuity |
Left |
N/A |
Transconjuctival incision |
Endocystectomy |
Topical antibiotics, steroid eye drops & NSAIDs |
Recovered |
31 |
|||||
41 |
M |
Proptosis & diminished visual acuity |
Left |
N/A |
Transconjuctival incision |
Endocystectomy |
Topical antibiotics, steroid eye drops & NSAIDs |
Recovered |
23 |
|||||
39 |
F |
Proptosis, pain & diminished visual acuity |
Left |
N/A |
Transconjuctival incision |
Endocystectomy |
Topical antibiotics, steroid eye drops & NSAIDs |
Recovered |
11 |
|||||
|
|
|
|
18 |
F |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|
62 |
F |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
33 |
F |
Proptosis |
Right |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
44 |
F |
Proptosis |
Left |
N/A |
Inferior orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
26 |
M |
Proptosis |
Left |
N/A |
Lateral orbitotomy |
Unspecified |
Albendazole |
Recovered |
** |
|||||
Rajabi et al [30] |
Case report |
Iran |
1 |
23 |
F |
Severe proptosis |
Right |
N/A |
Lateral orbitotomy |
Total resection |
Albendazole |
Recovered |
48 |
|
Turgut et al. [31] |
Case report |
Turkey |
1 |
5 |
M |
proptosis |
Left |
N/A |
Transcranial approach |
Cystectomy with saline irrigation |
Mebendazole |
Had recurrence, recovered after conservative approach |
36 |
|
Arora et al. [32] |
Case report |
India |
1 |
16 |
M |
Impaired vision & dull headache |
Left |
N/A |
Curette evacuation |
Unspecified |
None |
Recovered |
N/A |
|
Lenztzsch et al. [33] |
Case report |
Germany |
1 |
5 |
F |
Proptosis, downward displacement of the eye |
Left |
N/A |
Lateral transosseous orbitotomy |
Unspecified |
Albendazole |
Recovered |
N/A |
|
Al-muala et al. [34] |
Case report |
Iraq |
1 |
42 |
F |
Swelling, proptosis, visual impairment & headache |
Right |
3 |
Lateral rhinotomy |
Cystectomy |
Albendazole |
Recovered |
8 |
|
Ahluwallaet al. [35] |
Case report |
India |
1 |
30 |
F |
Proptosis & headache |
Right |
2.5 |
Anterior orbitotomy with lateral extension |
Unspecified |
None |
Recovered |
N/A |
|
Sihota et al. [36] |
Case report |
India |
1 |
14 |
M |
Recurrent proptosis |
Left |
N/A |
No surgery was performed |
N/A |
Albendazole |
Had recurrence & recovered |
24 |
|
Huilgol et al. [37] |
Case report |
India |
1 |
8 |
F |
Proptosis, pain & diminished vision |
Right |
N/A |
Exenteration of the orbit |
N/A |
None |
Recovered |
N/A |
|
N/A: Not applicable, M: Male, F: Female, cm: Centimeter *Khan et al. gives a range of follow up periods between 3 to 12 months without specifying the exact periods of each patient. **Rajabi et al. gives a range of follow up periods between 2 to 6 years without specifying the exact periods of each patient. |
Variables |
Frequency (%)/mean ± SD |
Mean age |
27.45 ± 19.57 |
Age Group (years) 0-9 10-19 20-29 30-39 40-49 50-59 60-69 80-89 |
Number of patients (56) 11 (19.6%) 14 (25%) 9 (16.1%) 5 (8.9%) 8 (14.3%) 4 (7.2%) 4 (7.2%) 1 (1.8%) |
Gender Male Female |
Number of patients (56) 27 (48.2%) 29 (51.8%) |
Country of patients Afghanistan India Azerbaijan Morocco Turkey Egypt Iran Pakistan Syria Yemen Greece Ethiopia Germany Iraq |
Number of patients (56) 10 (17.9%) 8 (14.3%) 8 (14.3%) 6 (10.7%) 6 (10.7%) 5 (8.9%) 3 (5.36%) 3 (5.36%) 2 (3.57%) 1 (1.8%) 1 (1.8%) 1 (1.8%) 1 (1.8%) 1 (1.8%) |
Affected side Right side Left side |
Number of patients (56) 33 (58.9%) 23 (41.1%) |
Area of residency Urban Rural N/A |
1 (1.8%) 16 (28.6%) 39 (69.6%) |
Contact with sheep and dogs Reported N/A |
13 (23.2%) 43 (76.8%) |
Proptosis was present in 55 cases (98.2%), while visual impairment was reported in 37 cases (64.3%). Magnetic resonance imaging (MRI) was used for diagnosis in 38 cases (67.8%), while computed tomography (CT) was used in 28 cases (50%). Laboratory tests were conducted in 37 cases (66.1%), with 32 (86.5%) yielding normal results. The primary treatment for orbital HC was surgical removal of the cyst combined with anthelmintic therapy in 41 cases (73.2%). Surgery alone was performed in 14 cases (25%), while a conservative approach was used in one case (1.8%). Among those who underwent surgery, orbitotomy was the preferred surgical approach for accessing the cyst in 41 cases (74.5%). Cystectomy was the most common removal method, performed in 20 cases (36.4%), while the PAIR method (puncture, aspiration, injection, and re-aspiration) was used in 2 cases (3.6%). Follow-up durations ranged from 3 weeks to 72 months. Concurrent HC was reported in 2 cases (3.6%), while recurrence followed by recovery occurred in 4 cases (7.1%) (Table 3).
Variables |
Frequency (%) |
Presentation Symptomatic Asymptomatic |
Number of patients (56) 56 (100 %) 0 |
Common symptoms Proptosis Visual impairment |
Symptomatic patients (56) 55 (98.2%) 36 (64.3%) |
Imaging modalities MRI CT scan |
38 (67.8%) 28 (50%) |
Laboratory tests Positive Negative |
Number of patients (37) 5 (13.5%) 32 (86.5%) |
Mean cyst size (cm) ± SD |
3.25 ± 0.9 |
Therapeutic approach Surgery & anthelmintic drugs Surgery alone Conservative approach |
Number of patients (56) 41 (73.2%) 14 (25%) 1 (1.8%) |
Surgical technique for accessing the orbit Orbitotomy Trans-conjuctival incision Unspecified Combined approach Lateral rhinotomy Exenteration of the orbit Curette evacuation Transcranial approach |
Number of patients (55)
41 (74.5%) 5 (9.1%) 3 (5.5%) 2 (3.6%) 1 (1.8%) 1 (1.8%) 1 (1.8%) 1 (1.8%) |
Surgical technique for cyst removal Cystectomy Unspecified PAIR method Lynch method Aspiration and excision Barrett’s technique Total resection Aspiration and excision |
Number of patients (55) 20 (36.4%) 28 (50%) 2 (3.6%) 1 (1.8%) 1 (1.8%) 1 (1.8%) 1 (1.8%) 1 (1.8%) |
Anthelmintic drug of choice Albendazole Mebendazole |
Number of patients (42) 39 (92.9%) 3 (7.1%) |
Outcome Recovery N/A |
Number of patients (56) 55 (98.2%) 1 (1.8%) |
Discussion
Hydatid disease is a parasitic infection endemic in many regions worldwide. While traditionally attributed to Echinococcus granulosus, recent studies have identified five causative Echinococcus species with ten distinct genotypes (G1–G10), including E. oligarthrus, E. equinus, E. granulosus sensu stricto, E. canadensis, and E. felidis [4]. Orbital HCs are typically primary and occur unilaterally [7]. In endemic regions, HCs are the second most common cystic orbital lesions (25.8%), following dermoid cysts (29.7%) [8,9].
The clinical manifestations of HC primarily result from their mass effect on surrounding structures, especially in confined areas like the orbit. The predominant clinical manifestation of intra-orbital hydatid cysts, as observed in the present review, is a gradually progressive, unilateral proptosis, which may present in either an axial or non-axial orientation. This condition is generally painless, irreducible, non-pulsatile, and lacks blowing characteristics. If the cyst ruptures, it can cause inflammation. Additional symptoms of orbital HCs may include ocular pain, diplopia, headache on the affected side, blurred vision, vision loss, chemosis, eyelid edema, restriction of extraocular movements, and orbital cellulitis. In more advanced stages, signs may include optic disc swelling, optic atrophy with abnormal papillary defects, retinal vein engorgement, orbital bone erosion, hypopyon, and further eyelid edema [10]. The findings of the current review indicate that there is no evident sex predilection, as both males and females are affected at comparable rates. This observation aligns with existing literature; for instance, Khan et al. reported a case series in which 45.45% of the patients were female [9]. Although some suggest that the left side may be more prone to involvement due to the path of the left carotid artery [10], the findings of the current review indicate that the path of the left carotid artery does not predict which side will be affected, and there hasn’t been any definitive factor that can determine which side will be involved.
Children and young adults are the most commonly affected age groups; however, the condition is not limited to them. In the present review, the age of affected individuals ranged from three to 80 years, demonstrating the wide age distribution of the disease. Younger individuals may be more exposed to environments or activities that increase their risk of ingesting Echinococcus eggs, such as direct contact with infected animals (particularly dogs) or consumption of contaminated food or water. Additionally, they may be exposed to these risk factors for a longer duration, allowing sufficient time for HCs to form and grow before the disease develops. Cysts grow at an average rate of about 1–1.5 cm per year. Currently, there is no definitive categorization of “giant” HCs in the literature. Due to the limited space in the orbital cavity, patients typically develop symptoms within two years [10]. Orbital HCs are often diagnosed early in children due to the limited space within the orbit. The diagnosis of orbital HCs requires a combination of approaches, including laboratory tests, imaging, and histopathology for confirmation. Although various serological tests are available for the diagnosis of echinococcosis, their sensitivity is often limited in cases of orbital hydatid cysts. This limitation is evident in the present
review, where only five out of 37 serological tests produced positive results. They also have lower sensitivity compared to tests for other organs, as the parasitic proteins are less exposed to the immune system in the orbit [11].
Imaging tests, particularly MRI and CT, are the most commonly used modalities for diagnosing orbital HCs, a trend observed in the current review. On CT imaging, the lesion appears hypodense, unilocular, well-defined, and thin-walled, with a homogeneous mass featuring a hyperdense rim and capsular enhancement. On orbital MRI, the cyst demonstrates low signal intensity on T1-weighted images and high signal intensity on T2-weighted images, with contrast enhancement of the capsule [12]. MRI is superior to other imaging modalities as it provides more detailed information and can differentiate the cyst from other lesions and surrounding tissue. The differential diagnosis should include other cystic mass lesions, such as abscesses, mucoceles, intra-orbital hematomas, lacrimal tumors or cysts, and lymphangiomas [13].
Regarding treatment, surgical removal of the cyst without rupture is preferred. However, this is not always feasible due to the anatomical complexity of the orbit. The complex structure and thin walls of orbital HCs make them prone to rupture. Rupture may also result in the persistence of residual cyst wall fragments or cause secondary implantation of the parasite [14]. The PAIR method has emerged as a minimally invasive alternative for treating intra-abdominal HCs. However, for orbital HCs, as demonstrated in cases by Bamashmus et al. and Berradi et al., the PAIR method has been used out of necessity, primarily due to the anatomical constraints of the surgical area and the accidental rupture of the cyst [18,20]. Based on the results of the current review, orbitotomy is the preferred surgical approach for accessing and exploring the cyst in the orbit. However, various other techniques can be employed, with the choice of approach largely determined by the cyst's location, size, and the surgeon's expertise. Elkrimi et al. utilized a combined endoscopic and supraorbital incision approach to access a 3.1 cm cyst [22], while Mathad et al. and Al-Muala et al. accessed a 3 cm cyst using lateral orbitotomy and lateral rhinotomy, respectively [12,34]. The findings of the current review suggest that cystectomy is the preferred surgical technique for cyst removal. However, complications during the procedure can necessitate alterations in the surgical approach, requiring immediate modifications, as reported by Sendul et al [11].
Preoperative anthelmintic therapy, particularly with albendazole, is crucial for preventing parasite spread and reducing the risk of anaphylactic reactions in case of cyst rupture during surgery [12]. Postoperative administration of albendazole or mebendazole is also recommended to reduce the likelihood of relapse. Albendazole is commonly preferred due to its superior systemic absorption and better ability to penetrate cysts [10]. In the current review, Albendazole was used in 92.9% of the cases. Additionally, postoperative therapy included the use of steroids, NSAIDs, and antibiotics to manage symptoms, as shown by Awad et al. [17].
Regarding recurrence, the findings of this review suggest a higher likelihood of recurrence in cases with concurrent hydatidosis. The increased parasitic burden in these cases may be a significant contributing factor to disease recurrence. Preventing recurrence can be achieved by improving basic hygiene practices, such as handwashing after contact with dogs and sheep, enhancing livestock slaughter hygiene, ensuring continuous deworming of dogs, and promoting public education. During the course of this review, several limitations were identified. Firstly, most of the included papers, as well as the majority of the available literature, are case reports and case series. Additionally, a large amount of data was unretrievable during the search process.
Conclusion
Orbital HC is a rare condition, primarily diagnosed using MRI, with surgery as the definitive treatment. Concurrent hydatidosis increases the risk of recurrence, requiring thorough and ongoing follow-up.
Declarations
Conflicts of interest: The author(s) have no conflicts of interest to disclose.
Ethical approval: Not applicable, as systematic reviews do not require ethical approval.
Patient consent (participation and publication): Not applicable.
Funding: The present study received no financial support.
Acknowledgments: None to be declared.
Authors' contributions: EP, SHA, and OIM significantly contributed to the study's conception and the literature search for related studies. MMA, HAN, REM, and YMM were involved in the literature review, manuscript writing, and data analysis and interpretation. HA, SR, AA, and BA were involved in the literature review, the study's design, and the manuscript's critical revision. HAN and MMA confirm the authenticity of all the raw data. All authors approved the final version of the manuscript.
Use of AI: AI was not used in the drafting of the manuscript, the production of graphical elements, or the collection and analysis of data.
Data availability statement: Not applicable