Angiogenesis, Inflammation & Therapeutics | Online ISSN  2207-872X
RESEARCH ARTICLE   (Open Access)

Enhancing Radiological Biomedical Natural Language Processing Tasks with Radiology-Specific Word Encodings: A Comparative Analysis of Word Embeddings Sources

Kamlesh Kumar Yadav 1*, Dhablia Dharmesh Kirit 1

+ Author Affiliations

Journal of Angiotherapy 8(9) 1-6 https://doi.org/10.25163/angiotherapy.899873

Submitted: 15 July 2024  Revised: 28 August 2024  Published: 05 September 2024 

Abstract

Background: Machine Learning (ML)-based Biomedical Natural Language Processing (BNLP) techniques have garnered attention in radiology. However, these models typically depend on Word Encodings (WE) trained on generic datasets, as radiology-specific word libraries are limited. Objective: This study aimed to investigate the potential of radiography as a comprehensive database for generating Radiology-Specific Word Encodings (RSWE), enhancing the efficiency of BNLP tasks, especially in processing radiological texts. Methods: A systematic evaluation was conducted using WE derived from four databases: medical records, biomedical journals, Wikipedia, and news sources. Unstructured Electronic Medical Record (EMR) data from the Mayo Clinic and PubMed Central publications were used to train WE for medical-specific sources, while GloVe and Google News represented publicly available pre-trained WE for generic sources. Analytical evaluation employed medical keywords in three categories (illness, symptoms, drugs), and a 2-D graphical plot was created for 380 medical words. Numerical evaluation consisted of internal and external assessments. Results: Findings revealed that RSWE derived from EMR and PubMed Central outperformed generic WE, better capturing medical word meanings and identifying medically essential terms, aligning more closely with expert assessments. Conclusion: The study demonstrates the value of radiography as a radiology-specific resource for generating RSWE, with promising implications for improving BNLP in radiology.

Keywords: Radiology-specific word encodings (RSWE), Medical natural language processing (BNLP), Word embeddings (WE), Radiopaedia dataset, Electronic health records (EHR)

References

Banerjee, I., Ling, Y., Chen, M. C., Hasan, S. A., Langlotz, C. P., Moradzadeh, N., ... & Lungren, M. P. (2019). Comparative effectiveness of convolutional neural network (CNN) and recurrent neural network (RNN) architectures for radiology text report classification. Artificial Intelligence in Medicine, 97, 79-88.

Johnson, S. J., Murty, M. R., & Navakanth, I. (2024). A detailed review on word embedding techniques with emphasis on word2vec. Multimedia Tools and Applications, 83(13), 37979-38007.

Langlotz, C. P. (2006). RadLex: A new method for indexing online educational materials. Radiographics, 26(6), 1595-1597.

Li, Z., Roberts, K., Jiang, X., & Long, Q. (2019). Distributed learning from multiple EHR databases: contextual embedding models for medical events. Journal of Biomedical Informatics, 92, 103138.

Liu, Y., Ge, T., Mathews, K. S., Ji, H., & McGuinness, D. L. (2018). Exploiting task-oriented resources to learn word embeddings for clinical abbreviation expansion. arXiv preprint arXiv:1804.04225.

Ma, L., Zhang, C., Wang, Y., Ruan, W., Wang, J., Tang, W., ... & Gao, J. (2020, April). Concare: Personalized clinical feature embedding via capturing the healthcare context. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 01, pp. 833-840).

Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, 6(1), 1-10.

Neelima, S., Govindaraj, M., Subramani, K., ALkhayyat, A., & Mohan, C. (2024). Factors influencing data utilization and performance of health management information systems: A case study. Indian Journal of Information Sources and Services, 14(2), 146-152. https://doi.org/10.51983/ijiss-2024.14.2.21

Nobel, J. M., Puts, S., Bakers, F. C., Robben, S. G., & Dekker, A. L. (2020). Natural language processing in Dutch free text radiology reports: challenges in a small language area staging pulmonary oncology. Journal of Digital Imaging, 33, 1002-1008.

Pudasaini, S., Shakya, S., Lamichhane, S., Adhikari, S., Tamang, A., & Adhikari, S. (2022). Application of NLP for information extraction from unstructured documents. In Expert Clouds and Applications: Proceedings of ICOECA 2021 (pp. 695-704). Springer Singapore.

Radiopaedia.org, the wiki-based collaborative Radiology resource. Radiopaedia. https://radiopaedia.org/?lang=us. Accessed June 1, 2020.

Richardson, L. (2007). Beautiful soup documentation.

Shetty, S., & Mahale, A. (2023). Multimodal medical tensor fusion network-based DL framework for abnormality prediction from the radiology CXRs and clinical text reports. Multimedia Tools and Applications, 82(28), 44431-44478.

Sindhusaranya, B., Yamini, R., Manimekalai Dr, M. A. P., & Geetha Dr, K. (2023). Federated learning and blockchain-enabled privacy-preserving healthcare 5.0 system: A comprehensive approach to fraud prevention and security in IoMT. Journal of Internet Services and Information Security, 13(4), 199-209.

Sorin, V., Barash, Y., Konen, E., & Klang, E. (2020). Deep learning for natural language processing in radiology—fundamentals and a systematic review. Journal of the American College of Radiology, 17(5), 639-648.

Yang, X., Lyu, T., Li, Q., Lee, C. Y., Bian, J., Hogan, W. R., & Wu, Y. (2019). A study of deep learning methods for de-identification of clinical notes in cross-institute settings. BMC Medical Informatics and Decision Making, 19, 1-9.

Yuan, J., Zhu, H., & Tahmasebi, A. (2019). Classification of pulmonary nodular findings based on characterization of change using radiology reports. AMIA Summits on Translational Science Proceedings, 2019, 285.

PDF
Full Text
Export Citation

View Dimensions


View Plumx



View Altmetric



3
Save
0
Citation
396
View
0
Share