This chapter aims to present a classification model for categorizing textual clinical records of breast magnetic resonance imaging, based on lexical, syntactic and semantic analysis of clinical reports according to the Breast Imaging-Reporting and Data System (BI-RADS) classification, using Deep Learning and Natural Language Processing (NLP). The model was developed from transfer learning based on the pre-trained BERTimbau model, BERT model (Bidirectional Encoder Representations from Transformers) trained in Brazilian Portuguese. The dataset is composed of medical reports in Brazilian Portuguese classified into six categories: Inconclusive; Normal or Negative; Certainly Benign Findings; Probably Benign Findings; Suspicious Findings; High Risk of Cancer; Previously Known Malignant Injury. The following models were implemented and compared: Random Forest, SVM, Naïve Bayes, BERTimbau with and without finetuning. The BERTimbau model presented better results, with better performance after finetuning.
Part of the book: Machine Learning and Data Mining Annual Volume 2023