09/18/2024


We make codes publicly available at https//github.com/bionlplab/AMD_prognosis_amia2021.Despite impressive success of machine learning algorithms in clinical natural language processing (cNLP), rule-based approaches still have a prominent role. In this paper, we introduce medspaCy, an extensible, open-source cNLP library based on spaCy framework that allows flexible integration of rule-based and machine learning-based algorithms adapted to clinical text. MedspaCy includes a variety of components that meet common cNLP needs such as context analysis and mapping to standard terminologies. By utilizing spaCy's clear and easy-to-use conventions, medspaCy enables development of custom pipelines that integrate easily with other spaCy-based modules. Our toolkit includes several core components and facilitates rapid development of pipelines for clinical text.Brigham and Women's Hospital has received funding from the Centers for Medicare and Medicaid Services to develop a novel electronic clinical quality measure to assess the risk-standardized major bleeding and venous thromboembolism (VTE) rate following elective total hip and/or knee arthroplasty. There are currently no existing measures that evaluate both the bleeding and VTE events following joint arthroplasty (TJA). Our novel composite measure was tested within two academic health systems with 17 clinician groups meeting the inclusion criteria. Following risk adjustment, the overall adjusted bleeding rate was 3.87% and ranged between 1.99% - 5.66%. The unadjusted VTE rate was 0.39% and ranged between 0% - 2.65%. The overall VTE/Bleeding composite score was 2.15 and ranged between 1.15 - 3.19. This measure seeks to provide clinician groups with a tool to assess their patient bleeding and VTE rates and compare them to their peers, ultimately providing an evidence-based quality metric assessing orthopedic practices.Opioid Use Disorder (OUD) is a public health crisis costing the US billions of dollars annually in healthcare, lost workplace productivity, and crime. Analyzing longitudinal healthcare data is critical in addressing many real-world problems in healthcare. Leveraging the real-world longitudinal healthcare data, we propose a novel multi-stream transformer model called MUPOD for OUD identification. MUPOD is designed to simultaneously analyze multiple types of healthcare data streams, such as medications and diagnoses, by attending to segments within and across these data streams. Our model tested on the data from 392,492 patients with long-term back pain problems showed significantly better performance than the traditional models and recently developed deep learning models.We develop various AI models to predict hospitalization on a large (over 110k) cohort of COVID-19 positive-tested US patients, sourced from March 2020 to February 2021. Models range from Random Forest to Neural Network (NN) and Time Convolutional NN, where combination of the data modalities (tabular and time dependent) are performed at different stages (early vs. model fusion). Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and F1-score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class. Performances do not significantly drop even when selected lists of features are removed to study model adaptability to different scenarios. However, a systematic study of the SHAP feature importance values for the developed models in the different scenarios shows a large variability across models and use cases. This calls for even more complete studies on several explainability methods before their adoption in high-stakes scenarios.Burn wounds are most commonly evaluated through visual inspection to determine surgical candidacy, taking into account burn depth and individualized patient factors. This process, though cost effective, is subjective and varies by provider experience. Deep learning models can assist in burn wound surgical candidacy with predictions based on the wound and patient characteristics. To this end, we present a multimodal deep learning approach and a complementary mobile application - DL4Burn - for predicting burn surgical candidacy, to emulate the multi-factored approach used by clinicians. Specifically, we propose a ResNet50-based multimodal model and validate it using retrospectively obtained patient burn images, demographic, and injury data.Sentence boundary detection (SBD) is a fundamental building block in the Natural Language Processing (NLP) pipeline. Incorrect SBD may impact subsequent processing stages resulting in decreased performance. In well-behaved corpora, a few simple rules based on punctuation and capitalization are sufficient for successfully detecting sentence boundaries. However, a corpus like MEDLINE citations presents challenges for SBD due to several syntactic ambiguities, e.g., abbreviation-periods, capital letters in first words of sentences, etc. In this manuscript we present an algorithm to address these challenges based on majority voting among three SBD engines (Python NLTK, pySBD, and Syntok) followed by custom post-processing algorithms that rely on NLP spaCy part-of-speech, abbreviation and capital letter detection, and computing general sentence statistics. Experiments on several thousand MEDLINE citations show that our proposed approach for combining multiple SBD engines and post-processing rules performs better than each individual engine.Social Determinants of Health (SDoH) are an increasingly important part of the broader research and public health efforts in understanding individuals' physical and mental well-being. Despite this, non-clinical factors affecting health are poorly recorded in electronic health databases and techniques to study how SDoH might relate to population outcomes are lacking. This paper proposes an approach to systematically identify and quantify associations between SDoH and health-related outcomes in a specific cohort of people by (1) leveraging published evidence from literature to build a knowledge graph of health and social factor associations and (2) analysing a large dataset of claims and medical records where those associations may be found. This work demonstrates how the proposed approach could be used to generate hypotheses and inform further research on SDoH in a data-driven manner.Computerized clinical decision support (CDS) will be essential to ensuring the safety and efficiency of new care delivery models, such as the patient-centered medical home. CDS will help empower non-physician team members, coordinate overall team efforts, and facilitate physician oversight. In this article, we discuss common clinical scenarios that could benefit from CDS optimized for team-based healthcare, including (1) low-acuity episodic illness, (2) diagnostic workup of new onset symptoms, (3) chronic care, (4) preventive care, and (5) care coordination. CDS that maximally supports teams may be one of biomedical informatics' best opportunities to decrease health care costs, improve quality, and increase clinical capacity.Supported by the Centers for Medicare & Medicaid Services (CMS), Brigham and Women's Hospital (BWH) has retooled the existing claims-based measures NQF1550 and NQF3493 into an electronic clinical quality measure (eCQM) to assess the risk-standardized complication rate (RSCR) following elective primary total hip (THA) and knee arthroplasty (TKA) at the clinician group level. This novel eCQM includes risk-adjustment for social determinants of health, includes all adult patients from all payers, leverages electronic health records (EHRs) rather than claims-based data, and includes both inpatient and outpatient procedures and complications which offers benefits compared to existing metrics. Following testing in two geographically different healthcare systems, the overall risk-standardized complication rate within 90 days following THA and TKA at the two sites was 3.60% (Site 1) and 3.70% (Site 2). This measure is designed for use in the Merit-Based Incentive Payment System (MIPS).Radiology reports are a rich resource for advancing deep learning applications for medical images, facilitating the generation of large-scale annotated image databases. Although the ambiguity and subtlety of natural language poses a significant challenge to information extraction from radiology reports. Thyroid Imaging Reporting and Data Systems (TI-RADS) has been proposed as a system to standardize ultrasound imaging reports for thyroid cancer screening and diagnosis, through the implementation of structured templates and a standardized thyroid nodule malignancy risk scoring system; however there remains significant variation in radiologist practice when it comes to diagnostic thyroid ultrasound interpretation and reporting. In this work, we propose a computerized approach using a contextual embedding and fusion strategy for the large-scale inference of TI-RADS final assessment categories from narrative ultrasound (US) reports. The proposed model has achieved high accuracy on an internal data set, and high performance scores on an external validation dataset.Epilepsy is a kind of neurological disorder characterized by recurrent epileptic seizures. While it is crucial to characterize pre-ictal brain electrical activities, the problem to this day still remains computationally challenging. Using brain signal acquisition and advances in deep learning technology, we aim to classify pre-ictal signals and characterize the brain waveforms of patients with epilepsy during the pre-ictal period. https://www.selleckchem.com/ We develop a novel machine learning model called Pre-ictal Signal Classification (PiSC) for pre-ictal signal classification and for identifying brain waveform patterns critical for seizure onset early detection. In PiSC, a unique preprocessing procedure is developed to convert the stereo-electroencephalography (sEEG) signals to data blocks ready for pre-ictal signal classification. Also, a novel deep learning framework is developed to integrate deep neural networks and meta-learning to effectively mitigate patient-to-patient variances as well as fine-tuning a trained classification model for new patients. The unique network architecture ensures model stability and generalization in sEEG data modeling. The experimental results on a real-world patient dataset show that PiSC improved the accuracy and F1 score by 10% compared with the existing models. Two types of sEEG patterns were discovered to be associated with seizure development in nocturnal epileptic patients.The COVID-19 pandemic challenged how healthcare systems provided care in socially distanced formats. We hypothesized that the COVID-19 era changes in clinical care delivery models contributed to increased Electronic Health Record (EHR) related work. To evaluate the changes in time and volume metrics of EHR usage, we segregated EHR audit log metric data into PreCOVID2019 March/April/May, initial COVID2020 March/April/May, and late COVID2021 March/April/May for 1262 physician providers. We discovered significant and pragmatically meaningful increases in total average time providers spent in the EHR in minutes mean(SD) PreCOVID2019=1958(1576), Mid-COVID2020=1709(1473), Late-COVID2021=2007(1563). Differences in total time in the EHR were significant Pre-midp-value= less then 0.001, but not Pre-Latep=0.439. Total number of messages received across all specialties increased significantly mean(SD) PreCOVID=459(389), MidCOVID=400(362), LateCOVID 521(423) Pre-Mid p-value= less then 0.001 and Pre-Late p-value= less then 0.