Zhou, Siddiqui, Seliger, Blumenthal, Kang, Doerfler, Fink (2019) Text preprocessing for improving hypoglycemia detection from clinical notes - A case study of patients with diabetes International journal of medical informatics 129() 374-380


Hypoglycemia is a common safety event when attempting to optimize glycemic control in diabetes (DM). While electronic medical records provide a natural ground for detecting and analyzing hypoglycemia, ICD codes used in the databases may be invalid, insensitive or non-specific in detecting new hypoglycemic events. We developed text preprocessing methods to improve automatic detection of hypoglycemia from analysis of clinical encounter text notes. We set out to improve hypoglycemia detection from clinical notes by introducing three preprocessing methods: stop word filtering, medication signaling, and ICD narrative enrichment. To test the proposed methods, we selected clinical notes from VA Maryland Healthcare System, based on various combinations of three criteria that are suggestive of hypoglycemia, including ICD-9 code of diabetes and hypoglycemia, laboratory glucose values < 70 md/dL, and text reference to a proximate hypoglycemia event. In addition, we constructed one dataset of 395 clinical notes from year 2009 and another of 460 notes from year 2014 to test the generality of the proposed methods. For each of the datasets, two physician judges manually reviewed individual clinical notes to determine whether hypoglycemia was present or absent. A third physician judge served as a final adjudicator for disagreements. Each of the proposed preprocessing methods contributed to the performance of hypoglycemia detection by significantly increasing the F1 score in the range of 5.3∼7.4% on one dataset (p < .01). Among the methods, stop word filtering contributed most to the performance improvement (7.4%). Combining all the preprocessing methods led to greater performance gain (p < .001) compared with using each method individually. Similar patterns were observed for the other dataset with the F1 score being increased in the range of 7.7%∼9.4% by individual methods (p < .001). Nevertheless, combining the three methods did not yield additional performance gain. The proposed text preprocessing methods improved the performance of hypoglycemia detection from clinical text notes. Stop word filtering achieved the most performance improvement. ICD narrative enrichment boosted the recall of detection. Combining the three preprocessing methods led to additional performance gains. Published by Elsevier B.V.