A gentle introduction to using machine learning and NLP for health surveillance


Presented November 27, 2018.

Unstructured data such as chief complaints and provider notes are an important component of effective Health surveillance. Applying machine learning (ML) and natural language processing (NLP) to unstructured data can often improve surveillance performance over traditional keyword search methods. This presentation explains how health surveillance practitioners can begin applying basic ML and NLP methods to unstructured data, and provides common sense guidelines for which methods provide the greatest improvement for a given level of effort. Lastly the presentation provides an overview of how more involved approaches such as Deep Learning Neural Networks frequently offer even greater performance gains.


Drew Levin, PhD, Technical Staff, Sandia National Laboratories

Primary Topic Areas: 
Original Publication Year: 
Event/Publication Date: 
November, 2018

November 30, 2018

Contact Us

National Syndromic
Surveillance Program

Centers for Disease
Control and Prevention


The National Syndromic Surveillance Program (NSSP) is a collaboration among states and public health jurisdictions that contribute data to the BioSense Platform, public health practitioners who use local syndromic surveillance systems, CDC programs, other federal agencies, partner organizations, hospitals, healthcare professionals, and academic institutions.

Site created by Fusani Applications