Talk Title:
Learning to summarize medical evidence
Talk Summary:
Decisions about patient care should be supported by data. But most clinical evidence — from notes in electronic health records to published reports of clinical trials — is stored as unstructured text and so not readily accessible. The body of such unstructured evidence is already vast and continues to grow at breakneck pace. Physicians are overwhelmed by this torrent of data, making it impossible to inform treatment decisions on the basis of all current relevant evidence. Natural language processing (NLP) methods offer a potential means of helping them make better use of this data to inform treatment decisions, ultimately improving patient care.
In this talk I will focus specifically on the task of generating summaries of evidence. I will consider two specific settings. In the first, the aim is to design models that can synthesize all published evidence from randomized trials that addresses a particular clinical question; here the objective is to train a model to automatically generate such narrative synopses of the evidence. I will discuss challenges inherent to designing summarization models for this task, including ensuring that summaries remain factual to the underlying content. The second setting concerns designing and training models that can provide extractive summaries of notes in patient electronic health record (EHR) data to aid radiologists performing imaging diagnosis. I will discuss the design and evaluation of a distantly supervised extractive summarization system that surfaces snippets from patient EHR that might support a given diagnostic query.