MSK MIND (Multimodal Integration of Data) is a new strategic initiative aimed at accelerating research and discovery through advanced analytics. Fusing diagnostic modalities of radiologic, histologic, genomic, molecular, laboratory, and clinical data, MSK MIND will pursue a novel integrative analysis to improve cancer care. Our guiding hypothesis is that clinical and scientific research centered on patient stratification, diagnosis, and biomarker discovery will be enhanced and accelerated by analyses that span multiple data modalities.

MSK MIND’s mission is to develop a computational platform and data infrastructure for multi-modal data integration with the goal of developing and deploying advanced computational methods to synthesize data for improving patient diagnosis, stratification, and treatment options.

MSK MIND has three main objectives:

  1. Support and advance clinical and scientific research questions reliant on multi-modal data
  2. Develop a standardized and centralized institutional software and data infrastructure to facilitate integrative analysis
  3. Promote academic research excellence in high-dimensional data and machine learning-driven patient stratification, diagnosis, and biomarker discovery

 

Projects

Colorectal Cancer: Mithat Gönen, PhD

Colorectal cancer affects over a million people in the US, half of whom develop metastases to the liver and require surgery, which is often accompanied by severe side effects. A team led by biostatistician Mithat Gönen will integrate genomic, radiomic, and clinical data to distinguish the people most likely to benefit from surgery from those expected to experience post-operative liver failure or early recurrences of metastases. The latter groups can then be offered alternative treatment.

Lung Cancer: Matthew Hellmann

Some people with non-small cell lung cancer (NSCLC) have greatly benefited from the introduction of immunotherapy, but most do not respond. Medical oncologists Kathryn Arbour and Matthew Hellman and their team will build on ongoing efforts to annotate the clinical, pathologic, and molecular features of NSCLC to design a model that can predict the response to immunotherapy in a pre-treatment setting. They will also develop a large patient database as a foundation for assessing patients with advanced disease who are likely to benefit from a combination of chemotherapy and immunotherapy.

Breast Cancer: Elizabeth Morris and Pedram Razavi

Breast tumors show a large variability in their molecular characteristics. Current clinical models do not fully take this heterogeneity into account. Thus, these models cannot optimally assign patients to the most suitable treatment groups. Co-principal investigators Elizabeth Morris and Pedram Razavi will develop a machine learning model based on the incorporation of whole tumor imaging data with pathologic and genomic information, as well as clinical variables, for a better prediction of treatment response, and recurrence-free and disease-free survival.

Gynecologic Cancer: Yuliya Lakhman

High-grade serous ovarian cancer is the most common and also the most lethal gynecologic malignancy. Radiologist Yuliya Lakhman and colleagues will use machine learning techniques to outline and annotate tumors in medical images and integrate them with the tumors’ molecular profile. Their goal is to define multi-modal predictors of tumor progression and to stratify patients into the appropriate treatment groups.

Powered by the radiology archive: Harnessing historical image annotations for automated tumor segmentation: Nathaniel Swinburne and Robert Young

The scalable harmonization of multimodal cancer data requires accurate, automated segmentation of tumors on radiologic images. At MSK, the major impediment to achieving automated tumor segmentation using deep learning is the lack of large annotated tumor image datasets representative of our unique patient population. Our hypothesis is that massive existing image annotation repositories can be harnessed in a hybrid object detection — segmentation framework to enable fully automated tumor segmentation. While our model will be developed for brain tumors, the proposed generalizable data pipeline (extendable for use with any anatomy, solid tumor type, or radiologic imaging modality) would allow MSK’s existing massive archive of image annotations to be harnessed for training deep learning models to perform tumor computer vision tasks, including but not exclusive to segmentation, and will be foundational for a multi-modal, multi-omic research platform.

Automated retrieval of clinical data elements for the identification of genomic predictors of outcome and treatment response in cancer: Nikolaus Schultz, John Philip, and Steven Maron

Retrieval of clinical annotation of tumor samples and patients presents a major challenge for data integration, as the current approach of manual abstraction from largely unstructured electronic medical records (EMR) is difficult to scale. However, the use of clinical text classification by means of natural language processing (NLP) and advanced machine learning methods has the potential to unlock information embedded in clinical narratives. Here, a multidisciplinary team of data scientists and cancer biologists led by Nikolaus Schultz are creating a hybrid NLP system to leverage against structured and unstructured EMR to identify patient and sample specific attributes. They hypothesize that the development of this system will lead to a robust, large-scale system for enhanced clinical integration with genomic databases that can be used to predict outcome and treatment response of individual cancer patients.

MSK MIND Co-Chairs

  • Sohrab Shah, PhD

    Chief, Computational Oncology
    Director, MSK MIND
  • Peter D. Stetson, MD, MA

    Chief Health Informatics Officer
  • Paul Sabbatini, MD

    Deputy Physician-in-Chief for Clinical Research

MSK MIND Data Engineering

  • Essam Elsherif

    Bioinformatics Software Engineer Leader
  • Benjamin Gross

    Bioinformatics Software Engineer Lead I
  • Arfath Pasha

    Bioinformatics Software Engineer V
  • Christopher Fong

    Bioinformatics Software Engineer III
  • Doori Rose

    Bioinformatics Engineer III
  • Druv Patel

    Bioinformatics Software Engineer I
  • Andy Aukerman

    Bioinformatics Engineer I

MSK MIND Scholars

  • Pegah Khosravi, PhD

    MSK MIND Scholar
  • Karl Pichotta

    Senior Computational Biologist II
  • Rami Vanguri, PhD

    MSK MIND Scholar
  • Justin Jee

    Clinical Fellow
  • Kevin Boehm

    MD-PhD Student
    Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD-PhD Program

MSK MIND Administration

  • Ederlinda Paraiso

    VP, Research Operations
    HOPP & CRC
  • Christie Park

    Lead Project Portfolio Manager
  • Anika Begum

    Project Coordinator

Contact Anika Begum, Project Coordinator, with any inquiries.