Project Team

Max Qian, PhD, Associate Professor of Computer Science and Computational Biology, the J. Craig Venter Insitute

Renee Yun, PhD, Assistant professor in Bioinformatics, the J. Craig Venter Institute

Michael Peluso, MD, Assistant Professor, Division of HIV, Infectious Diseases, and Global Medicine, Department of Medicine, University of California San Francisco

Steven Deeks, MD, Professor, Division of HIV, Infectious Diseases, and Global Medicine, Department of Medicine, University of California San Francisco

David Putrino, PhD, Director of Rehabilitation Innovation for the Mt Sinai Health System, Associate Professor of Rehabilitation Medicine at the Icahn School of Medicine at Mt Sinai.

Amy Proal, PhD, President/Chief Scientific Officer, PolyBio Research Foundation


Project Summary:

Max Qian is leading the project at the J. Craig Venter Institute

To identify LongCOVID endotypes (groups of patients with common symptoms) via a combination of computational modeling and machine learning methods. Detailed longitudinal LongCOVID clinical and research data for the project are being obtained from two central sites that together treat thousands of LongCOVID patients: 1) The UCSF LIINC study 2) The Mount Sinai Cohen Center for Recovery from Complex Chronic Illness.

Project background:

Over the past several years, hundreds of scientific papers have documented a wide range of biological abnormalities— from blood clotting issues, to hormonal imbalances, to energy dysfunction to viral persistence— in patients suffering from LongCOVID. However, because LongCOVID is a broad diagnostic label, each abnormality is usually documented in a subset of patients. This suggests that patients can likely be grouped into subtypes or endotypes based on common symptoms or biological drivers of disease. Delineation of these endotypes could improve LongCOVID diagnosis and treatment. Endotype creation is particularly important for clinical trials – where the ability of researchers to assign patients into treatment groups based on common symptoms/characteristics improves the chance that a given therapeutic will show efficacy. 

The project team are experts in machine learning and computational modeling. To identify LongCOVID endotypes they will leverage electronic health record data from two LongCOVID cohorts (at UCSF and Mount Sinai). They will implement machine learning method for classifying the patients into groups based on questionnaire data (with a focus on longitudinal symptom patterns). The goal is to identify not only LongCOVID endotypes (patient groups) but also the corresponding signature symptoms (topics). A second phase of the project will incorporate patient research data (e.g., spike protein in blood, T cell activation) into the endotype identification process. All data for the project will be entered into LabKey a customizable data visualization tool that allows for easy interpretation and analysis of complex data.