My group works on both foundations of statistical machine learning and applications in biomedicine and healthcare. We develop new technologies that make ML more accountable to humans, more reliable/robust and reveals core scientific insights.


We want our ML to be impactful and beneficial, and as such, we are deeply motivated by transformative applications in biotech and health. We collaborate with and advise many academic and industry groups, including Genentech, Google, Anthem, Virtusa, Accenture, and InterVenn. 


Some active research areas (please see Publications for more):


Accountable AI


We have led research in detecting and reducing harmful biases and stereotypes in AI systems (e.g. NeurIPS'16, Nature'18, PNAS'18, Nature'19). We have also developed some of the first methods to efficiently delete personal data from trained ML models (NeurIPS'19) and to assign algorithmic responsibility. Our methods are used by many companies.  

Data valuation

Screen Shot 2020-01-04 at 6.30.34 PM.png

Data limitation is the biggest challenge in applying ML. We have developed methods to quantify which data is more or less valuable, and to amplify the more useful data (e.g. ICML'19). We are also working on automatically correcting complex errors in data annotations.  

Combining the best of deep learning and statistics

We have developed new methods that combines the best of modern ML (end-to-end differentiable learning, flexible model) with desired statistical properties (rigorous false discovery control, sparsity, visualization). See for e.g. Nature Communications'19, Nature Communications'18, ICML'19a, ICML'19b, and AISTATS'19.     

ML for new biotechnologies (genome editing, single cells, etc.) 

We use ML to improve the precision of genome editing, to infer dynamical systems from single cell RNA-seq, to model spatial transcriptomics, and to interpret disease causing mutations. We are excited about combining new ML with breakthroughs in genomic technologies to develop powerful platforms. Nature Biotech'19, Nature Genetics'18, Nature Machine Intell'19.   

Computer vision and NLP for health


We have developed state-of-the-art computer vision algorithms for analyzing cardiac function from ultrasound videos. We also have the best performing NLP methods for learning structured  diseases and phenotypes from clinical notes (Nature Digital Medicine'18, Nature Digital Medicine'19). In general, we develop ML systems to augment clinical capabilities, and we leverage EHR data to improve clinical trials.