Research | Mysite

My group works on AI for science and medicine. We are currently working on making generative AI agents more capable and reliable, and applying them to tackle important problems in healthcare, biology, chemistry and medicine.

We collaborate with and advise many companies, including Accenture, Adela Bio, Amazon, Anthem, Collinear.AI, Enable Medicine, Exai, Fidocure, Genentech, Genmab, Google, Gradio, Greenstone Bio, Intervenn Bio, InVision, Nuclei.io, Together AI and Virtusa (several of these companies grew out of our group's research). We are excited to scale and deploy our research with industry partners.

Some active research areas (please see Publications for more):

Generative AI agents

We develop versatile frameworks to design and optimize AI agents. We developed TextGrad to optimize any LLM agent by backpropagating text gradients. We pioneered the mixture-of-agents framework that harnesses synergies across multiple agents. We have also developed new architectures for multi-modal language models such as Dragonfly

Data-centric AI

Screen Shot 2020-01-04 at 6.30.34 PM.png

Data limitation is the biggest challenge in applying ML. We have developed Data Shapley to quantify which data is more or less valuable, and to amplify the more useful data (e.g. ICML'19, AISTATS'21, AISTATS'22). We are also working on methods to audit and clean datasets (Nature MI'22) and data scaling laws (ICML'24).

Trustworthy AI

We have led research in detecting and reducing harmful biases and stereotypes in AI systems (e.g. NeurIPS'16, Nature'18, PNAS'18, Nature'19, Nature MI'21, Nature Med'21, ICML'22, Science Advances'22). We have also developed some of the first methods to efficiently delete personal data from trained ML models (NeurIPS'19) and to assign algorithmic responsibility (AIES'19). Our methods are used by many companies.

ML for new biotechnologies (spatial biology, single cells, etc.)

We use ML to make genome editing safer (Nature Biotech'19), to model spatial omics (Nature BME'20), to integrate single-cell multi-omics (PNAS'21), and to generate new drugs (Nature MI'19). We are excited about combining new ML with breakthroughs in genomic technologies to study human diseases (Nature Genetics'18).

Computer vision and NLP for healthcare

We have developed state-of-the-art computer vision algorithms for analyzing heart diseases from cardiac ultrasound videos (Nature'20), to improve telehealth (PSB,'21) and for digital pathology (Nature BME'20). We also have the best-performing NLP methods for analyzing clinical notes (Nature Digital Medicine'18, Nature Digital Medicine'19). Many of these systems are now being used by hospitals and large insurance companies. EchoNet received FDA approval in April 2024.

Precision medicine for all

We pioneered methods using EHR data + AI to make clinical trials more inclusive (Nature'21) and to recommend the best treatment for cancer patients based on their mutations (Nature Medicine'22). Our work was recognized as a Top Ten Clinical Research Achievement in 2022.