Cell Type Inference Using Large Language Models in Single-Cell Data Analysis

Abstract

Single-cell RNA sequencing enables high-resolution analysis of cellular heterogeneity, but a key analysis step, cell type identification, remains a labor-intensive process that requires manual inspection of marker genes. Large Language Models (LLMs) offer a promising solution for automating this critical step. We present a case study using CytoAnalyst, a web-based platform that integrates LLMs for automated cell type annotation. Using a bone marrow organoid dataset, we compared multiple state-of-the-art LLMs in their ability to predict cell types from marker genes identified through differential expression analysis. Our structured prompting approach yielded accurate predictions for common cell types across all models, while performance varied for rare or specialized populations. This work demonstrates that LLMs can significantly reduce manual effort in scRNA-seq analysis, though further improvements are needed for more accurate and robust annotations. Our web-based platform and method are freely available at: https://cytoanalyst.tinnguyen-lab.com/.

Publication
International Conference on Advances in Information and Communication Technology
Avatar
Duy Tran
PhD Student