Abstract
AbstractWe introduce starTracer, a novel R package designed to enhance the specificity and efficiency of marker gene identification in single-cell RNA-seq data analysis. The package consists of two primary functional modules: “searchMarker” and “filterMarker”. The “searchMarker” module, operating as an independent pipeline, exhibits superior flexibility by accepting a variety of input file types. Its primary output is a marker gene matrix, where genes are sorted by their potential to function as cluster-specific markers, with those exhibiting the greatest potential positioned at the top of the matrix for each respective cluster. In contrast, the “filterMarker” module is designed as a complementary pipeline to the Seurat “FindAllMarkers” function, providing a more accurate marker gene list for each cluster in conjunction with Seurat results. Benchmark analyses demonstrate that starTracer not only achieves excellent specificity in identifying marker genes compared to Seurat but also significantly surpasses it in processing speed. Impressively, the speed improvement ranges by 1~2 orders of magnitude compared to Seurat, as observed across three independent datasets. It is worth noting that starTracer exhibits increasing speed improvement with larger data volumes. It also excels in identifying markers in smaller clusters. Furthermore, the “filterMarker” reordering process considerably enhances Seurat’s marker matrix specificity. These advantages solidify starTracer as an invaluable tool for researchers working with single-cell RNA-seq data, merging robust accuracy with exceptional speed.
Publisher
Cold Spring Harbor Laboratory