Affiliation:
1. University of Victoria
2. University of Saskatchewan
3. National Research Council of Canada
Abstract
Abstract
With the rise of single-cell transcriptome sequencing technology, more and more studies are focusing on single-cell-based disease diagnosis and treatment. Cell type annotation is the first and most critical step in analyzing single-cell genomic data. Traditional marker-genes-based annotation approaches require a lot of domain knowledge and subjective human decisions, which makes annotation time-consuming and generate inconsistent cell identities. In the past few years, multiple automated cell type identification tools have been developed, leveraging large amounts of accumulated reference cells. All these methods are extensions or revisions of vanilla supervised machine learning methods. However, complex models have four potential disadvantages (1) they may require more model assumptions which may not hold in real-world problems, (2) they may involve many model parameters to be tuned, (3) they may be harder to interpret, (4) they may require more computational resources. In this work, we propose PCLDA, a method based on the simplest statistical models, including principal component analysis and linear discriminant analysis, which do not suffer the problems mentioned above. We show PCLDA’s performance is not inferior to the fancier methods in real data. The key message we promote in this work is to use simple statistics if it can solve the problem, avoiding unnecessary complications.
Publisher
Research Square Platform LLC