Machine learning

Partial least square regression for marker gene identification in scRNAseq data

This is an extension of my last blog post marker gene selection using logistic regression and regularization for scRNAseq. Let’s use the same PBMC single-cell RNAseq data as an example. Load libraries library(Seurat) library(tidyverse) library(tidymodels) library(scCustomize) # for plotting library(patchwork) Preprocess the data

Load the PBMC dataset pbmc.data <- Read10X(data.dir = "~/blog_data/filtered_gene_bc_matrices/hg19/") # Initialize the Seurat object with the raw (non-normalized data). pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.

marker gene selection using logistic regression and regularization for scRNAseq

why this blog post? I saw a biorxiv paper titled A comparison of marker gene selection methods for single-cell RNA sequencing data Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student’s t-test and logistic regression I am interested in using logistic regression to find marker genes and want to try fitting the model in the tidymodel ecosystem and using different regularization methods.