Bioinformatics

transpose single-cell cell x gene dataframe to gene x cell

Single cell matrix is often represented as gene x cell in R/Seurat, but it is represented as cell x gene in python/scanpy. Let’s use a real example to show how to transpose between the two formats. The GEO accession page is at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE154763 Download the data We can use command line to download the count matrix at ftp: https://ftp.ncbi.nlm.nih.gov/geo/series/GSE154nnn/GSE154763/suppl/ wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE154nnn/GSE154763/suppl/GSE154763_ESCA_normalized_expression.csv.gz -O ~/blog_data/GSE154763_ESCA_normalized_expression.csv.gz # decompress the file gunzip GSE154763_ESCA_normalized_expression.csv.gz # this GEO matrix is cell x gene # take a look by https://www.

How to run dockerized Rstudio server on google cloud

Create a google VM Follow the process using the console https://cloud.google.com/compute/docs/instances/create-start-instance#console_1 Install docker Follow https://docs.docker.com/engine/install/debian/ Note this example for the debian build. If you created your VM using ubuntu as the boot disk, you should follow the ubuntu section https://docs.docker.com/engine/install/ubuntu/. In the GCP VM: sudo apt-get update sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release sudo mkdir -p /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg –dearmor -o /etc/apt/keyrings/docker.

Are PDL1 RNA and protein levels correlated in cancer cell lines?

Are protein and RNA levels correlated? This is a big question. see replies to this tweet at https://twitter.com/slavov_n/status/1561844133496512512. In general, RNA and protein abundances should be correlated but there are exceptions of course. Biology is complicated/weird! One of my favorite examples is Hypoxia-inducible factor 1-alpha, HIF-1α. The protein is efficiently degraded in most tissues most of the time unless stabilized by hypoxia.

use random forest and boost trees to find marker genes in scRNAseq data

This is a blog post for a series of posts on marker gene identification using machine learning methods. Read the previous posts: logistic regression and partial least square regression. This blog post will explore the tree based method: random forest and boost trees (gradient boost tree/XGboost). I highly recommend going through https://app.learney.me/maps/StatQuest for related sections by Josh Starmer. Note, all the tree based methods can be used to do both classification and regression.

Partial least square regression for marker gene identification in scRNAseq data

This is an extension of my last blog post marker gene selection using logistic regression and regularization for scRNAseq. Let’s use the same PBMC single-cell RNAseq data as an example. Load libraries library(Seurat) library(tidyverse) library(tidymodels) library(scCustomize) # for plotting library(patchwork) Preprocess the data

Load the PBMC dataset pbmc.data <- Read10X(data.dir = "~/blog_data/filtered_gene_bc_matrices/hg19/") # Initialize the Seurat object with the raw (non-normalized data). pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.

marker gene selection using logistic regression and regularization for scRNAseq

why this blog post? I saw a biorxiv paper titled A comparison of marker gene selection methods for single-cell RNA sequencing data Our results highlight the efficacy of simple methods, especially the Wilcoxon rank-sum test, Student’s t-test and logistic regression I am interested in using logistic regression to find marker genes and want to try fitting the model in the tidymodel ecosystem and using different regularization methods.

CANCER IMMUNOLOGIC DATA COMMONS(CIDC)

The Cancer Immunologic Data Commons (CIDC), hosted by Dana-Farber Cancer Institute, will serve the bioinformatics needs of the network, optimization of data collection methodologies suitable for immune-related biomarkers, data integration and building a biomarker database for the secondary use by the large immuno-oncology community.