Bioinformatics

Reuse the single cell data! How to create a seurat object from GEO datasets

Download the data https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE116256 cd data/GSE116256 wget https://ftp.ncbi.nlm.nih.gov/geo/series/GSE116nnn/GSE116256/suppl/GSE116256_RAW.tar tar xvf GSE116256_RAW.tar rm GSE116256_RAW.tar Depending on how the authors upload their data. Some authors may just upload the merged count matrix file. This is the easiest situation. In this dataset, each sample has a separate set of matrix (*dem.txt.gz), features and barcodes. Total, there are 43 samples. For each sample, it has an associated metadata file (*anno.txt.gz) too. You can inspect the files in command line:

10 single-cell data benchmarking papers

I tweeted it at https://twitter.com/tangming2005/status/1679120948140572672 I got asked to put all my posts in a central place and I think it is a good idea. And here it is! Benchmarking integration of single-cell differential expression Benchmarking atlas-level data integration in single-cell genomics A review of computational strategies for denoising and imputation of single-cell transcriptomic data Benchmarking spatial and single-cell transcriptomics integration methods for transcript distribution prediction and cell type deconvolution

Has AI changed the course of Drug Development?

What’s the drug development process? Has AI changed the course of Drug Development? To answer this question, we need first to understand the drug development process. The whole process includes the following: target identification target pharmacology and biomarker development lead identification, lead optimization Clinical research & development regulatory review of IND (investigational new drug) and later phase clinical trials post-marketing knowledge Biologics/antibodies drug development follows a similar path (you can find the map in the same link).

How to add boxplots or density plots side-by-side a scatterplot: a single cell case study

introduce ggside using single cell data The ggside R package provides a new way to visualize data by combining the flexibility of ggplot2 with the power of side-by-side plots. We will use a single cell dataset to demonstrate its usage. ggside allows users to create side-by-side plots of multiple variables, such as gene expression, cell type, and experimental conditions. This can be helpful for identifying patterns and trends in scRNA-seq data that would be difficult to see in individual plots.

Unlock the Power of Genomics Data Analysis: Watershed's Seamless Cloud Computing Solution

Disclaimer: This post is sponsored by Watershed Omics Bench platform. I have personally tested the platform. The opinions and views expressed in this post are solely those of the author and do not represent the views of my employer As an experienced bioinformatician who understands the needs of biotech startups, I know the challenges that arise when analyzing genomics data. The first solution that comes to mind is cloud computing. Unsurprisingly, AWS and Google Cloud Platform (GCP) are commonly used options.

How to do neighborhood/cellular niches analysis with spatial transcriptome data

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter In a previous blog post, I showed you how to make a Seurat spatial object from Vizgen spatial transcriptome data. In this post, I am going to show you how to identify clusters of neighborhood or cellular niches where specific cell types tend to co-localize. read in the data and pre-process library(Seurat) library(here) library(ggplot2) library(dplyr) # the LoadVizgen function requires the raw segmentation files which is too big.

How to construct a spatial object in Seurat

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter Single-cell spatial transcriptome data is a new and advanced technology that combines the study of individual cells’ genes and their location in a tissue to understand the complex cellular and molecular differences within it. This allows scientists to investigate how genes are expressed and how cells interact with each other with much greater detail than before.

Deep learning to predict cancer from healthy controls using TCRseq data

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter The T-cell receptor (TCR) is a special molecule found on the surface of a type of immune cell called a T-cell. Think of T-cells like soldiers in your body’s defense system that help identify and attack foreign invaders like viruses and bacteria. The TCR is like a sensor or antenna that allows T-cells to recognize specific targets, kind of like how a key fits into a lock.

How to deal with overplotting without being fooled

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter The problem Let me be clear, when you have gazillions of data points in a scatter plot, you want to deal with the overplotting to avoid drawing misleading conclusions. Let’s start with a single-cell example. Load the libraries: library(dplyr) library(Seurat) library(patchwork) library(ggplot2) library(ComplexHeatmap) library(SeuratData) set.seed(1234) prepare the data data("pbmc3k") pbmc3k #> An object of class Seurat #> 13714 features across 2700 samples within 1 assay #> Active assay: RNA (13714 features, 0 variable features) ## routine processing pbmc3k<- pbmc3k %>% NormalizeData(normalization.

How to do gene correlation for single-cell RNAseq data (part 1)

Load libraries library(dplyr) library(Seurat) library(patchwork) library(ggplot2) library(ComplexHeatmap) library(SeuratData) set.seed(1234) prepare the data data("pbmc3k") pbmc3k #> An object of class Seurat #> 13714 features across 2700 samples within 1 assay #> Active assay: RNA (13714 features, 0 variable features) ## routine processing pbmc3k<- pbmc3k %>% NormalizeData(normalization.method = "LogNormalize", scale.factor = 10000) %>% FindVariableFeatures(selection.method = "vst", nfeatures = 2000) %>% ScaleData() %>% RunPCA(verbose = FALSE) %>% FindNeighbors(dims = 1:10, verbose = FALSE) %>% FindClusters(resolution = 0.