Visualization

How to add boxplots or density plots side-by-side a scatterplot: a single cell case study

introduce ggside using single cell data The ggside R package provides a new way to visualize data by combining the flexibility of ggplot2 with the power of side-by-side plots. We will use a single cell dataset to demonstrate its usage. ggside allows users to create side-by-side plots of multiple variables, such as gene expression, cell type, and experimental conditions. This can be helpful for identifying patterns and trends in scRNA-seq data that would be difficult to see in individual plots.

How to deal with overplotting without being fooled

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter The problem Let me be clear, when you have gazillions of data points in a scatter plot, you want to deal with the overplotting to avoid drawing misleading conclusions. Let’s start with a single-cell example. Load the libraries: library(dplyr) library(Seurat) library(patchwork) library(ggplot2) library(ComplexHeatmap) library(SeuratData) set.seed(1234) prepare the data data("pbmc3k") pbmc3k #> An object of class Seurat #> 13714 features across 2700 samples within 1 assay #> Active assay: RNA (13714 features, 0 variable features) ## routine processing pbmc3k<- pbmc3k %>% NormalizeData(normalization.

clustered dotplot for single-cell RNAseq

Dotplot is a nice way to visualize scRNAseq expression data across clusters. It gives information (by color) for the average expression level across cells within the cluster and the percentage (by size of the dot) of the cells express that gene within the cluster. Seurat has a nice function for that. However, it can not do the clustering for the rows and columns. David McGaughey has written a blog post using ggplot2 and ggtree from Guangchuang Yu.

stacked violin plot for visualizing single-cell data in Seurat

In scanpy, there is a function to create a stacked violin plot. There is no such function in Seurat, and many people were asking for this feature. e.g. https://github.com/satijalab/seurat/issues/300 https://github.com/satijalab/seurat/issues/463 The developers have not implemented this feature yet. In this post, I am trying to make a stacked violin plot in Seurat. The idea is to create a violin plot per gene using the VlnPlot in Seurat, then customize the axis text/tick and reduce the margin for each plot and finally concatenate by cowplot::plot_grid or patchwork::wrap_plots.

Align multiple ggplot2 plots by axis

I used to use cowplot to align multiple ggplot2 plots but when the x-axis are of different ranges, some extra work is needed to align the axis as well. The other day I was reading a blog post by GuangChuang Yu and he exactly tackled this problem. His packages such as ChIPseeker, ClusterProfiler, ggtree are quite popular among the users. Some dummy example from his post: library(dplyr) library(ggplot2) library(ggstance) library(cowplot) # devtools::install_github("YuLab-SMU/treeio") # devtools::install_github("YuLab-SMU/ggtree") library(tidytree) library(ggtree) no_legend=theme(legend.

My opinionated selection of books/urls for bioinformatics/data science curriculum

There was a paper on this topic: A New Online Computational Biology Curriculum. I am going to provide a biased list below (I have read most of the books if not all). I say it is biased because you will see many books of R are from Hadely Wickham. I now use tidyverse most of the time. Unix I suggest people who want to learn bioinformatics starting to learn unix commands first.

plot 10x scATAC coverage by cluster/group

This post was inspired by Andrew Hill’s recent blog post. Inspired by some nice posts by @timoast and @tangming2005 and work from @10xGenomics. Would still definitely have to split BAM files for other tasks, so easy to use tools for that are super useful too! — Andrew J Hill (@ahill_tweets) April 13, 2019 Andrew wrote that blog post in light of my other recent blog post and Tim’s (developer of the almighty Seurat package) blog post.

A tale of two heatmap functions

You probably do not understand heatmap! Please read You probably don’t understand heatmaps by Mick Watson In the blog post, Mick used heatmap function in the stats package, I will try to walk you through comparing heatmap, and heatmap.2 from gplots package. Before I start, I want to quote this: “The defaults of almost every heat map function in R does the hierarchical clustering first, then scales the rows then displays the image”