Calculate scATACseq TSS enrichment score

TSS enrichment score serves as an important quality control metric for ATACseq data. I want to write a script for single cell ATACseq data. From the Encode page: Transcription Start Site (TSS) Enrichment Score - The TSS enrichment calculation is a signal to noise calculation. The reads around a reference set of TSSs are collected to form an aggregate distribution of reads centered on the TSSs and extending to 1000 bp in either direction (for a total of 2000bp).

clustering scATACseq data: the TF-IDF way

scATACseq data are very sparse. It is sparser than scRNAseq. To do clustering of scATACseq data, there are some preprocessing steps need to be done. I want to reproduce what has been done after reading the method section of these two recent scATACseq paper: A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility Darren Cell 2018 Latent Semantic Indexing Cluster Analysis In order to get an initial sense of the relationship between individual cells, we first broke the genome into 5kb windows and then scored each cell for any insertions in these windows, generating a large, sparse, binary matrix of 5kb windows by cells for each tissue.

plot 10x scATAC coverage by cluster/group

This post was inspired by Andrew Hill’s recent blog post. Inspired by some nice posts by @timoast and @tangming2005 and work from @10xGenomics. Would still definitely have to split BAM files for other tasks, so easy to use tools for that are super useful too! — Andrew J Hill (@ahill_tweets) April 13, 2019 Andrew wrote that blog post in light of my other recent blog post and Tim’s (developer of the almighty Seurat package) blog post.

Use docopt to write command line R utilities

I was writing an R script to plot the ATACseq fragment length distribution and wanted to turn the R script to a command line utility. I then (re)discovered this awesome docopt.R. One just needs to write the help message the you want to display and docopt() will parse the options, arguments and return a named list which can be accessed inside the R script. check for more information as well.

Split a 10xscATAC bam file by cluster

I want to split the PBMC scATAC bam from 10x by cluster id. So, I can then make a bigwig for each cluster to visualize in IGV. The first thing I did was googling to see if anyone has written such a tool (Do not reinvent the wheels!). People have done that because I saw figures from the scATAC papers. I just could not find it. Maybe I need to refine my googling skills.

understand 10x scRNAseq and scATAC fastqs

single cell RNAseq Please read the following posts by Dave Tang. When I google, I always find his posts on top of the pages. Thanks for sharing your knowledge. From the 10x manual: The final Single Cell 3’ Libraries contain the P5 and P7 primers used in Illumina bridge amplification PCR. The 10x Barcode and Read 1 (primer site for sequencing read 1) is added to the molecules during the GEMRT incubation.