Genomics

Three gotchas when using R for Genomic data analysis

During my daily work with R for genomic data analysis, I encountered several instances that R gives me some (bad) surprises. 1. The devil 1 and 0 coordinate system read detail here https://github.com/crazyhottommy/DNA-seq-analysis#tips-and-lessons-learned-during-my-dna-seq-data-analysis-journey some files such as bed file is 0 based. Two genomic regions: chr1 0 1000 chr1 1001 2000 when you import that bed file into R using rtracklayer::import(), it will become chr1 1 1000 chr1 1000 2000 The function convert it to 1 based internally (R is 1 based unlike python).

Snakemake Pipelines

Snakemake pipelines for processing high-throughput sequencing data