Bioinformatics

Do not repeat yourself: List column to do RNAseq differential expression analysis

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In this blog post, I am going to show you how to use list column and purrr::map(), a powerful toolkit in your belt to avoid repetition in your bioinformatics analysis. To demonstrate the usage, We will use RNAseq differential expression analysis and pathway enrichment analysis as an example. Let’s use a real example https://www.

Reproducible Computing in Bioinformatics: Lessons from My Latest Talk

Hey everyone, it’s Tommy here. If you’ve been following my blog or my Twitter/X (@tangming2005), you know I love diving into the practical side of bioinformatics and genomics. Recently, I gave a talk titled “Good Enough Practices for Reproducible Computing” at Moderna, where I spent a good chunk of time chatting about reproducible computing. Why? Because in our field, where data is exploding and analyses get complex, making sure your work can be repeated—by you or anyone else—is a game-changer.

How to create a GenomicRanges object in Bioconductor using canonical transcripts

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Introduction to Annotation Data Packages in Bioconductor Accurate gene and transcript annotation is the foundation of many bioinformatics workflows, including RNA-seq analysis, functional genomics, and variant annotation. In the R/Bioconductor ecosystem, dedicated annotation data packages make it easy for researchers to access, query, and leverage gene models sourced from major biological databases.

Mastering Bioinformatics in the Age of AI: Foundational Skills for the Modern Scientist

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. AI is transforming every field — and bioinformatics is no exception. From designing drug molecules in minutes to writing entire pipelines, generative AI is making it faster than ever to process biological data. But here’s the truth: AI doesn’t understand biology — you do. That’s why, in this new era, your value isn’t replaced by AI — it’s multiplied by your ability to judge, validate, and improve what AI produces.

How Cancer Drugs Really Work

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. Suna was 28. She had melanoma. Chemo left her wrecked—her hair gone, her strength gone, and her hope fading. Doctors gave her weeks. Then they tried something different. It didn’t poison the tumor. It didn’t cut or burn. It woke up her immune system. Her own T-cells found the cancer. Attacked it. Killed it.

How to calculate partial correlation controlling cancer types

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. What is partial correlation Partial correlation measures the relationship between two variables while controlling for the effect of one or more other variables. Suppose you want to know how X and Y are related, independent of how both are influenced by Z. Partial correlation helps answer: If we remove the influence of Z, is there still a connection between X and Y?

Multi-Omics Integration Strategy and Deep Diving into MOFA2

body { text-align: justify} Today’s guest blog post on multiOmics integration is written by Aditi Qamra and edited by Tommy. If you want to do a guest posting in my blog which gets 30k views per month, feel free to contact me on LinkedIn. Aditi is a senior data scientist working on biomarker discovery and early product development at Roche, using multimodal clinical and genomic data. She has a PhD and postdoc in epigenomics of solid tumors and enjoys upskilling herself in stats topics.

How I Would Learn Bioinformatics From Scratch 12 Years Later: A Roadmap

You Can Change Your Appetites Linear algebra, statistics, machine learning—these used to feel abstract to me. I had zero experience of bioinformatics when I was studying my PhD in a wet lab. I memorized formulas without truly understanding them. But over time, I found the right resources that made these concepts click, especially in the context of bioinformatics. I wrote a blog post: My opinionated selection of books/urls for bioinformatics/data science curriculum six years ago, and many links are broken.

PCA analysis on TCGA bulk RNAseq data continued

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In my last blog post, I showed you how to download TCGA RNAseq count data and do PCA and make a heatmap. It is interesting to see some of the LUSC samples mix with the LUAD samples and vice versa. In this post, we will continue to use PCA to do more Exploratory data analysis (EDA).

PCA analysis on scATACseq data

To not miss a post like this, sign up for my newsletter to learn computational biology and bioinformatics. In my last post, I showed you how to use PCA for bulk RNAseq data. Today, let’s see how we can use it for scATACseq data. Download the example dataset from 10x genomics https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_pbmc_5k_v1 The dataset is 5k Peripheral blood mononuclear cells (PBMCs) from a healthy donor (v1.0). Download the atac_pbmc_5k_v1_filtered_peak_bc_matrix.tar.gz file and unzip it.