Bioinformatics

Part 2 CITE-seq downstream analysis: From Alevin output to Seurat visualization

In my last post, I showed you how to get the protein and RNA counts from a CITE-seq experiment using Simpleaf. Now that we have the raw count matrices, we are ready to explore them within R. To follow the tutorial, you can download the associated data from here. Load the data suppressPackageStartupMessages({ library(fishpond) library(ggplot2) library(dplyr) library(SingleCellExperiment) library(Seurat) library(DropletUtils) }) # set the seed set.seed(123) #gex_q <- loadFry('~/blog_data/CITEseq/alevin_rna/af_quant') #fb_q <- loadFry( '~/blog_data/CITEseq/alevin_adt/af_quant') # I saved the above objs first to rds files, now just read them back fb_q<- readRDS("~/blog_data/CITEseq/fb_q.

Part 1 How to use Salmon/Alevin to preprocess CITE-seq data

Introduction A state-of-the-art method called CITE-seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) allows surface protein levels and RNA expression to be measured simultaneously in individual cells. CITE-seq uses traditional single-cell RNA-sequencing to read out both transcriptome and proteomic information from the same cell after labeling it with oligo-conjugated antibodies. This gets over the drawbacks of techniques that just test proteins or RNA separately. CITE-seq reveals coordinated control of gene and protein activity, offering a potent multidimensional perspective of cell states.

My take on Data Challenges in Immuno-oncology, the Role of the Cloud, and Growing a Computational Biology Team

The original link. https://connect.corrdyn.com/blog/ming-tang-on-data-challenges-in-immuno-oncology-the-role-of-the-cloud-and-growing-a-computational-biology-team Guest Profile Tommy Tang’s career began when he pursued his Ph.D. in genetics and genomics at the University of Florida. Initially trained in molecular biology in the wet lab, he was driven to explore computational biology after encountering the limitations of traditional analysis methods. Through self-study, Tommy developed skills that enabled him to analyze complex genomic data sets. Following his Ph.D., Tommy joined MD Anderson Cancer Center and later moved to Harvard and the Dana Farber Cancer Institute, where he worked on single-cell RNA sequencing.

How to use random forest as a clustering method

If you ask me: what’s your favorite machine learning algorithm? I would answer logistic regression (with regularization: Lasso, Ridge and Elastic) followed by random forest. In fact, that’s how we try those methods in order. Deep learning can perform well for tabular data with complicated architecture while random forest or boost tree based method usually work well out of the box. Regression and random forest are more interpretable too.

My 4-steps to learn deep learning for genomics

Step 1, get a high-level understanding Watch statquest by Josh Starmer. 1blue3brown deep learning playlist Step2, code it out! If you are into python, watch “The spelled-out intro to neural networks and backpropagation: building micrograd”: I still code in R for most of the time, so I walk through the R code in the deep learning with R book.

How to convert raw counts to TPM for TCGA data and make a heatmap across cancer types

Sign up for my newsletter to not miss a post like this https://divingintogeneticsandgenomics.ck.page/newsletter The Cancer Genome Atlas (TCGA) project is probably one of the most well-known large-scale cancer sequencing project. It sequenced ~10,000 treatment-naive tumors across 33 cancer types. Different data including whole-exome, whole-genome, copy-number (SNP array), bulk RNAseq, protein expression (Reverse-Phase Protein Array), DNA methylation are available. TCGA is a very successful large sequencing project. I highly recommend learning from the organization of it.

Predict TCR cancer specificity using 1d convolutional and LSTM neural networks

The T-cell receptor (TCR) is a special molecule found on the surface of a type of immune cell called a T-cell. Think of T-cells like soldiers in your body’s defense system that help identify and attack foreign invaders like viruses and bacteria. The TCR is like a sensor or antenna that allows T-cells to recognize specific targets, kind of like how a key fits into a lock. When the TCR encounters a target it recognizes, it sends signals to the T-cell telling it to attack and destroy the invader.

How to create pseudobulk from single-cell RNAseq data

What is pseduobulk? Many of you have heard about bulk-RNAseq data. What is pseduobulk? Single-cell RNAseq can profile the gene expression at single-cell resolution. For differential expression, psedobulk seems to perform really well(see paper muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data). To create a pseudobulk, one can artificially add up the counts for cells from the same cell type of the same sample. In this blog post, I’ll guide you through the art of creating pseudobulk data from scRNA-seq experiments.

Generative AI: Text generation using Long short-term memory (LSTM) model

In the world of deep learning, generating sequence data is a fundamental task. Typically, this involves training a network, often an RNN (Recurrent Neural Network) or a convnet (Convolutional Neural Network), to predict the next token or a sequence of tokens in a given sequence, using the preceding tokens as input. For example, when provided with the input “the cat is on the ma,” the network’s objective is to predict the next character, such as ‘t.

Omics Playground: Derive biological insights from your omics data at your fingertip

Disclaimer: This post is sponsored by BigOmics platform. I have personally tested the platform. The opinions and views expressed in this post are solely those of the author and do not represent the views of my employer. A brief description of the platform. What challenges could the platform solve? The BigOmics platform - Omics Playground- provides a simplified approach for the effective processing of bulk RNA-seq data and proteomics data, resolving many issues experienced by scientists in the field.