I asked this question on Twitter:
what test to test if two distributions are different? I am aware of KS test. When n is large (which is common in genomic studies), the p-value is always significant. better to test against an effect size? how to do it in this context?
In genomics studies, it is very common to have large N (e.g., the number of introns, promoters in the genome, number of cells in the single-cell studies).
I asked this question on twitter.
load the package library(tidyverse) make some dummy data The dummy example: We have two groups of samples: disease and health. We treat those cells in vitro with different dosages (0, 1, 5) of a chemical X and count the cell number after 3 hours.
x <- tibble( '0' = c(8.66, 11.50, 7.01, 13.40, 11.30, 8.13, 5.92, 7.54), '1' = c(22.10, 23.00, 22.00, 35.70, 32.
I am taking this STATE-80 course from Harvard Extension School. This course teaches commonly used distributions and probability theory. The instructor Hatch is a really good teacher and he uses simulation for all the demonstrations along with the formulas.
In week 6, we revisited the Monty Hall problem which we played on the first day of class.
If you have not heard about it, I quoted from the wiki:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats.
I used to use cowplot to align multiple ggplot2 plots but when the x-axis are of different ranges, some extra work is needed to align the axis as well.
The other day I was reading a blog post by GuangChuang Yu and he exactly tackled this problem. His packages such as ChIPseeker, ClusterProfiler, ggtree are quite popular among the users.
Some dummy example from his post:
library(dplyr) library(ggplot2) library(ggstance) library(cowplot) # devtools::install_github("YuLab-SMU/treeio") # devtools::install_github("YuLab-SMU/ggtree") library(tidytree) library(ggtree) no_legend=theme(legend.