View source: R/phylolmMethods.R
phylolm.createRmd | R Documentation |
.Rmd
file containing code to perform differential expression analysis with phylolm
.A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) using the phylolm package. The code is written to a .Rmd
file. This function is generally not called by the user, the main interface for performing differential expression analysis is the runDiffExp
function.
phylolm.createRmd(
data.path,
result.path,
codefile,
norm.method,
model = "BM",
measurement_error = TRUE,
extra.design.covariates = NULL,
length.normalization = "RPKM",
data.transformation = "log2",
...
)
data.path |
The path to a .rds file containing the |
result.path |
The path to the file where the result object will be saved. |
codefile |
The path to the file where the code will be written. |
norm.method |
The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the |
model |
The model for trait evolution on the tree. Default to "BM". |
measurement_error |
A logical value indicating whether there is measurement error. Default to TRUE. |
extra.design.covariates |
A vector containing the names of extra control variables to be passed to the design matrix of |
length.normalization |
one of "none" (no correction), "TPM" or "RPKM" (default). See details. |
data.transformation |
one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data. |
... |
Further arguments to be passed to function |
For more information about the methods and the interpretation of the parameters, see the phylolm
package and the corresponding publications.
The length.matrix
field of the phyloCompData
object
is used to normalize the counts, using one of the following formulas:
* length.normalization="none"
: CPM_{gi} = \frac{N_{gi} + 0.5}{NF_i \times \sum_{g} N_{gi} + 1} \times 10^6
* length.normalization="TPM"
: TPM_{gi} = \frac{(N_{gi} + 0.5) / L_{gi}}{NF_i \times \sum_{g} N_{gi}/L_{gi} + 1} \times 10^6
* length.normalization="RPKM"
: RPKM_{gi} = \frac{(N_{gi} + 0.5) / L_{gi}}{NF_i \times \sum_{g} N_{gi} + 1} \times 10^9
where N_{gi}
is the count for gene g and sample i,
where L_{gi}
is the length of gene g in sample i,
and NF_i
is the normalization for sample i,
normalized using calcNormFactors
of the edgeR
package.
The function specified by the data.transformation
is then applied
to the normalized count matrix.
The "+0.5
" and "+1
" are taken from Law et al 2014,
and dropped from the normalization
when the transformation is something else than log2
.
The "\times 10^6
" and "\times 10^9
" factors are omitted when
the asin(sqrt)
transformation is taken, as asin
can only
be applied to real numbers smaller than 1.
The design
model used in the phylolm
uses the "condition" column of the sample.annotations
data frame from the phyloCompData
object
as well as all the covariates named in extra.design.covariates
.
For example, if extra.design.covariates = c("var1", "var2")
, then
sample.annotations
must have two columns named "var1" and "var2", and the design formula
in the phylolm
function will be:
~ condition + var1 + var2
.
The function generates a .Rmd
file containing the code for performing the differential expression analysis. This file can be executed using e.g. the knitr
package.
Charlotte Soneson, Paul Bastide, Mélina Gallopin
Ho, L. S. T. and Ane, C. 2014. "A linear-time algorithm for Gaussian and non-Gaussian trait evolution models". Systematic Biology 63(3):397-408.
Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.
Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.
try(
if (require(ape) && require(phylolm)) {
tmpdir <- normalizePath(tempdir(), winslash = "/")
set.seed(20200317)
tree <- rphylo(10, 0.1, 0)
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000,
samples.per.cond = 5, n.diffexp = 100,
tree = tree,
id.species = 1:10,
lengths.relmeans = rpois(1000, 1000),
lengths.dispersions = rgamma(1000, 1, 1),
output.file = file.path(tmpdir, "mydata.rds"))
## Add covariates
## Model fitted is count.matrix ~ condition + test_factor + test_reg
sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5))
sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1)
saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds"))
## Diff Exp
runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "DESeq2",
Rmdfunction = "phylolm.createRmd",
output.directory = tmpdir,
norm.method = "TMM",
extra.design.covariates = c("test_factor", "test_reg"),
length.normalization = "RPKM")
})
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.