salmon2deseq: Iniitialize a DESeq2 object from salmon output.

Description Usage Arguments Value Examples

View source: R/transcriptome.R

Description

Iniitialize a DESeq2 object from salmon output.

Usage

1
2
3
4
5
6
7
8
9
salmon2deseq(
  salmon_file_list,
  sampleFile,
  design,
  covariate = NULL,
  tx2gene = NULL,
  filter = NULL,
  rundeseq = T
)

Arguments

salmon_file_list

A two-column file with the first column as sample names and second column containing the path of quant.sf generated by salmon. Header line is required but column names do not matter.

Below is an example (pay attention to the path of quant.sf)

Samp    path_quant.sf
untrt_N61311    untrt_N61311/untrt_N61311.salmon.count/quant.sf
untrt_N052611    untrt_N052611/untrt_N052611.salmon.count/quant.sf
untrt_N080611    untrt_N080611/untrt_N080611.salmon.count/quant.sf
untrt_N061011    untrt_N061011/untrt_N061011.salmon.count/quant.sf
trt_N61311    trt_N61311/trt_N61311.salmon.count/quant.sf
trt_N052611    trt_N052611/trt_N052611.salmon.count/quant.sf
trt_N080611    trt_N080611/trt_N080611.salmon.count/quant.sf
trt_N061011    trt_N061011/trt_N061011.salmon.count/quant.sf
sampleFile

A file containing at least two columns. The first column is sample name just like the first column of salmon_file_list. Other columns are sample attributes. Normally one of sample attributes should contain the group information each sample belongs to.

One simple example (conditions represent group information)

Samp    conditions
untrt_N61311    untrt
untrt_N052611    untrt
untrt_N080611    untrt
untrt_N061011    untrt
trt_N61311    trt
trt_N052611    trt
trt_N080611    trt
trt_N061011    trt

Another example (3rd column meaning samples from two batches)

Samp    conditions  batch
untrt_N61311    untrt A
untrt_N052611    untrt A
untrt_N080611    untrt B
untrt_N061011    untrt B
trt_N61311    trt A
trt_N052611    trt A
trt_N080611    trt B
trt_N061011    trt B
design

A column name from "sampleFile" like "conditions" in example. This will be used as group variable for DE tests. Currently only simple design is allowed. If one wants to model multiple variables, construct one representation of super variable as indicated in https://support.bioconductor.org/p/67600/#67612 may be useful.

covariate

Names of columns containing informations maybe covariates like batch effects or other sample info. Multiple covariates should be supplied as a vector.

tx2gene

Optional and only used if one want to get gene expression instead of transcript expression. A two-column file with the first column as transcript names and second column as gene names. Header line is required but column names do not matter.

Below is an example of file contents.

txname    gene
ENST00000456328    ENSG00000223972
ENST00000450305    ENSG00000223972
ENST00000488147    ENSG00000227232
ENST00000619216    ENSG00000278267
ENST00000473358    ENSG00000243485
filter

Filter genes with low read counts. Default genes with total reads count lower than half of number of samples will be filtered out. One can give any number here. Normally default is OK. The DESeq2 will ao auto filter too. Check https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html.

rundeseq

Default TRUE. The function will perfrom deseq analysis using DESeq and return analyzed DESeqDataSet object. If FALSE, just return a DESeqDataSet object and one can run DESeqon it with more customed parameters.

Value

A DESeqDataSet object.

Examples

1
2
3
dds <- salmon2deseq(salmon_file_list, sampleFile, "conditions")
dds <- salmon2deseq(salmon_file_list, tx2gene=tx2gene,
                    sampleFile, "conditions")

Tong-Chen/YSX documentation built on Jan. 25, 2021, 2:49 a.m.