DESeq2_ysx: One step DEseq2 DE genes analysis for salmon output.

Description Usage Arguments Examples

View source: R/transcriptome.R

Description

One step DEseq2 DE genes analysis for salmon output.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
DESeq2_ysx(
  file,
  sampleFile,
  design,
  type,
  covariate = NULL,
  tx2gene = NULL,
  filter = NULL,
  output_prefix = "ehbio",
  rlog = T,
  vst = F,
  comparePairFile = NULL,
  padj = 0.05,
  log2FC = 1,
  dropCol = c("lfcSE", "stat")
)

Arguments

file

A file containing salmon output file lists if "type=salmon" with format described in salmon2deseq. Or reads count matrix file if "type=readscount" with format described in readscount2deseq.

sampleFile

A file containing at least two columns. The first column is sample name just like the first column of salmon_file_list. Other columns are sample attributes. Normally one of sample attributes should contain the group information each sample belongs to.

One simple example (conditions represent group information)

Samp    conditions
untrt_N61311    untrt
untrt_N052611    untrt
untrt_N080611    untrt
untrt_N061011    untrt
trt_N61311    trt
trt_N052611    trt
trt_N080611    trt
trt_N061011    trt

Another example (3rd column meaning samples from two batches)

Samp    conditions  batch
untrt_N61311    untrt A
untrt_N052611    untrt A
untrt_N080611    untrt B
untrt_N061011    untrt B
trt_N61311    trt A
trt_N052611    trt A
trt_N080611    trt B
trt_N061011    trt B
design

A column name from "sampleFile" like "conditions" in example. This will be used as group variable for DE tests. Currently only simple design is allowed. If one wants to model multiple variables, construct one representation of super variable as indicated in https://support.bioconductor.org/p/67600/#67612 may be useful.

type

Specify input file type, either "salmon" or "readscount". "tx2gene" currently has no effects for "type=readscount".

covariate

Names of columns containing informations maybe covariates like batch effects or other sample info. Multiple covariates should be supplied as a vector.

tx2gene

Optional and only used if one want to get gene expression instead of transcript expression. A two-column file with the first column as transcript names and second column as gene names. Header line is required but column names do not matter.

Below is an example of file contents.

txname    gene
ENST00000456328    ENSG00000223972
ENST00000450305    ENSG00000223972
ENST00000488147    ENSG00000227232
ENST00000619216    ENSG00000278267
ENST00000473358    ENSG00000243485
filter

Filter genes with low read counts. Default genes with total reads count lower than half of number of samples will be filtered out. One can give any number here. Normally default is OK. The DESeq2 will ao auto filter too. Check https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html.

output_prefix

A string, will be used as output file name prefix.

rlog

Get "rlog" transformed value for downstream correlation like analysis.

vst

Get "vst" transformed value for downstream correlation like analysis. Normally faster than "rlog".

comparePairFile

A file containing sample groups for comparing. Optional. If not given, the function will use colData information in dds and perform group compare for all possible combinations.

groupA groupB
groupA groupC
groupC groupB
padj

Multiple-test corrected p-value. Default 0.05.

log2FC

Log2 transformed fold change. Default 1.

dropCol

Columns to drop in final output. Default c("lfcSE", "stat"). Other options "ID", "baseMean", "log2FoldChange", "lfcSE", "stat", "pvalue", "padj". This has no specific usages except make the table clearer.

Examples

1
2
DESeq2_ysx(salmon_file_list, sampleFIle, conditions, type="salmon")
DESeq2_ysx(count_matrix_file, sampleFIle, conditions, type="readscount")

Tong-Chen/YSX documentation built on Jan. 25, 2021, 2:49 a.m.