arseq: Automated RNASeq Analysis Pipeline
In ajitjohnson/arseq: Automated RNA Sequencing Analysis Pipeline

Description Usage Arguments Value Examples

An easy to use pipeline for analysing RNA Seq data. The package currently supports the following analysis- Differential gene expression analysis using DESeq2, Calculate the most variable genes, PCA analysis, GO enrichment of the differentially expressed genes, KEGG pathway enrichment of the differentially expressed genes, GSEA analysis.

arseq(
  data,
  meta,
  design,
  contrast,
  dds = NULL,
  qc = TRUE,
  dgea = TRUE,
  species = "hsapiens_gene_ensembl",
  variable.genes = 1000,
  folder.name = "ARSeq",
  custom.gsea = NULL,
  kmeans = 10,
  save.dir = getwd(),
  ensemblmirror = "useast"
)

`data`	Un-normalized counts matrix (please note that you should NOT pass in normalized data). The counts' table should contain unique gene names as the first column. ENSEMBL ID's are also allowed but no other form of ID's are currently supported. Check example- head(example_data): `example_data`.
`meta`	A CSV file with information regarding the samples. It is absolutely critical that the columns of the counts' matrix and the rows of the metadata are in the same order. The function will not make guesses as to which column of the count matrix belongs to which row of the metadata, these must be provided in a consistent order. Check example- head(example_meta): `example_meta`.
`design`	The formula that expresses how the counts for each gene depend on your metadata (used to calculate the necessary data for differential gene expression analysis). Check DESeq2 documentation for designing the formula. In general, you pass in a column name (e.g. treatment) of your metadata file or a combination of column names (e.g. treatment + cell_type).
`contrast`	Information regarding the groups between which you would like to perform differential gene expression analysis. It could be between two groups or between multiple groups and needs to follow the following format: contrast = list(A = c(" "), B= c(" ")). If you are comparing two groups (e.g. control vs treatment), the constrast argument should look like the following: contrast = list(A = c("control"), B= c("treatment")). In situations where you have multiple groups to compare- (e.g. control vs treatment1 and treatment2), you should do the following- contrast = list(A = c("control"), B= c("treatment1", "treatment2")).
`qc`	Logical. When passed in TRUE, the program would run the quality control modules on the entire dataset. If you are planning to perform multiple comparisons using the contrast argument, run qc = TRUE for the first time and then change it to qc = FALSE for the subsequent comparisons to speed up the analysis.
`dgea`	Logical. Parameter to define if differential gene expression analysis is to be performed. Default: TRUE
`species`	Only applies to converting ENSEMBL IDs to Gene names (not to enrichment analysis). Species you want to use. To see the different datasets available you can use do: library(biomaRt); followed by mart = useEnsembl('ENSEMBL_MART_ENSEMBL'); followed by listDatasets(mart). Default: 'hsapiens_gene_ensembl'.
`variable.genes`	numeric: The number of most variable genes to be identified. By default, the program identifies the top 1000 most variable genes.
`folder.name`	Custom folder name that you would like to save your results in.
`custom.gsea`	User defined gene list to perform GSEA. File need to be supplied as a dataframe with each row as a gene list
`kmeans`	int. Number of clusters/ gene modules. The Most Variable Genes and Differentially Expressed Genes will be divided into the user defined clusters. Default 10.
`save.dir`	Directory to save the results in. Default: Working Directory.
`ensemblmirror`	String. Values for the mirror argument are: useast, uswest, asia

Differentially expressed genes.

## Not run: 
contrast = list(A = c("control"), B= c("drug_A"))
arseq2 (data = example_data,meta = example_meta, design = "treatment", contrast = contrast)

## End(Not run)