buildSummarized: Generate summarized Read File for DE analyses

Description Usage Arguments Value Examples

View source: R/build_summarized.R

Description

This function will create a summarized file, decribing reads from RNA-seq experiments that overlap a set of transcript features. Transcript features can be described as a gtf formatted table that is imported, or using a txdb. This is designed to be straightforward and with minimised parameters for first pass batch RNA-seq analyses.

Usage

1
2
3
4
5
6
buildSummarized(sample_table = NULL, bam_dir = NULL, gtf = NULL,
  tx_db = NULL, mapping_mode = "Union", read_format = NULL,
  ignore_strand = FALSE, fragments = TRUE, summarized = NULL,
  output_log = NULL, filter = FALSE,
  BamFileList_yiedsize = NA_integer_, n_cores = 1,
  force_build = FALSE, verbose = FALSE)

Arguments

sample_table

A data.frame describing samples. For paired mode it must contain 3 columns, with the names "file", "group" and "pairs". The filename is the name in the directory supplied with the "bam_dir" parameter below. This is not required if an existing summarized file is provided. Default=NULL

bam_dir

Full path to location of bam files listed in the "file" column in the sample_table provided above. This is not required if an existing summarized file is provided. Default=NULL

gtf

Full path to a gtf file describing the transcript coordinates to map the RNA-seq reads to. GTF file is not required if providing a pre-computed summarized experiment file previously generated using buildSummarized() OR a tx_db object (below). Default = NULL

tx_db

An R txdb object. E.g. TxDb.Dmelanogaster.UCSC.dm3.ensGene. Default = NULL

mapping_mode

Options are "Union", "IntersectionStrict" and "IntersectionNotEmpty". see "mode" in ?summarizeOverlaps for explanation. Default = "Union"

read_format

Are the reads from single-end or paired-end data? Option are "paired" or "single". An option must be selected. Default = NULL

ignore_strand

Ignore strand when mapping reads? see "ignore_strand" in ?summarizeOverlaps for explanation. Default=FALSE

fragments

When mapping_mode="paired", include reads from pairs that do not map with their corresponding pair? see "fragments" in ?summarizeOverlaps for explanation. Default = TRUE

summarized

Full path to a summarized experiment file. If buildSummarized() has already been performed, the output summarized file, saved in "/output_log/se.R" can be used as the input (e.g. if filtering is to be done). Default = NULL

output_log

Full path to directory for output of log files and saved summarized experiment generated.

filter

Perform filtering of low count and missing data from the summarized experiment file? This uses default filtering via "filterByExpr". See ?filterByExpr for further information. Default=FALSE

BamFileList_yiedsize

If running into memory problems. Set the number of lines to an integer value. See "yieldSize" description in ?BamFileList for an explanation.

n_cores

Number of cores to utilise for reading in Bam files. Use with caution as can create memory issues if BamFileList_yiedsize is not parameterised. Default = 1

force_build

If the sample_table contains less than two replicates per group, force a summarizedExperiment object to be built. Otherwise buildSummarized will halt. Default = FALSE.

verbose

Verbosity ON/OFF. Default=FALSE

Value

A summarized experiment

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Extract summarized following example in the vignette
## The example below will return a summarized experiment
## tx_db is obtained from TxDb.Dmelanogaster.UCSC.dm3.ensGene library
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
## bam files are obtained from the GenomicAlignments package

## 1. Build a sample table that lists files and groupings
## - obtain list of files
file_list <- list.files(system.file("extdata", package="GenomicAlignments"),
                        recursive = TRUE,
                        pattern = "*bam$",
                        full = TRUE)
bam_dir <- as.character(gsub(basename(file_list)[1], "", file_list[1]))

## - create a sample table to be used with buildSummarized() below
## must be comprised of a minimum of two columns, named "file" and "group",
sample_table <- data.frame("file" = basename(file_list),
                           "group" = c("treat", "untreat"))

summarized_dm3 <- buildSummarized(sample_table = sample_table,
                                  bam_dir = bam_dir,
                                  tx_db = TxDb.Dmelanogaster.UCSC.dm3.ensGene,
                                  read_format = "paired",
                                  force_build = TRUE)

consensusDE documentation built on Feb. 1, 2019, 6:01 p.m.