interest.sequential: Wrapup function: Sequential running

Description Usage Arguments Value Author(s) See Also Examples

View source: R/interest.sequential.R

Description

A read summarization function that countsns all the reads mapping to the introns/exons based on the users detailed parameter settings. The process runs on a single computing core.

Usage

1
2
3
4
5
6
7
8
interest.sequential( bamFileYieldSize=1000000, bamFile, isPaired,
	isPairedDuplicate=FALSE, isSingleReadDuplicate=NA, 
	reference, referenceGeneNames, 
	referenceIntronExon, repeatsTableToFilter=c(), 
	junctionReadsOnly=FALSE, outFile, logFile="", 
	returnObj= FALSE, method=c("IntRet", "ExEx", "IntSpan"), 
	appendLogFile=FALSE, sampleName="",
	scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), ...)

Arguments

bamFileYieldSize

Maximum number of paired Reads in the temprorary files created as the result of dividing the input .bam file.

bamFile

Path of the input bam file.

isPaired

Whether the bam file is the result of a paired end sequencing read mapping (TRUE) or not (FALSE).

isPairedDuplicate

Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for paired mapped reads. It uses the FLAG field in the bam file to filter the duplicate read. If the mapping software does not support detection and flaging the duplicate reads dedup tool of BamUtil or MarkDuplicates of Picard tools could be used.

isSingleReadDuplicate

Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for single mapped reads.

reference

Dataframe to be used as reference; It should at least contain three same-size vectors with the tag names chr, begin, and end which describe the genome coordinates of the introns and exons. It also accepts a GRanges object as input. To build a new reference check the referencePrepare function.

referenceGeneNames

A vector with the same size as the row-size of the reference which include the gene names.

referenceIntronExon

A vector with the same size as the row-size of the reference with values "intron" and "exon" describing which (intron or exon) each row of the reference represents.

repeatsTableToFilter

A data frame with similar structure as the reference, i.e. includes chr, begin, and end columns. If defined, all reads mapped to the described regions would be ingnored and the Intron/exon lengths would be corrected to exclude the regions with repetetive DNA sequences. See getRepeatTable.

junctionReadsOnly

The parameter is considered if the method is set as IntRet or ExEx (NOT IntSpan). It declares whether only consider the Intron-Exon or Exon-Exon junction reads and ignore the reads that fully map to exons or introns. By default this argument is set as FALSE.

outFile

The name or path of the result file.

logFile

The log file path; if defined log information are written to the log file.

returnObj

If set TRUE in addition to producing result text files, the results would also be returned as an object of class SummarizedExperiment.

method

A vector describing the summarization methods to use; i.e. whether count reads mapping to the introns (IntRet), reads mapping to the exons (ExEx), or reads spanning the introns (IntSpan). In IntSpan mode the introns in the reference are taken into account only; while in IntRet the introns and their spanning exons, and in ExEx mode only the exons in the reference are taken into account.

appendLogFile

Whether log information should be appended to the logFile. It is FALSE by default.

sampleName

The name of the sample being analyzed. It will be included in the returned object if returnObj is TRUE.

scaleLength

A vector constructed of TRUE/FALSE values, same size as the method argument. It indcates whether the retention levels of the intron/exons should be scaled to their lengths.

scaleFragment

A vector constructed of TRUE/FALSE values, same size as the method argument. It indcates whether the retention levels of the intron/exons should be scaled to the sum of retention levels (i.e. mapped fragments) over the genes.

...

Other parameter settings specific to BamFile-class function in the Rsamtools package. Parameters qnamePrefixEnd and qnameSuffixStart are in particular useful to modify qnames in the BAM files.

Value

If returnObj is set TRUE in addition to making result text files, dependant on whether a single or two method is defined, the results would be returned as a single object of class SummarizedExperiment or as a list of size 2 which includes 2 objects of class SummarizedExperiment one for IntRet and the other for ExEx.

Author(s)

Ali Oghabian

See Also

interest.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Creating temp directory to store the results
outDir<- file.path(tempdir(),"interestFolder")
dir.create(outDir)
outDir<- normalizePath(outDir)

# Loading suitable bam file
bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", 
	package="IntEREst", mustWork=TRUE)


# Choosing reference for the gene RHBDD3
ref=u12[u12[,"gene_name"]=="RHBDD3",]

test= interest.sequential(
	bamFileYieldSize=10000,
	bamFile=bamF,
	isPaired=TRUE,
	isPairedDuplicate=FALSE,
	isSingleReadDuplicate=NA,
	reference=ref,
	referenceGeneNames=ref[,"ens_gene_id"],
	referenceIntronExon=ref[,"int_ex"],
	repeatsTableToFilter=c(),
	outFile=paste(outDir,
	    "interestRes.tsv", sep="/"),
	logFile=paste(outDir, 
	    "log.txt", sep="/"),
	method=c("IntRet","IntSpan"),
	returnObj=TRUE, 
	scaleLength= c(TRUE,FALSE), 
	scaleFragment= c(TRUE,TRUE)
)

test

IntEREst documentation built on Nov. 8, 2020, 8:05 p.m.