interest: Wrapper function: Parallel run
In gacatag/IntEREst: Intron-Exon Retention Estimator

interest

R Documentation

Wrapper function: Parallel run

Description

A read summarization function that countsns all the reads mapping to the introns/exons based on the users detailed parameter settings. The process can be run in parallel on multiple computing cores to improve it performance.

Usage

interest( bamFileYieldSize=1000000, bamFile, isPaired, 
	isPairedDuplicate=FALSE, isSingleReadDuplicate= NA, reference, 
	referenceGeneNames, referenceIntronExon, repeatsTableToFilter=c(),
	junctionReadsOnly=FALSE, outFile, logFile="", 
	returnObj= FALSE, method=c("ExEx", "IntRet", "IntSpan", "ExSkip"), 
	strandSpecific,
	bpparam, appendLogFile=FALSE, sampleName="",
	scaleLength= c(TRUE,FALSE), scaleFragment= c(TRUE,TRUE), 
	limitRanges=GRanges(),
	excludeFusionReads=FALSE,
	loadLimitRangesReads=FALSE, ...)

Arguments

`bamFileYieldSize`	Maximum number of pair reads in the temprorary files created as the result of dividing the input .bam file.
`bamFile`	Path of the input bam file.
`isPaired`	Whether the bam file is the result of a paired end sequencing read mapping (TRUE) or not (FALSE).
`isPairedDuplicate`	Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for paired mapped reads. It uses the FLAG field in the bam file to filter the duplicate read. If the mapping software does not support detection and flaging the duplicate reads `dedup` tool of `BamUtil` or `MarkDuplicates` of `Picard` tools could be used.
`isSingleReadDuplicate`	Whether extract only (if set TRUE), filter (FALSE) or include (if set NA) PCR dupplicates for single mapped reads.
`reference`	Dataframe to be used as reference; It should at least contain three same-size vectors with the tag names `chr`, `begin`, and `end` which describe the exons and introns genome coordinates. It also accepts a `GRanges` object. To build a new reference check the `referencePrepare` function.
`referenceGeneNames`	A vector with the same size as the row-size of the reference which includes the gene names of the reference.
`referenceIntronExon`	A vector with the same size as the row-size of the reference with values "intron" and "exon" describing which (intron or exon) each row of the reference represents.
`repeatsTableToFilter`	A data.frame table with similar stucture to the `reference`. It includes `chr`, `begin`, and `end` columns. If defined, all reads mapped to the described regions would be ingnored and the Intron/exon lengths would be corrected to exclude the to exclude the regions with repetitive DNA sequences. See `getRepeatTable`.
`junctionReadsOnly`	The parameter is considered if the `method` is set as `IntRet` or `ExEx` (NOT `IntSpan`). It declares whether only consider the Intron-Exon or Exon-Exon junction reads and ignore the reads that fully map to exons or introns. By default this argument is set as FALSE.
`outFile`	The name or path of the result file.
`logFile`	The log file path; if defined log information are written to the log file.
`returnObj`	If set `TRUE` in addition to making result text files, the results would also be returned as an object of class `SummarizedExperiment`.
`method`	A vector describing the summarization methods to use; i.e. whether count reads mapping to the introns (`IntRet`), reads mapping to the exons (`ExEx`), reads spanning the introns (`IntSpan`), or reads that skip the exons (`ExSkip`). In `IntSpan` mode the introns in the reference are taken into account only; whilst in `IntRet` the introns and their spanning exons, and in `ExEx` and `ExSkip` mode only the exons in the reference are taken into account.
`strandSpecific`	The description for strand specificity of the RNAseq data. The values are either "unstranded", "stranded", or "reverse".If the reads are not strand specific or directional use "unstranded". If the first read in paired-read sequencing or the reads single-read sequencing is in the same direction as the the transcript strand use "stranded". If the first read in paired-read sequencing or the reads in single-read sequencing is in the oposite direction to the transcript strand use "reverse".
`bpparam`	An optional `BiocParallelParam` instance defining the parallel back-end to be used.
`appendLogFile`	Whether log information should be appended to the `logFile`. It is set `FALSE` by default.
`sampleName`	The name of the sample being analyzed. It will be included in the returned object if `returnObj` is `TRUE`.
`scaleLength`	A vector constructed of TRUE/FALSE values, same size as the `method` argument. It indcates whether the retention levels of the intron/exons should be scaled to their lengths.
`scaleFragment`	A vector constructed of TRUE/FALSE values, same size as the `method` argument. It indcates whether the retention levels of the intron/exons should be scaled to the sum of retention levels (i.e. mapped fragments) over the genes.
`limitRanges`	A GRanges object. If defined it loads sequencing reads that fall in the defined coordinates. It is similar to `which` parameter in `ScanBamParam`.
`excludeFusionReads`	Only valid if limitRanges is defined. It filters the defined by `limitRanges`. It also filters the read pairs if each read pair maps reads pairs where one of the reads either do not fall into one of the regions to a different region defined in `limitRanges`. It is useful to ignore analyzing the chimeric reads and fusion reads, i.e. reads that map to fusion genes. To filter properly, `limitRanges` must include coordinates of all genes.
`loadLimitRangesReads`	Boolean (TRUE/FALSE) variable. If set as `TRUE` only the reads in the limitRanges are loaded from bam file (and `bamFileYieldSize` parameter will be ignored).
`...`	Other parameter settings specific to `BamFile-class` function in the `Rsamtools` package. Parameters `qnamePrefixEnd` and `qnameSuffixStart` are in particular useful to modify qnames in the BAM files.

Value

If returnObj is set TRUE in addition to making result text files, dependant on whether a single or two method is defined, the results would be returned as a single object of class SummarizedExperiment or as a list of size 2 which includes 2 objects of class SummarizedExperiment one for IntRet and the other for ExEx.

Author(s)

Ali Oghabian

Examples


# Creating temp directory to store the results
outDir<- file.path(tempdir(),"interestFolder")
dir.create(outDir)
outDir<- normalizePath(outDir)

# Loading suitable bam file
bamF <- system.file("extdata", "small_test_SRR1691637_ZRSR2Mut_RHBDD3.bam", 
	package="IntEREst", mustWork=TRUE)

# Choosing reference for the gene RHBDD3
ref= u12[u12[,"gene_name"]=="RHBDD3",]

test= interest(
	bamFileYieldSize=10000,
	bamFile=bamF,
	isPaired=TRUE,
	isPairedDuplicate=FALSE,
	isSingleReadDuplicate=NA,
	reference=ref,
	referenceGeneNames=ref[,"ens_gene_id"],
	referenceIntronExon=ref[,"int_ex"],
	repeatsTableToFilter=c(),
	outFile=paste(outDir,
	    "interestRes.tsv", sep="/"),
	logFile=paste(outDir, 
	    "log.txt", sep="/"),
	method=c("IntRet", "IntSpan"),
	strandSpecific="unstranded",
	junctionReadsOnly=FALSE,
	returnObj=TRUE, 
	scaleLength= c(TRUE,FALSE), 
	scaleFragment= c(TRUE,TRUE)
)

test

gacatag/IntEREst documentation built on June 12, 2025, 6:21 p.m.