runMBASED2s: Function that runs between-sample (differential) ASE calling...

Description Usage Arguments Value Examples

Description

Function that runs between-sample (differential) ASE calling using data from individual loci (SNVs) within units of ASE (genes). Vector arguments 'lociAllele1CountsSample1', 'lociAllele2CountsSample1', 'lociAllele1NoASEProbsSample1', 'lociRhosSample1', 'lociAllele1CountsSample2', 'lociAllele2CountsSample2', 'lociAllele1NoASEProbsSample2', 'lociRhosSample2', and 'aseIDs' should all be of the same length. Letting i1, i2, .., iN denote the indices corresponding to entries within aseIDs equal to a given aseID, the entries at those indices in the other vector arguments provide information for the loci within that aseID for the respective samples. This information is then used by runMBASED2s1aseID. It is assumed that for any i, the i-th entries of all vector arguments correspond to the same locus, and that the entries corresponding to allele1 in sample1 and sample2 provide information on the same allele. If argument 'isPhased' (see below) is true, then entries corresponding to allele1 at each locus must represent the same haplotype.

Usage

1
2
3
4
5
runMBASED2s(lociAllele1CountsSample1, lociAllele2CountsSample1,
  lociAllele1CountsSample2, lociAllele2CountsSample2,
  lociAllele1NoASEProbsSample1, lociAllele1NoASEProbsSample2, lociRhosSample1,
  lociRhosSample2, aseIDs, numSim = 0, BPPARAM = SerialParam(),
  isPhased = FALSE, tieBreakRandom = FALSE, checkArgs = FALSE)

Arguments

lociAllele1CountsSample1,lociAllele2CountsSample1,lociAllele1CountsSample2,lociAllele2CountsSample2

vectors of counts of allele1 (e.g. reference) and allele2 (e.g. alternative) at individiual loci in sample1 and sample2. Allele counts are not necessarily phased (see argument 'isPhased'), so allele1 counts may not represent the same haplotype. However, the two alleles (allele1 and allele2) must be defined identically for both samples at each locus. All 4 arguments must be vectors of non-negative integers.

lociAllele1NoASEProbsSample1,lociAllele1NoASEProbsSample2

probabilities of observing allele1-supporting reads at individual loci under conditions of no ASE (e.g., vector with all entries set to 0.5, if there is no pre-existing allelic bias at any locus) in sample1 and sample2, respectively. Note that these probabilities are allowed to be sample-specific. Each argument must be a vector with entries >0 and <1.

lociRhosSample1,lociRhosSample2

dispersion parameters of beta distribution at individual loci (set to 0 if the read count-generating distribution at the locus is binomial). Note that the dispersions are allowed to be sample-specific. Each argument must be a vector with entries >=0 and <1.

aseIDs

the IDs of ASE units corresponding to the individual loci (e.g. gene names).

numSim

number of simulations to perform. Must be a non-negative integer. If 0 (DEFAULT), no simulations are performed.

BPPARAM

argument to be passed to bplapply(), when parallel achitecture is used to speed up simulations (parallelization is done over aseIDs). DEFAULT: SerialParam() (no parallelization).

isPhased

single boolean specifying whether the phasing has already been performed, in which case the lociAllele1CountsSample1 (and, therefore, lociAllele1CountsSample2) represent the same haplotype. DEFAULT: FALSE.

tieBreakRandom

single boolean specifying how ties should be broken during pseudo-phasing in cases of unphased data (isPhased=FALSE). If TRUE, each of the two allele will be assigned to major haplotype with probability=0.5. If FALSE (DEFAULT), allele1 will be assigned to major haplotype and allele2 to minor haplotype.

checkArgs

single boolean specifying whether arguments should be checked for adherence to specifications. DEFAULT: FALSE

Value

list with 3 elements:

ASEResults

Data frame with each row reporting MBASED results for a given aseID (aseIDs are provided as row names of this data frame). The columns of the data frame are: majorAlleleFrequencyDifference, pValueASE, heterogeneityQ, and pValueHeterogeneity.

allele1IsMajor

Vector of TRUE/FALSE of length equal to the number of supplied SNVs, reporting for each SNV whether allele1 represents major (TRUE) or minor (FALSE) haplotype of the corresponding aseID.

lociMAFDifference

Vector of locus-specific estimates of the difference of major allele (haplotype) frequency between the two samples. Note that 'major' and 'minor' distinction is made at the level of gene haplotype in sample1.

Examples

1
2
3
4
5
SNVCoverageTumor=sample(10:100, 5)
SNVCoverageNormal=sample(10:100, 5)
SNVAllele1CountsTumor=rbinom(length(SNVCoverageTumor), SNVCoverageTumor, 0.5)
SNVAllele1CountsNormal=rbinom(length(SNVCoverageNormal), SNVCoverageNormal, 0.5)
MBASED:::runMBASED2s(lociAllele1CountsSample1=SNVAllele1CountsTumor, lociAllele2CountsSample1=SNVCoverageTumor-SNVAllele1CountsTumor, lociAllele1CountsSample2=SNVAllele1CountsNormal, lociAllele2CountsSample2=SNVCoverageNormal-SNVAllele1CountsNormal, lociAllele1NoASEProbsSample1=rep(0.5, length(SNVCoverageTumor)), lociAllele1NoASEProbsSample2=rep(0.5, length(SNVCoverageNormal)), lociRhosSample1=rep(0, length(SNVCoverageTumor)), lociRhosSample2=rep(0, length(SNVCoverageNormal)), aseIDs=c(rep('gene1',4), 'gene2'), numSim=10^6,  BPPARAM=SerialParam(), isPhased=FALSE)

MBASED documentation built on Nov. 8, 2020, 5:53 p.m.