runMBASED1s: Function that runs single-sample ASE calling using data from...

Description Usage Arguments Value Examples

Description

Function that runs single-sample ASE calling using data from individual loci (SNVs) within units of ASE (genes). Vector arguments 'lociAllele1Counts', 'lociAllele2Counts', 'lociAllele1NoASEProbs', 'lociRhos', and 'aseIDs' should all be of the same length. Letting i1, i2, .., iN denote the indices corresponding to entries within aseIDs equal to a given aseID, the entries at those indices in the other vector arguments provide information for the loci within that aseID. This information is then used by runMBASED1s1aseID. It is assumed that for any i, the i-th entries of all vector arguments correspond to the same locus. If argument 'isPhased' (see below) is true, then entries corresponding to allele1 at each locus must represent the same haplotype.

Usage

1
2
3
runMBASED1s(lociAllele1Counts, lociAllele2Counts, lociAllele1NoASEProbs,
  lociRhos, aseIDs, numSim = 0, BPPARAM = SerialParam(), isPhased = FALSE,
  tieBreakRandom = FALSE, checkArgs = FALSE)

Arguments

lociAllele1Counts,lociAllele2Counts

vectors of counts of allele1 (e.g. reference) and allele2 (e.g., alternative) at individual loci. Allele counts are not necessarily phased (see argument 'isPhased'), so allele1 counts may not represent the same haplotype. Both arguments must be vectors of non-negative integers.

lociAllele1NoASEProbs

probabilities of observing allele1-supporting reads at individual loci under conditions of no ASE (e.g., vector with all entries set to 0.5, if there is no pre-existing allelic bias at any locus). Must be a vector with entries >0 and <1.

lociRhos

dispersion parameters of beta distribution at individual loci (set to 0 if the read count-generating distribution at the locus is binomial). Must be a vector with entries >=0 and <1.

aseIDs

the IDs of ASE units corresponding to the individual loci (e.g. gene names).

numSim

number of simulations to perform. Must be a non-negative integer. If 0 (DEFAULT), no simulations are performed.

BPPARAM

argument to be passed to bplapply(), when parallel achitecture is used to speed up simulations (parallelization is done over aseIDs). DEFAULT: SerialParam() (no parallelization).

isPhased

single boolean specifying whether the phasing has already been performed, in which case the lociAllele1Counts represent the same haplotype. DEFAULT: FALSE.

tieBreakRandom

single boolean specifying how ties should be broken during pseudo-phasing in cases of unphased data (isPhased=FALSE). If TRUE, each of the two allele will be assigned to major haplotype with probability=0.5. If FALSE (DEFAULT), allele1 will be assigned to major haplotype and allele2 to minor haplotype.

checkArgs

single boolean specifying whether arguments should be checked for adherence to specifications. DEFAULT: FALSE

Value

list with 3 elements:

ASEResults

Data frame with each row reporting MBASED results for a given aseID (aseIDs are provided as row names of this data frame). The columns of the data frame are: majorAlleleFrequency, pValueASE, heterogeneityQ, and pValueHeterogeneity.

allele1IsMajor

Vector of TRUE/FALSE of length equal to the number of supplied SNVs, reporting for each SNV whether allele1 represents major (TRUE) or minor (FALSE) haplotype of the corresponding aseID.

lociMAF

Vector of locus-specific estimates of the frequency of major allele, where 'major' refers to the haplotype of the gene found to be major by the ASE analysis. Note that since the determination of the major/minor status is done at the level of the gene, there may be loci with locus-specific MAF < 0.5.

Examples

1
2
3
4
5
SNVCoverage1 <- sample(10:100,5) ## gene with 5 loci
SNVAllele1Counts1 <- rbinom(length(SNVCoverage1), SNVCoverage1, 0.5)
SNVCoverage2 <- sample(10:100,5) ## gene with 5 loci
SNVAllele1Counts2 <- rbinom(length(SNVCoverage2), SNVCoverage2, 0.5)
MBASED:::runMBASED1s(lociAllele1Counts=c(SNVAllele1Counts1, SNVAllele1Counts2), lociAllele2Counts=c(SNVCoverage1-SNVAllele1Counts1, SNVCoverage2-SNVAllele1Counts2), lociAllele1NoASEProbs=rep(0.5, 10), lociRhos=rep(0, 10), aseIDs=rep(c('gene1','gene2'), each=5), numSim=10^6, BPPARAM=SerialParam(), isPhased=FALSE, tieBreakRandom=FALSE)

MBASED documentation built on Nov. 8, 2020, 5:53 p.m.