runMBASED: Main function that implements MBASED.

Description Usage Arguments Value Examples

Description

Main function that implements MBASED.

Usage

1
2
runMBASED(ASESummarizedExperiment, isPhased = FALSE, numSim = 0,
  BPPARAM = SerialParam())

Arguments

ASESummarizedExperiment

RangedSummarizedExperiment object containing information on read counts to be used for ASE detection. Rows represent individual heterozygous loci (SNVs), while columns represent individual samples. There should be either one or two columns, depending on whether one- or two-sample analysis is to be performed. Joint analysis of multiple samples or replicates is currently not supported, and one-sample analysis of multiple samples must be done through independent series of calls to runMBASED(). Note that for two-sample analysis, only loci which are heterozygous in both samples must be supplied (this excludes, e.g., tumor-specific mutations in cases of tumor/normal comparisons). For two-sample analysis, it is assumed that the first column corresponds to 'sample1' and the second column to 'sample2' in the sample1-vs-sample2 comparison. This is important, since differential ASE assessment is not symmetric and sample1-vs-sample2 comparison may yield different results from sample2-vs-sample1 comparison (the relationship is set up by assuming that only instances of ASE greater in sample1 than in sample2 are of interest). assays(ASESummarizedExperiment) must contain matrices lociAllele1Counts and lociAllele2Counts of non-negative integers, containing counts of allele1 (e.g. reference) and allele2 (e.g. alternative) at individual loci. All supplied loci must have total read count (across both alleles) greater than 0 (in each of the two samples, in the case of two-sample analysis). Allele counts are not necessarily phased (see 'isPhased' argument below), so allele1 counts may not represent the same haplotype. assays(ASESummarizedExperiment) may also contain matrix lociAllele1CountsNoASEProbs with entries >0 and <1, containing probabilities of observing allele1-supporting reads at individual loci under conditions of no ASE (which may differ for individual samples in the two-sample analysis). If this matrix is not provided, it is constructed such that every entry in the matrix is set to 0.5 (no pre-existing allelic bias at any locus in any sample). assays(ASESummarizedExperiment) may also contain matrix lociCountsDispersions with entries >=0 and <1, containing dispersion parameters of beta-binomial read count distribution at individual loci (which may differ for individual samples in the two-sample analysis). If this matrix is not provided, it is constructed such that every entry in the matrix is set to 0 (read count-generating distribution at each locus in each sample is binomial). Any other matrices in assays(ASESummarizedExperiment) are ignored by MBASED. rowRanges(ASESummarizedExperiment) must be supplied by the user, containing additional information about SNVs, including a required column 'aseID', specifying for each locus the unique unit of expression that it belongs to (e.g., gene; must be non-NA). MBASED uses names(rowRanges(ASESummarizedExperiment)), when specified, to give a unique identifier to each SNV; if no names are provided, the SNVs are labeled 'locus1', 'locus2', ..., in the row order.

isPhased

specifies whether the true haplotypes are known, in which case the lociAllele1Counts are assumed to represent allelic counts along the same haplotype (and the same is true of lociAllele2Counts). Must be either TRUE or FALSE (DEFAULT).

numSim

number of simulations to perform to estimate statistical signficance of observed ASE. Must be a non-negative integer. If set to 0 (DEFAULT), no simulations are performed and nominal p-values are reported.

BPPARAM

argument to be passed to function bplapply(), when parallel achitecture is used to speed up simulations (parallelization is done over aseIDs). DEFAULT: SerialParam() (no parallelization).

Value

RangedSummarizedExperiment object with rows representing individual aseIDs (genes) and a single column. assays(returnObject) includes single-column matrices 'majorAlleleFrequency' (1-sample analysis only), 'majorAlleleFrequencyDifference' (2-sample analysis only), 'pValueASE' (unadjusted ASE p-value), 'pValueHeterogeneity' (unadjusted inter-loci variability p-value, set to NA for aseIDs with only 1 locus). Note that p-values are not adjusted for multiple hypothesis testing, and the users should carry out such an adjustment themselves, e.g. by employing the utilities in the multtest package. In addition, metadata(returnObject) is a list containing a RangedSummarizedExperiment object names 'locusSpecificResults', with rows corresponding to individual loci (SNVs) and a single column, that provides information on locus-level MBASED analysis results. assays(metadata(returnObject)$locusSpecificResults) contains single-column matrices 'MAF' (estimate of allele frequency for gene-wide major allele at the locus, 1-sample analysis only), 'MAFDifference' (estimate of allele frequency difference for gene-wide major allele at the locus, 2-sample analysis only), and 'allele1IsMajor' (whether allele1 is assigned to major haplotype by MBASED).

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
mySNVs <- GRanges(
    seqnames=c('chr1', 'chr2', 'chr2', 'chr2'),
     ranges=IRanges(start=c(1000, 20020, 20285, 21114), width=1),
     aseID=c('gene1', rep('gene2', 3)),
    allele1=c('G', 'A', 'C', 'A'),
    allele2=c('T', 'C', 'T', 'G')
)
names(mySNVs) <- paste0('SNV', 1:4)
## RangedSummarizedExperiment object with data to run tumor vs. normal comparison
mySE_TumorVsNormal <- SummarizedExperiment(
     assays=list(
        lociAllele1Counts=matrix(
             c(
                c(25,10,22,14),
                c(18,17,14,28)
            ),
            ncol=2,
            dimnames=list(
                names(mySNVs),
                c('tumor', 'normal')
            )
         ),
        lociAllele2Counts=matrix(
             c(
                c(20,16,15,16),
                 c(23,9,24,17)
            ),
            ncol=2,
            dimnames=list(
                names(mySNVs),
                 c('tumor', 'normal')
            )
         ),
         lociAllele1CountsNoASEProbs=matrix(
             c(
                 c(0.48, 0.51, 0.55, 0.45),
                 c(0.52, 0.43, 0.52, 0.43)
             ),
             ncol=2,
             dimnames=list(
                 names(mySNVs),
                 c('tumor', 'normal')
             )
         ),
        lociCountsDispersions=matrix(
            c(
                 c(0.005, 0.007, 0.003, 0.01),
                 c(0.001, 0.004, 0.02, 0.006)
             ),
            ncol=2,
            dimnames=list(
                 names(mySNVs),
                 c('tumor', 'normal')
            )
         )
    ),
     rowRanges=mySNVs
)
twoSampleAnalysisTumorVsNormal <- runMBASED(
    ASESummarizedExperiment=mySE_TumorVsNormal,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(twoSampleAnalysisTumorVsNormal)
assays(twoSampleAnalysisTumorVsNormal)$majorAlleleFrequencyDifference
assays(twoSampleAnalysisTumorVsNormal)$pValueASE
assays(twoSampleAnalysisTumorVsNormal)$pValueHeterogeneity
assays(metadata(twoSampleAnalysisTumorVsNormal)$locusSpecificResults)$MAFDifference
assays(metadata(twoSampleAnalysisTumorVsNormal)$locusSpecificResults)$allele1IsMajor

## exchanging the order of the columns will allow us to run normal vs. tumor comparison
## Note that while results are the same for single-locus gene1, they differ for multi-locus gene2
mySE_NormalVsTumor <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,c('normal','tumor')]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[2:1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_NormalVsTumor )) <- names(assays(mySE_TumorVsNormal))
twoSampleAnalysisNormalVsTumor <- runMBASED(
    ASESummarizedExperiment=mySE_NormalVsTumor,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(twoSampleAnalysisNormalVsTumor)
assays(twoSampleAnalysisNormalVsTumor)$majorAlleleFrequencyDifference
assays(twoSampleAnalysisNormalVsTumor)$pValueASE
assays(twoSampleAnalysisNormalVsTumor)$pValueHeterogeneity
assays(metadata(twoSampleAnalysisNormalVsTumor)$locusSpecificResults)$MAFDifference
assays(metadata(twoSampleAnalysisNormalVsTumor)$locusSpecificResults)$allele1IsMajor

## we can also do separate one-sample analysis on tumor and normal samples
mySE_Tumor <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,'tumor',drop=FALSE]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_Tumor)) <- names(assays(mySE_TumorVsNormal))
oneSampleAnalysisTumor <- runMBASED(
    ASESummarizedExperiment=mySE_Tumor,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(oneSampleAnalysisTumor)
assays(oneSampleAnalysisTumor)$majorAlleleFrequency
assays(oneSampleAnalysisTumor)$pValueASE
assays(oneSampleAnalysisTumor)$pValueHeterogeneity
assays(metadata(oneSampleAnalysisTumor)$locusSpecificResults)$MAF
assays(metadata(oneSampleAnalysisTumor)$locusSpecificResults)$allele1IsMajor

mySE_Normal <- SummarizedExperiment(
    assays=lapply(names(assays(mySE_TumorVsNormal)), function(matName) {
        curMat <- assays(mySE_TumorVsNormal)[[matName]]
        modifiedMat <- curMat[,'normal',drop=FALSE]
        return(modifiedMat)
    }),
    colData=colData(mySE_TumorVsNormal)[1,],
    rowRanges=rowRanges(mySE_TumorVsNormal)
)
names(assays(mySE_Normal)) <- names(assays(mySE_TumorVsNormal))
oneSampleAnalysisNormal <- runMBASED(
    ASESummarizedExperiment=mySE_Normal,
     numSim=10^6,
     BPPARAM=SerialParam(),
     isPhased=FALSE
)
rowRanges(oneSampleAnalysisNormal)
assays(oneSampleAnalysisNormal)$majorAlleleFrequency
assays(oneSampleAnalysisNormal)$pValueASE
assays(oneSampleAnalysisNormal)$pValueHeterogeneity
assays(metadata(oneSampleAnalysisNormal)$locusSpecificResults)$MAF
assays(metadata(oneSampleAnalysisNormal)$locusSpecificResults)$allele1IsMajor

MBASED documentation built on Nov. 8, 2020, 5:53 p.m.