RVgene: Probability of sharing of rare variants in a family sample...

Description Usage Arguments Details Value References Examples

View source: R/RVgene.R

Description

Computing probability of sharing of rare variants in a family sample within a genomic region such as a gene.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
RVgene(
  data,
  ped.listfams,
  sites,
  fams,
  pattern.prob.list,
  nequiv.list,
  N.list,
  type = "alleles",
  minor.allele.vec,
  precomputed.prob = list(0),
  maxdim = 1e+09,
  partial.sharing = TRUE,
  ...
)

Arguments

data

A list of SnpMatrix objects corresponding to each pedigree object in ped.listfams, or a data.frame or matrix encoding the pedigree information and genotype data in the standard LINKAGE ped format or the PLINK raw format with additive component only (see PLINK web site [1]). From the pedigree information, only the family ID in the first column, the subject ID in the second column and the affection status in the sixth column are used (columns 3 to 5 are ignored). Also, family members without genotype data do not need to appear in this object. The genotype of each variant can be coded in two ways, each corresponding to a different value of the type option: a minor allele count on one column with missing values coded NA, (type="count") or the identity of the two alleles on two consecutive columns, with missing values coded 0 corresponding to the standard LINKAGE ped format (type="alleles"). If you provide a SnpMatrix object then the genotype should be coded as the minor allele count + 1, i.e. 01 is the homozygous genotype for the common allele.

ped.listfams

a list of pedigree objects, one object for each pedigree for which genotype data are included in data.

sites

a vector of the column indices of the variant sites to test in data. If the argument fams is provided, the variant sites are tested in each corresponding family in the fams vector (a variant present in multiple families must then be repeated for every families where it appears).

fams

an optional character vector of the names of families in data and ped.listfams carrying the variants listed in the corresponding position in sites. If missing, the names of the families carrying the minor allele at each position in sites are extracted from data

pattern.prob.list

a list of precomputed rare variant sharing probabilities for all possible sharing patterns in the families in data and ped.listfams

nequiv.list

an optional vector of the number of configurations of rare variant sharing by the affected subjects corresponding to the same pattern and probability in pattern.prob.list. Default is a vector of 1s

N.list

a vector of the number of affected subjects sharing a rare variant in the corresponding pattern in pattern.prob.list

type

an optional character string taking value "alleles" or "count". Default is "alleles"

minor.allele.vec

an optional vector of the minor alleles at each site in the sites vector. It is not needed if type="count". If it is missing and type="alleles", the minor allele is assumed to take the value 2

precomputed.prob

an optional list of vectors precomputed rare variant sharing probabilities for families in data and ped.listfams. If the vectors are named, the names must be strings formed by the concatenation of the sorted carrier names separated by semi-columns. If the vectors are not named, the vectors must represent probabilities for all the possible values of N.list for the corresponding family (one probability per value of N.list)

maxdim

upper bound on the dimension of the array containing the joint distribution of the sharing patterns for all families in fams (to avoid running out of memory)

partial.sharing

logical indicating whether the test allowing for sharing by a subset of affected subjects should be performed. If FALSE, only the test requiring sharing by all affected subjects is computed. Default is TRUE

...

other arguments to be passed to RVsharing

Details

The function extracts the carriers of the minor allele at each entry in sites in each family where it is present in ped.mat (or in the families specified in fams if that argument is specified). It then computes exact rare variant sharing probabilities in each family for each variant by calling RVsharing. If multiple rare variants are seen in the same family, the smallest sharing probability among all rare variants is retained. The joint rare variant sharing probability over all families is obtained as the product of the family-specific probabilities. The p-value of the test allowing for sharing by a subset of affected subjects over the rare variants in the genomic region is then computed as the sum of the probabilities of the possible combinations of sharing patterns among all families with a probability less than or equal to the observed joint probability and a total number of carriers greater than or equal to the sum of the number of carriers in all families, using the values in pattern.prob.list, nequiv.list and N.list. The families where all affected subjects share a rare variant are determined by verifying if the length of the carrier vector equals the maximum value of N.list for that family. The p-value of the test requiring sharing by all affected subjects is computed by calling multipleFamilyPValue.

Value

A list with items: p P-value of the exact rare variant sharing test allowing for sharing by a subset of affected subjects. pall P-value of the exact rare variant sharing test requiring sharing by all affected subjects. potentialp Minimum achievable p-value if all affected subjects were carriers of a rare variant.

References

Bureau, A., Begum, F., Taub, M.A., Hetmanski, J., Parker, M.M., Albacha-Hejazi, H., Scott, A.F., et al. (2019) Inferring Disease Risk Genes from Sequencing Data in Multiplex Pedigrees Through Sharing of Rare Variants. Genet Epidemiol. 43(1):37-49. doi: 10.1002/gepi.22155.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
data(samplePedigrees)
data(ex.ped.mat)
fam15157 <- samplePedigrees$secondCousinTriple
fam15157.pattern.prob = c(RVsharing(fam15157,carriers=c(15,16,17)),
    RVsharing(fam15157,carriers=c(15,16)),
    RVsharing(fam15157,carriers=c(15)))
fam15157.nequiv = c(1,3,3)
# check that distribution sums to 1
sum(fam15157.pattern.prob*fam15157.nequiv)
fam15157.N = 3:1
fam28003 <- samplePedigrees$firstAndSecondCousinsTriple
fam28003.pattern.prob = c(RVsharing(fam28003,carriers=c(36,104,110)),
    RVsharing(fam28003,carriers=c(36,104)),
    RVsharing(fam28003,carriers=c(104,110)),
    RVsharing(fam28003,carriers=c(36)),
    RVsharing(fam28003,carriers=c(104)))
fam28003.N = c(3,2,2,1,1)
fam28003.nequiv = c(1,2,1,1,2)
# check that distribution sums to 1
sum(fam28003.pattern.prob*fam28003.nequiv)
# Creating lists
ex.pattern.prob.list = list("15157"=fam15157.pattern.prob,"28003"=fam28003.pattern.prob)
ex.nequiv.list = list("15157"=fam15157.nequiv,"28003"=fam28003.nequiv)
ex.N.list = list("15157"=fam15157.N,"28003"=fam28003.N)
ex.ped.obj = list(fam15157,fam28003)
names(ex.ped.obj) = c("15157","28003")
sites = c(92,119)
minor.allele.vec=c(1,4)
RVgene(ex.ped.mat,ex.ped.obj,sites,
    pattern.prob.list=ex.pattern.prob.list,
nequiv.list=ex.nequiv.list,N.list=ex.N.list,
    minor.allele.vec=minor.allele.vec)
# calling with a SnpMatrix list
data(famVCF)
fam15157.snp = suppressWarnings(VariantAnnotation::genotypeToSnpMatrix(fam15157.vcf))
fam28003.snp = suppressWarnings(VariantAnnotation::genotypeToSnpMatrix(fam28003.vcf))
ex.SnpMatrix.list = list(fam15157=fam15157.snp$genotypes,fam28003=fam28003.snp$genotypes)
RVgene(ex.SnpMatrix.list,ex.ped.obj,sites,
    pattern.prob.list=ex.pattern.prob.list, nequiv.list=ex.nequiv.list,
    N.list=ex.N.list,minor.allele.vec=minor.allele.vec)

RVS documentation built on Nov. 8, 2020, 6:57 p.m.