RAVA_FIRST: RAVA-FIRST: RAre Variant Association using...
In genostats/Ravages: Rare Variant Analysis and Genetic Simulations

RAVA.FIRST

R Documentation

RAVA-FIRST: RAre Variant Association using Functionally-InfoRmed STeps

Description

Analyse rare variants using the RAVA-FIRST approach based on CADD scores to group and filter rare variants

Usage

RAVA.FIRST(x, SNVs.scores = NULL, indels.scores = NULL, ref.level, 
           filter=c("whole", "controls", "any"), 
           maf.threshold=0.01, min.nb.snps = 2, 
           min.cumulative.maf = NULL, group = NULL, 
           cores = 10, burden = TRUE, H0.burden, burden.parameters, 
           SKAT = TRUE, H0.SKAT, SKAT.parameters, verbose = TRUE, path.data)

Arguments

`x`	A bed.matrix
`SNVs.scores`	A dataframe with the columns 'chr', 'pos', 'A1', 'A2' and 'adjCADD' containing the ADJUSTED CADD scores of the SNVs (Optional, useful to gain in computation time if the adjusted CADD scores of variants in the study are available)
`indels.scores`	A dataframe with the columns 'chr', 'pos', 'A1', 'A2' and 'PHRED_1.4' containing the CADD PHREDv1.4 scores of the indels - Compulsory if indels are present in `x`
`ref.level`	The level corresponding to the controls group, only needed if `filter=="controls"`
`filter`	On which group the MAF filter will be applied
`maf.threshold`	The MAF threshold used to define a rare variant, set at 0.01 by default
`min.nb.snps`	The minimum number of variants needed to keep a CADD region, set at 2 by default
`min.cumulative.maf`	The minimum cumulative maf of variants needed to keep a CADD region
`group`	A factor indicating the group of each individual, only needed if `filter = "controls"` or `filter = "any"`. If missing, `x@ped$pheno` is taken
`cores`	How many cores to use, set at 10 by default
`burden`	Whether to compute the burden test
`H0.burden`	A list returned from `NullObject.parameters` with `RVAT="burden"`
`burden.parameters`	A list containing the parameters to use by `burden.subscores` for the burden analysis ('burden.function' and 'get.effect.size')
`SKAT`	Whether to compute SKAT
`H0.SKAT`	A list returned from `NullObject.parameters` with `RVAT="SKAT"`
`SKAT.parameters`	A list containing the parameters to use by `SKAT` ('get.moments', 'estimation.pvalue', 'params.sampling', 'debug')
`verbose`	Whether to display information about the function actions
`path.data`	The repository where data for RAVA-FIRST are or will be downloaded from https://lysine.univ-brest.fr/RAVA-FIRST/

Details

Rare variants are analysed using the 'RAVA-FIRST' strategy composed of three steps: - Rare variants are grouped in 'CADD regions' defined from the CADD scores of variants observed in GnomAD. - Rare variant are selected within each CADD region based on an adjusted CADD score using a region-specific threshold corresponding to the median of scores observed in GnomAD in each region. - Burden analysis is performed by integrating sub-scores for the coding, regulatory and intergenic categories within each CADD region. For SKAT analysis, a test for each CADD region is performed.

RAVA.FIRST() is based on the functions set.CADDregions, filter.adjustedCADD, burden.subscores and SKAT. Please refer to these functions for more information. Especially, refer to the functions burden.subscores and SKAT to get more information about what is need in burden.parameters and SKAT.parameters.

bedtools need to be installed on the system if indels are present or if scores are not provided for SNVs.

It is recommended to use this function chromosome by chromosome for large datasets.

Value

A list containing the results for the burden analysis ('burden') and the results for the SKAT analysis ('SKAT'), along with information about CADD regions (positions, type of genomic categories overlapped by each region and median of adjusted CADD scores).

Source

https://lysine.univ-brest.fr/RAVA-FIRST/

Examples

##This example takes a few minutes to run. To run it quickly, we advise the user to decrease the number of variants or to increase the number of cores used
#Import 1000Genome data from region around LCT gene
#x <- as.bed.matrix(x=LCT.matrix.bed, fam=LCT.matrix.fam, bim=LCT.snps)

#Add population
#x@ped[,c("pop", "superpop")] <- LCT.matrix.pop1000G[,c("population", "super.population")]

#Select EUR superpopulation
#x <- select.inds(x, superpop=="EUR")
#x@ped$pop <- droplevels(x@ped$pop)

#Remove the multi-allelic variants
#x <- select.snps(x, !grepl(A2, pattern = ",") & !grepl(A1, pattern = ","))

#Keep only the rare variants
#x <- select.snps(x, maf <= 0.01 & maf > 0)

#Select the indels and make a dataframe to be used for annotation
#x.vcf <- subset(x@snps, nchar(A1)>1 | nchar(A2)>1)[,c("chr", "pos", "id", "A1", "A2")]

#Export the file for annotation 
#write.table(x.vcf, file = "indels.toannotate.vcf", col.names = F, row.names = F, quote = F, sep = "\t")

###############
#Annotate the file in on CADD website (https://cadd.gs.washington.edu/score)
#Please be careful here about which CADD version to use for annotation, here is v1.4
###############

#Download the results file, unzip it and load it in R
#x.indels <- read.table("indels.annotated.tsv")
#colnames(x.indels) <- c("chr", "pos", "A1", "A2", "CADD1.4", "PHRED_1.4")

#Set the NULL model for burden testing
#H0.burden <- NullObject.parameters(pheno = x@ped$pop, ref.level = "CEU", RVAT = "burden", pheno.type = "categorical")

#Run RAVA-FIRST
#x.RAVAFIRST = RAVA.FIRST(x, indels.scores = x.indels, H0.burden = H0.burden, SKAT = F, cores = 1, path.data = "RAVA-FIRST/Package", maf.threshold = 1)

genostats/Ravages documentation built on Oct. 11, 2023, 11:10 p.m.