Wavelet_screening: Main function to perform wavelet screening

Description Usage Arguments Details Value Examples

View source: R/Wavelet_screening.R

Description

Perform a wavelet screening of a locus for a given phenotype and a specified level of resolution

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
Wavelet_screening(
  Y,
  loci,
  bp,
  confounder,
  lev_res,
  sigma_b = NA,
  coeftype,
  base_shrink,
  para = FALSE,
  BF = FALSE,
  verbose = TRUE
)

Arguments

Y

phenotype vector has to be numeric. For case-control data, code it as 0 and 1. Multiple label phenotypes, e.g., ABO blood groups, will be implemented in the next version.

loci

genotype matrix (either data.frame or numeric matrix). Lines=SNPs in increasing order in terms of base pair, columns=individuals. No missing values allowed.

bp

vector of the base pairs positions. It has to be in the same order and length than the locus line order/length.

confounder

the confounding matrix with the same sample order as Y. The intercept should not be included if missing will generate an intercept matrix.

lev_res

the maximum level of resolution needed.

sigma_b

the parameter of the NIG prior used for the Beta computation, set as NA by default. If not provided, performs a frequentist modeling. We advise setting this value between 0.1 and 0.2

coeftype

type of wavelet coefficient used for the screening (choice "c" or "d"). If missing set as "c"

base_shrink

numeric, value used in the thresholding of the proportion of assocation, if non specificed set up as 1/sqrt(2*log(sample_size))

para

logical parameter for parallelization, if not specified, set at FALSE by default.

BF

logical parameter for obtainning the Bayes Factor of the wavelet regression. If not specified, set at FALSEby default.

verbose

logical parameter, set as TRUE by default. ID

Details

The Wavelet_screening function computes the likelihood ratio used for testing the significance of a genetic region. In addition, it computes the proportion of wavelet coefficients associated by the level of resolution and the Beta used for this estimation. All the details of the computation can be found in our paper, preliminarily titled "Wavelet screening: a novel look to GWAS data.".

Value

A named vector. The first position contains the estimated value of the Lambda statistics. The next positions of the vector are the computed proportion of associations per level of resolution.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
## Not run: 
set.seed(1)
#########################################
#Generate a randomly sampled SNP from a locus of size=1Mb
#########################################

#5000 Randomly choosen basepairs
my_bp <- sort(sample(1:1000000, size=5000,replace = FALSE))
#############################
#Three different bump signals
#############################
my_functions <-data.frame(f0 = c(rep(0,400000),rep(0,200000),rep(0,400000)),
                         f1 = c(rep(0,400000),rep(1,200000),rep(0,400000)) ,
                         f2=c(rep(0,400000),rep(2,200000),rep(0,400000)))


library(gridExtra)
###########################
#Minor allele frequency 30%
###########################
MAF=0.3
sampl_schem <- c((1-MAF)^2,2*MAF*(1-MAF),MAF^2)
#######################################
#Sampling at Hardy Weinberg equilibrium
#######################################
#Assigning class

#sample size =4000
n_size=4000
type_fn <-sample(0:2,replace = TRUE,size=n_size,  prob=  sampl_schem  )


genotype <-  matrix(my_functions[my_bp,2 ], ncol=1 ) %*%t(matrix(type_fn,ncol=1))
#dim(genotype)= nSNP, nind

###############################################################
#Generate a phenotype with variance explained by genotype =0.5%
###############################################################
varexp=0.005
var_noise <- (1-varexp)*var(sample(0:2,replace = TRUE,size=10000,
                                  prob=sampl_schem ))/varexp
Y <-  rnorm(n=n_size,sd=sqrt(var_noise)) +type_fn
df <- data.frame(y=Y,genotype =factor(type_fn))
P1 <- ggplot(df,aes(y=y,x=genotype))+
 geom_boxplot()+
 xlab("Type of genotype")+
 theme(axis.text=element_text(size=12),
       axis.title=element_text(size=14,face="bold"))+
 ylab("Simulated Phenotype")+
 theme_bw()+
 ggtitle("Variation of the phenotype\n depending of the genotype, \n Variance explained =0.5%")

df <- data.frame(bp= rep(my_bp,3),y=c(my_functions[my_bp,1],my_functions[my_bp,2],my_functions[my_bp,3]),
                mycol = factor(c(rep("f0",length(my_bp)),rep("f1",length(my_bp)),rep("f2",length(my_bp))) ) )

P2 <- ggplot(df,aes(y=y,x=bp,color=mycol))+
 geom_point(size=1)+
 xlab("Base pair")+
 ylab("Number of variants")+
 theme_bw()+
 theme(legend.title=element_blank())+
 ggtitle("Three different kind of genotype signal")

grid.arrange(P1,P2,ncol=2)

##################
#Wavelet screening
##################
res <- Wavelet_screening( Y,loci=genotype,bp=my_bp,
                         lev_res=6)
res
#Value of the test statistics
res[c("L_h","min_ph_pv")]
#############
#Significance
#############

#Simulate the null distribution using proxy covariance matrix

Sim <- Simu_null_proxy(Y,lev_res = 6 ,size=10000)
head(Sim)
#Calibration of the hyperparameter
lambda <- Search_lambda(Sim,plot=TRUE)

Th <- Sim[,c("L_h")]+lambda*Sim[,c("min_ph_pv")]
muv <- median(Th,na.rm = TRUE)
sdv <- mad(Th,na.rm = TRUE)
####################################
#Test Value of the loci to be tested
####################################
th <-  res[c("L_h")]+lambda*res["min_ph_pv"]
#######
#P-value
#######
1-pnorm(th,mean=muv,sd=sdv)

df <- data.frame(Th = Th,type = factor( c(rep("Null",length(Th)))) )
ggplot(df,aes(Th,fill=type))+
 xlim(c(min(c(Th,th)),max(Th,th)))+
 geom_density()+
 guides(fill=FALSE)+
 geom_point(aes(x=th, y=0), colour="red")+theme(legend.position="none")+
 geom_text(label="Value of the test statistics", x=th, y=0.001)+
 geom_text(label="Null distribution", x=mean(Th), y=0.001)+
 theme_bw()
##############
#Visualisation
##############
bp <- c(min(my_bp),max(my_bp))
plot_WS(res=res,bp=bp,lev_res=6)



## End(Not run)

william-denault/WaveletScreaming documentation built on Jan. 23, 2021, 12:34 p.m.