mWaveQTL: Main function to perform Fast functional association analysis...
In william-denault/mWaveQTL: mWaveQTL

Description Usage Arguments Details Value References Examples

View source: R/mWaveQTL.R

Perform a screening for a trait on a given phenotype and a specified level of resolution

mWaveQTL(
  Y,
  genotype,
  pos,
  confounder,
  lev_res,
  sigma_b,
  para = FALSE,
  nperm = 10000,
  npermstop = 100
)

`Y`	phenotype vector, has to be numeric. For case control code it as 0 and 1.
`genotype`	genotype matrix (either data.frame or numeric matrix). Lines=SNPs in increasing order in term of base pair, columns=individuals. No missing values allowed.
`pos`	vector of spatial position of the variables. It has to be in the same order as long as the number of column of the genotype matrix line order/length.
`confounder`	the confounding matrix with the same sample order as Y. The intercept should not be included, if missing will generate a intercept matrix.
`lev_res`	the maximum level of resolution needed
`sigma_b`	the parameter of the NIG prior used for the Bayes Factor computation. We advised to set this value between 0.1 and 0.2
`para`	logical parameter for parallelisation, if not specified set at FALSE.
`nperm`	the number of permutation peformed to asses the significance of a regions. Default value 10,000
`npermstop`	Number of permutations allowed to be larger than the observed test statistics for the regions, before stopping permuation procedure. Default value =100

The mWaveQTL function computes the Likelihood ratio used for testing significance of a genetic region. In addition it computes the porportion of wavelets coefficients associated by level of resolution, and the Bayes factor used for this estimation.

A named vector. First position the estimated value of the Lambda statistics, then the proportion of association per level of resolution then the computed Bayes Factor per wavelet coefficient.

Shim and Stephens,Wavelet-based genetic association analysis of functional phenotypes arising from high-thoughput sequencing asssays,The Annals of Applied Statistics, 2015, Vol. 9, No. 2, 665–686

## Not run: 


set.seed(66)
#########################################
#Generate a randomly sample genotype size=1Mb
#########################################

#5000 Randomly choosen pos
my_pos <- sort(sample(1:1000000, size=5000,replace = FALSE))
#############################
#Three different bump genotype
#############################
my_functions <-data.frame(f0 = c(rep(0,400000),rep(0,200000),rep(0,400000)),
                         f1 = c(rep(0,400000),rep(1,200000),rep(0,400000)) ,
                         f2=c(rep(0,400000),rep(2,200000),rep(0,400000)))


library(gridExtra)
###########################
#Minor allele frequency 30%
###########################
MAF=0.3
sampl_schem <- c((1-MAF)^2,2*MAF*(1-MAF),MAF^2)
#######################################
#sampling at Hardy Weinberg equilibrium
#######################################
#Assigning class

#sample size =4000
n_size=4000
type_fn <-sample(0:2,replace = TRUE,size=n_size,  prob=  sampl_schem  )


genotype <-  matrix(my_functions[my_pos,2 ], ncol=1 ) %*%t(matrix(type_fn,ncol=1))
#dim(genotype)= nSNP, nind

###############################################################
#Generate a phenotype with variance explained by genotype  0.5%
###############################################################
varexp=0.005
var_noise <- (1-varexp)*var(sample(0:2,replace = TRUE,size=10000,
                                  prob=sampl_schem ))/varexp
Y <-  rnorm(n=n_size,sd=sqrt(var_noise)) +type_fn
df <- data.frame(y=Y,genotype =factor(type_fn))
P1 <- ggplot(df,aes(y=y,x=genotype))+
 geom_boxplot()+
 xlab("Type of genotype")+
 theme(axis.text=element_text(size=12),
       axis.title=element_text(size=14,face="bold"))+
 ylab("Simulated Phenotype")+
 theme_bw()+
 ggtitle("Variation of the phenotype\ndepending of thegenotype, \nVariance explained =0.5%")

df <- data.frame(pos= rep(my_pos,3),y=c(my_functions[my_pos,1],my_functions[my_pos,2],my_functions[my_pos,3]),
                mycol = factor(c(rep("f0",length(my_pos)),rep("f1",length(my_pos)),rep("f2",length(my_pos))) ) )

P2 <- ggplot(df,aes(y=y,x=pos,color=mycol))+
 geom_point(size=1)+
 xlab("Base pair")+
 ylab("Number of variants")+
 theme_bw()+
 theme(legend.title=element_blank())+
 ggtitle("Three different kind of genotype")

grid.arrange(P1,P2,ncol=2)

##################
#Screening
##################
res <- mWaveQTL( Y,
                genotype=genotype,
                pos=my_pos,
                         lev_res=6,
                         sigma_b = 0.2)
##############
#Visualisation
##############
pos <- c(min(my_pos),max(my_pos))
plot_mWaveQTL(res=res,pos=pos,lev_res=6)



## End(Not run)