ffw: Main function to perform Fast functional association analysis...

Description Usage Arguments Details Value References Examples

View source: R/ffw.R

Description

Perform a screening of a signal for a given phenotype and a specified level of resolution

Usage

1
ffw(Y, signal, pos, confounder, lev_res, sigma_b, para = FALSE, betas = FALSE)

Arguments

Y

phenotype vector, has to be numeric. For case control code it as 0 and 1.

signal

signals matrix (either data.frame or numeric matrix). Lines=SNPs in increasing order in term of base pair, columns=individuals. No missing values allowed.

pos

vector of spatial position of the variables. It has to be in the same order and length than the signal line order/length.

confounder

the confounding matrix with the same sample order as Y. The intercept should not be included, if missing will generate a intercept matrix.

lev_res

the maximum level of resolution needed

sigma_b

the parameter of the NIG prior used for the Bayes Factor computation. We advised to set this value between 0.1 and 0.2

para

logical parameter for parallelisation, if not specified set at FALSE.

Details

The ffw function computes the Likelihood ratio used for testing significance of a genetic region. In addition it computes the porportion of wavelets coefficients associated by level of resolution, and the Bayes factor used for this estimation.

Value

A named vector. First position the estimated value of the Lambda statistics, then the proportion of association per level of resolution then the computed Bayes Factor per wavelet coefficient.

References

Shim and Stephens,Wavelet-based genetic association analysis of functional phenotypes arising from high-thoughput sequencing asssays,The Annals of Applied Statistics, 2015, Vol. 9, No. 2, 665–686

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
## Not run: 


set.seed(66)
#########################################
#Generate a randomly sample signal size=1Mb
#########################################

#5000 Randomly choosen pos
my_pos <- sort(sample(1:1000000, size=5000,replace = FALSE))
#############################
#Three different bump signals
#############################
my_functions <-data.frame(f0 = c(rep(0,400000),rep(0,200000),rep(0,400000)),
                         f1 = c(rep(0,400000),rep(1,200000),rep(0,400000)) ,
                         f2=c(rep(0,400000),rep(2,200000),rep(0,400000)))


library(gridExtra)
###########################
#Minor allele frequency 30%
###########################
MAF=0.3
sampl_schem <- c((1-MAF)^2,2*MAF*(1-MAF),MAF^2)
#######################################
#sampling at Hardy Weinberg equilibrium
#######################################
#Assigning class

#sample size =4000
n_size=4000
type_fn <-sample(0:2,replace = TRUE,size=n_size,  prob=  sampl_schem  )


signals <-  matrix(my_functions[my_pos,2 ], ncol=1 ) %*%t(matrix(type_fn,ncol=1))
#dim(signals)= nSNP, nind

###############################################################
#Generate a phenotype with variance explained by signals  0.5%
###############################################################
varexp=0.005
var_noise <- (1-varexp)*var(sample(0:2,replace = TRUE,size=10000,
                                  prob=sampl_schem ))/varexp
Y <-  rnorm(n=n_size,sd=sqrt(var_noise)) +type_fn
df <- data.frame(y=Y,signals =factor(type_fn))
P1 <- ggplot(df,aes(y=y,x=signals))+
 geom_boxplot()+
 xlab("Type of signals")+
 theme(axis.text=element_text(size=12),
       axis.title=element_text(size=14,face="bold"))+
 ylab("Simulated Phenotype")+
 theme_bw()+
 ggtitle("Variation of the phenotype\ndepending of the signals, \nVariance explained =0.5%")

df <- data.frame(pos= rep(my_pos,3),y=c(my_functions[my_pos,1],my_functions[my_pos,2],my_functions[my_pos,3]),
                mycol = factor(c(rep("f0",length(my_pos)),rep("f1",length(my_pos)),rep("f2",length(my_pos))) ) )

P2 <- ggplot(df,aes(y=y,x=pos,color=mycol))+
 geom_point(size=1)+
 xlab("Base pair")+
 ylab("Number of variants")+
 theme_bw()+
 theme(legend.title=element_blank())+
 ggtitle("Three different kind of signals signal")

grid.arrange(P1,P2,ncol=2)

##################
#Screening
##################
res <- ffw( Y,signal=signals,pos=my_pos,
                         lev_res=6,sigma_b = 0.2)
# or:
signals_df <- as.data.frame(signals)
res <- ffw( Y,signal=signals_df,pos=my_pos,
                         lev_res=6,sigma_b = 0.2)
#value of the test statistic
res["Lambda"]
#############
#Significance
#############

#Estimation of the Bayes factor lambda_1 distribution parameter
#Take a bit of time
lambda <- get_lambda1(Y,sigma_b = 0.2)
#Simulation of the test statistics under the null distribution of
# the Bayes factor
Sim_gam <- Simu_Lambda_null(nsimu=10000, lambda=lambda,lev_res = 6)
val <- res["Lambda"]

#Via Simulation

pval <-c(length(Sim_gam[which(Sim_gam>val)])+1)/(length(Sim_gam)+1)
pval
#'pval


##############
#Visualisation
##############
pos <- c(min(my_pos),max(my_pos))
plot_ffw(res=res,pos=pos,lev_res=6)



## End(Not run)

william-denault/ffw documentation built on Jan. 20, 2021, 8:28 p.m.