This vignette illustrates the usage of the SNPknock
package for creating knockoffs of discrete Markov chains and hidden Markov models [@sesia2019]. For simplicity, we will use synthetic data.
The SNPknock
package also provides a simple interface to the genotype imputation software fastPhase
, which can be used to fit hidden Markov models for genotype data. Since fastPhase
is not available as an R package, this particular functionality of SNPknock
cannot be demonstrated here. A tutorial showing how to use a combination of SNPknock
and fastPhase
to create knockoffs of genotype data can be found here.
First, we verify that the SNPknock
can be loaded.
library(SNPknock)
We define a Markov chain model with 50 variables, each taking one of 5 possible values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries.
p=50; # Number of variables in the model K=5; # Number of possible states for each variable # Marginal distribution for the first variable pInit = rep(1/K,K) # Create p-1 transition matrices Q = array(stats::runif((p-1)*K*K),c(p-1,K,K)) for(j in 1:(p-1)) { Q[j,,] = Q[j,,] + diag(rep(1,K)) Q[j,,] = Q[j,,] / rowSums(Q[j,,]) }
We can sample 100 independent observations of this Markov chain using the SNPknock
package.
set.seed(1234) X = sampleDMC(pInit, Q, n=100) print(X[1:5,1:10])
Above, each row of X
contains an independent realization of the Markov chain.
A knockoff copy of X
can be sampled as follows.
Xk = knockoffDMC(X, pInit, Q) print(Xk[1:5,1:10])
We define a hidden Markov chain model with 50 variables, each taking one of 3 possible values, and 50 latent variables distributed as a Markov chain, each taking one of 5 values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries. We also define an emission distributions over the 3 possible emission states, for each of the 50 latent variables and the 5 latent states.
p=200; # Number of variables in the model K=5; # Number of possible states for each variable M=3; # Number of possible emission states for each variable # Marginal distribution for the first variable pInit = rep(1/K,K) # Create p-1 transition matrices Q = array(stats::runif((p-1)*K*K),c(p-1,K,K)) rho = stats::runif(p-1, min = 1, max = 50) for(j in 1:(p-1)) { Q[j,,] = Q[j,,] + rho[j] * diag(rep(1,K)) Q[j,,] = Q[j,,] / rowSums(Q[j,,]) } # Create p emission matrices pEmit = array(stats::runif(p*M*K),c(p,M,K)) for(j in 1:p) { pEmit[j,,] = sweep(pEmit[j,,],2,colSums(pEmit[j,,]),`/`) }
We can sample 100 independent observations of this hidden Markov model using the SNPknock
package.
set.seed(1234) X = sampleHMM(pInit, Q, pEmit, n=100) print(X[1:5,1:10])
Above, each row of X
contains an independent realization of the hidden Markov model.
A knockoff copy of X
can be sampled as follows.
Xk = knockoffHMM(X, pInit, Q, pEmit) print(Xk[1:5,1:10])
If you want to see how to use SNPknock
to create knockoffs of genotype data, see the genotypes vignette.
If you want to learn about SNPknock
for large genome-wide association studies [@sesia2019multi], see https://msesia.github.io/knockoffzoom/
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.