This vignette illustrates the usage of the `SNPknock`

package for creating knockoffs of discrete Markov chains and hidden Markov models [@sesia2019]. For simplicity, we will use synthetic data.

The `SNPknock`

package also provides a simple interface to the genotype imputation software `fastPhase`

, which can be used to fit hidden Markov models for genotype data. Since `fastPhase`

is not available as an R package, this particular functionality of `SNPknock`

cannot be demonstrated here. A tutorial showing how to use a combination of `SNPknock`

and `fastPhase`

to create knockoffs of genotype data can be found here.

First, we verify that the `SNPknock`

can be loaded.

```
library(SNPknock)
```

We define a Markov chain model with 50 variables, each taking one of 5 possible values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries.

p=50; # Number of variables in the model K=5; # Number of possible states for each variable # Marginal distribution for the first variable pInit = rep(1/K,K) # Create p-1 transition matrices Q = array(stats::runif((p-1)*K*K),c(p-1,K,K)) for(j in 1:(p-1)) { Q[j,,] = Q[j,,] + diag(rep(1,K)) Q[j,,] = Q[j,,] / rowSums(Q[j,,]) }

We can sample 100 independent observations of this Markov chain using the `SNPknock`

package.

set.seed(1234) X = sampleDMC(pInit, Q, n=100) print(X[1:5,1:10])

Above, each row of `X`

contains an independent realization of the Markov chain.

A knockoff copy of `X`

can be sampled as follows.

Xk = knockoffDMC(X, pInit, Q) print(Xk[1:5,1:10])

We define a hidden Markov chain model with 50 variables, each taking one of 3 possible values, and 50 latent variables distributed as a Markov chain, each taking one of 5 values. We specify a uniform marginal distribution for the first variable in the chain and create 49 transition matrices with randomly sampled entries. We also define an emission distributions over the 3 possible emission states, for each of the 50 latent variables and the 5 latent states.

p=200; # Number of variables in the model K=5; # Number of possible states for each variable M=3; # Number of possible emission states for each variable # Marginal distribution for the first variable pInit = rep(1/K,K) # Create p-1 transition matrices Q = array(stats::runif((p-1)*K*K),c(p-1,K,K)) rho = stats::runif(p-1, min = 1, max = 50) for(j in 1:(p-1)) { Q[j,,] = Q[j,,] + rho[j] * diag(rep(1,K)) Q[j,,] = Q[j,,] / rowSums(Q[j,,]) } # Create p emission matrices pEmit = array(stats::runif(p*M*K),c(p,M,K)) for(j in 1:p) { pEmit[j,,] = sweep(pEmit[j,,],2,colSums(pEmit[j,,]),`/`) }

We can sample 100 independent observations of this hidden Markov model using the `SNPknock`

package.

set.seed(1234) X = sampleHMM(pInit, Q, pEmit, n=100) print(X[1:5,1:10])

Above, each row of `X`

contains an independent realization of the hidden Markov model.

A knockoff copy of `X`

can be sampled as follows.

Xk = knockoffHMM(X, pInit, Q, pEmit) print(Xk[1:5,1:10])

If you want to see how to use `SNPknock`

to create knockoffs of genotype data, see the genotypes vignette.

If you want to learn about `SNPknock`

for large genome-wide association studies [@sesia2019multi], see https://msesia.github.io/knockoffzoom/

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.