This vignette illustrates the basic usage of the SNPknock
package in combination with the phasing software fastphase
[@scheet2006] to create knockoffs of unphased genotypes or phased haplotypes [@sesia2019, @sesia2019multi]. Since fastphase
is not available as an R package, this functionality of SNPknock
requires the user to first obtain a copy of fastphase
.
To learn more about the application of SNPknock
to large genome-wide association studies [@sesia2019multi], visit: https://msesia.github.io/knockoffzoom/.
fastphase
fastphase
is a phasing and imputation tool based on the hidden Markov model described in [@scheet2006].
Binary executables for Linux and Mac OS are available from http://scheet.org/software.html.
Before continuing with this tutorial, download the fastphase
tarball from the above link and extract the fastphase
executable file into a convenient directory (e.g. "~/bin/").
A small synthetic dataset of 1454 unphased genotype SNPs from 100 individuals can be found in the package installation directory. We can load it with:
library(SNPknock) X_file = system.file("extdata", "genotypes.RData", package = "SNPknock") load(X_file) table(X)
Below, we show how to fit a hidden Markov model to this data, with the help of fastphase
.
Since fastphase
takes as input genotype sequences in ".inp" format, we must first convert
the X matrix by calling writeXtoInp
.
By default, this function will write onto a temporary file in the R temporary directory.
# Convert X into the suitable fastphase input format, write it into a temporary file # and return the path to that file. Xinp_file = writeXtoInp(X)
Assuming that we have already downloaded fastphase
, we can call it to fit the hidden Markov model to X.
fp_path = "~/bin/fastphase" # Path to the fastphase executable # Call fastphase and return the path to the parameter estimate files fp_outPath = runFastPhase(fp_path, Xinp_file, K = 12, numit = 15)
Above, the SNPknock
package could not find fastphase
because we did not provide the correct path
(we cannot include third-party executable files within this package). However, if you install fastphase
separately and provide SNPknock
with the correct path, this will work.
If the previous step worked for you, you can find the parameter estimates produced by fastphase
in the following files:
r_file = paste(fp_outPath, "_rhat.txt", sep="") alpha_file = paste(fp_outPath, "_alphahat.txt", sep="") theta_file = paste(fp_outPath, "_thetahat.txt", sep="") char_file = paste(fp_outPath, "_origchars", sep="")
Otherwise, for the sake of this tutorial, you can use the example parameter files provided in the package installation directory:
r_file = system.file("extdata", "genotypes_rhat.txt", package = "SNPknock") alpha_file = system.file("extdata", "genotypes_alphahat.txt", package = "SNPknock") theta_file = system.file("extdata", "genotypes_thetahat.txt", package = "SNPknock") char_file = system.file("extdata", "genotypes_origchars", package = "SNPknock")
Then, we can construct the hidden Markov model with:
hmm = loadHMM(r_file, alpha_file, theta_file, char_file)
Finally, we can use the hidden Markov model created above to generate knockoffs.
Xk = knockoffGenotypes(X, hmm$r, hmm$alpha, hmm$theta) table(Xk)
A small synthetic dataset of 1454 phased haplotype SNPs from 100 individuals can be found in the package installation directory. We can load it with:
library(SNPknock) H_file = system.file("extdata", "haplotypes.RData", package = "SNPknock") load(H_file) table(H)
Below, we show how to fit a hidden Markov model to this data, with the help of fastphase
.
Since fastphase
takes as input haplotype sequences in ".inp" format, we must first convert
the H matrix by calling writeXtoInp
.
By default, this function will write onto a temporary file in the R temporary directory.
# Convert X into the suitable fastphase input format, write it into a temporary file # and return the path to that file. Hinp_file = writeXtoInp(H, phased = TRUE)
Assuming that we have already downloaded fastphase
, we can call it to fit the hidden Markov model to X.
fp_path = "~/bin/fastphase" # Path to the fastphase executable # Call fastphase and return the path to the parameter estimate files fp_outPath = runFastPhase(fp_path, Hinp_file, K = 12, numit = 15, phased = TRUE)
Above, the SNPknock
package could not find fastphase
because we did not provide the correct path
(we cannot include third-party executable files within this package). However, if you install fastphase
separately and provide SNPknock
with the correct path, this will work.
If the previous step worked for you, you can find the parameter estimates produced by fastphase
in the following files:
r_file = paste(fp_outPath, "_rhat.txt", sep="") alpha_file = paste(fp_outPath, "_alphahat.txt", sep="") theta_file = paste(fp_outPath, "_thetahat.txt", sep="") char_file = paste(fp_outPath, "_origchars", sep="")
Otherwise, for the sake of this tutorial, you can use the example parameter files provided in the package installation directory:
r_file = system.file("extdata", "haplotypes_rhat.txt", package = "SNPknock") alpha_file = system.file("extdata", "haplotypes_alphahat.txt", package = "SNPknock") theta_file = system.file("extdata", "haplotypes_thetahat.txt", package = "SNPknock") char_file = system.file("extdata", "haplotypes_origchars", package = "SNPknock")
Then, we can construct the hidden Markov model with:
hmm = loadHMM(r_file, alpha_file, theta_file, char_file)
Finally, we can use the hidden Markov model created above to generate knockoffs.
Hk = knockoffHaplotypes(H, hmm$r, hmm$alpha, hmm$theta) table(Hk)
If you want to see some basic usage of SNPknock
, see the introductory vignette.
If you want to learn about SNPknock
for large genome-wide association studies [@sesia2019multi], see https://msesia.github.io/knockoffzoom/.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.