The goal of SNPFastImputeMac is to impute missing values in SNP data files.
You can install the development version of SNPFastImpute from github using devtools:
devtools::install_github("GaoGN517/689_SNPFastImpute_Mac")
This is a duplicated version of my previously developed package on windows system. I met a problem when install it to mac system. So I copied all necessary documents and folders from the previous package to this new one. This one has no such an error when install on mac system.
This is a basic example which shows you how to solve a common problem:
library(SNPFastImputeMac)
## Read a vcf file as a matrix
## filename <- "data/Test.vcf" ## your file path
## vcf_df <- read.table(filename) ## read in your vcf data
## Here we just load the dataset in vcf format.
data(vcf_df)
output_df <- vcf2df(vcf_df)
## There would be warning message for NAs, which is caused by adding missing positions
## of SNPs, which is what we should do. So here NAs does not mean problem.
## Introduce some missing values into the original matrix to test the performance
data("SNP_orig_sub")
## this is the SNP matrix with the original SNP types.
## original NA ratio is 0.018
## Make the final missing ratio to be 20%
SNP_NA_df02 <- NA_Generator(SNP_orig_sub, 0.2)
## This is an object list of four elements,
## SNP_NA_df,
## NA_percent_orig,
## NA_percent_generate,
## NP_generate_positions
## Make another object with final missing ratio to be 5%
SNP_NA_df005 <- NA_Generator(SNP_orig_sub, 0.05)
## Now we have the original matrix, the two objects with additional missing values.
ls()
# [1] "SNP_NA_df005" "SNP_NA_df02" "SNP_orig_sub"
## We can use the following function to create an object that can be used in the imputation
## function for each SNP in the matrix.
Create_Single_SNP_Object(SNP_NA_df005$SNP_NA_df, 2, size = 20)
## The main function of this package is to use the imputation function to fill in the missing
## values.
## We can just do the filling for the original matrix
system.time(
predict_df <- Impute_GenoType_XGBoost(SNP_orig_sub, size = 10)
)
## user system elapsed
## 1.898 0.027 1.949
## We can also perform the filling on the matrix which we introduced additional
## missing values. And then see how our method performed on predictions.
system.time(
df_fill02 <- Impute_GenoType_XGBoost(SNP_NA_df02$SNP_NA_df)
)
## user system elapsed
## 10.202 0.090 10.455
system.time(
df_fill005 <- Impute_GenoType_XGBoost(SNP_NA_df005$SNP_NA_df)
)
## user system elapsed
## 8.900 0.156 9.445
NA_positions02 <- SNP_NA_df02$NP_generate_positions
NA_positions005 <- SNP_NA_df005$NP_generate_positions
classification_error(SNP_orig_sub, df_fill02, NA_positions02) ## 0.076
classification_error(SNP_orig_sub, df_fill005, NA_positions005) ## 0.049
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.