README.md

SNPFastImputeMac

The goal of SNPFastImputeMac is to impute missing values in SNP data files.

Installation

You can install the development version of SNPFastImpute from github using devtools:

devtools::install_github("GaoGN517/689_SNPFastImpute_Mac")

This is a duplicated version of my previously developed package on windows system. I met a problem when install it to mac system. So I copied all necessary documents and folders from the previous package to this new one. This one has no such an error when install on mac system.

Basic Example

This is a basic example which shows you how to solve a common problem:

library(SNPFastImputeMac)

## Read a vcf file as a matrix
## filename <- "data/Test.vcf" ## your file path
## vcf_df <- read.table(filename) ## read in your vcf data 
## Here we just load the dataset in vcf format.
data(vcf_df)
output_df <- vcf2df(vcf_df)
## There would be warning message for NAs, which is caused by adding missing positions
## of SNPs, which is what we should do. So here NAs does not mean problem. 

## Introduce some missing values into the original matrix to test the performance
data("SNP_orig_sub")
## this is the SNP matrix with the original SNP types.
## original NA ratio is 0.018

## Make the final missing ratio to be 20%
SNP_NA_df02 <- NA_Generator(SNP_orig_sub, 0.2)
## This is an object list of four elements, 
## SNP_NA_df, 
## NA_percent_orig,
## NA_percent_generate,
## NP_generate_positions

## Make another object with final missing ratio to be 5%
SNP_NA_df005 <- NA_Generator(SNP_orig_sub, 0.05)

## Now we have the original matrix, the two objects with additional missing values.
ls()
# [1] "SNP_NA_df005" "SNP_NA_df02"  "SNP_orig_sub"

## We can use the following function to create an object that can be used in the imputation 
## function for each SNP in the matrix.
Create_Single_SNP_Object(SNP_NA_df005$SNP_NA_df, 2, size = 20)

## The main function of this package is to use the imputation function to fill in the missing
## values.

## We can just do the filling for the original matrix
system.time(
  predict_df <- Impute_GenoType_XGBoost(SNP_orig_sub, size = 10)
)
##    user  system elapsed 
##   1.898   0.027   1.949 

## We can also perform the filling on the matrix which we introduced additional 
## missing values. And then see how our method performed on predictions. 
system.time(
  df_fill02 <- Impute_GenoType_XGBoost(SNP_NA_df02$SNP_NA_df)
)
##   user  system elapsed 
##   10.202   0.090  10.455 
system.time(
  df_fill005 <- Impute_GenoType_XGBoost(SNP_NA_df005$SNP_NA_df)
)
##    user  system elapsed 
##   8.900   0.156   9.445 

NA_positions02 <- SNP_NA_df02$NP_generate_positions
NA_positions005 <- SNP_NA_df005$NP_generate_positions

classification_error(SNP_orig_sub, df_fill02, NA_positions02) ## 0.076
classification_error(SNP_orig_sub, df_fill005, NA_positions005) ## 0.049


GaoGN517/689_SNPFastImpute_Mac documentation built on Dec. 8, 2019, 12:33 a.m.