Impute_GenoType_XGBoost: Impute missing SNPs for the input dataset.

Description Usage Arguments Details Value References Examples

View source: R/Impute_Genotype_XGBoost_function.R

Description

Impute missing SNPs for the input dataset.

Usage

1
Impute_GenoType_XGBoost(df, size = 10, num_class = 3, nrounds = 100)

Arguments

df

The original dataset including the missing SNPs to be imputed.

size

The windows size to use as the training dataset for each SNP, default: 10.

num_class

Number of classes of response variable (types of SNPs), default: 3.

params A list of parameters for the xgboost model building. Default: nrounds = 100, booster = "gbtree", objective = "multi:softprob", num_class = 3, eval_metric = "mlogloss".

nrounds

Number of fitting rounds, default: 100.

Details

In our model, we try to use the types of SNPs around each missing SNP to predict the missing value. For each missing value, we need to use the size n of SNPs around it as predictors, and use the non-missing samples for this SNP position as the training dataset.

Value

The predicted missing genotypes.

References

Kabisch, Maria, Ute Hamann, and Justo Lorenzo Bermejo. "Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure." BMC genomics 18.1 (2017): 798.

Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754

Examples

1
2
3
4
data("Test_df")
predict_df <- Impute_GenoType_XGBoost(Test_df, size = 10)
## May take several seconds to finish.
## Should return a dataset where the missing values are filled by predicted values.  

GaoGN517/689_SNPFastImpute_Mac documentation built on Dec. 8, 2019, 12:33 a.m.