Description Usage Arguments Details Value References Examples
View source: R/Impute_Genotype_XGBoost_function.R
Impute missing SNPs for the input dataset.
1 | Impute_GenoType_XGBoost(df, size = 10, num_class = 3, nrounds = 100)
|
df |
The original dataset including the missing SNPs to be imputed. |
size |
The windows size to use as the training dataset for each SNP, default: 10. |
num_class |
Number of classes of response variable (types of SNPs), default: 3. params A list of parameters for the xgboost model building. Default: nrounds = 100, booster = "gbtree", objective = "multi:softprob", num_class = 3, eval_metric = "mlogloss". |
nrounds |
Number of fitting rounds, default: 100. |
In our model, we try to use the types of SNPs around each missing SNP to predict the missing value. For each missing value, we need to use the size n of SNPs around it as predictors, and use the non-missing samples for this SNP position as the training dataset.
The predicted missing genotypes.
Kabisch, Maria, Ute Hamann, and Justo Lorenzo Bermejo. "Imputation of missing genotypes within LD-blocks relying on the basic coalescent and beyond: consideration of population growth and structure." BMC genomics 18.1 (2017): 798.
Tianqi Chen and Carlos Guestrin, "XGBoost: A Scalable Tree Boosting System", 22nd SIGKDD Conference on Knowledge Discovery and Data Mining, 2016, https://arxiv.org/abs/1603.02754
1 2 3 4 | data("Test_df")
predict_df <- Impute_GenoType_XGBoost(Test_df, size = 10)
## May take several seconds to finish.
## Should return a dataset where the missing values are filled by predicted values.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.