View source: R/remove_missing_genotype_data.R
remove_missing_genotype_data | R Documentation |
Because there an be no missing data when calculating the kinship correction, we need a way to remove either individuals or markers with missing data. We also need a way to calculate which of these options will remove the least amount of data.
remove_missing_genotype_data(
data_obj,
geno_obj = NULL,
ind_missing_thresh = 0,
marker_missing_thresh = 0,
prioritize = c("fewer", "ind", "marker")
)
data_obj |
a |
geno_obj |
a genotype object |
ind_missing_thresh |
Allowable amount of missing information for an individual. If 10 default, all individuals with any missing data at all will be removed. |
marker_missing_thresh |
Allowable amount of missing information for a marker. If 10 default, all markers with any missing data at all will be removed. |
prioritize |
the basis prioritization is one of "fewer" = calculate whether removing individuals or markers will remove fewer data points, and start with that. "ind" = remove individuals with missing data before considering markers with missing data. "marker" = remove markers with missing data before considering individuals. |
For example, if there is one marker with no data at all, we would rather remove that one marker, than all individuals with missing data. Alternatively, if there is one individual with very sparse genotyping, we would prefer to remove that single individual, rather than all markers with missing data.
This function provides a way to calculate whether individuals or markers should be prioritized when removing data. It then removes those individuals or markers.
The cape object is returned with individuals and markers removed. After this step,
the function get_geno
should return an array with no missing data if ind_missing_thresh
and marker_missing_thresh are both 0. If these numbers are higher, no individual or marker will
be missing more than the set percentage of data.
details All missing genotype data must either be imputed or removed if using the kinship correction.
Running impute_missing_geno
prior to running remove_missing_genotype_data
ensures that the least possible amount of data are removed before running cape. In some cases, there
will be missing genotype data even after running impute_missing_geno
, in which case,
remove_missing_genotype_data
still needs to be run.
The function run_cape
automatically runs these steps when use_kinship
is set to TRUE.
get_geno
, impute_missing_geno
, run_cape
## Not run:
#remove entries with more than 10\
#removal of markers
data_obj <- remove_missing_genotype_data(data_obj, geno_obj,
marker_missing_thresh = 10, ind_missing_thresh = 10,
prioritize = "marker")
#remove markers with more than 5\
#more than 50\
#missing data, prioritizing removal of individuals.
data_obj <- remove_missing_genotype_data(data_obj, geno_obj,
ind_missing_thresh = 10, marker_missing_thresh = 50,
prioritize = "ind")
#remove entries witn any missing data prioritizing whichever
#method removes the least amount of data
data_obj <- remove_missing_genotype_data(data_obj, geno_obj)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.