remove_missing_genotype_data: Removes individuals and/or markers with missing data

View source: R/remove_missing_genotype_data.R

remove_missing_genotype_dataR Documentation

Removes individuals and/or markers with missing data

Description

Because there an be no missing data when calculating the kinship correction, we need a way to remove either individuals or markers with missing data. We also need a way to calculate which of these options will remove the least amount of data.

Usage

remove_missing_genotype_data(
  data_obj,
  geno_obj = NULL,
  ind_missing_thresh = 0,
  marker_missing_thresh = 0,
  prioritize = c("fewer", "ind", "marker")
)

Arguments

data_obj

a Cape object

geno_obj

a genotype object

ind_missing_thresh

Allowable amount of missing information for an individual. If 10 default, all individuals with any missing data at all will be removed.

marker_missing_thresh

Allowable amount of missing information for a marker. If 10 default, all markers with any missing data at all will be removed.

prioritize

the basis prioritization is one of "fewer" = calculate whether removing individuals or markers will remove fewer data points, and start with that. "ind" = remove individuals with missing data before considering markers with missing data. "marker" = remove markers with missing data before considering individuals.

Details

For example, if there is one marker with no data at all, we would rather remove that one marker, than all individuals with missing data. Alternatively, if there is one individual with very sparse genotyping, we would prefer to remove that single individual, rather than all markers with missing data.

This function provides a way to calculate whether individuals or markers should be prioritized when removing data. It then removes those individuals or markers.

Value

The cape object is returned with individuals and markers removed. After this step, the function get_geno should return an array with no missing data if ind_missing_thresh and marker_missing_thresh are both 0. If these numbers are higher, no individual or marker will be missing more than the set percentage of data.

details All missing genotype data must either be imputed or removed if using the kinship correction. Running impute_missing_geno prior to running remove_missing_genotype_data ensures that the least possible amount of data are removed before running cape. In some cases, there will be missing genotype data even after running impute_missing_geno, in which case, remove_missing_genotype_data still needs to be run. The function run_cape automatically runs these steps when use_kinship is set to TRUE.

See Also

get_geno, impute_missing_geno, run_cape

Examples

## Not run: 
#remove entries with more than 10\
#removal of markers
data_obj <- remove_missing_genotype_data(data_obj, geno_obj, 
marker_missing_thresh = 10, ind_missing_thresh = 10,
prioritize = "marker")

#remove markers with more than 5\
#more than 50\
#missing data, prioritizing removal of individuals.
data_obj <- remove_missing_genotype_data(data_obj, geno_obj, 
ind_missing_thresh = 10, marker_missing_thresh = 50,
prioritize = "ind")

#remove entries witn any missing data prioritizing whichever 
#method removes the least amount of data
data_obj <- remove_missing_genotype_data(data_obj, geno_obj)


## End(Not run)


cape documentation built on May 29, 2024, 5:11 a.m.