Treat missing data

Share:

Description

missingno gives the user four options to deal with missing data: remove loci, remove samples, replace with zeroes, or replace with average allele counts.

Usage

1
2
missingno(pop, type = "loci", cutoff = 0.05, quiet = FALSE,
  freq = FALSE)

Arguments

pop

a genclone or genind object.

type

a character string: can be "ignore", "zero", "mean", "loci", or "geno" (see Details for definitions).

cutoff

numeric. A number from 0 to 1 indicating the allowable rate of missing data in either genotypes or loci. This will be ignored for type values of "mean" or "zero".

quiet

if TRUE, it will print to the screen the action performed.

freq

defaults to FALSE. This option is passed on to the tab function. If TRUE, the matrix in the genind object will be replaced by a numeric matrix (as opposed to integer). THIS IS NOT RECOMMENDED. USE THE FUNCTION tab instead.

Details

These methods provide a way to deal with systematic missing data and to give a wrapper for adegenet's tab function. ALL OF THESE ARE TO BE USED WITH CAUTION.

Using this function with polyploid data (where missing data is coded as "0") may give spurious results.

Treatment types

  • "ignore" - does not remove or replace missing data.

  • "loci" - removes all loci containing missing data in the entire data set.

  • "genotype" - removes any genotypes/isolates/individuals with missing data.

  • "mean" - replaces all NA's with the mean of the alleles for the entire data set.

  • "zero" or "0" - replaces all NA's with "0". Introduces more diversity.

Value

a genclone or genind object.

Note

"wild missingno appeared!"

Author(s)

Zhian N. Kamvar

See Also

tab, poppr, poppr.amova, nei.dist, aboot

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
data(nancycats)

nancy.locina <- missingno(nancycats, type = "loci")

## Found 617 missing values.
## 2 loci contained missing values greater than 5%.
## Removing 2 loci : fca8 fca45 

nancy.genona <- missingno(nancycats, type = "geno")

## Found 617 missing values.
## 38 genotypes contained missing values greater than 5%.
## Removing 38 genotypes : N215 N216 N188 N189 N190 N191 N192 N302 N304 N310 
## N195 N197 N198 N199 N200 N201 N206 N182 N184 N186 N298 N299 N300 N301 N303 
## N282 N283 N288 N291 N292 N293 N294 N295 N296 N297 N281 N289 N290  

# Replacing all NA with "0" (see tab in the adegenet package).
nancy.0 <- missingno(nancycats, type = "0")

## Replaced 617 missing values 

# Replacing all NA with the mean of each column (see tab in the
# adegenet package).
nancy.mean <- missingno(nancycats, type = "mean")

## Replaced 617 missing values 

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.