filter_genos: Filter a hapmap or genotype matrix

View source: R/filter_genotypes.R

filter_genosR Documentation

Filter a hapmap or genotype matrix

Description

Filters a hapmap or genotype matrix based on user-defined limits of SNP missingness, SNP minor allele frequency, SNP heterozygosity, entry missingness, and entry heterozygosity.

Usage

filter_genos(
  genos,
  min.maf = 0,
  max.mar.missing = 1,
  max.entry.missing = 1,
  max.mar.het = 1,
  max.entry.het = 1,
  print.plot = FALSE,
  verbose = TRUE
)

Arguments

min.maf

The minimum minor allele frequency cutoff to keep a SNP.

x

A hapmap or genotype matrix. Heuristics are used to determine the format. Both may be encoded using TASSEL or rrBLUP standards. See Details for information on file format.

min.snp.missing

The maxmimum missingness proportion

encoding

The desired output encoding. Either "rrBLUP" or "TASSEL". See Details for information on file format

Details

The TASSEL format is as such: The first row is column names. The first 4 columns are marker name, alleles, chromosome, and position, respectively. The next 7 column are additional information for TASSEL. The remaining columns are samples. Genotypes are encoded in diploid format (i.e. AA, AC, CC) with "NN" denoting missing data.

The rrBLUP format is as such: The first row is column names. The first 4 columns are marker name, alleles, chromosome, and position, respectively. The next 7 column are additional information for TASSEL. The remaining columns are samples. Genotypes are encoded in 1, 0, -1 format where 1 is homozygous for the first allele, 0 is heterozygous, and -1 is homozygous for the second allele. Missing data is denoted with NA.

Value

A data.frame of a hapmap encoded in the designated format.


neyhartj/gws documentation built on Feb. 5, 2024, 12:42 a.m.