filter_maf: Filter loci by minor allele frequency (MAF)

View source: R/filter_maf.R

filter_mafR Documentation

Filter loci by minor allele frequency (MAF)

Description

Parses a data table of genotypes/allele frequencies and returns a list of loci that conform to a desired MAF threshold.

Usage

filter_maf(
  dat,
  maf = 0.05,
  type = "genos",
  method = "mean",
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  popCol = "POP",
  genoCol = "GT",
  freqCol = "FREQ"
)

Arguments

dat

A data table of genotypes or allele frequencies. Ggenotypes are recorded either as '/' separated alleles (0/0, 0/1 1/1), or as counts of the Alt allele (0, 1, 2). If allele frequencies, can be either the Ref or Alt allele, so long as it is consistent across samples, populations, loci, etc. Expects the columns:

  1. The sample ID (see param sampCol), for genotype datasets only.

  2. The locus ID (see param locusCol).

  3. The population ID (see param popCol).

  4. The genotypes (see param genoCol), or the allele frequencies (see param freqCol)

maf

Numeric: The minor allele frequency. E.g. 0.05 will filter for 5 a locus if its frequency is < 0.05 or > 0.95. Default is 0.05, and the value must be <=0.5.

type

Character: Is dat a data table of genotypes ('genos') or a data table of allele frequencies ('freqs')? Default = 'genos'.

method

Character: The method by which MAF filtering is performed. One of 'mean', or 'any_pop'. Default = 'mean'. For 'mean', the mean MAF across populations is calculated and used to assess the MAF threshold at each locus. For 'any_pop', if any population has a MAF less than the threshold at a locus, then that locus will be removed.

sampCol

Character: The column name with the sampled individual information. Default = 'SAMPLE'. Only needed when type=='genos'.

locusCol

Character: The column name with the locus information. Default = 'LOCUS'.

popCol

Character: The column name with population information. Default = 'POP'.

genoCol

Character: The column name with the genotype information. Default = 'GT'. Only needed when type=='genos'.

freqCol

Character: The column name with the allele frequency information. Default = 'freqCol'. Only needed when type=='freqs'.

Value

Returns a character vector of locus names in dat[[locusCol]] that conform to the MAF threshold (>= value of maf).

Examples

# LONG TABLE OF GENOTYPES
data(data_Genos)

# Filter for MAF=0.20
loci.genos <- filter_maf(data_Genos, maf=0.20, type='genos')

data_Genos[LOCUS %in% dt.loci]

# LONG TABLE OF ALLELE FREQUENCIES
freqs_4pops <- data_Genos %>%
   .[, .(FREQ=sum(GT)/(length(GT)*2)), by=c('LOCUS','POP')]

loci.freqs <- filter_maf(freqs_4pops, maf=0.20, type='freqs')


j-a-thia/genomalicious documentation built on Oct. 19, 2024, 7:51 p.m.