replace_miss_genos: Replace missing genotypes
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

replace_miss_genos

R Documentation

Replace missing genotypes

Description

For each locus, missing genotypes are replaced with the most common genotype. Can be done across all sampled individuals or by population. Loci must be biallelic.

Usage

replace_miss_genos(
  dat,
  sampCol = "SAMPLE",
  locusCol = "LOCUS",
  genoCol = "GT",
  popCol = NULL
)

Arguments

`dat`	Data table: A long data table, e.g. like that imported from `vcf2DT`. Genotypes can be coded as '/' separated characters (e.g. '0/0', '0/1', '1/1'), or integers as Alt allele counts (e.g. 0, 1, 2). Must contain the following columns, The sampled individuals (see param `sampCol`). The locus ID (see param `locusCol`). The genotype column (see param `genoCol`).
`sampCol`	Character: The column name with the sampled individual information. Default is `'SAMPLE'`.
`locusCol`	Character: The column name with the locus information. Default is `'LOCUS'`.
`genoCol`	Character: The column name with the genotype information. Default is `'GT'`.
`popCol`	Character: An optional argument. The column name with the population information. Default is `NULL`. If specified, genotype replacement at each locus is done per population, not across all sampled individuals.

Details

NOTE: it is recommended that missing genotypes are imputed using inferences of linkage and genotype likelihood. However, if you need a quick-and-dirty approach, this function might be useful for preliminary analyses, or if missing data is very low.

If genotypes are coded as characters, NA or './.' should be used to code missing genotypes. Otherwise if genotypes are coded as integers, NA should code missing genotypes. Whether the most common genotype is estimated across individuals or for each population depends on parameterisation of popCol.

Examples

library(genomalicious)

data(data_Genos)

D <- data_Genos %>% copy

# Sites with missing data
D[sample(1:nrow(D), round(0.1*nrow(D)), FALSE), GT:=NA] %>%
 setnames(., 'GT', 'GT.MISS')

# Replace across individuals
D.rep.inds <- replace_miss_genos(
   dat=D, sampCol='SAMPLE', locusCol='LOCUS', genoCol='GT.MISS'
) %>%
   setnames(., 'GT', 'GT.INDS')

# Replace within populations
D.rep.pops <- replace_miss_genos(
   dat=D, sampCol='SAMPLE', locusCol='LOCUS', genoCol='GT.MISS', popCol='POP'
) %>%
   setnames(., 'GT', 'GT.POPS')

# Tabulate comparisons between methods
compReplace <- left_join(
   data_Genos[, c('LOCUS','SAMPLE','POP','GT')],
   D[, c('LOCUS','SAMPLE','POP','GT.MISS')]
) %>%
.[is.na(GT.MISS), !'GT.MISS'] %>%
   left_join(., D.rep.inds[,c('LOCUS','SAMPLE','POP','GT.INDS')]) %>%
   left_join(., D.rep.pops[,c('LOCUS','SAMPLE','POP','GT.POPS')])

# Number of correct matches is slightly higher when using the most
# common genotype within populations
compReplace[GT==GT.INDS] %>% nrow
compReplace[GT==GT.POPS] %>% nrow

j-a-thia/genomalicious documentation built on April 13, 2025, 9:41 a.m.

j-a-thia/genomalicious index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

replace_miss_genos: Replace missing genotypes
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Replace missing genotypes

Description

Usage

Arguments

Details

Examples

Related to replace_miss_genos in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious A smorgasbord of R functions for population genomic analyses

replace_miss_genos: Replace missing genotypes In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses

Replace missing genotypes

Description

Usage

Arguments

Details

Examples

Related to replace_miss_genos in j-a-thia/genomalicious...

R Package Documentation

Browse R Packages

We want your feedback!

j-a-thia/genomalicious
A smorgasbord of R functions for population genomic analyses

replace_miss_genos: Replace missing genotypes
In j-a-thia/genomalicious: A smorgasbord of R functions for population genomic analyses