View source: R/replace_miss_genos.R
replace_miss_genos | R Documentation |
For each locus, missing genotypes are replaced with the most common genotype. Can be done across all sampled individuals or by population. Loci must be biallelic.
replace_miss_genos(
dat,
sampCol = "SAMPLE",
locusCol = "LOCUS",
genoCol = "GT",
popCol = NULL
)
dat |
Data table: A long data table, e.g. like that imported from
|
sampCol |
Character: The column name with the sampled individual information.
Default is |
locusCol |
Character: The column name with the locus information.
Default is |
genoCol |
Character: The column name with the genotype information.
Default is |
popCol |
Character: An optional argument. The column name with the
population information. Default is |
NOTE: it is recommended that missing genotypes are imputed using inferences of linkage and genotype likelihood. However, if you need a quick-and-dirty approach, this function might be useful for preliminary analyses, or if missing data is very low.
If genotypes are coded as characters, NA
or './.'
should be used to code missing genotypes. Otherwise if genotypes
are coded as integers, NA
should code missing genotypes.
Whether the most common genotype is estimated across individuals or
for each population depends on parameterisation of popCol
.
library(genomalicious)
data(data_Genos)
D <- data_Genos %>% copy
# Sites with missing data
D[sample(1:nrow(D), round(0.1*nrow(D)), FALSE), GT:=NA] %>%
setnames(., 'GT', 'GT.MISS')
# Replace across individuals
D.rep.inds <- replace_miss_genos(
dat=D, sampCol='SAMPLE', locusCol='LOCUS', genoCol='GT.MISS'
) %>%
setnames(., 'GT', 'GT.INDS')
# Replace within populations
D.rep.pops <- replace_miss_genos(
dat=D, sampCol='SAMPLE', locusCol='LOCUS', genoCol='GT.MISS', popCol='POP'
) %>%
setnames(., 'GT', 'GT.POPS')
# Tabulate comparisons between methods
compReplace <- left_join(
data_Genos[, c('LOCUS','SAMPLE','POP','GT')],
D[, c('LOCUS','SAMPLE','POP','GT.MISS')]
) %>%
.[is.na(GT.MISS), !'GT.MISS'] %>%
left_join(., D.rep.inds[,c('LOCUS','SAMPLE','POP','GT.INDS')]) %>%
left_join(., D.rep.pops[,c('LOCUS','SAMPLE','POP','GT.POPS')])
# Number of correct matches is slightly higher when using the most
# common genotype within populations
compReplace[GT==GT.INDS] %>% nrow
compReplace[GT==GT.POPS] %>% nrow
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.