dfilter: Filtering datasets for subpopulations with low sample sizes

View source: R/dfilter.R

dfilterR Documentation

Filtering datasets for subpopulations with low sample sizes

Description

Simplifies the process of eliminating subpopulations with low sample sizes.

Usage

dfilter(data, minsample)

Arguments

data

Matrix containing genotype data with individuals as rows and loci as columns. Genotypes should be coded as 0 (homozygous), 1 (heterozygous), or 2 (homozygous). Rownames must be subpopulation names and column names should be marker names.

minsample

An integer representing the smallest number of individuals a subpopulation must contain to be included in analysis.

Value

filtered_data The original dataset minus the subpopulations that fail to meet the sample size threshold.

Examples

test <- matrix(round(runif(400,1,2)), nrow = 100)
rownames(test) <- c(rep(c('A','B','C'),each=25), rep(c('D','E'), each=5), rep('F', 15))
dim(test)

#The 'D' and 'E' subpopulations have only five members each and should be removed
filtered_test <- dfilter(test,12)

dim(filtered_test)	# New dataset is reduced by 10 rows (five for 'D' and five for 'E')


pfpetrowski/OhtaDStats documentation built on Feb. 25, 2023, 2:39 a.m.