dfilter: Filtering datasets for subpopulations with low sample sizes
In pfpetrowski/OhtaDStats: Tomoka Ohta D Statistics

View source: R/dfilter.R

dfilter

R Documentation

Filtering datasets for subpopulations with low sample sizes

Description

Simplifies the process of eliminating subpopulations with low sample sizes.

Usage

dfilter(data, minsample)

Arguments

`data`	Matrix containing genotype data with individuals as rows and loci as columns. Genotypes should be coded as 0 (homozygous), 1 (heterozygous), or 2 (homozygous). Rownames must be subpopulation names and column names should be marker names.
`minsample`	An integer representing the smallest number of individuals a subpopulation must contain to be included in analysis.

Value

filtered_data The original dataset minus the subpopulations that fail to meet the sample size threshold.

Examples

test <- matrix(round(runif(400,1,2)), nrow = 100)
rownames(test) <- c(rep(c('A','B','C'),each=25), rep(c('D','E'), each=5), rep('F', 15))
dim(test)

#The 'D' and 'E' subpopulations have only five members each and should be removed
filtered_test <- dfilter(test,12)

dim(filtered_test)	# New dataset is reduced by 10 rows (five for 'D' and five for 'E')

pfpetrowski/OhtaDStats documentation built on Feb. 25, 2023, 2:39 a.m.