setSamFilter: Filter out samples

setSamFilterR Documentation

Filter out samples

Description

Search samples which do not meet the criteria and label them as "invalid".

Usage

setSamFilter(
  object,
  id = NA_character_,
  missing = 1,
  het = c(0, 1),
  mac = 0,
  maf = 0,
  ad_ref = c(0, Inf),
  ad_alt = c(0, Inf),
  dp = c(0, Inf),
  mean_ref = c(0, Inf),
  mean_alt = c(0, Inf),
  sd_ref = Inf,
  sd_alt = Inf,
  ...
)

## S4 method for signature 'GbsrGenotypeData'
setSamFilter(
  object,
  id,
  missing,
  het,
  mac,
  maf,
  ad_ref,
  ad_alt,
  dp,
  mean_ref,
  mean_alt,
  sd_ref,
  sd_alt
)

Arguments

object

A GbsrGenotypeData object.

id

A vector of strings matching with sample ID which can be retrieve by getSamID(). The samples with the specified IDs will be filtered out.

missing

A numeric value [0-1] to specify the maximum missing genotype call rate per sample.

het

A vector of two numeric values [0-1] to specify the minimum and maximum heterozygous genotype call rate per sample.

mac

A integer value to specify the minimum minor allele count per sample.

maf

A numeric value to specify the minimum minor allele frequency per sample.

ad_ref

A numeric vector with length two specifying lower and upper limit of reference read counts per sample.

ad_alt

A numeric vector with length two specifying lower and upper limit of alternative read counts per sample.

dp

A numeric vector with length two specifying lower and upper limit of total read counts per sample.

mean_ref

A numeric vector with length two specifying lower and upper limit of mean of reference read counts per sample.

mean_alt

A numeric vector with length two specifying lower and upper limit of mean of alternative read counts per sample.

sd_ref

A numeric value specifying the upper limit of standard deviation of reference read counts per sample.

sd_alt

A numeric value specifying the upper limit of standard deviation of alternative read counts per sample.

...

Unused.

Details

For mean_ref, mean_alt, sd_ref, and sd_alt, this function calculate mean and standard deviation of reads obtained at SNP markers of each sample. If a mean read counts of a sample was smaller than the specified lower limit or larger than the upper limit, this function labels the sample as "invalid". In the case of sd_ref and sd_alt, standard deviations of read counts of each sample are checked and the samples having a larger standard deviation will be labeled as "invalid". To check valid and invalid samples, run validSam().

Value

A GbsrGenotypeData object with filters on samples.

Examples

# Load data in the GDS file and instantiate a [GbsrGenotypeData] object.
gds_fn <- system.file("extdata", "sample.gds", package = "GBScleanR")
gds <- loadGDS(gds_fn)

# Summarize the information needed for filtering.
gds <- countGenotype(gds)
gds <- countRead(gds)

gds <- setSamFilter(gds,
                       id = getSamID(gds)[1:10],
                       missing = 0.2,
                       dp = c(5, Inf))

# Close the connection to the GDS file.
closeGDS(gds)


tomoyukif/GBScleanR documentation built on Oct. 31, 2024, 2:43 a.m.