SubsetByLocus: Create RADdata Objects with a Subset of Loci
In polyRAD: Genotype Calling with Uncertainty from Sequencing Data in Polyploids and Diploids

SubsetByLocus

R Documentation

Create RADdata Objects with a Subset of Loci

Description

These functions take a RADdata object as input and generate smaller RADdata objects containing only the specified loci. SubsetByLocus allows the user to specify which loci are kept, whereas SplitByChromosome creates multiple RADdata objects representing chromosomes or sets of chromosomes. RemoveMonomorphicLoci eliminates any loci with fewer than two alleles. RemoveHighDepthLoci eliminates loci that have especially high read depth in order to eliminate false loci originating from repetitive sequence. RemoveUngenotypedLoci is intended for datasets that have been run through PipelineMapping2Parents and may have some genotypes that are missing or non-variable due to how priors were determined.

Usage

SubsetByLocus(object, ...)
## S3 method for class 'RADdata'
SubsetByLocus(object, loci, ...)

SplitByChromosome(object, ...)
## S3 method for class 'RADdata'
SplitByChromosome(object, chromlist = NULL, chromlist.use.regex = FALSE, 
                  fileprefix = "splitRADdata", ...)
                  
RemoveMonomorphicLoci(object, ...)
## S3 method for class 'RADdata'
RemoveMonomorphicLoci(object, verbose = TRUE, ...)

RemoveHighDepthLoci(object, ...)
## S3 method for class 'RADdata'
RemoveHighDepthLoci(object, max.SD.above.mean = 2, verbose = TRUE, ...)

RemoveUngenotypedLoci(object, ...)
## S3 method for class 'RADdata'
RemoveUngenotypedLoci(object, removeNonvariant = TRUE, ...)

Arguments

`object`	A `RADdata` object.
`loci`	A character or numeric vector indicating which loci to include in the output `RADdata` object. If numeric, it refers to row numbers in `object$locTable`. If character, it refers to row names in `object$locTable`.
`chromlist`	An optional list indicating how chromosomes should be split into separate `RADdata` objects. Each item in the list is a vector of the same class as `object$locTable$Chr` (character or numeric) containing the names of chromosomes that should go into one group. If not provided, each chromosome will be sent to a separate `RADdata` object.
`chromlist.use.regex`	If `TRUE`, the character strings in `chromlist` will be treated as regular expressions for searching chromosome names. For example, if one wanted all chromosomes beginning with the string "scaffold" to go into one `RADdata` object, one could include the string `"^scaffold"` as an item in `chromlist` and set `chromlist.use.regex = TRUE`. If `FALSE`, exact matches to chromosome names will be used.
`fileprefix`	A character string indicating the prefix of .RData files to export.
`max.SD.above.mean`	The maximum number of standard deviations above the mean read depth that a locus can be in order to be retained.
`verbose`	If `TRUE`, print out information about the original number of loci and the number of loci that were retained. For `RemoveHighDepthLoci`, a histogram is also plotted showing mean depth per locus, and the cutoff for removing loci.
`removeNonvariant`	If `TRUE`, in addition to removing loci where posterior probabilities are missing, loci will be removed where posterior probabilities are uniform across the population.
`...`	Additional arguments (none implemented).

Details

SubsetByLocus may be useful if the user has used their own filtering criteria to determine a set of loci to retain, and wants to create a new dataset with only those loci. It can be used at any point in the analysis process.

SplitByChromosome is intended to make large datasets more manageable by breaking them into smaller datasets that can be processed independently, either in parallel computing jobs on a cluster, or one after another on a computer with limited RAM. Generally it should be used immediately after data import. Rather than returning new RADdata objects, it saves them individually to separate workspace image files, which can than be loaded one at a time to run analysis pipelines such as IteratePopStruct. GetWeightedMeanGenotypes or one of the export functions can be run on each resulting RADdata object, and the resulting matrices concatenated with cbind.

SplitByChromosome, RemoveMonomorphicLoci, and RemoveHighDepthLoci use SubsetByLocus internally.

Value

SubsetByLocus, RemoveMonomorphicLoci, RemoveHighDepthLoci, and RemoveUngenotypedLoci return a RADdata object with all the slots and attributes of object, but only containing the loci listed in loci, only loci with two or more alleles, only loci without abnormally high depth, or only loci where posterior probabilities are non-missing and variable, respectively.

SplitByChromosome returns a character vector containing file names where .RData files have been saved. Each .RData file contains one RADdata object named splitRADdata.

Author(s)

Lindsay V. Clark

Examples

# load a dataset for this example
data(exampleRAD)
exampleRAD

# just keep the first and fourth locus
subsetRAD <- SubsetByLocus(exampleRAD, c(1, 4))
subsetRAD

# split by groups of chromosomes
exampleRAD$locTable
tf <- tempfile()
splitfiles <- SplitByChromosome(exampleRAD, list(c(1, 4), c(6, 9)),
                                fileprefix = tf)
load(splitfiles[1])
splitRADdata

# filter out monomorphic loci (none removed in example)
filterRAD <- RemoveMonomorphicLoci(exampleRAD)

# filter out high depth loci (none removed in this example)
filterRAD2 <- RemoveHighDepthLoci(filterRAD)

# filter out loci with missing or non-variable genotypes 
# (none removed in this example)
filterRAD3 <- IterateHWE(filterRAD2)
filterRAD3 <- RemoveUngenotypedLoci(filterRAD3)

polyRAD documentation built on Nov. 10, 2022, 5:14 p.m.