makeUR | R Documentation |
Create an UR object from an RA object and perform standard filtering and compute statistics specific to unrelated populations.
makeUR(
RAobj,
ploid = 2,
indsubset = NULL,
filter = list(MAF = 0.01, MISS = 0.5, BIN = 100, HW = c(-0.05, Inf), MAXDEPTH = 500),
mafEst = TRUE,
nThreads = 2
)
RAobj |
Object of class RA created via the |
ploid |
An integer number specifying the ploidy level of the population. Currently, only a ploidy level of two (diploid) is implemented. |
indsubset |
Integer vector specifying which samples of the RA dataset to retain in the UR population. |
filter |
Named list of thresholds for various criteria used to fiter SNPs. See below for details. |
mafEst |
Logical value indicating whether the allele frequences and sequencing error parameters are to estimated for each SNP (see details). |
nThreads |
Integer vector specifying the number of clusters to use in the foreach loop. Only used in the estimation of
allele frequencies when |
If mafEst=TRUE
, then the major allele frequency and sequencing error rate for each SNP is estimated based on optimizing the likelihood
P(Y=a) = \sum_{G} P(Y=a|G)P(G)
where P(G)
are genotype probabilities under Hardy Weinberg Equilibrium (HWE) and P(Y=a|G)
are the probilities given in Equation (5) of
\insertCitebilton2018genetics2;textualGUSbase. Otherwise, the allele
frequencies are taken as the mean of the allele ratio (defined as the number of reference reads divided by the total number of reads) and
the sequencing error rate is assumed to be zero.
The filtering criteria currently implemented are
Minor allele frequency (MAF): SNPs are discarded if their MAF is less than the threshold (default is 0.01)
Proportion of missing data (MISS): SNPs are discarded if the proportion of individuals with no reads (e.g. missing genotype) is greater than the threshold value (default is 0.5)
Bin size for SNP selection (BIN
):SNPs are binned together if the distance (in base pairs) between them is less than the threshold value (default is 100).
One SNP is then randomly selected from each bin and retained for final analysis. This filtering is to ensure that there is only one SNP on each sequence read.
Hardy Weinberg Distance (HW): SNPs are discarded if their Hardy Weinberg distance is less than the first threshold
value (default=-0.05
) or if their Hardy Weinberg distance is greater than the second threshold value (default=Inf
).
This filtering criteria has been taken from the KGD software (https://github.com/AgResearch/KGD).
Maximum average SNP read depth (MAXDEPTH): SNPs are discarded if the average read depth for the SNP is larger than the threshold (default is 500)
If filter = NULL
, then no filtering is performed.
Estimation of the allele frequencies when mafEst=TRUE
is parallelized using openMP in compiled C code, where the
number of threads used in the parallelization is specified by the argument nThreads
.
An R6 object of class UR.
Timothy P. Bilton and Ken G. Dodds
bilton2018genetics2GUSbase
file <- simDS()
RAfile <- VCFtoRA(file$vcf)
simdata <- readRA(RAfile)
## make unrelated population
urpop <- makeUR(simdata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.