Description Usage Arguments Details Value Examples
This function applies different filters on SNP data so as to generate as
set of markers suitable for haplotype analysis. Although the function can
be called separately for filtering purposes, it was thought and designed
specifically for the needs of package HaplotypeMiner
and might not suit
the needs of the general. See section Details
for a discussion of
the different mandatory and optional filters applied by this function.
1 2 3 |
snp_data |
A list of data pertaining so SNP markers and having at least
elements |
chrom |
A character of length one. The name of the chromosome for which markers should be kept. |
center_pos |
A numeric of length one. The central position (in base pairs) of the gene of interest. |
max_distance_to_gene |
A numeric of length one. The maximum distance
(in base pairs) between |
max_missing_threshold |
A numeric of length one between 0 and 1, or
|
max_het_threshold |
A numeric of length one between 0 and 1, or
|
min_alt_threshold |
A numeric of length one between 0 and 1, or
|
min_allele_count |
A positive numeric value, or |
verbose |
Logical. Should information regarding the filtering process be printed to screen? Defaults to TRUE. |
genotype_filter
applies automaticaly three filters to the
snp_data
object provided as an argument. These filters are applied
both when the function is used separately and when used internally inside
a call to haplo_selection
. The three filters are :
chrom Only markers lying on the chromosome of interest are kept for further analysis.
distance Only markers less than max_distance_to_gene
base pairs
away from the central position of interest (usually the center of the gene
for which haplotypes are beign generated) are kept for further analysis. The
value of this distance is 1 Gb by default, which should mean that all
markers on a given chromosome are kept by default (a more sensible
default value could be used).
multiallelism Markers that are not biallelic (i.e. either triallelic
or tetraallelic) are automatically removed from the dataset as package
HaplotypeMiner
does not know how to handle these markers yet.
Other filters are optional and are not applied by default, although is is
recommended that users do apply these filters either prior to the analysis,
externally to package HaplotypeMiner
, or as part of the analysis pipeline
implemented by function haplo_selection
. These four filters
are :
Missing data Markers harbouring a missing data rate higher than
max_missing_threshold
can be selectively removed during the
analysis.
Heterozygosity Markers harbouring a heterozygosity rate higher
than max_het_threshold
can be selectively removed during the
analysis. This may not be relevant for species found in the while, but
is relevant e.g. for crop species which are expected to by homozygous
at all loci.
Minor allele frequency Markers harbouring a minor
allele frequency (MAF) lower than min_alt_treshold
can be selectively
removed from the analysis.
Minor allele count Markers harbouring a minor allele count (MAC)
lower than min_allele_count
can be selectively removed from the
analysis.
A list containing 3 or 4 elements depending on the snp_data
object used as input :
GenotypesAn object of class snpMatrix
containing the
genotypes corresponding to the various markers for every individual.
This is essentially a subset of snp_data$Genotypes
that
contains only markers that have been selected.
MarkersA data.frame
containing metadata relative to the
genotyped markers. This is essentially a subset of
snp_data$Markers
that contains only markers that have been
selected.
FiltersA list of eight integer vectors indicating how many markers remained following different filtering steps : (1) the total number of markers, (2) the number of markers located on the chromosome of interest, (3) the number of markers located close enough to the central gene position, (4) the number of biallelic markers, (5) the number of markers passing the missing data filter, (6) the number of markers passing the heterozygosity filter, (7) the number of markers passing the MAF filter, and (8) the number of markers passing the MAC filter. All these numbers are the number of markers remaining after every preceding step and not the absolute number of markers passing this filter.
VCFIf a VCF element was present in the initial snp_data
object, this element is a subset of it containing only the markers
remaining following filtering.
1 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.