sexy_markers | R Documentation |
This function identifies sex-linked markers putatively located on
heterogametic or homogametic chromosomes and re-assign the sex in a dataset
according to Y- or W-linked markers.
The function work best in: DArT silico (counts) >
DArT counts or RADseq with allele read depth > DArT silico (genotypes) >
RADseq (genotypes) and DArT (1-row, 2-rows genotypes).
sexy_markers(
data,
silicodata = NULL,
strata = NULL,
boost.analysis = FALSE,
coverage.thresholds = 1,
filters = TRUE,
interactive.filter = TRUE,
folder.name = NULL,
parallel.core = parallel::detectCores() - 1,
...
)
data |
(object or file) DArT file |
silicodata |
(optional, file) A silico (dominant marker) DArT file |
strata |
(file) A tab delimited file with a minimum of
2 columns (
Default: |
boost.analysis |
(optional, logical) This method uses machine learning
approaches to find sex markers and re-assign samples in sex group. |
coverage.thresholds |
(optional, integer) The minimum coverage required
to call a marker absent. For silico genotype data this must be < 1. |
filters |
(optional, logical) When |
interactive.filter |
(optional, logical) When |
folder.name |
(optional,character) Name of the folder to store the results.
Default: |
parallel.core |
(optional) The number of core used for parallel
execution during import.
Default: |
... |
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section) |
This function takes DArT and RAD-type data to find markers that have a specific
pattern that is linked to sex.
The function hypothesizes the presence of sex-chromosomes in your
species/population. The tests are designed to identify markers that are
located on putative heterogametic (Y or W) or homogametic (X or Z) chromosomes.
Note: Violating Assumptions or Prerequisites (see below) can lead to
false positive or the absence of detection of sex-linked markers.
The created object contains:
A list with (1) the summarised SNP data per sex and (2) the summarised silico data per sex. This should allow you to re-create the various plots.
A vector with the names of the sex-linked marker. One vector for each sex method.
A dataframe with a summary of the sex-linked markers and their sequence (if available).
Genetic Sex Determination System over a e.g. environmental-sex-determination system.
Genome coverage: restriction sites randomly spread throughout the whole genome.
Mutations: Processes such as sex-specific mutations in the restriction sites could lead to false positive results.
Deletions/duplications: Processes such as sex-specific deletions or duplications could lead to false positive results.
Homology: The existence of homologous sequences between the homogametic and heterogametic chromosomes could lead to false negative results.
Absence of population signal
Sample size: Ideally, the data must have enough individuals (n > 100).
Batch effect: Sex should be randomized on lanes/chips during sequencing.
Sex ratio: Dataset with equal ratio work best.
Genotyping rate: for DArT data, if the minimum call rate is
> 0.5 ask DArT to lower their filtering threshold.
RADseq data, lower markers missingness thresholds during filtering
(e.g. stacks r
and p
).
Identity-by-Missingness: Absence of artifactual pattern of missingness (see missing visualization)
Low genotyping error rate: see detect_het_outliers
and whoa.
Low heterozygosity miscall rate: see detect_het_outliers
and
whoa.
Absence of pattern of heterozygosity driven by missingness:
see detect_mixed_genomes
.
Absence of paralogous sequences in the data.
Heterogametic sex-markers:
Presence/Absence method: To identify markers on Y or W chromosomes, we look at the presence or absence of a marker between females and males. More specifically, if a marker is always present in males but never in females, they are putatively located in the Y-chromosome; and vice versa for the W-linked markers.
Homogametic sex-markers: We have two different methods to identify markers on X or Z chromosomes:
Heterozygosity method: By looking at the heterozygosity of a marker between sexes, we can identify markers that are always homozygous in one sex (e.g. males for an XY system), while exhibiting an intermediate range of heterozygosity in the other sex (0.1 - 0.5).
Coverage method: If the data includes count information, this function will look for markers that have double the number of counts for either of the sexes. For example if an XY/XX system is present, females are expected to have double the number of counts for markers on the X chromosome.
dots-dots-dots ... allows to pass several arguments for fine-tuning the function:
species
: To give your figures some meanings.
Default species = NULL
.
population
: To give your figures some meanings.
Default species = NULL
.
tau
: The quantile used in regression to distinguish homogametic markers
with the heterozygosity method. See rq
.
Default tau = 0.03
.
mis.threshold.data
: Threshold to filter the SNP data on missingness.
Only if interactive.filter = FALSE
.
mis.threshold.silicodata
: Threshold to filter the silico data on
missingness. No default. Only if interactive.filter = FALSE
.
threshold.y.markers
: Threshold to select heterogametic sex-linked
markers from the SNP data with the presence/absence method.No default.
Only if interactive.filter = FALSE
.
threshold.x.markers.qr
: Threshold to select homogametic sex-linked
markers from the SNP data with the heterozygosity method. No default.
Only if interactive.filter = FALSE
.
zoom.data
: Threshold to subset the F/M ratio of mean SNP coverage.
Used to improve the histogram resolution to select a better threshold.x.markers.RD
threshold. No default. Only if interactive.filter = FALSE
.
threshold.x.markers.RD
: Threshold to select homogametic sex-linked
markers from the SNP data with the coverage method.No default.
Only if interactive.filter = FALSE
.
zoom.silicodata
: Threshold to subset the F/M ratio of mean silico coverage.
Used to improve the histogram resolution to select a better threshold.x.markers.RD.silico
threshold. No default. Only if interactive.filter = FALSE
.
threshold.x.markers.RD.silico
: Threshold to select homogametic sex-linked
markers from the silico data with the coverage method. No default.
Only if interactive.filter = FALSE
.
sex.id.input
: (integer) sex.id.input = c(1, 2 or 3)
to recalculate the sex based on (1) 'visual', (2) 'genetic SNP' or (3) 'genetic SILICO' sexID.
No default. Only if interactive.filter = FALSE
.
het.qr.input
: (integer) het.qr.input = c(1 or 2)
to plot the heterozygosity residual plot for (1) X-linked markers (heterozygous for females),
or (2) Z-linked markers (heterozygous for males). No default.
Only if interactive.filter = FALSE
.
Machine Learning approaches (Random Forest and Extreme Gradient Boosting Trees) are currently been tested. They usually show a lower discovery rate but tend to perform better with new samples.
Floriaan Devloo-Delva Floriaan.Devloo-Delva@csiro.au and with help from Thierry Gosselin thierrygosselin@icloud.com
Eric Anderson's whoa package.
## Not run:
# The minimum
sex.markers <- radiator::sexy_markers(
data = "shark.dart.data.csv",
strata = "shark.strata.tsv")
# This will use the default: interactive version and a list is created and to view the sex markers
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.