View source: R/filter_snp_number.R
filter_snp_number | R Documentation |
This filter removes outlier markers with too many SNP number per locus/read. The data requires snp and locus information (e.g. from a VCF file). Having a higher than "normal" SNP number is usually the results of assembly artifacts or bad assembly parameters. This filter is population-agnostic, but still requires a strata file if a vcf file is used as input.
Filter targets: Markers
Statistics: The number of SNPs per locus.
filter_snp_number(
data,
strata = NULL,
interactive.filter = TRUE,
filter.snp.number = NULL,
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE,
...
)
data |
(4 options) A file or object generated by radiator:
How to get GDS and tidy data ?
Look into |
strata |
(path or object) The strata file or object.
Additional documentation is available in |
interactive.filter |
(optional, logical) Do you want the filtering session to
be interactive. Figures of distribution are shown before asking for filtering
thresholds.
Default: |
filter.snp.number |
(integer) This is best decided after viewing the figures.
If the argument is set to 2, locus with 3 and more SNPs will be blacklisted.
Default: |
filename |
(optional) Name of the filtered tidy data frame file
written to the working directory (ending with |
parallel.core |
(optional) The number of core used for parallel
execution during import.
Default: |
verbose |
(optional, logical) When |
... |
(optional) Advance mode that allows to pass further arguments for fine-tuning the function. Also used for legacy arguments (see details or special section) |
Interactive version
There are 2 steps in the interactive version to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. SNP number per read/locus visualization
Step 2. Choose the filtering thresholds
A list in the global environment with 6 objects:
$snp.number.markers
$number.snp.reads.plot
$whitelist.markers
$tidy.filtered.snp.number
$blacklist.markers
$filters.parameters
The object can be isolated in separate object outside the list by following the example below.
## Not run:
turtle.outlier.snp.number <- radiator::filter_snp_number(
data = "turtle.vcf",
strata = "turtle.strata.tsv",
max.snp.number = 4,
filename = "tidy.data.turtle.tsv"
)
tidy.data <- turtle.outlier.snp.number$tidy.filtered.snp.number
#Inside the same list, to isolate the markers blacklisted:
blacklist <- turtle.outlier.snp.number$blacklist.markers
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.