read_dart | R Documentation |
Used internally in radiator and might be of interest for users. The function generate a GDS object/file and optionally, a tidy dataset using DArT files.
read_dart(
data,
strata,
filename = NULL,
tidy.dart = FALSE,
verbose = FALSE,
parallel.core = parallel::detectCores() - 1,
...
)
data |
One of the DArT output files. 6 formats used by DArT are recognized by radiator. recognised:
Depending on the number of markers, these format will be recoded similarly to VCF files (dosage of alternate allele, see details). The function can import If you encounter a problem, sent me your data so that I can update the function. |
strata |
A tab delimited file or object with 3 columns.
Columns header is:
See example on how to extract the TARGET_ID of your DArT file. |
filename |
(optional) The function uses |
tidy.dart |
(logical, optional) Generate a tidy dataset.
Default: |
verbose |
(optional, logical) When |
parallel.core |
(optional) The number of core used for parallel
execution during import.
Default: |
... |
(optional) To pass further argument for fine-tuning the function. |
A radiator GDS file and tidy dataframe with several columns depending on DArT file:
silico.dart:
A tibble with 5 columns: CLONE_ID, SEQUENCE, VALUE, INDIVIDUALS, STRATA
.
This object is also saved in the directory (file ending with .rad).
Common to 1row, 2rows and counts
: A GDS file is automatically generated.
To have a tidy tibble, the argument tidy.dart = TRUE
must be used.
VARIANT_ID: generated by radiator and correspond the markers in integer.
MARKERS: generated by radiator and correspond to CHROM + LOCUS + POS separated by 2 underscores.
CHROM: the chromosome info, for de novo: CHROM_1.
LOCUS: the locus info.
POS: the SNP id on the LOCUS.
COL: the position of the SNP on the short read.
REF: the reference allele.
ALT: the alternate allele.
INDIVIDUALS: the sample name.
STRATA/POP_ID: populations id of the sample.
GT_BIN: the genotype based on the number of alternate allele in the genotype
(the count/dosage of the alternate allele). 0, 1, 2, NA
.
REP_AVG: the reproducibility average, output specific of DArT.
Other columns potentially in the tidy tibble:
GT: the genotype in 6 digit format à la genepop.
GT_VCF: the genotype in VCF format 0/0, 0/1, 1/1, ./.
.
GT_VCF_NUC: the genotype in VCF format, but keeping the nucleotide information.
A/A, A/T, T/T, ./.
AVG_COUNT_REF: the coverage for the reference allele, output specific of DArT.
AVG_COUNT_SNP: the coverage for the alternate allele, output specific of DArT.
READ_DEPTH: the number of reads used for the genotype (count data).
ALLELE_REF_DEPTH: the number of reads of the reference allele (count data).
ALLELE_ALT_DEPTH: the number of reads of the alternate allele (count data).
Written in the working directory:
The radiator GDS file
The DArT metadata information
The tidy DArT data
The strata file associated with this tidy dataset
The allele dictionary is a tibble with columns:
MARKERS, CHROM, LOCUS, POS, REF, ALT
.
dots-dots-dots ... allows to pass several arguments for fine-tuning the function:
whitelist.markers
: detailed in filter_whitelist
.
Defautl: whitelist.markers = NULL
.
missing.memory
(option, path)
This argument allows to erase genotypes that have bad statistics.
It's the path to a file .rad
file that contains 3 columns:
MARKERS, INDIVIDUALS, ERASE
. The file is produced by several radiator
functions. For DArT data, filter_rad
generate the file.
Defautl: missing.memory = NULL
. Currently not used.
path.folder
: (optional, path) To write output in a specific folder.
Default: path.folder = NULL
. The working directory is used.
pop.levels
: detailed in tidy_genomic_data
.
Thierry Gosselin thierrygosselin@icloud.com
extract_dart_target_id
## Not run:
clownfish.dart.tidy <- radiator::read_dart(
data = "clownfish.dart.csv",
strata = "clownfish.strata.tsv"
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.