split_vcf: Split a VCF file

split_vcfR Documentation

Split a VCF file

Description

This function allows to split a VCF file in several VCFs, based on individuals or populations.

Usage

split_vcf(data, strata, parallel.core = parallel::detectCores() - 1, ...)

Arguments

data

14 options for input (diploid data only): VCFs (SNPs or Haplotypes, to make the vcf population ready), plink (tped, bed), stacks haplotype file, genind (library(adegenet)), genlight (library(adegenet)), gtypes (library(strataG)), genepop, DArT, and a data frame in long/tidy or wide format. To verify that radiator detect your file format use detect_genomic_format (see example below). Documented in Input genomic datasets of tidy_genomic_data.

DArT and VCF data: radiator was not meant to generate alleles and genotypes if you are using a VCF file with no genotype (only genotype likelihood: GL or PL). Neither is radiator able to magically generate a genind object from a SilicoDArT dataset. Please look at the first few lines of your dataset to understand it's limit before asking raditor to convert or filter your dataset.

strata

A file identical to the strata file usually used in radiator, with an additional column named: SPLIT. This new column contains numerical values (e.g. 1, 1, 1, ..., 2, 2, 2, 2, ..., 3, 3, ...), that indicate for each INDIVIDUALS/STRATA, how to split. The number of VCF to split to is based on the max value found in the column SPLIT, above this would result in 3 VCF files created).

parallel.core

(optional) The number of core used for parallel execution during import. Default: parallel.core = parallel::detectCores() - 1.

...

(optional) To pass further arguments for fine-tuning the function.

Value

The function returns in the global environment a list with the different tidy dataset from the split vcf. In the working directory, the splitted VCF files with "_1", "_2" in the name.

Author(s)

Thierry Gosselin thierrygosselin@icloud.com

Examples

## Not run: 
split.data <- radiator::split_vcf(
data = "batch_1.vcf",
strata = "strata.split.tsv",
blacklist.id = "blacklisted.id.txt",
whitelist.markers = "whitelist.loci.txt")

## End(Not run)

thierrygosselin/radiator documentation built on May 5, 2024, 5:12 a.m.