vcf2phase: Convert VCF data into GHap phase

ghap.vcf2phaseR Documentation

Convert VCF data into GHap phase

Description

This function takes phased genotype data in the Variant Call Format (VCF) and converts them into the GHap phase format.

Usage

  ghap.vcf2phase(input.files = NULL, vcf.files = NULL,
                 sample.files = NULL, out.file,
                 ncores = 1, verbose = TRUE)

Arguments

If all input files share the same prefix, the user can use the following shortcut options:

input.files

Character vector with the list of prefixes for input files.

out.file

Character value for the output file name.

The user can also opt to point to input files separately:

vcf.files

Character vector containing the list of VCF files.

sample.files

Character vector containing the list of SAMPLE files.

To turn conversion progress-tracking on or off or set the number of cores please use:

ncores

A numeric value specfying the number of cores to use (default = 1).

verbose

A logical value specfying whether log messages should be printed (default = TRUE).

Details

The Variant Call Format (VCF) - as described in https://github.com/samtools/hts-specs - is here manipulated to obtain the GHap phase format. Important: the function does not apply filters to the data, except for skipping multi-allelic variants. Should variants be filtered, the user is advised to pre-process the VCF files with third-party software (such as BCFTools). The FORMAT field should also follow the "GT:..." specification, with genotypes placed first in each sample column. Finally, all genotypes should be phased and take one of the following values: "0|0", "0|1", "1|0" or "1|1". Warning: this function is not optimized for very large datasets.

Author(s)

Yuri Tani Utsunomiya <ytutsunomiya@gmail.com>

References

H. Li et al. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009. 25:2078-2079.

H. Li. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011. 27(21):2987-2993.

See Also

ghap.compress, ghap.loadphase, ghap.fast2phase, ghap.oxford2phase

Examples


# #### DO NOT RUN IF NOT NECESSARY ###
# 
# # Copy the example data in the current working directory
# exfiles <- ghap.makefile(dataset = "example",
#                          format = "vcf",
#                          verbose = TRUE)
# file.copy(from = exfiles, to = "./")
# 
# ### RUN ###
# 
# # Convert from a single genome-wide file
# ghap.vcf2phase(input.files = "example",
#                out.file = "example")
# 
# # Convert from a list of chromosome files
# ghap.vcf2phase(input.files = paste0("example_chr",1:10),
#                out.file = "example")
# 
# # Convert using separate lists for file extensions
# ghap.vcf2phase(vcf.files = paste0("example_chr",1:10,".vcf"),
#                sample.files = paste0("example_chr",1:10,".sample"),
#                out.file = "example")


GHap documentation built on July 2, 2022, 1:07 a.m.