extract_variants_from_vcf_file: Extracts variants from a vcf file

View source: R/load_data.R

extract_variants_from_vcf_fileR Documentation

Extracts variants from a vcf file

Description

Add Description

Usage

extract_variants_from_vcf_file(
  vcf_file,
  id = NULL,
  rename = NULL,
  sample_field = NULL,
  filename_as_id = FALSE,
  strip_extension = c(".vcf", ".vcf.gz", ".gz"),
  filter = TRUE,
  multiallele = c("expand", "exclude"),
  extra_fields = NULL,
  fix_vcf_errors = TRUE
)

Arguments

vcf_file

Path to the vcf file

id

ID of the sample to select from VCF. If NULL, then the first sample will be selected. Default NULL.

rename

Rename the sample to this value when extracting variants. If NULL, then the sample will be named according to ID.

sample_field

Some algoriths will save the name of the sample in the ##SAMPLE portion of header in the VCF (e.g. ##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the id parameter ("TUMOR" in this example), then sample_field can be used to specify the name of the tag ("SampleName" in this example). Default NULL.

filename_as_id

If set to TRUE, the file name will be used as the sample name.

strip_extension

Only used if filename_as_id is set to TRUE. If set to TRUE, the file extention will be stripped from the filename before setting the sample name. If a character vector is given, then all the strings in the vector will removed from the end of the filename before setting the sample name. Default c(".vcf",".vcf.gz",".gz")

filter

Exclude variants that do not have a PASS in the FILTER column of the VCF. Default TRUE.

multiallele

Multialleles are when multiple alternative variants are listed in the same row in the vcf. One of "expand" or "exclude". If "expand" is selected, then each alternate allele will be given their own rows. If "exclude" is selected, then these rows will be removed. Default "expand".

extra_fields

Optionally extract additional fields from the INFO section of the VCF. Default NULL.

fix_vcf_errors

Attempt to automatically fix VCF file formatting errors.

Value

Returns a data.table of variants extracted from a vcf

Examples

vcf <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf",
  package = "musicatk")
variants <- extract_variants_from_vcf_file(vcf_file = vcf)

campbio/musicatk documentation built on July 14, 2024, 8:28 a.m.