readSVvcf: Read SVs from a VCF file

View source: R/readSVvcf.R

readSVvcfR Documentation

Read SVs from a VCF file

Description

Read a VCF file that contains SVs and create a GRanges with relevant information, e.g. SV size or genotype quality.

Usage

readSVvcf(
  vcf.file,
  keep.ins.seq = FALSE,
  keep.ref.seq = FALSE,
  sample.name = "",
  qual.field = c("GQ", "QUAL"),
  other.field = NULL,
  check.inv = FALSE,
  keep.ids = FALSE,
  nocalls = FALSE,
  out.fmt = c("gr", "df", "vcf"),
  min.sv.size = 10
)

Arguments

vcf.file

the path to the VCF file

keep.ins.seq

should it keep the inserted sequence? Default is FALSE.

keep.ref.seq

should it keep the reference allele sequence? Default is FALSE.

sample.name

the name of the sample to use. If "" (default) or sample names not in the VCF, select the first sample. If NULL, don't select particular sample.

qual.field

field to use as quality. Can be in INFO (e.g. default GQ) or FORMAT (e.g. DP). If not found in INFO/FORMAT, QUAL field is used.

other.field

name of other fields to extract from the INFO (e.g. AF). Default is NULL

check.inv

should the sequence of MNV be compared to identify inversions.

keep.ids

keep variant ids? Default is FALSE.

nocalls

if TRUE returns no-calls only (genotype ./.). Default FALSE.

out.fmt

output format. Default is 'gr' for GRanges. Other options: 'df' for data.frame and 'vcf' for the VCF object from the VariantAnnotation package.

min.sv.size

the minimum size of the variant to extract from the VCF. Default is 10

Details

By default, the quality information is taken from the GQ field. If GQ (or the desired field) is missing from both FORMAT or INFO, QUAL will be used.

The 'sample.name' argument can be used to select genotypes for specific sample from the VCF. In addition, variants that are homozygous reference in this sample will be filtered. If 'sample.name' is not in the VCF, the first sample will be selected (default). To force the entire VCF to be read no matter the genotypes of samples, use 'sample.name=NULL'.

Alleles are split and, for each, column 'ac' reports the allele count. Notable cases incude 'ac=-1' for no/missing calls (e.g. './.'), and 'ac=0' on the first allele to report hom ref, variants. These cases are often filtered later with 'ac>0' to keep only non-ref calls. If the VCF contains no samples or if no sample selection if forced (sample.name=NULL), 'ac' will contain '-1' for all variants in the VCF.

Value

depending on 'out.fmt', a GRanges, data.frame, or VCF object with relevant information.

Author(s)

Jean Monlong

Examples

## Not run: 
calls.gr = readSVvcf('calls.vcf')

## End(Not run)

jmonlong/sveval documentation built on July 31, 2023, 7:50 p.m.