makeVariantExperimentFromVCF: The function to convert VCF files directly into...

Description Usage Arguments Value Examples

View source: R/makeVariantExperimentFromVCF.R

Description

makeVariantExperimentFromVCF is the function to convert a vcf file into VariantExperiment object. The genotype data will be written as GDSArray format, which is saved in the assays slot. The annotation info for variants or samples will be written as DelayedDataFrame object, and saved in the rowData or colData slot.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
makeVariantExperimentFromVCF(
  vcf.fn,
  out.dir = tempfile(),
  replace = FALSE,
  header = NULL,
  info.import = NULL,
  fmt.import = NULL,
  sample.info = NULL,
  ignore.chr.prefix = "chr",
  reference = NULL,
  start = 1L,
  count = -1L,
  parallel = FALSE,
  verbose = FALSE
)

Arguments

vcf.fn

the file name(s) of (compressed) VCF format; or a ‘connection’ object.

out.dir

The directory to save the gds format of the vcf data, and the newly generated VariantExperiment object with array data in GDSArray format and annotation data in DelayedDataFrame format. The default is a temporary folder.

replace

Whether to replace the directory if it already exists. The default is FALSE.

header

if NULL, ‘header’ is set to be ‘seqVCF_Header(vcf.fn)’, which is a list (with a class name "SeqVCFHeaderClass", S3 object).

info.import

characters, the variable name(s) in the INFO field for import; default is ‘NULL’ for all variables.

fmt.import

characters, the variable name(s) in the FORMAT field for import; default is ‘NULL’ for all variables.

sample.info

characters (with) file path for the sample info data. The data must have colnames (for phenotypes), rownames (sample ID's). No blank line allowed. The default is ‘NULL’ for no sample info.

ignore.chr.prefix

a vector of character, indicating the prefix of chromosome which should be ignored, like "chr"; it is not case-sensitive.

reference

genome reference, like "hg19", "GRCh37"; if the genome reference is not available in VCF files, users could specify the reference here.

start

the starting variant if importing part of VCF files.

count

the maximum count of variant if importing part of VCF files, -1 indicates importing to the end.

parallel

‘FALSE’ (serial processing), ‘TRUE’ (parallel processing), a numeric value indicating the number of cores, or a cluster object for parallel processing; ‘parallel’ is passed to the argument ‘cl’ in ‘seqParallel’, see ‘?SeqArray::seqParallel’ for more details. The default is "FALSE".

verbose

whether to print the process messages. The default is FALSE.

Value

An VariantExperiment object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## the vcf file
vcf <- SeqArray::seqExampleFileName("vcf")
## conversion
## ve <- makeVariantExperimentFromVCF(vcf)
## ve
## the filepath to the gds file.
## gdsfile(ve)

## only read in specific info columns
## ve <- makeVariantExperimentFromVCF(vcf, out.dir = tempfile(),
##                                    info.import=c("OR", "GP"))
## ve
## convert without the INFO and FORMAT fields
## ve <- makeVariantExperimentFromVCF(vcf, out.dir = tempfile(),
##                                    info.import=character(0),
##                                    fmt.import=character(0))
## ve
## now the assay data does not include the
#"annotation/format/DP/data", and the rowData(ve) does not include
#any info columns.

VariantExperiment documentation built on April 10, 2021, 6 p.m.