View source: R/makeVariantExperimentFromVCF.R
makeVariantExperimentFromVCF | R Documentation |
makeVariantExperimentFromVCF
is the function
to convert a vcf file into VariantExperiment
object. The
genotype data will be written as GDSArray
format, which
is saved in the assays
slot. The annotation info for
variants or samples will be written as DelayedDataFrame
object, and saved in the rowData
or colData
slot.
makeVariantExperimentFromVCF(
vcf.fn,
out.dir = tempfile(),
replace = FALSE,
header = NULL,
info.import = NULL,
fmt.import = NULL,
sample.info = NULL,
ignore.chr.prefix = "chr",
reference = NULL,
start = 1L,
count = -1L,
parallel = FALSE,
verbose = FALSE
)
vcf.fn |
the file name(s) of (compressed) VCF format; or a ‘connection’ object. |
out.dir |
The directory to save the gds format of the vcf
data, and the newly generated VariantExperiment object with
array data in |
replace |
Whether to replace the directory if it already exists. The default is FALSE. |
header |
if NULL, ‘header’ is set to be ‘seqVCF_Header(vcf.fn)’, which is a list (with a class name "SeqVCFHeaderClass", S3 object). |
info.import |
characters, the variable name(s) in the INFO field for import; default is ‘NULL’ for all variables. |
fmt.import |
characters, the variable name(s) in the FORMAT field for import; default is ‘NULL’ for all variables. |
sample.info |
characters (with) file path for the sample info data. The data must have colnames (for phenotypes), rownames (sample ID's). No blank line allowed. The default is ‘NULL’ for no sample info. |
ignore.chr.prefix |
a vector of character, indicating the prefix of chromosome which should be ignored, like "chr"; it is not case-sensitive. |
reference |
genome reference, like "hg19", "GRCh37"; if the genome reference is not available in VCF files, users could specify the reference here. |
start |
the starting variant if importing part of VCF files. |
count |
the maximum count of variant if importing part of VCF files, -1 indicates importing to the end. |
parallel |
‘FALSE’ (serial processing), ‘TRUE’ (parallel processing), a numeric value indicating the number of cores, or a cluster object for parallel processing; ‘parallel’ is passed to the argument ‘cl’ in ‘seqParallel’, see ‘?SeqArray::seqParallel’ for more details. The default is "FALSE". |
verbose |
whether to print the process messages. The default is FALSE. |
An VariantExperiment
object.
## the vcf file
vcf <- SeqArray::seqExampleFileName("vcf")
## conversion
ve <- makeVariantExperimentFromVCF(vcf)
ve
## the filepath to the gds file.
gdsfile(ve)
## only read in specific info columns
ve <- makeVariantExperimentFromVCF(vcf, out.dir = tempfile(),
info.import=c("OR", "GP"))
ve
## convert without the INFO and FORMAT fields
ve <- makeVariantExperimentFromVCF(vcf, out.dir = tempfile(),
info.import=character(0),
fmt.import=character(0))
ve
## now the assay data does not include the
#"annotation/format/DP/data", and the rowData(ve) does not include
#any info columns.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.