extract_variants | R Documentation |
Chooses the correct function to extract variants from input based on
the class of the object or the file extension. Different types of objects
can be mixed within the list. For example, the list can include VCF files
and maf objects. Certain parameters such as id
and rename
only apply to VCF objects or files and need to be individually specified
for each VCF. Therefore, these parameters should be suppied as a vector
that is the same length as the number of inputs. If other types of
objects are in the input list, then the value of id
and rename
will be ignored for these items.
extract_variants(
inputs,
id = NULL,
rename = NULL,
sample_field = NULL,
filename_as_id = FALSE,
strip_extension = c(".vcf", ".vcf.gz", ".gz"),
filter = TRUE,
multiallele = c("expand", "exclude"),
fix_vcf_errors = TRUE,
extra_fields = NULL,
chromosome_col = "chr",
start_col = "start",
end_col = "end",
ref_col = "ref",
alt_col = "alt",
sample_col = "sample",
verbose = TRUE
)
inputs |
A vector or list of objects or file names. Objects can be
CollapsedVCF, ExpandedVCF, MAF,
an object that inherits from |
id |
A character vector the same length as |
rename |
A character vector the same length as |
sample_field |
Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF.
See |
filename_as_id |
If set to |
strip_extension |
Only used if |
filter |
Exclude variants that do not have a |
multiallele |
Multialleles are when multiple alternative variants
are listed in the same row in the vcf.
See |
fix_vcf_errors |
Attempt to automatically fix VCF file
formatting errors.
See |
extra_fields |
Optionally extract additional fields from all input
objects. Default |
chromosome_col |
The name of the column that contains the chromosome
reference for each variant. Only used if the input is a matrix or data.frame.
Default |
start_col |
The name of the column that contains the start
position for each variant. Only used if the input is a matrix or data.frame.
Default |
end_col |
The name of the column that contains the end
position for each variant. Only used if the input is a matrix or data.frame.
Default |
ref_col |
The name of the column that contains the reference
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
alt_col |
The name of the column that contains the alternative
base(s) for each variant. Only used if the input is a matrix or data.frame.
Default |
sample_col |
The name of the column that contains the sample
id for each variant. Only used if the input is a matrix or data.frame.
Default |
verbose |
Show progress of variant extraction. Default |
Returns a data.table of variants from a vcf
# Get loations of two vcf files and a maf file
luad_vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf",
package = "musicatk")
lusc_maf_file <- system.file("extdata", "public_TCGA.LUSC.maf",
package = "musicatk")
melanoma_vcfs <- list.files(system.file("extdata", package = "musicatk"),
pattern = glob2rx("*SKCM*vcf"), full.names = TRUE)
# Read all files in at once
inputs <- c(luad_vcf_file, melanoma_vcfs, lusc_maf_file)
variants <- extract_variants(inputs = inputs)
table(variants$sample)
# Run again but renaming samples in first four vcfs
new_name <- c(paste0("Sample", 1:4), NA)
variants <- extract_variants(inputs = inputs, rename = new_name)
table(variants$sample)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.