library(echotabix)
#### Example with full data #### tmp <- echodata::example_fullSS() tabix_files <- echotabix::convert(target_path = tmp, start_col = "BP")
query
automatically chooses the correct methods to perform a query
by inferring:
query_genome
) matches
the full data's genome build (target_genome
),
and if not performs liftover before querying. query_granges
is used to specify which coordinates you want to query from
the target file. This argument is rather flexible and
can take any of the following:
GRanges
object with one or more rows, across one or more chromosomes. data.table
containing the columns "CHR","POS", and "SNP",
which is automatically converted to a GRanges
object. echotabix::construct_query()
function,
for additional flexibility and non-standard column names. query_dat <- echodata::BST1 query_res <- echotabix::query(target_path = tabix_files$path, query_granges = query_dat)
You can also gain more customised control over the query
with the helper functionconstruct_query()
(which is called
internally by default when query_granges
is not a GRanges
object.
query_granges1 <- echotabix::construct_query(query_chrom = "chr4", query_start_pos = 5000, query_end_pos = 1e8L) query_res2 <- echotabix::query(target_path = tabix_files$path, query_granges = query_granges1)
query_dat2 <- echodata::LRRK2[1:100,] ## Rename columns for demo purposes data.table::setnames(query_dat2, c("CHR","SNP"),c("seqnames","rsID")) query_granges2 <- echotabix::construct_query(query_dat = query_dat2, query_chrom_col = "seqnames", query_snp_col = "rsID") query_res2 <- echotabix::query(target_path = tabix_files$path, query_granges = query_granges2)
Instead of specifying each column name, you can also select
standardise_colnames=TRUE
to automatically rename all columns to a standard nomenclature using
MungeSumstats::format_header
.
query_granges3 <- echotabix::construct_query(query_dat = query_dat2, standardise_colnames = TRUE) query_res2 <- echotabix::query(target_path = tabix_files$path, query_granges = query_granges2)
Next, we'll query whole-genome sequencing data from the 1000 Genomes Project.
Specifically, we'll extract a given genomic range for a subset of samples (individuals).
target_path <- file.path( "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/", "ALL.chr4.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz" ) ## Fewer samples will speed up queries substantially (15-30s vs. >1.2min). samples <- c("HG00097","HG00099","HG00100","HG00101","HG00102") dat2 <- echodata::BST1[1:50,] #### Query #### vcf <- echotabix::query(target_path = target_path, query_granges = dat2, samples = samples, query_save = TRUE, as_datatable = FALSE, query_force_new = TRUE) print(vcf)
Merges the convert
and query
functions into a single pipeline.
If the target_path
file is not already a tabix file,
it will first be converted to a sorted, bgzip-compressed,
tabix-indexed file, and then proceed to the query
step.
If the target_path
file is already a tabix file, the function will
detect this automatically and skip right to the query
step.
query_dat <- echodata::BST1 target_path <- echodata::example_fullSS() query_res <- echotabix::convert_and_query( target_path = target_path, target_start_col = "BP", query_granges = query_dat, query_force_new = TRUE) knitr::kable(head(query_res))
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.