slicing: Query GDC for data slices

View source: R/slicing.R

slicingR Documentation

Query GDC for data slices

Description

This function returns a BAM file representing reads overlapping regions specified either as chromosomal regions or as gencode gene symbols.

Usage

slicing(
  uuid,
  regions,
  symbols,
  destination = file.path(tempdir(), paste0(uuid, ".bam")),
  overwrite = FALSE,
  progress = interactive(),
  token = gdc_token()
)

Arguments

uuid

character(1) identifying the BAM file resource

regions

character() vector describing chromosomal regions, e.g., c("chr1", "chr2:10000", "chr3:10000-20000") (all of chromosome 1, chromosome 2 from position 10000 to the end, chromosome 3 from 10000 to 20000).

symbols

character() vector of gencode gene symbols, e.g., c("BRCA1", "PTEN")

destination

character(1) default tempfile() file path for BAM file slice

overwrite

logical(1) default FALSE can destination be overwritten?

progress

logical(1) default interactive() should a progress bar be used?

token

character(1) security token allowing access to restricted data. Almost all BAM data is restricted, so a token is usually required. See https://docs.gdc.cancer.gov/Data/Data_Security/Data_Security/#authentication-tokens.

Details

This function uses the Genomic Data Commons "slicing" API to get portions of a BAM file specified either using "regions" or using HGNC gene symbols.

Value

character(1) destination to the downloaded BAM file

Examples

## Not run: 
 slicing("df80679e-c4d3-487b-934c-fcc782e5d46e",
        regions="chr17:75000000-76000000",
        token=gdc_token())

# Get 10 BAM files.
bamfiles = files() |> 
           filter(data_format=='BAM') |>
           results(size=10) |> ids()

# Current alignments at the GDC are to GRCh38
library('TxDb.Hsapiens.UCSC.hg38.knownGene')
all_genes = genes(TxDb.Hsapiens.UCSC.hg38.knownGene)

first3genes = all_genes[1:3]
# remove strand info
strand(first3genes) = '*'

# We can get our regions easily now
as.character(first3genes)

# Use parallel downloads to speed processing
library(BiocParallel)
register(MulticoreParam())

fnames = bplapply(bamfiles, slicing, overwrite = TRUE,
                regions=as.character(first3genes))

# 10 BAM files
fnames

library(GenomicAlignments)
lapply(unlist(fnames), readGAlignments)


## End(Not run)

Bioconductor/GenomicDataCommons documentation built on Oct. 31, 2024, 7 a.m.