suppressMessages({
suppressPackageStartupMessages({
library(bedbaseRClient)
library(httr)
library(hca)
library(TFutils)
library(dplyr)
})
})

Introduction

bedbase.org assembles genomic data on many thousands of region-based scoring files. See the about page for bedbase.org for details.

This package explores the resource using Bioconductor/R. We'll use interesting facilities of the hca package to explore JSON outputs of the bedbase API.

A basic query

We'll find out about resources related to the GM12878 cell line.

Metadata acquisition

The get_bb_metadata function is very simple. We'll learn how to vary the types of data retrieved later on. One could question whether 'GM12878' is a 'cell type' and we will examine this concept another time.

library(bedbaseRClient)
q1 = get_bb_metadata(query_type="cell_type", query_val="GM12878")
q1
cc = httr::content(q1)
names(cc[[1]][[2]])

Metadata exploration

Illustration with known fields and values

The content returned by the API is complex. The hca package can help navigate.

library(hca)
lcc = lol(cc) # list of lists
lol_path(lcc)

With lol_pull, we can extract the content present along a given structural path.

exps = lol_pull(lcc, "[*][*].exp_protocol")
table(exps)

That's informative. Let's check how many targets are transcription factors. We'll use the Lambert table in the TFutils package.

library(TFutils)
lam = retrieve_lambert_main()
targs = lol_pull(lcc, "[*][*].target")
length(intersect(targs, lam$Name))

We can pivot to a different cell type as follows:

q2 = get_bb_metadata(query_type="cell_type", query_val="K562")
kk = httr::content(q2)
ktargs = lol_pull(lol(kk), "[*][*].target")
length(intersect(ktargs, lam$Name))

Enumerating available experiments

We'll use the API to acquire 50 metadata records. Cell type information is in an API component called other.

fixc = function(x) ifelse(is.null(x), NA_character_, x)
ctcheck = httr::GET("http://bedbase.org/api/bed/all/data?ids=md5sum&ids=other&limit=50")
ddl = lol(httr::content(ctcheck))
types = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) x[["cell_type"]], character(1))
abs = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["antibody"]]), character(1))
targs = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["target"]]), character(1))
exps = vapply(lol_pull(ddl, "data[*]")[seq(2,100,2)], function(x) fixc(x[["exp_protocol"]]), character(1))

The md5sum component is used for direct file access.

mds = unlist(lol_pull(ddl, "data[*]")[seq(1,100,2)])
tab50 = data.frame(celltype=types, expt=exps, Ab=abs, targ=targs, md5=mds)
DT::datatable(tab50)

A targeted query

sel = tab50 |> filter(expt=="ChiPseq", 
    celltype=="GM12878", Ab=="IKZF1") |> 
    select(md5) 
query_bb(sel[1,1], which=GenomicRanges::GRanges("chr17:38000000-39000000"))


vjcitn/bedbaseRClient documentation built on June 15, 2021, 9:55 a.m.