get_dataset: Extract data from RegulonDB
In regutools: regutools: an R package for data extraction from RegulonDB

Description Usage Arguments Value Author(s) Examples

This function retrieves data from RegulonDB. Attributes from datasets can be selected and filtered.

get_dataset(
  regulondb,
  dataset = NULL,
  attributes = NULL,
  filters = NULL,
  and = TRUE,
  interval = NULL,
  partialmatch = NULL,
  output_format = "regulondb_result"
)

`regulondb`	A `regulondb()` object.
`dataset`	Dataset of interest. Use the function list_datasets for an overview of valid datasets.
`attributes`	Vector of attributes to be retrieved.
`filters`	List of filters to be used. The names should correspond to the attribute and the values correspond to the condition for selection.
`and`	Logical argument. If FALSE, filters will be considered under the "OR" operator
`interval`	the filters whose values will be considered as interval
`partialmatch`	name of the condition(s) with a string pattern for full or partial match in the query
`output_format`	A string specifying the output format. Possible options are "regulondb_result", "GRanges", "DNAStringSet" or "BStringSet".

By default, a regulon_results object. If specified in the parameter output_format, it can also return either a GRanges object or a Biostrings object.

Carmina Barberena Jonas, Jesús Emiliano Sotelo Fonseca, José Alquicira Hernández, Joselyn Chávez

## Connect to the RegulonDB database if necessary
if (!exists("regulondb_conn")) regulondb_conn <- connect_database()

## Build the regulon db object
e_coli_regulondb <-
    regulondb(
        database_conn = regulondb_conn,
        organism = "E.coli",
        database_version = "1",
        genome_version = "1"
    )

## Obtain all the information from the "GENE" dataset
get_dataset(e_coli_regulondb, dataset = "GENE")

## Get the attributes posright and name from the "GENE" dataset
get_dataset(e_coli_regulondb,
    dataset = "GENE",
    attributes = c("posright", "name")
)

## From "GENE" dataset, get the gene name, strand, posright, product name
## and id of all genes regulated with name like "ara", strand as "forward"
## with a position right between 2000 and 40000
get_dataset(
    e_coli_regulondb,
    dataset = "GENE",
    attributes = c("name", "strand", "posright", "product_name", "id"),
    filters = list(
        name = c("ara"),
        strand = c("forward"),
        posright = c("2000", "40000")
    ),
    and = TRUE,
    partialmatch = "name",
    interval = "posright"
)