hub_metadata: Create and validate metadata

View source: R/hub_metadata.R

hub_metadataR Documentation

Create and validate metadata

Description

This functions makes a list of values that can be used to add as a resource to a 'metadata.csv' file in a Hub package. The type of each argument indicates the expected value, e.g., Title = character(1) indicates that it is expected to be a character vector of length 1. See individual parameters for more information.

Usage

hub_metadata(
  Title = character(1),
  Description = character(1),
  BiocVersion = package_version("0.0"),
  Genome = character(1),
  SourceType = character(1),
  SourceUrl = character(1),
  SourceVersion = character(1),
  Species = character(1),
  TaxonomyId = integer(1),
  Coordinate_1_based = NA,
  DataProvider = character(1),
  Maintainer = character(1),
  RDataClass = character(1),
  DispatchClass = character(1),
  Location_Prefix = character(1),
  RDataPath = character(1),
  Tags = character()
)

Arguments

Title

character(1) Title for the resource with version or genome build as appropriate.

Description

character(1) Description of the resource. May include details such as data type, format, study origin, sequencing technology, treated vs control, number of samples etc.

BiocVersion

The two-digit version of Bioconductor the resource is being introduced into. Could be a character vector "4.1" or an object created from package_version(), e.g., package_version("4.1").

Genome

character(1) Name of genome build.

SourceType

character(1) Form of originial data, e.g., BED, FASTA, etc. getValidSourceTypes() list currently acceptable values. If nothing seems appropriate for your data reach out to maintainer@bioconductor.org.

SourceUrl

character(1) URL of originial resource(s).

SourceVersion

character(1). A description of the version of the resource in the original source. Since source version may not follow R / Bioconductor versioning practices, this field is not restricted to a package_version() format.

Species

character(1) Species name. For help on valid species see getSpeciesList, validSpecies, or suggestSpecies.

TaxonomyId

integer(1) NCBI code. There are checks for valid taxonomyID given the Species which produce warnings. See GenomeInfoDb::loadTaxonomyDb() for full validation table.

Coordinate_1_based

logical(1) are the genomic coordinates in the resource 0-based, or 1-based? Use NA if genomic coordinates are not present in the resource.

DataProvider

character(1) Provider of original data, e.g., NCBI, UniProt etc.

Maintainer

character(1) Maintainer name and email address, ⁠A Maintainer <URL: a. maintainer@email.com>⁠.

RDataClass

character(1) Class of derived R object, e.g., GRanges. Length must match the length of RDataPath.

DispatchClass

character(1) Determines how data are loaded into R. The value for this field should be Rda if the data were serialized with save() and Rds if serialized with saveRDS. The filename should have the appropriate rda or rds extension.

A number of dispatch classes are pre-defined in 
AnnotationHub/R/AnnotationHubResource-class.R with the suffix `Resource`.
For example, if you have sqlite files, the AnnotationHubResource-class.R
defines SQLiteFileResource so the DispatchClass would be SQLiteFile. 
Contact maintainer@bioconductor.org if you are not sure which class to 
use. The function `AnnotationHub::DispatchClassList()` will output a 
matrix of currently implemented DispatchClass and brief description of 
utility. If a predefine class does not seem appropriate contact 
maintainer@bioconductor.org.
Location_Prefix

character(1) URL location of AWS S3 bucket or web site where resource is located.

RDataPath

character(1) File path to where object is stored in AWS S3 bucket or on the web. This field should be the remainder of the path to the resource. The Location_Prefix will be prepended to RDataPath for the full path to the resource. If the resource is stored in Bioconductor's AWS S3 buckets, it should start with the name of the package associated with the metadata and should not start with a leading slash. It should include the resource file name. For strongly associated files, like a bam file and its index file, the two files should be seperates with a colon :. This will link a single hub id with multiple files.

Tags

character() Zero or more tags describing the data, colon : separated.

Value

None

Examples

hub_metadata()

tst <- hub_metadata(
    Title = "ENCODE",
    Description = "a test entry",
    BiocVersion = package_version("3.9"),
    Genome = NA_character_,
    SourceType = "JSON",
    SourceUrl = "https://www.encodeproject.org",
    SourceVersion = package_version("0.0"),
    Species = NA_character_,
    TaxonomyId = NA_integer_,
    Coordinate_1_based = NA,
    DataProvider = "ENCODE Project",
    Maintainer = "tst person <tst@email.com>",
    RDataClass = "data.table",
    DispatchClass = "Rda",
    Location_Prefix = NA_character_,
    RDataPath = "ENCODExplorerData/encode_df_lite.rda",
    Tags = c("ENCODE", "Homo sapiens")
)


Bioconductor/HubPub documentation built on Jan. 18, 2025, 12:17 a.m.