makeExperimentHubMetadata: Make ExperimentHubMetadata objects from csv file of metadata

View source: R/ExperimentHubMetadata-class.R

makeExperimentHubMetadataR Documentation

Make ExperimentHubMetadata objects from csv file of metadata

Description

Make ExperimentHubMetadata objects from metadata.csv file located in the "inst/extdata/" package directory of an ExperimentHub package.

Usage

  makeExperimentHubMetadata(pathToPackage, fileName=character())

Arguments

pathToPackage

Full path to data package including the package name; no trailing slash

fileName

Name of single metadata file located in "inst/extdata". If none is provided the function looks for a file named "metadata.csv".

Details

  • makeExperimentHubMetadata: Reads the resource metadata in the metadata.csv file into a ExperimentHubMetadata object. The ExperimentHubMetadata is inserted in the ExperimentHub database. Intended for internal use or package authors checking the validity of package metadata.

  • Formatting metadata files:

    makeExperimentHubMetadata reads .csv files of metadata located in "inst/extdata". Internal functions perform checks for required columns and data types and can be used by package authors to validate their metadata before submitting the package for review.

    The rows of the .csv file(s) represent individual Hub resources (i.e., data objects) and the columns are the metadata fields. All fields should be a single character string of length 1.

    Required Fields in metadata file:

    • Title: character(1). Name of the resource. This can be the exact file name (if self-describing) or a more complete description.

    • Description: character(1). Brief description of the resource, similar to the 'Description' field in a package DESCRIPTION file.

    • BiocVersion: character(1). The first Bioconductor version the resource was made available for. Unless removed from the hub, the resource will be available for all versions greater than or equal to this field. Generally the current devel version of Bioconductor.

    • Genome: character(1). Genome. Can be NA

    • SourceType: character(1). Format of original data, e.g., FASTA, BAM, BigWig, etc. getValidSourceTypes() list currently acceptable values. If nothing seems appropiate for your data reach out to maintainer@bioconductor.org.

    • SourceUrl: character(1). Optional location of original data files. Multiple urls should be provided as a comma separated string.

    • SourceVersion: character(1). Version of original data.

    • Species: character(1). Species.For help on valid species see getSpeciesList, validSpecies, or suggestSpecies. Can be NA.

    • TaxonomyId: character(1). Taxonomy ID. There are checks for valid taxonomyId given the Species which produce warnings. See GenomeInfoDb::loadTaxonomyDb() for full validation table. Can be NA.

    • Coordinate_1_based: logical. TRUE if data are 1-based. Can be NA

    • DataProvider: character(1). Name of company or institution that supplied the original (raw) data.

    • Maintainer: character(1). Maintainer name and email in the following format: Maintainer Name <username@address>.

    • RDataClass: character(1). R / Bioconductor class the data are stored in, e.g., GRanges, SummarizedExperiment, ExpressionSet etc. If the file is loaded or read into R what is the class of the object.

    • DispatchClass: character(1). Determines how data are loaded into R. The value for this field should be ‘Rda’ if the data were serialized with save() and ‘Rds’ if serialized with saveRDS. The filename should have the appropriate ‘rda’ or ‘rds’ extension. There are other available DispathClass types and the function AnnotationHub::DispatchClassList()

      A number of dispatch classes are pre-defined in AnnotationHub/R/AnnotationHubResource-class.R with the suffix ‘Resource’. For example, if you have sqlite files, the AnnotationHubResource-class.R defines SQLiteFileResource so the DispatchClass would be SQLiteFile. Contact maintainer@bioconductor.org if you are not sure which class to use. The function AnnotationHub::DispatchClassList() will output a matrix of currently implemented DispatchClass and brief description of utility. If a predefine class does not seem appropriate contact maintainer@bioconductor.org. An all purpose DispathClass is FilePath that instead of trying to load the file into R, will only return the path to the locally downloaded file.

    • Location_Prefix: character(1). Do not include this field if data are stored in the Bioconductor AWS S3; it will be generated automatically.

      If data will be accessed from a location other than AWS S3 this field should be the base url.

    • RDataPath: character().This field should be the remainder of the path to the resource. The Location_Prefix will be prepended to RDataPath for the full path to the resource. If the resource is stored in Bioconductor's AWS S3 buckets, it should start with the name of the package associated with the metadata and should not start with a leading slash. It should include the resource file name. For strongly associated files, like a bam file and its index file, the two files should be separates with a colon :. This will link a single hub id with the multiple files.

    • Tags: character() vector. ‘Tags’ are search terms used to define a subset of resources in a Hub object, e.g, in a call to query.

      ‘Tags’ are automatically generated from the ‘biocViews’ in the DESCRIPTION and applied to all resources of the metadata file. Optionally, maintainers can define ‘Tags’ column of the metadata to define tags for each resource individually. Multiple ‘Tags’ are specified as a colon separated string, e.g., tags for two resources would look like this:

      	     Tags=c("tag1:tag2:tag3", "tag1:tag3")
      	     

    NOTE: The metadata file can have additional columns beyond the 'Required Fields' listed above. These values are not added to the Hub database but they can be used in package functions to provide an additional level of metadata on the resources.

    More on Location_Prefix and RDataPath. These two fields make up the complete file path url for downloading the data file. If using the Bioconductor AWS S3 bucket the Location_Prefix should not be included in the metadata file[s] as this field will be populated automatically. The RDataPath will be the directory structure you uploaded to S3. If you uploaded a directory ‘MyAnnotation/’, and that directory had a subdirectory ‘v1/’ that contained two files ‘counts.rds’ and ‘coldata.rds’, your metadata file will contain two rows and the RDataPaths would be ‘MyAnnotation/v1/counts.rds’ and ‘MyAnnotation/v1/coldata.rds’. If you host your data on a publicly accessible site you must include a base url as the Location_Prefix. If your data file was at ‘ftp://myinstiututeserver/biostats/project2/counts.rds’, your metadata file will have one row and the Location_Prefix would be ‘ftp://myinstiututeserver/’ and the RDataPath would be ‘biostats/project2/counts.rds’.

Value

A list of ExperimentHubMetadata objects.

See Also

  • addResources

  • ExperimentHubMetadata class

  • makeAnnotationHubMetadata

Examples

## makeExperimentHubMetadata() reads data from inst/scripts/<files>.csv
## into ExperimentHubMetadata objects. These objects are used to insert
## metadata into the production database. This function is used internally
## by addResources() and is not intended to be called directly.

## For an example of how this works we can use the GSE62944 ExperimentHub
## package. Download the source tarball from:

# http://www.bioconductor.org/packages/devel/data/experiment/html/GSE62944.html

## and unpack it. Set 'pathToPackage' to point to the downloaded source.
## Then call the function:
## Not run: 
makeExperimentHubMetadata("path/to/mypackage")

## End(Not run)

Bioconductor/ExperimentHubData documentation built on Oct. 31, 2024, 7 a.m.