makeAnnotationHubMetadata: Make AnnotationHubMetadata objects from csv file of metadata

View source: R/AnnotationHubMetadata-class.R

makeAnnotationHubMetadataR Documentation

Make AnnotationHubMetadata objects from csv file of metadata

Description

Make AnnotationHubMetadata objects from .csv files located in the "inst/extdata/" package directory of an AnnotationHub package.

Usage

  makeAnnotationHubMetadata(pathToPackage, fileName=character())

Arguments

pathToPackage

Full path to data package including the package name; no trailing slash

fileName

Name of metadata file(s) with csv extension. If none are provided, all files with .csv extension in "inst/extdata" will be processed.

Details

  • makeAnnotationHubMetadata: Reads the resource metadata from .csv files into a AnnotationHubMetadata object. The AnnotationHubMetadata is inserted in the AnnotationHub database. Intended for internal use or package authors checking the validity of package metadata.

  • Formatting metadata files:

    makeAnnotationHubMetadata reads .csv files of metadata located in "inst/extdata". Internal functions perform checks for required columns and data types and can be used by package authors to validate their metadata before submitting the package for review.

    The rows of the .csv file(s) represent individual Hub resources (i.e., data objects) and the columns are the metadata fields. All fields should be a single character string of length 1.

    Required Fields in metadata file:

    • Title: character(1). Name of the resource. This can be the exact file name (if self-describing) or a more complete description.

    • Description: character(1). Brief description of the resource, similar to the 'Description' field in a package DESCRIPTION file.

    • BiocVersion: character(1). The first Bioconductor version the resource was made available for. Unless removed from the hub, the resource will be available for all versions greater than or equal to this field. Generally the current devel version of Bioconductor.

    • Genome: character(1). Genome. Can be NA.

    • SourceType: character(1). Format of original data, e.g., FASTA, BAM, BigWig, etc. getValidSourceTypes() list currently acceptable values. If nothing seems appropiate for your data reach out to maintainer@bioconductor.org.

    • SourceUrl: character(1). Optional location of original data files. Multiple urls should be provided as a comma separated string.

    • SourceVersion: character(1). Version of original data.

    • Species: character(1). Species. For help on valid species see getSpeciesList, validSpecies, or suggestSpecies. Can be NA.

    • TaxonomyId: character(1). Taxonomy ID. There are checks for valid taxonomyId given the Species which produce warnings. See GenomeInfoDb::loadTaxonomyDb() for full validation table. Can be NA.

    • Coordinate_1_based: logical. TRUE if data are 1-based. Can be NA

    • DataProvider: character(1). Name of company or institution that supplied the original (raw) data.

    • Maintainer: character(1). Maintainer name and email in the following format: Maintainer Name <username@address>.

    • RDataClass: character(1). R / Bioconductor class the data are stored in, e.g., GRanges, SummarizedExperiment, ExpressionSet etc. If the file is loaded or read into R what is the class of the object.

    • DispatchClass: character(1). Determines how data are loaded into R. The value for this field should be ‘Rda’ if the data were serialized with save() and ‘Rds’ if serialized with saveRDS. The filename should have the appropriate ‘rda’ or ‘rds’ extension. There are other available DispathClass types and the function AnnotationHub::DispatchClassList()

      A number of dispatch classes are pre-defined in AnnotationHub/R/AnnotationHubResource-class.R with the suffix ‘Resource’. For example, if you have sqlite files, the AnnotationHubResource-class.R defines SQLiteFileResource so the DispatchClass would be SQLiteFile. Contact maintainer@bioconductor.org if you are not sure which class to use. The function AnnotationHub::DispatchClassList() will output a matrix of currently implemented DispatchClass and brief description of utility. If a predefine class does not seem appropriate contact maintainer@bioconductor.org. An all purpose DispathClass is FilePath that instead of trying to load the file into R, will only return the path to the locally downloaded file.

    • Location_Prefix: character(1). Do not include this field if data are stored in the Bioconductor AWS S3; it will be generated automatically.

      If data will be accessed from a location other than AWS S3 this field should be the base url.

    • RDataPath: character().This field should be the remainder of the path to the resource. The Location_Prefix will be prepended to RDataPath for the full path to the resource. If the resource is stored in Bioconductor's AWS S3 buckets, it should start with the name of the package associated with the metadata and should not start with a leading slash. It should include the resource file name. For strongly associated files, like a bam file and its index file, the two files should be separates with a colon :. This will link a single hub id with the multiple files.

    • Tags: character() vector. ‘Tags’ are search terms used to define a subset of resources in a Hub object, e.g, in a call to query.

      ‘Tags’ are automatically generated from the ‘biocViews’ in the DESCRIPTION and applied to all resources of the metadata file. Optionally, maintainers can define ‘Tags’ column of the metadata to define tags for each resource individually. Multiple ‘Tags’ are specified as a colon separated string, e.g., tags for two resources would look like this:

      	     Tags=c("tag1:tag2:tag3", "tag1:tag3")
      	     

    NOTE: The metadata file can have additional columns beyond the 'Required Fields' listed above. These values are not added to the Hub database but they can be used in package functions to provide an additional level of metadata on the resources.

    More on Location_Prefix and RDataPath. These two fields make up the complete file path url for downloading the data file. If using the Bioconductor AWS S3 bucket the Location_Prefix should not be included in the metadata file[s] as this field will be populated automatically. The RDataPath will be the directory structure you uploaded to S3. If you uploaded a directory ‘MyAnnotation/’, and that directory had a subdirectory ‘v1/’ that contained two files ‘counts.rds’ and ‘coldata.rds’, your metadata file will contain two rows and the RDataPaths would be ‘MyAnnotation/v1/counts.rds’ and ‘MyAnnotation/v1/coldata.rds’. If you host your data on a publicly accessible site you must include a base url as the Location_Prefix. If your data file was at ‘ftp://myinstiututeserver/biostats/project2/counts.rds’, your metadata file will have one row and the Location_Prefix would be ‘ftp://myinstiututeserver/’ and the RDataPath would be ‘biostats/project2/counts.rds’.

Value

A named list the length of fileName. Each element is a list of of AnnotationHubMetadata objects created from the .csv file.

See Also

  • updateResources

  • AnnotationHubMetadata class

Examples


## Each row of the metadata file represents a resource added to one of
## the 'Hubs'. This example creates a metadata.csv file for a single resource.
## In the case of multiple resources, the arguments below would be character
## vectors that produced multiple rows in the data.frame.

meta <- data.frame(
    Title = "RNA-Sequencing dataset from study XYZ",
    Description = paste0("RNA-seq data from study XYZ containing 10 normal ",
			 "and 10 tumor samples represented as a",
			 "SummarizedExperiment"),
    BiocVersion = "3.4",
    Genome = "GRCh38",
    SourceType = "BAM",
    SourceUrl = "http://www.path/to/original/data/file",
    SourceVersion = "Jan 01 2016",
    Species = "Homo sapiens",
    TaxonomyId = 9606,
    Coordinate_1_based = TRUE,
    DataProvider = "GEO",
    Maintainer = "Your Name <youremail@provider.com>",
    RDataClass = "SummarizedExperiment",
    DispatchClass = "Rda",
    ResourceName = "FileName.rda"
)

## Not run: 
## Write the data out and put in the inst/extdata directory.
write.csv(meta, file="metadata.csv", row.names=FALSE)

## Test the validity of metadata.csv
makeAnnotationHubMetadata("path/to/mypackage")

## End(Not run)

Bioconductor/AnnotationHubData documentation built on Feb. 15, 2024, 10:10 a.m.