View source: R/AnnotationHubMetadata-class.R
makeAnnotationHubMetadata | R Documentation |
Make AnnotationHubMetadata objects from .csv files located in the "inst/extdata/" package directory of an AnnotationHub package.
makeAnnotationHubMetadata(pathToPackage, fileName=character())
pathToPackage |
Full path to data package including the package name; no trailing slash |
fileName |
Name of metadata file(s) with csv extension. If none are provided, all files with .csv extension in "inst/extdata" will be processed. |
makeAnnotationHubMetadata: Reads the resource metadata from .csv files into a AnnotationHubMetadata object. The AnnotationHubMetadata is inserted in the AnnotationHub database. Intended for internal use or package authors checking the validity of package metadata.
Formatting metadata files:
makeAnnotationHubMetadata
reads .csv files of metadata
located in "inst/extdata". Internal functions perform checks for
required columns and data types and can be used by package authors
to validate their metadata before submitting the package for
review.
The rows of the .csv file(s) represent individual Hub
resources (i.e., data objects) and the columns are the metadata
fields. All fields should be a single character string of length 1.
Required Fields in metadata file:
Title: character(1)
. Name of the resource. This can be
the exact file name (if self-describing) or a more complete
description.
Description: character(1)
. Brief description of the
resource, similar to the 'Description' field in a package
DESCRIPTION file.
BiocVersion: character(1)
. The first Bioconductor version
the resource was made available for. Unless removed from
the hub, the resource will be available for all versions
greater than or equal to this field. Generally the current
devel version of Bioconductor.
Genome: character(1)
. Genome. Can be NA.
SourceType: character(1)
. Format of original data, e.g., FASTA,
BAM, BigWig, etc. getValidSourceTypes()
list currently
acceptable values. If nothing seems appropiate for your data
reach out to maintainer@bioconductor.org.
SourceUrl: character(1)
. Optional location of original
data files. Multiple urls should be provided as a comma separated
string.
SourceVersion: character(1)
. Version of original data.
Species: character(1)
. Species. For help on valid
species see getSpeciesList, validSpecies, or
suggestSpecies. Can be NA.
TaxonomyId: character(1)
. Taxonomy ID. There are
checks for valid taxonomyId given the Species which produce
warnings. See GenomeInfoDb::loadTaxonomyDb() for full validation
table. Can be NA.
Coordinate_1_based: logical
. TRUE if data are
1-based. Can be NA
DataProvider: character(1)
. Name of company or institution
that supplied the original (raw) data.
Maintainer: character(1)
. Maintainer name and email in the
following format: Maintainer Name <username@address>.
RDataClass: character(1)
. R / Bioconductor class the data
are stored in, e.g., GRanges, SummarizedExperiment,
ExpressionSet etc. If the file is loaded or read into R
what is the class of the object.
DispatchClass: character(1)
. Determines how data are
loaded into R. The value for this field should be
‘Rda’ if the data were serialized with save()
and
‘Rds’ if serialized with saveRDS
. The filename
should have the appropriate ‘rda’ or ‘rds’
extension. There are other available DispathClass types
and the function AnnotationHub::DispatchClassList()
A number of dispatch classes are pre-defined in
AnnotationHub/R/AnnotationHubResource-class.R with the suffix
‘Resource’. For example, if you have sqlite files, the
AnnotationHubResource-class.R defines SQLiteFileResource so
the DispatchClass would be SQLiteFile. Contact
maintainer@bioconductor.org if you are not sure which class
to use. The function
AnnotationHub::DispatchClassList()
will output a
matrix of currently implemented DispatchClass and brief
description of utility. If a predefine class does not seem
appropriate contact maintainer@bioconductor.org. An all
purpose DispathClass is FilePath
that instead of trying
to load the file into R, will only return the path to the
locally downloaded file.
Location_Prefix: character(1)
. Do not include this field
if data are stored in the Bioconductor AWS S3; it will be
generated automatically.
If data will be accessed from a location other than AWS S3 this field should be the base url.
RDataPath: character()
.This field should be the
remainder of the path to the resource. The
Location_Prefix
will be prepended to
RDataPath
for the full path to the resource.
If the resource is stored in Bioconductor's AWS S3
buckets, it should start with the name of the package associated
with the metadata and should not start with a leading
slash. It should include the resource file name. For
strongly associated files, like a bam file and its index
file, the two files should be separates with a colon
:
. This will link a single hub id with the multiple files.
Tags: character() vector
.
‘Tags’ are search terms used to define a subset of
resources in a Hub
object, e.g, in a call to query
.
‘Tags’ are automatically generated from the ‘biocViews’ in the DESCRIPTION and applied to all resources of the metadata file. Optionally, maintainers can define ‘Tags’ column of the metadata to define tags for each resource individually. Multiple ‘Tags’ are specified as a colon separated string, e.g., tags for two resources would look like this:
Tags=c("tag1:tag2:tag3", "tag1:tag3")
NOTE: The metadata file can have additional columns beyond the 'Required Fields' listed above. These values are not added to the Hub database but they can be used in package functions to provide an additional level of metadata on the resources.
More on Location_Prefix
and RDataPath
. These two fields make up
the complete file path url for downloading the data file. If using
the Bioconductor AWS S3 bucket the Location_Prefix should not be
included in the metadata file[s] as this field will be populated
automatically. The RDataPath
will be the directory structure you
uploaded to S3. If you uploaded a directory ‘MyAnnotation/’, and
that directory had a subdirectory ‘v1/’ that contained two files
‘counts.rds’ and ‘coldata.rds’, your metadata file will contain
two rows and the RDataPaths would be ‘MyAnnotation/v1/counts.rds’
and ‘MyAnnotation/v1/coldata.rds’. If you host your data on a
publicly accessible site you must include a base url as the
Location_Prefix
. If your data file was at
‘ftp://myinstiututeserver/biostats/project2/counts.rds’, your
metadata file will have one row and the Location_Prefix
would be
‘ftp://myinstiututeserver/’ and the RDataPath
would be
‘biostats/project2/counts.rds’.
A named list the length of fileName
. Each element is a list of
of AnnotationHubMetadata
objects created from the .csv file.
updateResources
AnnotationHubMetadata
class
## Each row of the metadata file represents a resource added to one of
## the 'Hubs'. This example creates a metadata.csv file for a single resource.
## In the case of multiple resources, the arguments below would be character
## vectors that produced multiple rows in the data.frame.
meta <- data.frame(
Title = "RNA-Sequencing dataset from study XYZ",
Description = paste0("RNA-seq data from study XYZ containing 10 normal ",
"and 10 tumor samples represented as a",
"SummarizedExperiment"),
BiocVersion = "3.4",
Genome = "GRCh38",
SourceType = "BAM",
SourceUrl = "http://www.path/to/original/data/file",
SourceVersion = "Jan 01 2016",
Species = "Homo sapiens",
TaxonomyId = 9606,
Coordinate_1_based = TRUE,
DataProvider = "GEO",
Maintainer = "Your Name <youremail@provider.com>",
RDataClass = "SummarizedExperiment",
DispatchClass = "Rda",
ResourceName = "FileName.rda"
)
## Not run:
## Write the data out and put in the inst/extdata directory.
write.csv(meta, file="metadata.csv", row.names=FALSE)
## Test the validity of metadata.csv
makeAnnotationHubMetadata("path/to/mypackage")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.