example.datasets: Small real example datasets
In gecko515/HEMDAG: Hierarchical Ensemble Methods for Directed Acyclic Graphs

Collection of real sub-datasets used in the examples of the HEMDAG package

data(graph)
data(labels)
data(scores)
data(wadj)
data(test.index)

The DAG g contained in graph data is an object of class graphNEL. The graph g has 23 nodes and 30 edges and represents the "ancestors view" of the HPO term Camptodactyly of finger ("HP:0100490").

The matrix L contained in the labels data is a 100 X 23 matrix, whose rows correspond to genes (Entrez GeneID) and columns to HPO classes. L[i,j]=1 means that the gene i belong to class j, L[i,j]=0 means that the gene i does not belong to class j. The classes of the matrix L correspond to the nodes of the graph g.

The matrix S contained in the scores data is a named 100 X 23 flat scores matrix, representing the likelihood that a given gene belongs to a given class: higher the value higher the likelihood. The classes of the matrix S correspond to the nodes of the graph g.

The matrix W contained in the wadj data is a named 100 X 100 symmetric weighted adjacency matrix, whose rows and columns correspond to genes.The genes names (Entrez GeneID) of the adjacency matrix W correspond to the genes names of the flat scores matrix S and to genes names of the target multilabel matrix L.

The vector of integer numbers test.index contained in the test.index data refers to the index of the examples of the scores matrix S to be used in the test set. It is useful only in holdout experiments.

Some examples of full data sets for the prediction of HPO terms are available at the following link. Note that the processing of the full datasets should be done similarly to the processing of the small data examples provided directly in this package. Please read the README clicking the link above to know more details about the available full datasets.

gecko515/HEMDAG documentation built on Oct. 18, 2019, 6:34 a.m.