dataset_graphsage: GraphSage Datasets

Description Usage Arguments Value

View source: R/datasets.R

Description

Loads one of the datasets (PPI or Reddit) used in Hamilton & Ying (2017). The PPI dataset (originally Stark et al. (2006)) for inductive node classification uses positional gene sets, motif gene sets and immunological signatures as features and gene ontology sets as labels. The Reddit dataset consists of a graph made of Reddit posts in the month of September, 2014. The label for each node is the community that a post belongs to. The graph is built by sampling 50 large communities and two nodes are connected if the same user commented on both. Node features are obtained by concatenating the average GloVe CommonCrawl vectors of the title and comments, the post's score and the number of comments. The train, test, and validation splits are returned as binary masks. :param max_degree: int, if positive, subsample edges so that each node has the specified maximum degree. :param normalize_features: if TRUE, the node features are normalized; :

Usage

1
dataset_graphsage(dataset_name, max_degree = -1L, normalize_features = TRUE)

Arguments

dataset_name

dataset_name name of the dataset to load ('ppi', or 'reddit')

max_degree

max_degree if positive, subsample edges so that each node has the specified maximum degree.

normalize_features

normalize_features if TRUE, the node features are normalized

return_type

Data format to return data in. One of either "list", or "tidygraph"

Value

Either a list with 6 elements (containing - Adjacency matrix, Node features, Labels, and 3 binary masks for train, validation, and test splits), or a tbl_graph object with the Node features, Labels, and 3 binary masks as node attributes).


rdinnager/rspektral documentation built on June 12, 2021, 1:26 a.m.