dataset_citations: Citation datasets

Description Usage Arguments Value

View source: R/datasets.R

Description

Loads a citation dataset (Cora, Citeseer or Pubmed) using the "Planetoid" splits initially defined in Yang et al. (2016). The train, test, and validation splits are given as binary masks. Node attributes are bag-of-words vectors representing the most common words in the text document associated to each node. Two papers are connected if either one cites the other. Labels represent the class of the paper.

Usage

1
2
3
4
5
6
dataset_citations(
  dataset_name = "cora",
  normalize_features = TRUE,
  random_split = FALSE,
  return_type = c("list", "tidygraph")
)

Arguments

dataset_name

name of the dataset to load ('cora', 'citeseer', or 'pubmed');

normalize_features

normalize_features normalize_features: if TRUE, the node features are normalized;

random_split

random_split if TRUE, return a randomized split (20 nodes per class for training, 30 nodes per class for validation and the remaining nodes for testing, Shchur et al. (2018))

Value

_type Data format to return data in. One of either "list", or "tidygraph"

Either a list with 6 elements (containing - Adjacency matrix, Node features, Labels, and 3 binary masks for train, validation, and test splits), or a tbl_graph object with the Node features, Labels, and 3 binary masks as node attributes).


rdinnager/rspektral documentation built on June 12, 2021, 1:26 a.m.