CreateDataCV: Creating cross validation data

Description Usage Arguments Value Author(s) References Examples

Description

A function to create cross-validation data.

Usage

1
2
CreateDataCV(net                   , p          = 0.75 , G           = 50  , 
             net_type = "directed" , deg_thresh = 0    , exclude_end = FALSE)

Arguments

net

A three-column matrix whose each row contains information of one edge in the form (from_node id, to_node id, time_stamp). from_node id is the id of the source node. to_node id is the id of the destination node. time_stamp is the arrival time of the edge. from_node id and to_node id are assumed to be integers starting from 0. time_stamp can be either numeric or string. The value of a time-stamp can be arbitrary, but we assume that a smaller time_stamp (regarded so by the sort function in R) represents an earlier arrival time.

p

Numeric between 0 and 1. Indicates the ratio of number of new edges in the learning data to that of the full data. Default is p = 0.75.

G

Integer. Number of bins. Default value is \code50.

net_type

String. The type of the network: "directed" or "undirected". Default is "directed".

deg_thresh

Integer. We only consider nodes with number of acquired new edges at least this threshold. Default value is 0,i.e. all the nodes.

exclude_end

Logical. If TRUE, then for the testing data, at each time-step we only consider the new edges that connect to nodes with the current degrees less than deg\_max, which is the maximum degree in the learning data. The motivation for this option is that in the learning phase, we can only learn the PA function up to deg_max, so it makes sense to limit the degree in the testing phase to deg\_max. From our experiences, this option does not matter. Default value is FALSE

.

Value

An object of class "CV_Data" containing the data needed for cross validation.

Author(s)

Thong Pham thongpham@thongpham.net

References

1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (http://dx.doi.org/10.1007/978-3-319-29228-1_13).

2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (http://dx.doi.org/10.1371/journal.pone.0137796).

3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (www.nature.com/articles/srep32558).

Examples

1
2
3
4
library("PAFit")
net      <- GenerateNet(N = 100 , m = 1 , mode = 1 , alpha = 1 , shape = 5 , rate = 5)
data_cv  <- CreateDataCV(net$graph)
summary(data_cv)


Search within the PAFit package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.