CreateDataCV: Creating cross validation data

Description Usage Arguments Value Author(s) References Examples


A function to create cross-validation data.


CreateDataCV(net                   , p          = 0.75 , G           = 50  , 
             net_type = "directed" , deg_thresh = 0    , exclude_end = FALSE)



A three-column matrix whose each row contains information of one edge in the form (from_node id, to_node id, time_stamp). from_node id is the id of the source node. to_node id is the id of the destination node. time_stamp is the arrival time of the edge. from_node id and to_node id are assumed to be integers starting from 0. time_stamp can be either numeric or string. The value of a time-stamp can be arbitrary, but we assume that a smaller time_stamp (regarded so by the sort function in R) represents an earlier arrival time.


Numeric between 0 and 1. Indicates the ratio of number of new edges in the learning data to that of the full data. Default is p = 0.75.


Integer. Number of bins. Default value is \code50.


String. The type of the network: "directed" or "undirected". Default is "directed".


Integer. We only consider nodes with number of acquired new edges at least this threshold. Default value is 0,i.e. all the nodes.


Logical. If TRUE, then for the testing data, at each time-step we only consider the new edges that connect to nodes with the current degrees less than deg\_max, which is the maximum degree in the learning data. The motivation for this option is that in the learning phase, we can only learn the PA function up to deg_max, so it makes sense to limit the degree in the testing phase to deg\_max. From our experiences, this option does not matter. Default value is FALSE



An object of class "CV_Data" containing the data needed for cross validation.


Thong Pham


1. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Nonparametric Estimation of the Preferential Attachment Function in Complex Networks: Evidence of Deviations from Log Linearity, Proceedings of ECCS 2014, 141-153 (Springer International Publishing) (

2. Pham, T., Sheridan, P. & Shimodaira, H. (2015). PAFit: A Statistical Method for Measuring Preferential Attachment in Temporal Complex Networks. PLoS ONE 10(9): e0137796. doi:10.1371/journal.pone.0137796 (

3. Pham, T., Sheridan, P. & Shimodaira, H. (2016). Joint Estimation of Preferential Attachment and Node Fitness in Growing Complex Networks. Scientific Reports 6, Article number: 32558. doi:10.1038/srep32558 (


net      <- GenerateNet(N = 100 , m = 1 , mode = 1 , alpha = 1 , shape = 5 , rate = 5)
data_cv  <- CreateDataCV(net$graph)

Search within the PAFit package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.