lsmi_cv: Cross-validation to Select an Optimal Combination of n.seed...
In snowboot: Bootstrap Methods for Network Inference

Description Usage Arguments Details Value References See Also Examples

From the vector of specified n.seeds and possible waves 1:n.wave around each seed, the function selects a single number n.seed and an n.wave (optimal seed-wave combination) that produce a labeled snowball with multiple inclusions (LSMI) sample with desired bootstrap confidence intervals for a parameter of interest. Here by ‘desired’ we mean that the interval (and corresponding seed-wave combination) are selected as having the best coverage (closest to the specified level prob), based on a cross-validation procedure with proxy estimates of the parameter. See Algorithm 2 by \insertCitegel_etal_2017;textualsnowboot and Details below.

lsmi_cv(
  net,
  n.seeds,
  n.wave,
  seeds = NULL,
  B = 100,
  prob = 0.95,
  cl = 1,
  param = c("mu"),
  method = c("percentile", "basic"),
  proxyRep = 19,
  proxySize = 30
)

`net`	a network object that is a list containing: `degree` the degree sequence of the network, which is an `integer` vector of length n; `edges` the edgelist, which is a two-column matrix, where each row is an edge of the network; `n` the network order (i.e., number of nodes in the network). The network object can be simulated by `random_network`, selected from the networks available in `artificial_networks`, converged from an `igraph` object with `igraph_to_network`, etc.
`n.seeds`	an integer vector of numbers of seeds for snowball sampling (cf. a single integer `n.seed` in `lsmi`). Only `n.seeds <= n` are retained. If `seeds` is specified, only values `n.seeds < length(unique(seeds))` are retained and automatically supplemented by `length(unique(seeds))`.
`n.wave`	an integer defining the number of waves (order of the neighborhood) to be recorded around the seed in the LSMI. For example, `n.wave = 1` corresponds to an LSMI with the seed and its first neighbors. Note that the algorithm allows for multiple inclusions.
`seeds`	a vector of numeric IDs of pre-specified seeds. If specified, LSMIs are constructed around each such seed.
`B`	a positive integer, the number of bootstrap replications to perform. Default is 100.
`prob`	confidence level for the intervals. Default is 0.95 (i.e., 95% confidence).
`cl`	parameter to specify computer cluster for bootstrapping, passed to the package `parallel` (default is `1`, meaning no cluster is used). Possible values are: cluster object (list) produced by makeCluster. In this case, new cluster is not started nor stopped; `NULL`. In this case, the function will attempt to detect available cores (see detectCores) and, if there are multiple cores (>1), a cluster will be started with makeCluster. If started, the cluster will be stopped after computations are finished; positive integer defining the number of cores to start a cluster. If `cl = 1`, no attempt to create a cluster will be made. If `cl > 1`, cluster will be started (using makeCluster) and stopped afterwards (using stopCluster).
`param`	The parameter of interest for which to run a cross-validation and select optimal `n.seed` and `n.wave`. Currently, only one selection is possible: `"mu"` (the network mean degree).
`method`	method for calculating the bootstrap intervals. Default is `"percentile"` (see Details).
`proxyRep`	The number of times to repeat proxy sampling. Default is 19.
`proxySize`	The size of the proxy sample. Default is 30.

Currently, the bootstrap intervals can be calculated with two alternative methods: "percentile" or "basic". The "percentile" intervals correspond to Efron's 100\cdotprob% intervals \insertCite@see @efron_1979, also Equation 5.18 by @davison_hinkley_1997 and Equation 3 by @gel_etal_2017, @chen_etal_2018_snowbootsnowboot:

(θ^*_{[Bα/2]}, θ^*_{[B(1-α/2)]}),

where θ^*_{[Bα/2]} and θ^*_{[B(1-α/2)]} are empirical quantiles of the bootstrap distribution with B bootstrap replications for parameter θ (θ can be the f(k) or μ), and α = 1 - prob.

The "basic" method produces intervals \insertCite@see Equation 5.6 by @davison_hinkley_1997snowboot:

(2\hat{θ} - θ^*_{[B(1-α/2)]}, 2\hat{θ} - θ^*_{[Bα/2]}),

where \hat{θ} is the sample estimate of the parameter. Note that this method can lead to negative confidence bounds, especially when \hat{θ} is close to 0.

A list consisting of:

`bci`	A numeric vector of length 2 with the bootstrap confidence interval (lower bound, upper bound) for the parameter of interest. This interval is obtained by bootstrapping node degrees in an LSMI with the optimal combination of `n.seed` and `n.wave` (the combination is reported in `best_combination`).
`estimate`	Point estimate of the parameter of interest (based on the LSMI with `n.seed` seeds and `n.wave` waves reported in the `best_combination`).
`best_combination`	An integer vector of lenght 2 containing the optimal `n.seed` and `n.wave` selected via cross-validation.
`seeds`	A vector of numeric IDs of the seeds that were used in the LSMI with the optimal combination of `n.seed` and `n.wave`.