# samplingDistCalculation: samplingDistCalculation In sgee: Stagewise Generalized Estimating Equations

## Description

Internal function to set up subsampling distribution to execute the stochastic version of a stagewise approach. The subsampling is coducted at the cluster level, not the individual observation level. Sampling probabilities are first calculated or provided for each observation individually, and then the sampling probability for each cluster is taken to be the average probability across all observations in the cluster.

## Usage

 ```1 2``` ```samplingDistCalculation(sampleProb, y, x, clusterID, waves, beta, beta0, phi, alpha, offset, meanLinkInv, varianceLink, corstr, mu.eta) ```

## Arguments

 `sampleProb` A user provided value for the probability associated with each observation. `sampleProb` can be provided as 1) a vector of fixed values of length equal to the resposne vector y, 2) a function that takes in a list of values (full list of values given in details) and returns a vector of length equal to the response vector y, or 3) the default value of `NULL`, which results in a uniform distribution `y` The vector of the response values provided to the original stagewise function `x` The covariate matrix provided to the original stagewise function `clusterID` The vector of cluster ID numbers provided to the original stagewise function `waves` The waves parameter identifying the order of observations within the clusters that is provided to the original stagewise function `beta` The vector of the current estimates of the coefficients `beta0` The current estimate of the intercept `phi` Current estimate of the scale parameter `alpha` Current estimate of the parameter affecting the within cluster correlation `offset` offset in the linear predictor provided to the original stagewise function `meanLinkInv` The link inverse function from the `family` object provided to the original stagewise function indicating what family of mean and variance structure is assumed `varianceLink` The variance link function from the `family` object provided to the original stagewise function indicating what family of mean and variance structure is assumed `corstr` The structure of the working correlation matrix that was provided to the original stagewise function `mu.eta` Derivative function of mu, the conditional mean of the response, with respect to eta, the linear predictor, from the `family` object provided to the original stagewise function indicating what family of mean and variance structure is assumed

## Value

The sampling distribution probabilities to be used for the sub sampling. distribution is provided as a vector with length equal to the number of clusters.

## Note

Internal function.

The function provided to `sampleProb` (through the `sgee.control` function) needs to calculate probabilities for each observation in the response vector `y`. How these calculations are done is up to the user and the following values are provided to the `sampleProb` function as a list called `values`: `y`, `x`, `clusterID`, `waves`, `beta`, `beta0`, `phi`, `alpha`, `offset`, `meanLinkInv`, `varianceLink`, `corstr`, `mu.eta`. additionally, all of the values produced by `sampleProb` need to be non-negative.

## Author(s)

Gregory Vaughan

sgee documentation built on May 1, 2019, 7:10 p.m.