compBayes: Fitting a compositional Stan model
In ColtoCaro/compMS: Bayesian Compositional Proteomics Modeling via Stan

Description Usage Arguments Details

View source: R/modelFitting.R

This function runs the main code for fitting an appropriate compositonal proteomics model based on the input data file. It returns a list of size 3. The first component of the contains summary data on relative protein abundance. The second component summarizes PTM peptides and the third component contains the Stanfit object.

1
2
3

compBayes(dat, pp = 0.99, nCores = 1, iter = 2000,
  normalize = TRUE, pop_sd = 10, simpleMod = FALSE, bridge = TRUE,
  adapt_delta = 0.9, varPool = 1)

`dat`	The data to be analyzed. This data must be formatted as shown in the included sample data `sampleDat`. Both the reference channel and the reference conditions will be defined from the first column of data. If multiple runs are to be analyzed at once then the dat object must be a list of dataframes. This is only recommended if biological or technical replicates exist across multiple runs. Otherwise, for computational simplicity, each run should be analyzed separately. For a single run, the data should be formatted as a dataframe. In each dataframe the first two rows are used to determine which columns are technical replicates and which are biological replicates. For example, if two columns, possibly from different plexes, both have the number '3' in the first row, then they will be treated as technical replicates. Likewise, columns that share a number in the second row are treated as biological replicates. If there are no biological replicates in the experiment then the second row should be all zeroes. If technical replicates are not also labeled as biological replicates an error will be generated.
`pp`	The percentage covariate level to be used for predicting relative protein abundance. By default this value is .95, so that sum signal to noise will be adjusted towards the 95th percentile of observed values.
`nCores`	determines the number of cores used for parallel processing should be used. For large datasets it is highly recommended that the package be used on a computer capable of parallel processing.
`iter`	Number of iterations for each chain
`normalize`	A boolean variable that determines whether or not adjustments should be made under the assumption that the average protein abundance is equivalent in each channel. By default this value is true which results in each row of the matrix being perturbed by the inverse of the compositional mean. Consequently the geometric means of each tag will be equivalent.
`pop_sd`	The prior standard deviation of the population level variance.
`simpleMod`	A boolean variable that determines whether or not a population level model will be fit. In a simple model only relative abuandace parameters that define average log-ratios within sample groups will be estimated. The population level model estimates parameters for each biological replicate along with population level effects. Average sample parameters are also estimated as many experiments do not have sufficient sample size to make use of the full population level model.
`bridge`	If bridge == TRUE then the first column will be treated as the reference for the entire experiment. If the condition number repeats, these entries will be collapsed into a single reference channel. When bridge is false this collapse will prevent estimation of the individual collapsed channels which are necessary for viewing proportion plots. In this case a second model will be fit for the sole purpose of visualizing the behavior of individual replicates.
`adapt_delta`	The target proposal acceptance rate that determines step size in a Stan model.
`varPool`	A variable that determines the structure of the variance parameters. varPool = 0 denotes no pooling, i.e. a separate variance parameter for each protein. varPool = 1 (default) generates partially pooled variance parameters and varPool = 2 creates complete pooling, i.e. only one experimental error.