calculate_phats: Calculate phat values for an input file of assay values
In dereksonderegger/BurkPx: Burkholderia Assay

Description Usage Arguments Details

View source: R/calculate_phats.R

Calculate phat values for an input file of assay values

calculate_phats(data_in, file_out = NULL, Species = "Human",
  Trained_on = "Full", Isotype = c("IgG", "IgM", "IgGM"),
  Method = "LASSO", average_replicates = FALSE,
  simplify_model_name = TRUE)

`data_in`	The input data set. This could be either a data frame or a character string pointing to the name of an input file. If it is a file name, then it should either be the file name in the current working directory or be the full path to the file. This file may either be a .csv, .xls, or .xlsx file. See below for the format of the input file or input data frame.
`file_out`	The name of the output .csv file that contains all of the phat values. If it isn't given, the default is to return the output as a data frame.
`Species`	The species we want to predict for. Accepted values are "Human" or "NHP". This defaults to "Human".
`Trained_on`	The data set which the model was trained on. Typically users will want the model trained on the full dataset, but for cross-validation purposes, we might want to specify the model built just using the designated training data. Valid options are 'Full' and 'Training' and defaults to the full data set.
`Isotype`	Which isotypes should we report. Valid options are any combination of 'IgG', 'IgM' or 'IgGM'. By default, the procedure produces all of them. If the necessary data is not presented, the results are NAs or NaNs.
`Method`	What model selection method was used. Valid options are 'LASSO', 'Ridge' or 'HCP1' or any combination of these.
`average_replicates`	If a serum sample has replicate observations, should we average the replicate values and produce a single phat or should we treat them seperatately and produce phat values for each replicate. The default (FALSE) is to not average and produce phat values for each replicate.
`simplify_model_name`	The column names for the resulting phat values are the model names of the model that produced it. However, because the model names is often quite long (e.g. Human_Training_IgGM_LASSO) it is often desirable to reduce the model name to just the aspects that change so that the resulting column names are simpler. The defaults to TRUE. If there is only one model requested, the simplified column name is just "phat".

The input file should be a spreadsheet file (either .csv, .xls, .xlsx file) are data frame with the same column requirements. The input should have a column 'Serum' which denotes the Serum ID. If the Serum ID values start with IgG or IgM, then we will use those as the Isotype. If the Serum ID values do not start with IgG or IgM, then there must be an 'Isotype' column that contains that information. The remaining columns are the antibody values and should include some subset of the column names.

The following antibodies can be used: BPSL2096_AhpC, BPSL1404_ClpX, BPSS0476_GroS, BPSS0477_GroEL2, BPSS0135, BPSL1743_Arg, BPSL2827_DNAK, BPSL3222_rpIL, MSHR5855.WCL, BPSL1201_IMPS, BPSL3396_AtpD, BPSS0530, BPSL2522_OmpA, BPSS1850, LPSA, LPSB, CPS, BPSS1769_NADH, BPSS1652, BPSL2697_GroEL, BPSS1498_HCP1.B

If a particular model does not use an antibody (e.g. the HCP1 models only use HCP1 values), then the input data could be missing all of the other columns and the function will still work.

dereksonderegger/BurkPx documentation built on Aug. 14, 2019, 8:04 p.m.