R/grofit.R

Defines functions grofit

Documented in grofit

#' grofit: Extract physiological parameters from kinetic OD600 data.
#' Copyright (c) 2019. Kaleido Biosciences. All Rights Reserved
#'
#' grofit takes OD600 data from a specific format and applies a smoothing spline to extract relevant physiological data.
#' Importantly, this function drops any NA values that may exist when fitting the spline.
#' These values occasionally occur when measuring pH, either from an erratic read or if the pH is outside of the standard curve.
#' The column percent_NA_od600 contains the percent of time points that were NA for a given well.
#' @param gropro_output This is a tidy dataframe containing columns exactly named Sample.ID, Time, and OD600 at a minimum. There may also be any other columns representing any other metadata.
#' @return A tidy data frame of several features that were extracted from a smoothing spline fit. The data frame also contains information that can be used to assess model fit.
#' Physiologial features:
#' \itemize{
#'  \item{"starting_od600"}{This is the starting od600}
#'  \item{"od600_lag_length"}{This is the length of the calculated lag phase.
#'  Calculated by determining the time where the tangent line at the point of the max growth rate meets the starting od600}
#'  \item{"od600_max_gr"}{ This is the maximum growth rate that is observed. Calculated by determining the max derivitive of the spline fit for OD600}
#'  \item{"max_od600"}{ This is the maximum od600 observed by the spline fit}
#'  \item{"difference_between_max_and_end_od600"}{ This is the difference between the maximum and end od600. Higher values should correspond to a "death phase". Or one could argue the cells are getting smaller.}
#'  \item{"auc_od600"}{This is the area under the curve of the OD600 curves. It is calculated using the trapezoidal rule on fitted values from smooth.spline.}
#'}
#'  Model fit:
#'  \itemize{
#'  \item{"percent_NA_od600"}{The percent of wells that were NA when fitting the spline to the kinetic od600 data}
#'  \item{"rmse_od600"}{The Root-mean-square deviation for od600}
#'  }

#' @export
#' @importFrom magrittr %>%
#' @examples
#' ### grofit ###
#' \dontrun{grofit_output = grofit(gropro_output)}
grofit = function(gropro_output){

    data = dplyr::select(gropro_output,Sample.ID,Time,OD600)
    metadata = dplyr::select(gropro_output,-Time,-OD600) %>%
        dplyr::distinct()

    output = data.frame()

    for(i in unique(data$Sample.ID)){

        input = data %>%
            dplyr::filter(Sample.ID == i)

        percent_NA_od600 = sum(is.na(input$OD600))/length(input$OD600) * 100

        #Filtering out all of the OD600 values that returned NA
        input_od600 = input %>%
            dplyr::filter(!is.na(OD600))

        #Selecting only the necessary parameters for further analysis
        od600_features = od600_features(input_od600) %>%
            dplyr::mutate(Sample.ID = as.character(i),
                          percent_NA_od600 = percent_NA_od600)

        output = rbind(output,od600_features)
    }

    output = output %>%
        dplyr::select(Sample.ID,
                      starting_od600,
                      od600_lag_length,
                      od600_max_gr,
                      max_od600,
                      difference_between_max_and_end_od600,
                      auc_od600,
                      percent_NA_od600,
                      rmse_od600)

    final = dplyr::inner_join(metadata,output,by = "Sample.ID")
    return(final)
}
Kaleido-Biosciences/phgrofit documentation built on Feb. 8, 2022, 5:16 a.m.