R/evaluatePopulationSignificance.R

Defines functions evaluatePopulationSignficance

Documented in evaluatePopulationSignficance

#' Determine the significance of a population of PrixFixeNetworks
#'
#' Given a population (i.e., list) of PrixFixeNetworks calculate the mean
#' network density across the population. Compare the mean density to a distribution
#' of random network densities.
#'
#' @section Calculating the null model distribtion:
#' In order to calculate the null model we want to sample genes from the total
#' co-function network to generate random loci. In order to mitigate problems
#' with node-degree effects, we first categorically bin the cofunction network
#' and then replace original loci genes with a random choice from the same bin.
#'
#' @section Calculating the p-value:
#' The p-value here is defined as the fraction of null model densities that are
#' greater than the average true network density.
#'
#' @param pf_data a \code{PFData} object generated by \code{PFDataLoader}
#' @param population a list of \code{PrixFixeNetwork} objects
#' @param num_trials The number of random trials to generate the null distribution
#' @return a p-value
#'
#' @examples
#' \dontrun{
#' # load example PFData (FA genes)
#' data(PF_FanconiAnemia)
#' # generate population of subnetworks
#' population <- initializePopulation(PF_FanconiAnemia, population_size=100, "true_members")
#' # evaluate population significance
#' population_significance <- evaluatePopulationSignificance(pf_data, population, 100)
#' }
#'
evaluatePopulationSignficance <- function(pf_data, population,
                                          num_trials = 10) {
  # Calculate the p-value for population significance.
  mean_density <- getNetworkDensity(population, return_mean = TRUE)
  dopar_progress_bar <- make_progress_bar(num_trials)

  # Create a null_population of size num_trials
  null_population <- initializePopulation(
    pf_data = pf_data,
    population_size = num_trials,
    members = "null_members")
  # Calculate all densities to create an empirical null distribution of network
  # densities
  null_densities <- getNetworkDensity(null_population, return_mean = FALSE)

  # ****************************************************************************
  # NOTE: Original version of algorithm. Incorrect implementation
  #
  # For each trial, initialize a random population of size(population) with null
  # members. Then return the average network density for that trial.
  #
  # null_densities <- foreach(i=1:num_trials,
  #                           .combine = dopar_progress_bar()) %dopar% {
  #                             null_population <- initializePopulation(
  #                               pf_data = pf_data,
  #                               population_size = length(population),
  #                               members = "null_members")
  #                             return(getNetworkDensity(null_population,
  #                                                      return_mean = TRUE))
  #                             }
  # ****************************************************************************

  # Calculate the fraction of random trials that resulted in average
  # densities higher than the test population
  n_outperform <- sum(ifelse(null_densities > mean_density, 1, 0))
  p_value <- n_outperform / length(null_densities)
  return(p_value)
}
princeew/PFFindR documentation built on Dec. 31, 2020, 2:06 a.m.