R/compute_control.R

#' Generate a negative control.
#'
#' Computes correlations for all the randomized windows against a window of choice.
#'
#' \code{compute_control} only correlates the set of randomized windows to one of the windows
#' into which the data set was divided. Therefore, it needs to be called once for each window.
#'
#' The correlation vector for each random is generated by iterating the genes in it and
#' correlating them to those in the selected window. As a result, there will be as many
#' correlation values in the vector as genes in the top window. At the same time, the output
#' will have as many elements as randomized versions of the top window have been computed.
#' Consequently, both top window size and number of randomizations impact the computation speed
#' of the process.
#'
#' The correlation method argument is passed on to the \code{cor} function, in the \code{stats}
#' package, and therefore, the same options as this function provides are available. However, it
#' is adviseable to use pearson correlation, since it presents the most advantageous balance of
#' result quality and computational efficiency.
#'
#' @param randomized_windows A list containing as many elements as randomized top windows
#' have been computed.
#'
#' @param dataset A data frame containing the binned data.
#'
#' @param window_number An integer indicating the bin for which the control is being computed.
#'
#' @param cor_method A string indicating the type of correlation to use.
#'
#' @return A list containing the negative control: a vector of correlations corresponding to
#' correlating each randomized window to the chosen window in the data.

compute_control <- function(randomized_windows, dataset, window_number, cor_method){

    # select a window from the actual data and extract only expression values
    selected_window <- subset(dataset, dataset$bin == window_number)
    selected_window <- select(selected_window, -mean, -CV, -stdev, -bin)

    all_correlations <- list()

    for (i in seq_len(length(randomized_windows))){
        # create empty list to store sub-calculations
        window_correlations <- list()
        # extract each randomized window
        selected_random <- randomized_windows[[i]] %>% as.matrix()

        for (j in seq_len(nrow(randomized_windows[[i]]))){

            # compute correlation for each gene in the selected random window
            # against all genes in the actual data window
            window_correlations[[j]] <- cor(selected_random[j,], t(selected_window),
                                            method = cor_method) %>% as.vector()
        }
        all_correlations[[i]] <- do.call(c, window_correlations)
    }
    return(all_correlations)
}
angelesarzalluz/scfilters documentation built on May 10, 2019, 11:46 a.m.