rsRankingIndex: Get indices for top scored region sets

rsRankingIndexR Documentation

Get indices for top scored region sets

Description

For each target variable, get index of original region sets but ordered by rsScores ranking for each target variable. The original index refers to that region set's position in the 'GRList' param given to 'aggregateSignalGRList' which is also that region set's row index in the COCOA output. The first number in a given column of this function's output will be the original index of the region set ranked first for that target variable. The second row for a column will be the original index of the region set that ranked second for that target variable, etc. You can use this function to make it easier when you want to select the top region sets for further analysis or just for sorting the results. Region set scores are sorted in decreasing or increasing order according to the 'decreasing' parameter.

Usage

rsRankingIndex(rsScores, signalCol, decreasing = TRUE, newColName = signalCol)

Arguments

rsScores

data.frame. A data.frame with region set scores. The output of the 'aggregateSignalGRList' function. Each row is a region set. One column for each sample variable of interest (e.g. PC or sample phenotype). Also can have columns with info on the overlap between the region set and the epigenetic data. Rows should be in the same order as the region sets in GRList (the list of region sets used to create rsScores.)

signalCol

A character vector with the names of the sample variables of interest/target variables (e.g. PCs or sample phenotypes).

The columns in rsScores for which you want the indices of the original region sets.

decreasing

Logical. Whether to sort rsScores in decreasing or increasing order.

newColName

Character. The names of the columns of the output data.frame. The order should correspond to the order of the input columns given by signalCol.

Value

A data.frame with one column for each 'signalCol'. Column names are given by 'signalCol' or 'newColName' (if used). Each column has been sorted by score for region sets for that target variable (order given by 'decreasing' param). Original indices for region sets that were used to create rsScores are given. Region sets with a score of NA are counted as having the lowest scores and indices for these region sets will be at the bottom of the returned data.frame (na.last=TRUE in sorting)

Examples

data("rsScores")
rsRankInd = rsRankingIndex(rsScores=rsScores, 
                           signalCol=c("PC1", "PC2"))
# region sets sorted by score for PC1
rsScores[rsRankInd$PC1, ]
# region sets sorted by score for PC2
rsScores[rsRankInd$PC2, ]


databio/COCOA documentation built on Jan. 19, 2025, 8:28 a.m.