# familyRank: Feature Ranking with Family Rank In FamilyRank: Algorithm for Ranking Predictors Using Graphical Domain Knowledge

## Description

Ranks features by incorporating graphical knowledge to weight empirical feature scores. This is the main function of the FamilyRank package.

## Usage

 ```1 2``` ```familyRank(scores, graph, d = 0.5, n.rank = min(length(scores), 1000), n.families = min(n.rank, 1000), tol = 0.001) ```

## Arguments

 `scores` A numeric vector of empirical feature scores. Higher scores should indicate a more predictive feature. `graph` A matrix or data frame representation of a graph object. `d` Damping factor `n.rank` Number of features to rank. `n.families` Number of families to grow. `tol` Tolerance

## Details

The `scores` vector should be generated using an existing statistical method. Higher scores should correspond to more predictive features. It is up to the user to adjust accordingly. For example, if the user wishes to use p-values as the empirical score, the user should first adjust the p-values, perhaps by subtracting all p-values from 1, so that a higher value corresponds to a more predictive feature.

The `graph` must be supplied in matrix form, where the first two columns represent graph nodes and the third column represents the edge weights between nodes. The graph nodes must be represented by the index of the feature that corresponds with the index in the `score` vector. For example, a node corresponding to the first value of the `score` vector should be indicated by a 1 in the `graph` object, the second by a 2, etc. It is not necessary that every feature in the `score` vector appear in the `graph`. Missing pairwise interactions will be considered to have interaction scores of 0.

The damping factor, `d`, represents the percentage of weight given to the interaction scores. The damping factor must be between 0 and 1. Higher values give more weight to the interaction score while lower values give more weight to the empirical score.

The value for `n.rank` must be less than or equal to the number of scored features. The algorithm will include only the top `n.rank` features in the ranking process (e.g. the `n.rank` features with the highest values in the `score` vector will be used to grow families). Higher values of `n.rank` require longer compute times.

The value for `n.families` must be less than or equal to the value of `n.rank`. This is the number of families the algorithm will grow. If `n.families` is less than `n.rank`, the algorithm will initate families using the `n.families` highest scoring features. Higher values of `n.families` require longer compute times.

The tolerance variable, `tol`, tells the algorithm when to stop growing a family. Features are added to families until the weighted score is less than the tolerance level, or until all features have been added.

## Value

Returns a vector of the weighted feature scores.

Michelle Saul

## References

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ```# Toy Example scores <- c(.6, .2, .9) graph <- cbind(c(1,1), c(2,3), c(.4, .8)) familyRank(scores = scores, graph = graph, d = .5) # Simulate data set # 100 samples # 1000 features # Features 1 through 15 perfectly define response # All other features are random noise simulatedData <- createData(n.case = 50, n.control = 50, mean.upper=13, mean.lower=5, sd.upper=1, sd.lower=1, n.features = 10000, subtype1.feats = 1:5, subtype2.feats = 6:10, subtype3.feats = 11:15) x <- simulatedData\$x y <- simulatedData\$y graph <- simulatedData\$graph # Score simulated features using absolute difference in group means scores <- apply(x, 2, function(col){ splt <- split(col, y) group.means <- unlist(lapply(splt, mean)) score <- abs(diff(group.means)) names(score) <- NULL return(score) }) # Display top 15 features using emprical score order(scores, decreasing = TRUE)[1:15] # Rank scores using familyRank scores.fr <- familyRank(scores = scores, graph = graph, d = .5) # Display top 15 features using emprical scores with Family Rank order(scores.fr, decreasing = TRUE)[1:15] ```