swGsea | R Documentation |
Performs site weighted gene set enrichment analysis or standard GSEA when
likelihood/weight columns in input_df
are 1 or 0, p=1
,
q=1
and thresh_type="val"
.
swGsea(
input_df,
thresh_type = "percentile",
thresh = 0.9,
thresh_action = "exclude",
min_set_size = 10,
max_set_size = 500,
max_score = "max",
min_score = "min",
psuedocount = 0.001,
perms = 1000,
p = 1,
q = 1,
nThreads = 1,
rng_seed = 1,
fork = FALSE
)
input_df |
A data frame in which first column is name of item of interest (gene, protein, phosphosite, etc.), the second is the correlation of that item of interest with the phenotype (typically log ratio of expression for phenotype vs. normal), and the remaining columns are the scores for the likelihood that the item belongs in each set (one column per set). |
thresh_type |
The type of |
thresh |
Depends on |
thresh_action |
Either "include", "exclude (default)", or "adjust"; this specifies how to treat each set if it doesn't contain a minimum number of items or contains all of the items; this option cannot be used with predefined lists of items in sets (if the number of items in a given set doesn't meet requirements, that set will be skipped). |
min_set_size, max_set_size |
The minimum/maximum number of items each set needs for the analysis to proceed. |
max_score, min_score |
A optional numeric vector of minimum/maximum boundaries to clip scores for each set. |
psuedocount |
Psuedocount (pc) is used for rescaling set scores:
|
perms |
The number of permutations. |
p |
The exponential scaling factor of the phenotype score (second column in
|
q |
The exponential scaling factor of the likelihood score (weights). |
nThreads |
The number of threads to use in calculating permutaions. |
rng_seed |
Random seed. |
fork |
A boolean. Whether pass "fork" to |
The formula for weighting is as follows
\frac{s_{j}^{q}|r_{j}|^{p}}{\sum s^{q}|r|^{p}}
Where r is log ratio score, s is likelihood score, j is the index of the gene.
A list of Enrichment_Results
, Items_in_Set
and Running_Sums
.
A data frame with row names of gene set and columns of "ES", "NES", "p_val", "fdr".
A list of one-column data frames. Describes genes and their ranks in each set.
Running sum scores along genes sorted by ranked scores, with gene sets as columns.
Eric Jaehnig
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.