get_gene_weights: Get Gene Weights from Reference Data
In ratschlab/ssPATHS: ssPATHS: Single Sample PATHway Score

Description Usage Arguments Value Author(s) References Examples

View source: R/ssPATHS.R

This method performs linear discriminant analysis on a reference dataset using a pre-defined set of genes related to a pathway of interest.

1	get_gene_weights(expression_se, gene_ids, unidirectional)

`expression_se`	This is an SummarizedExperiment object of the reference samples. Rows are genes and columns are samples. The colData component must contain a `sample_id` column. Within this method, there is a normalization step where each sample is scaled across all genes in the SummarizedExperiment assay. For this to be stable and consistent, we recommend that the assay contain at least 500 genes that are consistently expressed across all samples in addition to the genes in the pathway of interest.
`gene_ids`	This is a vector of strings, where each element is a `gene_id` in the pathway of interest. The `gene_id`s must be present in `rownames(expression_se)`.
`unidirectional`	This is a boolean, `default=TRUE`. Most genesets are unidirectional, meaning that most genes are either increasing or decreasing together. If this is set to `TRUE`, then the learned weights will be clipped such that the dominant directionality is kept, and the other gene weights are set to zero.

A list containing the gene weights and estimated scores of the reference samples.

`proj_vector_df`	A dataframe containing the gene weights and gene ids
`dca_proj`	A dataframe containing the sample scores and sample ids.

Natalie R. Davidson

Steven C.H. Hoi, W. Liu, M.R. Lyu and W.Y. Ma (2006). Learning Distance Metrics with Contextual Constraints for Image Retrieval. Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR2006).

data(tcga_expr_df)

# transform from data.frame to SummarizedExperiment
tcga_se <- SummarizedExperiment(t(tcga_expr_df[ , -(1:4)]),
                                colData=tcga_expr_df[ , 2:4])
colnames(tcga_se) <- tcga_expr_df$tcga_id
colData(tcga_se)$sample_id <- tcga_expr_df$tcga_id

# get related genes, for us hypoxia
hypoxia_gene_ids <- get_hypoxia_genes()
hypoxia_gene_ids <- intersect(hypoxia_gene_ids, rownames(tcga_se))

# setup labels for classification
colData(tcga_se)$Y <- ifelse(colData(tcga_se)$is_normal, 0, 1)

# now we can get the gene weightings
res <- get_gene_weights(tcga_se, hypoxia_gene_ids, unidirectional=TRUE)
gene_weights_test <- res[[1]]
sample_scores <- res[[2]]