convert_gene_expression_to_pathway_features: Convert Gene Expression Data to Pathway-Level Features

View source: R/helpers.R

convert_gene_expression_to_pathway_featuresR Documentation

Convert Gene Expression Data to Pathway-Level Features

Description

Transforms a gene expression matrix into pathway-level features per sample suitable for machine learning applications. This function maps genes to their corresponding biological pathways, removes redundant pathways, calculates a PathwayGeneScore based on median gene expression and pathway variance, and optionally includes pathways not shared across multiple genes. Unmapped genes are retained as individual features in the final dataset.

Usage

convert_gene_expression_to_pathway_features(
  input_data,
  data_transposed,
  keep_non_shared = TRUE
)

Arguments

input_data

A dataframe containing gene expression data, where rows represent samples and columns represent genes. Each cell contains the expression level of a gene in a specific sample.

data_transposed

A dataframe containing gene-to-pathway mappings, with at least two columns: mappedSymbol (gene symbols) and mappedId (unique pathway identifiers).

keep_non_shared

A logical flag indicating whether to include pathways mapped to a single gene. Defaults to TRUE. If set to FALSE, pathways mapped to fewer than two genes will be excluded from the final dataset.

Value

A dataframe where each row corresponds to a sample, and each column represents either a pathway-level feature (PathwayGeneScore) or an unmapped gene's expression. Pathway features encapsulate the median expression of genes within the pathway, adjusted by gene count and pathway variance. Unmapped genes are included as individual features to retain comprehensive gene expression information.

Examples


# Sample gene expression data
input_data <- data.frame(
  A1CF = c(2, 3, 3, 3),
  A2M = c(3, 4, 3, 3),
  A4GALT = c(3, 4, 3, 4),
  A4GNT = c(3, 4, 3, 3),
  ABC1 = c(2, 2, 2, 2),
  ABC2 = c(4, 4, 4, 4)
)

# Sample gene-pathway mapping data
data_transposed <- data.frame(
  mappedSymbol = c("A4GNT", "A4GALT", "A2M", "A4GALT", "A2M", "A2M", "ABC1", "ABC2"),
  mappedId = c("GO:0000139", "GO:0000139", "GO:0001553", "GO:0001576",
               "GO:0001869", "GO:0002020", "GO:0000139", "GO:0000139")
)

# Convert gene expression data to pathway-level features, including non-shared pathways
final_data <- convert_gene_expression_to_pathway_features(input_data, data_transposed, 
                                                          keep_non_shared = TRUE)
print(final_data)


genular documentation built on Oct. 19, 2024, 9:07 a.m.