summerize_by_category | R Documentation |
This function summarizes input data by categories defined in the mapping data. It supports summary methods such as median and mean, and allows additional options like retaining missing categories or appending category IDs to names.
summerize_by_category(
input_data,
mapping_data,
identifier = "symbol",
keep_missing = FALSE,
keep_ids = FALSE,
summary_method = "median"
)
input_data |
A data frame where each column represents a gene or an identifier, and each row represents an observation or a sample. |
mapping_data |
A data frame that maps identifiers to categories, which must
include the columns specified by |
identifier |
The name of the column in |
keep_missing |
A logical value indicating whether to retain identifiers in
|
keep_ids |
A logical value indicating whether to append category IDs to the category names in the summary output. |
summary_method |
The method used for summarizing within categories. Currently supports "median" and "mean". |
A data frame where each column represents a category and each row represents the summarized value of that category for the corresponding observation/sample.
# Create a sample input data frame with gene expression levels
input_data <- data.frame(
A1CF = c(2, 3, 3, 3),
A2M = c(3, 4, 3, 3),
A4GALT = c(3, 4, 3, 4),
A4GNT = c(3, 4, 3, 3)
)
# Fetch gene-related data based on specified fields and conditions
# The function `fetch_all_gene_search_results` is presumably defined elsewhere
# and retrieves information from a biological database
all_gene_results <- fetch_all_gene_search_results(
queryFields = list(c("symbol")), # Query by gene symbols
queryValues = colnames(input_data), # Gene symbols to query
fieldsFilter = c( # Fields to extract from the results
"geneID",
"symbol",
"crossReference.enseGeneID",
"mRNAExpressions.proteinAtlas.c",
"ontology.id",
"ontology.term",
"ontology.cat"
),
searchType = "or", # Search type (OR condition for queries)
orderBy = "geneID", # Ordering criteria
sortDirection = "asc", # Sort direction (ascending)
responseType = "json", # Format of the returned data
matchType = "exact", # Type of match for the query
organismType = list(c(9606)), # Organism type (e.g., Homo sapiens)
ontologyCategories = list(), # Ontology categories to include
limit = 100, # Limit on the number of results
options = list(api_key = "your_api_key", timeout = 10000) # Additional options
)
# Transform the fetched gene data based on specified mappings
data_transposed <- extract_data(
all_gene_results,
list(
"geneID" = "mappedGeneID",
"symbol" = "mappedSymbol",
"crossReference$enseGeneID" = "mappedEnseGeneID",
"mRNAExpressions$proteinAtlas" = list(c("c" = "mappedC")),
"ontology" = list(c(
"id" = "mappedId",
"term" = "mappedTerm",
"cat" = "mappedCat"
))
)
)
# Manually create a similar structure to the expected output of `extract_data`
# This mimics the processed and transposed gene data
data_transposed <- data.frame(
mappedGeneID = c(2, 2, 2, 2, 2, 2),
mappedSymbol = rep("A2M", 6),
mappedEnseGeneID = rep("ENSG00000175899", 6),
mappedC = c("gdT-cell", NA, NA, NA, NA, NA),
mappedId = c(
NA,
"R-HSA-109582",
"R-HSA-1474244",
"R-HSA-382551",
"R-HSA-140877",
"R-HSA-1474228"
),
mappedTerm = c(
NA,
"Hemostasis",
"Extracellular matrix organization",
"Transport of small molecules",
"Formation of Fibrin Clot (Clotting Cascade)",
"Degradation of the extracellular matrix"
),
mappedCat = c(NA, 10, 10, 10, 11, 11),
stringsAsFactors = FALSE
)
library(dplyr)
# Process and group the data by symbol, then summarize and arrange by terms
data_transposed_pathways <- data_transposed %>%
dplyr::group_by(mappedSymbol) %>%
dplyr::arrange(mappedTerm, .by_group = TRUE) %>%
dplyr::summarize(
category = first(mappedTerm),
category_id = first(mappedId)
)
# Display the first few rows of the grouped data
# print(head(data_transposed_pathways))
# Summarize the original input data by the categories defined in the processed gene data
# This function call summarizes expression levels by the gene's associated pathway or term
result_data_pathways <- summerize_by_category(
input_data,
data_transposed_pathways,
identifier = "mappedSymbol",
keep_missing = FALSE,
keep_ids = FALSE,
summary_method = "median"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.