extract_data: Extract Data Based on Mappings

View source: R/helpers.R

extract_dataR Documentation

Extract Data Based on Mappings

Description

This function iterates over a list of gene results, extracting and transforming data according to a provided mapping schema. It handles both direct mappings and nested array mappings, creating a comprehensive data frame with extracted data.

Usage

extract_data(
  all_gene_results,
  mappings = list(geneID = "mappedGeneID", symbol = "mappedSymbol",
    `crossReference$enseGeneID` = "mappedEnseGeneID", `mRNAExpressions$proteinAtlas` =
    list(c(c = "mappedC")), ontology = list(c(id = "mappedId", term = "mappedTerm", cat =
    "mappedCat")))
)

Arguments

all_gene_results

A list of gene results, where each element is a list containing gene information that might include nested structures.

mappings

A list defining the mapping from input data structure to output data frame columns. It supports direct mappings as well as mappings for nested structures. The default mappings are provided. Each mapping should be a character vector for direct mappings or a list of vectors for nested mappings.

Value

A data frame where each row corresponds to an entry in the input list, and each column corresponds to one of the specified mappings. For nested array mappings, multiple rows will be generated based on array entries, duplicating other information as needed.

Examples


# Assuming all_gene_results is your input data

all_gene_results <- fetch_all_gene_search_results(
  queryFields = list(c("symbol")),
  queryValues = c("A1CF", "A2M", "A4GALT", "A4GNT"),
  fieldsFilter = c("geneID", "symbol", "crossReference.enseGeneID", 
                   "mRNAExpressions.proteinAtlas.c", "ontology.id", 
                   "ontology.term", "ontology.cat"),
  searchType = "or",
  orderBy = "geneID",
  sortDirection = "asc",
  responseType = "json",
  matchType = "exact",
  organismType = list(c(9606)),
  ontologyCategories = list(),
  limit = 100,
  options = list(api_key = "3147dda5fa023c9763d38bddb472dd28", timeout = 10000)
)

data_transposed <- extract_data(all_gene_results, list(
    "geneID" = "mappedGeneID",
    "symbol" = "mappedSymbol",
    "crossReference$enseGeneID" = "mappedEnseGeneID",
    "mRNAExpressions$proteinAtlas" = list(c("c" = "mappedC")),
    "ontology" = list(c("id" = "mappedId", "term" = "mappedTerm", "cat" = "mappedCat"))
))


genular documentation built on Oct. 19, 2024, 9:07 a.m.