View source: R/return_data_as_dataframe.R
return_data_as_dataframe | R Documentation |
The 'return_data_as_dataframe' function transforms the response data obtained from a query to the RCSB Protein Data Bank (PDB) into a structured dataframe. This function handles various scenarios, including responses with duplicate names, null or empty responses, and nested data structures. It ensures that the resulting dataframe is consistently formatted and ready for downstream analysis.
return_data_as_dataframe(response, data_type, ids)
response |
A list containing the response data from a PDB query. This list is expected to be structured according to the RCSB PDB GraphQL or REST API specifications. |
data_type |
A string indicating the type of data contained in the response (e.g., "ENTRY", "POLYMER_ENTITY"). This parameter is primarily used for contextual information and does not directly influence the function's operations. |
ids |
A vector of identifiers corresponding to the response data. These IDs are used to label the resulting dataframe, ensuring that each row corresponds to a specific query identifier. |
The 'return_data_as_dataframe' function is designed to provide a flexible and robust mechanism for converting PDB query responses into dataframes. It addresses several common challenges in handling API responses, such as:
If the response is null or contains no data, the function immediately returns 'NULL', avoiding unnecessary processing.
The function detects and manages scenarios where the response contains duplicated names. It simplifies such lists by keeping only the first occurrence of each duplicated element, ensuring that the final dataframe has unique column names.
The function flattens nested lists within the response, ensuring that all relevant data is captured in a single-level dataframe structure. This is particularly useful for complex responses that contain deeply nested data elements.
After processing the data, the function ensures that column names are consistent and do not retain unnecessary prefixes. This makes the resulting dataframe easier to interpret and work with in subsequent analyses.
A dataframe constructed from the response data, where each row corresponds to an identifier from the 'ids' vector and each column represents a data field from the response. If the response is null or empty, the function returns 'NULL'.
The function is equipped to handle responses with varying degrees of complexity. It is recommended to provide valid 'ids' corresponding to the query to ensure that the dataframe rows are correctly labeled.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.