analyse_metadata: Analyse Metadata of Tables Needing Secondary Tabular Data...

View source: R/analyse_metadata.R

analyse_metadataR Documentation

Analyse Metadata of Tables Needing Secondary Tabular Data Protection

Description

This function analyzes a metadata dataframe to determine which tables should be treated together in the same cluster. It also rearranges and groups the tables based on hierarchical relationships, creating a structured output for further processing.

Usage

analyse_metadata(df_metadata, verbose = FALSE)

Arguments

df_metadata

A dataframe containing metadata in wide format.

verbose

Logical. If TRUE, returns a detailed list of intermediate results from each processing step. If FALSE, returns only the cluster assignments. Defaults to FALSE.

Details

The function performs the following steps:

  • Converts the metadata from wide format to long format using wide_to_long.

  • Identifies hierarchical relationships and renames variables with identify_hrc.

  • Splits hierarchical relationships into clusters using split_in_clusters.

  • Creates edges to describe the relationships via create_edges.

  • Generates translation tables for regrouping with grp_tab_names.

  • Regroups tables into independent clusters with grp_tab_in_cluster.

  • Identifies tables to be treated together using tab_to_treat.

  • Produces a final dataframe summarizing the cluster assignments using dataframe_result.

Value

A list or dataframe, depending on the value of the verbose parameter:

  • If verbose = TRUE, returns a list with detailed intermediate results:

    identify_hrc

    A data frame with renamed variables and grouped response variables.

    info_var

    A mapping of original variable names to their renamed counterparts.

    split_in_clusters

    A list of clusters obtained after splitting the data.

    create_edges

    A list of edges created for describing relationships.

    grp_tab_names

    Translation tables generated for renaming and regrouping.

    grp_tab_in_clusters

    Independent tables grouped by clusters.

    tab_to_treat

    Cluster assignments for tables to be treated.

    df_tab_to_treat

    A dataframe summarizing the tables and their clusters.

  • If verbose = FALSE, returns only the cluster assignments (tab_to_treat).

Examples

data(metadata_pizza_lettuce)

# View the structure of the original data
str(metadata_pizza_lettuce)

# Run the analysis
detailed_analysis <- analyse_metadata(metadata_pizza_lettuce, verbose = TRUE)

# Simplified output (non-verbose)
cluster_id_dataframe <- analyse_metadata(metadata_pizza_lettuce, verbose = FALSE)


InseeFrLab/rtauargus documentation built on Feb. 25, 2025, 6:32 a.m.