get_cluster_summary: get_cluster_summary

View source: R/get_cluster_summary.R

get_cluster_summaryR Documentation

get_cluster_summary

Description

A function to get summary data by coordinated cluster

Usage

get_cluster_summary(output, labels = FALSE)

Arguments

output

the output list resulting from the function get_coord_shares

labels

auto-generate a cluster label using account's title and descriptions. Relies on Openai APIs. Expects the API Bearer in OPENAI_API_KEY environment variable.

Details

The gini values are computed by using the Gini coefficient on the proportions of unique domains each cluster shared. The Gini coefficient is a measure of the degree of concentration (inequality) of a variable in a distribution. It ranges between 0 and 1: the more nearly equal the distribution, the lower its Gini index. When a cluster shared just one domain, the value of the variable is set to 1. It is calculated separately for full_domains (e.g. www.foxnews.com, video.foxnews.com) and parent domains (foxnews.com)

The cooRscore.avg is a measures of cluster coordination. Higher values implies higher coordination. Its value is calculated by dividing, for each entity in a coordinated network, its strength by its degree, and then calculating the average by cluster of these values.

The cooRshare_ratio.avg is an addional measure of cluster coordination ranging from 0 (no shares coordinated) to 1 (all shares coordinated).

Value

A data frame that summarizes data for each coordinated cluster. The data includes: - The average number of subscribers of entities in a cluster. - The proportion of coordinated shares over the total shares (coorshare_ratio). - The average coordinated score (avg_cooRscore), which measures the dispersion (gini) in the distribution of domains that are coordinatedly shared by the cluster (0-1). Higher values correspond to higher concentration (fewer different domains linked). - The top coordinatedly shared domains (ranked by the number of shares) and the total number of coordinatedly shared domains. If the NewsGuard API is provided, this function also returns an estimate of the trustworthiness of the domains used by the cluster. If the label parameter is set to TRUE and an OpenAI token is provided, the function also returns an automatically generated label for each cluster.

Examples

  # get the top ten posts containing URLs shared by each network cluster and by engagement
  cluster_summary <- get_cluster_summary(output, label=TRUE)

  # clustering the clusters rowwise mutate
  clusters <- hclust(dist(cluster_summary[, 2:4]))
  plot(clusters)


fabiogiglietto/CooRnet documentation built on Aug. 15, 2024, 7:16 p.m.