twitter: twitter network
In SNAnalyst/SNA4DSData: Datasets for the SNA4DS course

twitter

R Documentation

twitter network

Description

twitter network

Usage

data(twitter, package = "SNA4DSData")

Format

A data frame with 22405 vertices and 77920 edges. Directed, unweighted.

Details

twitter network, in igraph-format. The network is known as "icwsm_polarization"

The official documentation:

Overview

This dataset describes three networks of political communication between users of the Twitter social media platform in the six weeks prior to the 2010 Congressional midterm elections. This network is particularly interesting because one mode of communication, retweets, segregates users into two politically homogeneous communities of like-minded individuals, while mentions form a bridge between the two communities over which users are exposed to people and information they would not likely select ahead of time.

Data Source

The present analysis leverages data collected from the Twitter (gardenhose API (dev.twitter.com/pages/ streaming_api) between September 14th and November 1st, 2010, the run-up to the November 4th U.S. congressional midterm elections. During the six weeks of data collection we observed approximately 355 million tweets.

Identifying Political Content

Let us define a political communication as any tweet containing at least one politically relevant hashtag. To identify an appropriate set of political hashtags and to avoid intro- ducing bias into the sample, we performed a simple tag co-occurrence discovery procedure. We began by seeding our sample with the two most popular political hashtags, #p2 (Progressives 2.0) and #tcot (Top Conservatives on Twitter). For each seed we identified the set of hashtags with which it co-occurred in at least one tweet, and ranked the results using the Jaccard coefficient. Thus, when the tweets in which both seed and hashtag occur make up a large portion of the tweets in which either occurs, the two are deemed to be related.

Political Communication Networks

From the tweets containing any of the politically relevant hashtags we constructed networks representing political communication among Twitter users. Focusing on the two primary modes of public user-user interaction, mentions and retweets, we define communication links in the following ways. In the retweet network an edge runs from a node representing user A to a node representing user B if B retweets content originally broadcast by A, indicating that information has propagated from A to B. In the mention network an edge runs from A to B if A mentions B in a tweet, indicating that information may have propagated from A to B (a tweet mentioning B is visible in B's timeline). Both networks therefore represent potential pathways for information to flow between users.

The retweet network consists of 23,766 non-isolated nodes among a total of 45,365. The largest connected component accounts for 18,470 nodes, with 102 nodes in the next-largest component. The mention network is smaller, consisting of 10,142 non-isolated nodes out of 17,752 total. It has 7,175 nodes in its largest connected component, and 119 in the next-largest. Because of their dominance we focus on the largest connected components for the rest of our analysis.