knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The network of political blogs was first analyzed in "The political blogosphere and the 2004 US Election" by Lada A. Adamic and Natalie Glance, in Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005). This data set, collected before the 2004 American presidential election, records hyperlinks connecting political blogs to one another. These blogs have been labeled manually as either "liberal" or "conservative". We conduct our analysis on the largest connected component of the graph, and we ignore the direction of the links.
# Load packages library(Matrix) library(igraph) library(gsbm) # Load data data(blogosphere) A <- blogosphere$A names <- blogosphere$names opinion <- blogosphere$opinion
We run our algorithm and we use our estimator $\widehat{S}{\epsilon}$ to detect outliers : a node is considered an outlier if the corresponding column of the matrix $\widehat{S}{\epsilon}$ is not null.
degrees <- colSums(A) n <- nrow(A) sqrt_deg <-sqrt(mean(degrees)) # Choice of parameters lambda_1<- 10*sqrt_deg lambda_2 <- 5*sqrt_deg print(lambda_1) print(lambda_2) # Run the mcgd algorithm res <- gsbm_mcgd(A, lambda_1,lambda_2) # Detect the outliers outliers_detected <- which(colSums(res$S)>0) s<- length(outliers_detected) names[outliers_detected]
Our algorithm detects $s = 10$ outliers.
Then, we use our estimator $\widehat{L}{\epsilon}$ to estimate the communities of the remaining nodes. More precisely, we estimate the community of a node by the sign of its coordinate along the second eigenvector of $\widehat{L}{\epsilon}$, up to a permutation of the two communities. We compare our results with the labels obtained by manual labeling.
# Estimate the communities of the remaining (inlier) nodes I <- which(colSums(res$S)==0) com_est <- matrix(rep(0, (n-s)*2), nrow = 2, ncol = n-s) sv <- svd(res$L, nu = 2, nv = 2) com_est[1,] <- floor(sign(sv$u[I,2])/2 + rep(0.5,n - s)) com_est[2,] <- rep(3,n-s) - com_est[1,] # labels are obtained up to a permutation best_est <- which.max(c(sum(com_est[1,] == opinion[I]), sum(com_est[2,] == opinion[I]))) # Missclassified nodes missclassified_nodes <- (com_est[best_est,] != opinion[I]) error <- sum(missclassified_nodes) print(error)
Among the $n-s = 1212$ remaining nodes, which are considered as inliers, $84$ are missclassified. The number of missclassified nodes is comparable with the best-known methods that have been applied for this dataset. We note that the nodes that are missclassified by our method either have low degree, or are well connected with nodes belonging to the other community.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.