Welcome to the Shortened Web Link Analysis Tool (SWAT)!

This tool is designed to allow analysts to upload .csv files containing Shortened Web Link Data to conduct analysis on the shortened web links. This app is divided into a number of tabs located at the top of the browser screen which are intended to be navigated to sequentially.

Load Data

On this tab is the file input box. The user will select a file from their computer for uploading. The maximum file size is currently 500MB. After selecting a file, the user will select "Upload and Clean Data". After clicking this button, the app will import the data and generate a data table (similar to an excel spreadsheet) of the uploaded dataset. Here, the user can search for terms contained in the data or can sort the columns to see and get a better feeling of the data.

Exploratory Data Analysis

This tab generates bar charts and time density plots based on user input for building the chart. If the user selects:

The app will build the associated bar chart. The user can also choose to build a bar chart to find an associated breakdown by changing the fill option. If the user selects:

a time-density plot will be generated. This density plot shows the distribution of either link ages (DURATION_FROMCLICKTOCREATION) or when users actually clicked on the link (TIME_CLICKED). Creation indicates the time that the link shortening service first generated that shortened link. A final functionality on this tab is the ability to filter low occurring values. This allows the user to focus on the most occurring entities and also generating a nicely formatted chart. The link age density chart can also filter out old ages between 1 day and 3 days.

Community Detection

This tab conducts the market basket analysis on the dataset. After clicking the conduct community detection, the app will build the co-user matrix (a matrix that connects web domains to other web domains by common users). It will then build the graph and conduct community detection after applying an iterative filter. At each step of the filtering, the graph's modularity is calculated. The community structure that generates the greatest modularity is the preferred filter level. The second plot informs the user as to how many domains were filtered out of the data set to generate the detected communities.

Community Domains

This tab allows the user to identify how many web domains are in each identified cluster. These web domain clusters indicate which web domains were most commonly accessed together. Finally, a table can be generated showing which web domains are in the selected cluster and how many total clicks were directed to those web domains.



gallagherj2008/SWAT documentation built on May 28, 2019, 12:59 p.m.