To get (back) to the overview of all steps and functions use this link: a.a.main
This function simply creates several plots to better understand the dimensions of the data.
First plot is a barplot of the class dimension to grasp the amount of non-spam and spam classifications. The plot can be find in "out/1. Exploratory - Barplot of email numbers.pdf"
The second plot creates Histograms for each dimension to show the distributions of observations. It is stored in: "out/1. Exploratory - Histograms.pdf"
The third plot contains the scatterplots of all pairs of dimensions which had a correlation of at least 0.2 in descending order which puts the plots of dimensions with the highest correlation to the top. The result is stored in: "out/1. Exploratory - Scatterplots of highly correlated dimensions.pdf"
1 |
Vitali Friesen
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.