inspect.na | R Documentation |
This function provides a summary of NAs in a given matrix or data.frame either feature-wise (by column) or sample-wise (by row). It can also provide a barplot and/or histogram regarding this statistics.
inspect.na(d, hist=FALSE, summary=TRUE, byrow=FALSE, barplot=TRUE, na.value = NA)
d |
A data.frame or matrix which you want to get the summary of NAs in it (Mandatory) |
hist |
logical. Should the function plot histogram. Default is FALSE. (Optional) |
summary |
logical. Should the function returns the result dataframe. Default is TRUE. (Optional) |
byrow |
logical. Should the function perform row-wise. Default is FALSE. (Optional) |
barplot |
logical. Should the function plot barplot. Default is TRUE. (Optional) |
na.value |
A vector containing the value that should be considered as missing value. The default is NA, but you can add to it or change it to your preference. See the example. (Optional) |
This function provides a quick and easy way to see how much missing values (e.g NA) exist in a data.frame or matrix. This function is designed to make the data exploration easier since missing values are one of the most problematic part in lated stages of analysis.
The function provides a data.frame (in case summary argument is set to TRUE) containing column or row index, name, number_of_NAs and ratio_of_NA. In case the function does not find any NA, it will return NULL in case it need to be checked by is.null().
The barplot generated by this function is presenting column names or row names which contain NAs with their NA ratio to the total number of items in that row or column. The plot also colors the bars based on their NA ratio: * Gray less than and equal to 10% * Yellow for >10% and <30% * Orange for >30% and <50% * Red for >50% The plot also has horizontal lines at 10%, 20%, 30% and 50% to make the plot easier to read.
The histogram generated by this function is meant to provide an overview of how NAs are distributed in the input data. This plot presents all the columns or rows regardless of having NA values or not. This plot is more useful when used for small number of rows or columns.
Mehrad Mahmoudian
pin.na
is.na
# get some data
my_iris <- iris
# add 20 NAs randomly
for(i in 1:260){
my_iris[sample(1:nrow(my_iris), 2), sample(c(1,2,3,1,3,3,3), 1)] <- NA
}
# now we can inspect the NAs
inspect.na(my_iris)
# plot the histogram
inspect.na(my_iris, hist=TRUE, barplot=FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.