inspect matrix or data.frame regarding NAs

Description

This function provides a summary of NAs in a given matrix or data.frame either feature-wise (by column) or sample-wise (by row). It can also provide a barplot and/or histogram regarding this statistics.

Usage

1

Arguments

d

A data.frame or matrix which you want to get the summary of NAs in it (Mandatory)

hist

logical. Should the function plot histogram. Default is FALSE. (Optional)

summary

logical. Should the function returns the result dataframe. Default is TRUE. (Optional)

byrow

logical. Should the function perform row-wise. Default is FALSE. (Optional)

barplot

logical. Should the function plot barplot. Default is TRUE. (Optional)

Details

This function provides a quick and easy way to see how much missing values (NA) exist in a data.frame or matrix. This function is designed to make the data exploration easier since missing values are one of the most problematic part in lated stages of analysis.

Value

The function prvides a data.frame (in case summary argument is set to TRUE) containing column or row index, name, number_of_NAs and ratio_of_NA. In case the function does not find any NA, it will return NULL in case it need to be checked by is.null().

The barplot generated by this function is presenting column names or row names which contain NAs with their NA ratio to the total number of items in that row or column. The plot also colors the bars based on their NA ratio: * Gray less than and equal to 10 * Yellow for >10 * Orange for >30 * Red for >50 The plot also has horizontal lines at 10

The histogram generated by this function is meant to provide an overview of how NAs are distributed in the input data. This plot presents all the columns or rows regardless of having NA values or not. This plot is more useful when used for small number of rows or columns.

Author(s)

Mehrad Mahmoudian

See Also

pin.na is.na

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
    # get some data
    my_iris <- iris
    # add 20 NAs randomly
    for(i in 1:260){
        my_iris[sample(1:nrow(my_iris), 2), sample(c(1,2,3,1,3,3,3), 1)] <- NA
    }

    # now we can inspect the NAs
    inspect.na(my_iris)
    # plot the histogram
    inspect.na(my_iris, hist=TRUE, barplot=FALSE)