inspect.na: inspect matrix or data.frame regarding NAs
In varhandle: Functions for Robust Variable Handling

View source: R/inspect.na.R

inspect.na

R Documentation

inspect matrix or data.frame regarding NAs

Description

This function provides a summary of NAs in a given matrix or data.frame either feature-wise (by column) or sample-wise (by row). It can also provide a barplot and/or histogram regarding this statistics.

Usage

    inspect.na(d, hist=FALSE, summary=TRUE, byrow=FALSE, barplot=TRUE, na.value = NA)

Arguments

`d`	A data.frame or matrix which you want to get the summary of NAs in it (Mandatory)
`hist`	logical. Should the function plot histogram. Default is FALSE. (Optional)
`summary`	logical. Should the function returns the result dataframe. Default is TRUE. (Optional)
`byrow`	logical. Should the function perform row-wise. Default is FALSE. (Optional)
`barplot`	logical. Should the function plot barplot. Default is TRUE. (Optional)
`na.value`	A vector containing the value that should be considered as missing value. The default is NA, but you can add to it or change it to your preference. See the example. (Optional)

Details

This function provides a quick and easy way to see how much missing values (e.g NA) exist in a data.frame or matrix. This function is designed to make the data exploration easier since missing values are one of the most problematic part in lated stages of analysis.

Value

The function provides a data.frame (in case summary argument is set to TRUE) containing column or row index, name, number_of_NAs and ratio_of_NA. In case the function does not find any NA, it will return NULL in case it need to be checked by is.null().

The barplot generated by this function is presenting column names or row names which contain NAs with their NA ratio to the total number of items in that row or column. The plot also colors the bars based on their NA ratio: * Gray less than and equal to 10% * Yellow for >10% and <30% * Orange for >30% and <50% * Red for >50% The plot also has horizontal lines at 10%, 20%, 30% and 50% to make the plot easier to read.

The histogram generated by this function is meant to provide an overview of how NAs are distributed in the input data. This plot presents all the columns or rows regardless of having NA values or not. This plot is more useful when used for small number of rows or columns.

Author(s)

Mehrad Mahmoudian

Examples

    # get some data
    my_iris <- iris
    # add 20 NAs randomly
    for(i in 1:260){
        my_iris[sample(1:nrow(my_iris), 2), sample(c(1,2,3,1,3,3,3), 1)] <- NA
    }

    # now we can inspect the NAs
    inspect.na(my_iris)
    # plot the histogram
    inspect.na(my_iris, hist=TRUE, barplot=FALSE)

varhandle documentation built on Oct. 1, 2023, 1:08 a.m.