plotHistNA: Plots a histogram on the proportion of NA

Description Usage Arguments Examples

Description

Plots a histogram on the proportion of missing values (NA) for all the features in the SEER dataframe.

Usage

1
plotHistNA(dataframe, summary = FALSE, additional_na, binwidth)

Arguments

dataframe

The dataframe with SEER data

summary

If set to TRUE, the function will print a summary on the NA values apart from the histogram.

additional_na

A vector with additional symbol(s) that also should be considered NA. This is important for some datasets exported from SEER*Stat software that come with NA values and also strings 'Blank(s)' representing also lack of values.

binwidth

This parameter is automatically set by ggplot2. You can set it to a specific number if you want.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# First build parsing instructions
## Not run: 
instr <- buildSEERParser(file_path = 'read.seer.research.nov17.sas',
                         file_source = 'download')

# Now you can read it
paths <- c('/home/yourusername/SEER/yr1973_2015.seer9/BREAST.TXT',
          '/home/yourusername/SEER/yr2000_2015.ca_ky_lo_nj_ga/BREAST.TXT')

# I'm interested here in patients with breast cancer diagnosed between 2012
# and 2015
seer_data <- readSEER(path = paths,
                      instructions = instr,
                      year_dx = c(2012:2015),
                      primary_site = 'Breast')
# Plot the histogram
plotHistNA(seer_data, summary=TRUE)
## End(Not run)

mribeirodantas/vidente documentation built on May 15, 2019, 4:47 p.m.