Description Usage Arguments Details Value Author(s) References Examples
View source: R/almostComplete.R
An alternative to stats::complete.cases()
that lets you specify the
percentage of completeness desired.
1 | almostComplete(dataset, rowPct, colPct = rowPct, n = 1)
|
dataset |
The input |
rowPct |
The maximum percent of |
colPct |
The maximum percent of |
n |
When |
When n
is specified and rowPct
and colPct
are NULL
, the function
calculates the number of NA
values by row and column. By default, it then
drops the rows and columns with the highest number of missing values. With
the dataset in the Examples section, if you use n = 2
, the function will
remove rows 1, 3, and 6 and columns A, B, C, and F. Compare this behavior
with the results of rowSums(is.na(mydf))
and colSums(is.na(mydf))
.
A data.frame
Ananda Mahto
http://stackoverflow.com/a/20475029/1270695
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | mydf <- read.csv(text="
SampleID,A,B,C,D,E,F
x1,NA,x,NA,x,NA,x
x2,x,x,NA,x,x,NA
x3,NA,NA,x,x,x,NA
x4,x,x,x,NA,x,x
x5,x,x,x,x,x,x
x6,NA,NA,NA,x,NA,NA
x7,x,x,x,NA,x,x
x8,NA,NA,x,x,x,x
x9,x,x,x,x,x,NA
x10,x,x,x,x,x,x
x11,NA,x,x,x,x,NA")
## What do the data look like?
## How many NAs are there per column and row?
mydf
colSums(is.na(mydf))
rowSums(is.na(mydf))
## What does complete.cases do?
mydf[complete.cases(mydf), ]
## Drop whichever row and column have
## the highest percentage of NA values
almostComplete(mydf, NULL, NULL)
## Drop the rows and columns which have
## more than the second highest percentage of NA values
almostComplete(mydf, NULL, NULL, n = 2)
## Set one threshold value for both rows and columns.
almostComplete(mydf, .7)
## Specify row and column threshold values separately.
almostComplete(mydf, rowPct = .2, colPct = .5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.