vtype: Estimates the Variable Type in Error Afflicted Data.
In vtype: Estimates the Variable Type in Error Afflicted Data

Description Usage Arguments Details Value Examples

View source: R/vtype.R

Estimates the type of variables in not quality controlled data.

1	vtype(data, qvalue=0.75, miss_values=NULL)

`data`	a data frame.
`qvalue`	Quality value from 0.1 to 1, specifies the proportion of data assumed to be well formatted. The default value of 0.75 works very well most of the time. If the quality of the data is very poor, the q-value can be reduced. If the sample size is very small, it can be increased to use a greater portion of data.
`miss_values`	a character vector of values considered to be invalid (missing). Important, if missing values were coded as -9 or 9999, otherwise it looks like valid numeric values. Values as NA, NaN, Inf, -Inf, NULL and spaces are automatic considered as invalid (missing) values.

The prediction is based on a pre-trained random forest model, trained on over 5000 medical variables with OOB accuracy of 99pct. The accuracy depends heavily on the type and coding style of data. For example, often categorical variables are coded as integers 1 to x, if the number of categories is very large, there is no way to distinguish it from a continuous integer variable. Some types are per definition very sensitive to errors in data, like ID, missing or constant, where a single alternative non-missing value makes it not constant or not missing anymore. The data is assumed to be cross sectional, where ID is unique (no multiple entries per ID).

A data frame with following entries

variable: name of the variable
type: estimated variable type
probability: probability for estimated type
format: format of the variable (depending on the type)
class: broader categorization of type
alternative: posible alternative type with lesser probability
n: number of non-missing values
missings: number of missing values

1
2
3

# Application to a sample data set included in the package. 

vtype(sim_nqc_data, miss_values='9999')

vtype documentation built on May 14, 2021, 5:07 p.m.

vtype index

Package overview Application example"

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

vtype
Estimates the Variable Type in Error Afflicted Data

vtype: Estimates the Variable Type in Error Afflicted Data.
In vtype: Estimates the Variable Type in Error Afflicted Data

Description

Usage

Arguments

Details

Value

Examples

Related to vtype in vtype...

R Package Documentation

Browse R Packages

We want your feedback!

vtype Estimates the Variable Type in Error Afflicted Data

vtype: Estimates the Variable Type in Error Afflicted Data. In vtype: Estimates the Variable Type in Error Afflicted Data

Description

Usage

Arguments

Details

Value

Examples

Related to vtype in vtype...

R Package Documentation

Browse R Packages

We want your feedback!

vtype
Estimates the Variable Type in Error Afflicted Data

vtype: Estimates the Variable Type in Error Afflicted Data.
In vtype: Estimates the Variable Type in Error Afflicted Data