# BestVector: 'BestVector' Asks for a dataframe and some parameters and... In cleanerR: How to Handle your Missing Data

## Description

`BestVector` Asks for a dataframe and some parameters and returns the best combination of collums to predict the missing value

## Usage

 `1` ```BestVector(df, goal, maxi, repetitions, trigger = 1, ratio = 0.99) ```

## Arguments

 `df` A dataframe with the missing values you wish to fill `goal` The collum with the missing values you wish to fill `maxi` What will be the length of possible combinations you will test example if 2 they will test up to all possible pairs of collums `repetitions` Measure of error, the bigger the less likely you will get the right prediction `trigger` When you pair all possible combination of tuples a percentage of them will show only once, trigger rejects the set if this percentage is higher than this value `ratio` Rejects collumns that the ratio of unique values to total values is higher than this value, primary keys have ratio equal to 1

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29``` ```#The Best Vector Function shall do the following #Take a dataframe and a goal collumn to predict #Tests every combination of vectors limited by a parameter length #Returns the best set to predict the goal #Then to run some experiments first lets build a dataframe e=sample(1:2,1e2,replace=TRUE) e1=sample(1:2,1e2,replace=TRUE) e2=sample(1:2,1e2,replace=TRUE) e=data.frame(e,e1,e2,paste(LETTERS[e],LETTERS[e1]),paste(LETTERS[e],LETTERS[e1],LETTERS[e2]) ) #We can easily see that to predict the last collumn you need the first three. #Lets Check what the function will find z=BestVector(e,5,3,nrow(e),1) print(z) #Lets now check what is the best set if we use only 2 collumns maximum z1=BestVector(e,5,2,nrow(e),1) print(z1) #We could also predict which collumn is best to predict the fourth one z2=BestVector(e,4,2,nrow(e),1) print(z2) #We could also take a look at the dataset iris. #Since this dataset does not repeat lines we must use trigger=0 #To predict Species z3=BestVector(iris,5,2,nrow(iris),0) print(names(iris))[z3] #We can check the accuracy of these predictions with the accuracy functions print(MeanAccuracy(iris,z3,5)) print(MeanAccuracy(e,z2,4)) print(MeanAccuracy(e,z1,5)) print(MeanAccuracy(iris,z,5)) ```

cleanerR documentation built on May 2, 2019, 5:51 a.m.