Description Usage Arguments Examples
View source: R/Encontrar_candidatos_dataset_v1.R
GenerateCandidates
Asks for a dataframe and some parameters and returns all possible combinations of collums for prediction that satisfy a given error in input
in a list the first element of the list are the combinations while the second is its measure of error,to get the best parameters call BestVector
1 | GenerateCandidates(df, goal, maxi, repetitions, trigger = 1)
|
df |
A dataframe with the missing values you wish to fill |
goal |
The collum with the missing values you wish to fill |
maxi |
What will be the length of possible combinations you will test example if 2 they will test up to all possible pairs of collums |
repetitions |
Measure of error, the bigger the less likely you will get the right prediction |
trigger |
When you pair all possible combination of tuples a percentage of them will show only once, trigger rejects the set if this percentage is higher than this value |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | #The GenerateCandidates function generates all sets of maximum length maxi.
#Maxi is a measure of error.
#This measure of error is related to the repetitions parameter.
#This parameter should range from 0 (rejects anything less to 100 percent accuracy)
#To number of rows of the dataframe to accept all.
#Lets generate a dataset
e=sample(1:5,1e4,replace=TRUE)
e1=sample(1:5,1e4,replace=TRUE)
e2=sample(1:5,1e4,replace=TRUE)
e=data.frame(e,e1,e2,paste(LETTERS[e],LETTERS[e1]),paste(LETTERS[e],LETTERS[e1],LETTERS[e2]) )
names(e)=c("random1","random2","random3","2randoms","3randoms")
#We can then generate all candidates to predict the 5 collumn
#We shall determine the reject part to 80 percent of the dataframe length
z=GenerateCandidates(df=e, goal=5, maxi=4, repetitions=0.8*nrow(e), trigger = 1)
#We can see z is a list
#z[[1]] is another list that contains all sets that satisfy our request
#z[[2]] is a measure of error, the smaller the more accurate
#Lets then order z[[1]] by z[[2]]
m=z[[1]][order(z[[2]])]
print(m)
#We can see then that m[[1]] holds the best set for prediction, while m[[length(m)]] the worst
#To prove it we can do the following
##cat("The best set to predict",names(e)[5],"is ",names(e)[m[[1]]],"\n" )
##cat("Its expected accuracy is",MeanAccuracy(e,m[[1]],5),"\n" )
##cat("The worst set to predict",names(e)[5],"is ",names(e)[m[[length(m)]]],"\n" )
##cat("Its expected accuracy is",MeanAccuracy(e,m[[length(m)]],5),"\n" )
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.