Description Usage Arguments Examples
View source: R/Encontrar_candidatos_dataset_v1.R
CompleteDataset
Asks for a dataframe, a vector of collumn indices and the goal collumn and returns the data frame with the values filled
1 | CompleteDataset(df, rows, goal)
|
df |
A dataframe with the missing values you wish to fill |
rows |
The collumns you wish to use to predict the missing values |
goal |
The collum with the missing values you wish to fill |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | #The CompleteDataset Function shall do the following
#Take a dataframe and a goal collumn to predict
#Takes a set of vectors to use for prediction
#Use this set to predict with accuracy given by MeanAccuracy function
#Then to run some experiments first lets build a dataframe
e=sample(1:5,1e4,replace=TRUE)
e1=sample(1:5,1e4,replace=TRUE)
e2=sample(1:5,1e4,replace=TRUE)
e=data.frame(e,e1,e2,paste(LETTERS[e],LETTERS[e1]),paste(LETTERS[e],LETTERS[e1],LETTERS[e2]) )
#Now we got a dataframe lets create a copy of it
ce=e
ce[sample(1:nrow(e),0.3*nrow(e)),5]=NA
#So 30 percent of the data is now missing
#Lets try to recover it then with CompleteDataset
#First we must choose a set of vectors to use
#Lets first try with BestVector
vector_c=BestVector(ce,5,4,nrow(ce),1)
ce1=CompleteDataset(ce,rows=vector_c,goal=5)
#We can see how many values are still missing with NA_VALUES
print(NA_VALUES(ce1) )
#And check how many we got wrong by
print(sum(ce1[,5]!=e[,5]) )
#If the user wanted he of course could choose a set of his own, for example
user_set=c(1,3)
ce1=CompleteDataset(ce,rows=user_set,goal=5)
#We can see how many values are still missing with NA_VALUES
print(NA_VALUES(ce1) )
#And check how many we got wrong by
print(sum(ce1[,5]!=e[,5]) )
#But we can see that is not the best solution
#To see how to check the best sets take a look at generate_candidates
# The process could be done for the 4 collum as well
ce=e
ce[sample(1:nrow(e),0.5*nrow(e)),4]=NA
#So 50 percent of the data is now missing
#Lets try to recover it then with CompleteDataset
vector_c=BestVector(ce,4,4,nrow(ce),1)
ce1=CompleteDataset(ce,rows=vector_c,goal=4)
#We can see how many values are still missing with NA_VALUES
print(NA_VALUES(ce1) )
#And check how many we got wrong by
print(sum(ce1[,4]!=e[,4]) )
#Here we can easily see e holds the original data
#ce1 is the recovered data
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.