Imputation: Imputing missing values through linear regression

View source: R/Imputation.R

ImputationR Documentation

Imputing missing values through linear regression

Description

Fits a simple linear regression model, to impute missing values of the dependent variable.

Usage

Imputation(Data, Variable, x_lab, y_lab)

Arguments

Data

Data frame containing two at least partially concurrent time series. First column may be a "Date" object. Can be Dataframe_Combine output.

Variable

Character vector of length one specifying the (column) name of the variable to be imputed i.e. dependent variable in the fitted regression.

x_lab

Character vector of length one specifying the name of the independent variable to appear as the x-axis label on a plot showing the data, imputed values and the linear regression model.

y_lab

Character vector of length one specifying the name of the dependent variable to appear as the y-axis label on plot showing the data, imputed values and the linear regression model.

Value

List comprising a

  • Data data frame containing the original data plus an additional column named Value where the NA values of the Variable of interest have been imputed where possible.

  • Model linear regression model parameters including its coefficient of determination

and a scatter plot of the data (black points), linear regression model (red line) and fitted (imputed) values (blue points).

Examples

####Objective: Fill in missing values at groundwater well G_3356 using record at G_3355
##Viewing first few rows of G_3356
head(G_3356)
#Converting date column to a "Date" object
G_3356$Date<-seq(as.Date("1985-10-23"), as.Date("2019-05-29"), by="day")
#Converting readings to numeric object
G_3356$Value<-as.numeric(as.character(G_3356$Value))

##Viewing first few rows of G_3355
head(G_3355)
#Converting date column to a "Date" object
G_3355$Date<-seq(as.Date("1985-08-20"), as.Date("2019-06-02"), by="day")
#Converting readings to numeric object
G_3355$Value<-as.numeric(as.character(G_3355$Value))

##Merge the two dataframes by date
library('dplyr')
GW_S20<-merge(G_3356,G_3355,by="Date")
colnames(GW_S20)<-c("Date","G3356","G3355")
#Carrying out imputation
Imputation(Data=GW_S20,Variable="G3356",
           x_lab="Groundwater level (ft NGVD 29)",
           y_lab="Groundwater level (ft NGVD 29)")

rjaneUCF/MultiHazard documentation built on April 20, 2024, 12:48 a.m.