#' Import and clean NOAA data for use in analytics and visualization
#'
#' This function will be used to import and clean NOAA data for further
#' analysis. While it is possible that this will be called by the usuer, it is
#' more likely the case that this function will be used in future functions that
#' produced a more valuable output than the cleaned data itself. As is
#' common this function uses the \code{tidyverse} packages. Normally the author prefers
#' the \code{data.table} package for speed, however, the NOAA set is very small
#' and users will more likely be familiar with \code{tidyverse} syntax. See
#' \code{https://www.ngdc.noaa.gov/nndc/struts/results?&t=101650&s=225&d=225}
#' for data defintions.NOAA data goes back in history thousands of years. For
#' this reason their raw data contains negative and positive years, the sign
#' indicating the B.C vs A.D respectively.
#'
#' @param dataframe A \code{data.frame} of dirty NOAA data that has been read in
#' already. This argument is ignored if the \code{file} argument is provided. File
#' is the more compact method of cleaning data as it does the import directly.
#'
#'
#' @param file A character for the file that contains the 'dirty' data. This
#' requires the full path if not in the working director. If a vector is given
#' multiple data sets will be produced.
#'
#' @param dayfill A number indicating day of the month to use for missing data.
#' The missing data is common for old earthquakes for which data is not possible.
#' The default is to pick the first of the month.
#'
#' @param monthfill A number indicating the month to use for missing data.
#' The missing data is common for old earthquakes for which data is not possible.
#' The default is to pick July; middle of the year.
#'
#' @param delim A character string, dictating how the dirty data file is
#' delimited. This control is passed directly into the \code{delim} argument
#' of \code{readr::read_delim}. Currently NOAA data is stored in tab
#' delimited, hence the default equal to \code{"\t"}. This paratmeter is
#' really a future proofing if NOAA changes or if the user has internal data
#' collection that intermediately copies the data into a different delimited
#' format. Non delimited file types are not supported.
#'
#' @param ... This can be used primarily to pass arguments to the support
#' functions imported from other packages:
#'
#' @importFrom readr read_delim
#' @importFrom lubridate ymd
#'
#' @export
#'
#' @examples
#' \dontrun{
#'
#' #for a file from NOAA, simply point machine to file and the data.frame will return
#' eq_clean_data(file="datafromnoaa.txt")
#'
#' #if you have imported the data and just want it cleaned per the standard of this package
#' data<-read.delim("file.csv")
#'
#' eq_clean_data(dataframe=data)
#'
#' }
#'
eq_clean_data<-function(dataframe,file=NULL,
dayfill=1,monthfill=7,
delim="\t",...){
ifelse(!is.null(file),
tmpdata<-readr::read_delim(file=file,delim=delim),
tmpdata<-dataframe)
#Fill in missing data for day and month with the parameters passed
tmpdata[is.na(tmpdata$DAY),c("DAY")]<-dayfill
tmpdata[is.na(tmpdata$MONTH),c("MONTH")]<-monthfill
#Create a full date column from components
tmpdata$DATE<-lubridate::make_date(year=tmpdata$YEAR,
month=tmpdata$MONTH,
day=tmpdata$DAY)
#Move Date near Date fields
yearcol<-match("YEAR",names(tmpdata))
datecol<-match("DATE",names(tmpdata))
tmpdata<-tmpdata[,c(1:(yearcol-1),datecol,yearcol,(yearcol+1):(datecol-1))]
#Rip the country name out of the Location name and camel case it
#Depended on assumption that ": " will separates Country and Location
#Assignment required this to be a second function; thus you must look at the other code for documentation.
tmpdata$LOCATION_NAME<-eq_location_clean(tmpdata)
#Reclass classes we need as classes
tmpdata$LATITUDE<-as.numeric(tmpdata$LATITUDE)
tmpdata$LONGITUDE<-as.numeric(tmpdata$LONGITUDE)
tmpdata$DEATHS<-as.numeric(tmpdata$DEATHS)
tmpdata$EQ_PRIMARY<-as.numeric(tmpdata$EQ_PRIMARY)
tmpdata$TOTAL_DEATHS<-as.numeric(tmpdata$TOTAL_DEATHS)
return(tmpdata)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.