Using the effort calculations in fecR requires the input data to be in a particular format. This vignette introduces the check_format() function in fecR that can be used to check the structure of your input data before the effort is calculated.
The function checks:
It currently does not check:
Some basic automatic corrections are offered (see the details below). However, any changes made to the data should be confirmed by the user after the function has executed.
The function returns the data set. If automatic corrections have been asked for, the returned data set will have the corrections. Changes are noted to the screen by the function.
This table is adapted from the Nicosia report Annex. Each row in the table is a fishing operation. Each fishing operation is part of a fishing trip. Each fishing trip has the same vessel identifier, departure and return dates and times and trip ID.
|Column name|Description|Format|Notes|Example| |-------------|---------------|--------------|-----------------------|-------------| | eunr_id | Vessel identifier, anonymous|Character string||"MyVessel1234"| | loa | Vessel length in cm | Numeric | | 3654 | | gt | Gross tonnage | Numeric | | 355 | | kw | Engine power | Numeric | | 1251 | | trip_id | Unique identifier for fishing trip | Character string | | "MyTrip1234"| | depdate | Date of trip departure | Character string | 8 numeric characters: YYYYMMDD | "20140214"| | deptime | Time of trip departure | Character string | 4 numeric characters: HHMM. HH and MM can be separated by a colon: HH:MM| "0745" or "0745"| | retdate | Date of trip return | Character string | 8 numeric characters: YYYYMMDD | "20140214"| | rettime | Time of trip return | Character string | 4 numeric characters: HHMM. HH and MM can be separated by a colon: HH:MM| "0745" or "0745"| | fishdate | Date of fishing operation | Character string | 8 numeric characters: YYYYMMDD | "20140214"| | gear | Gear used for specific fishing operation | Character string | Gear must be listed in the Master Data Register | "OTB" | | gear_mesh_size | Mesh size in mm| Integer | Every mm will be considered as a different gear. For example, a gear with a mesh size of 81 will be considered as a different gear to one with a mesh size of 80. The data needs to be encoded so that size ranges have the same integer. For example, set all sizes in the range 80-89 as 80. A gear without a mesh, e.g. a long line, will have a mesh size of 0. Missing values are not allowed. | 80 | | fishing_area | Area where the fishing operation took place. DCF level 3 (level 4 for Baltic)| Character string | Must be upper case. Missing values are not allowed. | "27.4.B"| | economic_zone | Economic zone where the fishing operation took place | Character string | Must be one of "EU", "NOR" or "UNKNOWN"| "EU" | | rectangle | Rectangle where fishing operation took place | Character string | For example, ICES rectangle or GSA + statistical area. No symbols are allowed, e.g. no ' to separate characters. Must be upper case. Note: GSAs not yet added to list | "39F8"|
The calc_fishing_effort() function in fecR will only work if the input data is correct. The data can be be prepared using a spreadsheet and then saved as a CSV file. This can then be read into R to used by fecR.
The simplest way of confirming if the input data is correct is to use the check_format() function in fecR. If the function executes with a positive message and no warnings, the data is OK and can be used with calc_fishing_effort(). If the data is not OK warnings are produced and informative messages written to the screen. The user should then make changes to the data (either in R or in the original CSV file) and try again.
As mentioned above, it is possible to call check_format() and ask for some basic automatic corrections. If any corrections are made messages are written to screen describing them. It is not possible to automatically correct for everything. The returned data set should be passed into check_format() again to see if it passes the checks.
In this section we show some examples of running check_format() with data and how the automatic correction be used.
First we load the library:
library(fecR)
In this test we invent some data that passes the checks without corrections. The data conists of two trips and four fishing activities.
okdata <- data.frame( eunr_id = "my_boat", loa = 2000, gt = 70, kw = 400, trip_id = c("trip1","trip1","trip2","trip2"), depdate = rep(c("20140718", "20141023"), each=2), deptime = rep(c("0615", "0730"), each=2), retdate = rep(c("20140719", "20141024"), each=2), rettime = rep(c("1830", "1615"), each=2), fishdate = c("20140718", "20140719", "20141023", "20141024"), gear = c("OTB","OTB","GN","GN"), gear_mesh_size = 80, fishing_area = "27.4.B", economic_zone = "EU", rectangle = "39F0", stringsAsFactors = FALSE ) okdata
We can check the data by calling check_format() without correction (the default setting):
test <- check_format(okdata)
You can see that the function checks each of the columns before giving an output message saying that everything is OK. As there were no warnings we can now use this data in calc_fishing_effort() if we want to.
The input data has a strict number of columns and the names need to follow the example above. In this example, an extra column is added to the data. Without asking for corrections, check_format() will complain.
extra_col <- cbind(okdata, new_col = runif(nrow(okdata))) test <- check_format(extra_col)
You can see that a warning is produced and the output message indicates that there is a problem with the data.
We can run check_format() with the automatic corrections turned on:
test <- check_format(extra_col, correct=TRUE)
You can see that there is a message about the extra column and that it will be removed. The returned data set has been corrected by removing the extra column. This means that if we call the check function on the returned, corrected data, it should pass the check without problem.
test2 <- check_format(test)
If one or more of your columns is named incorrectly, the data check complains. No automatic correction is possible for this problem. You will have to rename the columns yourself.
Note that the column names are case-sensitive.
wrong_col <- okdata colnames(wrong_col)[3] <- "something" test <- check_format(wrong_col)
The warning message tells you which columns are missing.
The eunr_id column is the vessel identifier. It should be a character string. If the column is not a character string it is possible to use the automatic correction to force it to be a character.
wrong_eunr_id <- okdata # Force them to be numeric instead of character wrong_eunr_id[,"eunr_id"] <- 1234 test <- check_format(wrong_eunr_id) # With the automatic check test <- check_format(wrong_eunr_id, correct=TRUE)
If an entry is missing in the column (e.g. it is NA or empty) then check complains. It is not possible to automatically correct for missing data.
wrong_eunr_id <- okdata # Set to be missing wrong_eunr_id[1,"eunr_id"] <- as.character(NA) test <- check_format(wrong_eunr_id)
The loa, gt and kw columns store the vessel length in cm, the gross tonnage and the engine power respectively. These columns must be numeric, i.e. no units or characters. If they are not numeric check complains.
Here we demonstrate with the loa column.
wrong_loa <- okdata # Turn to a character string wrong_loa[c(2,3),"loa"] <- "90m" test <- check_format(wrong_loa)
If automatic correction is turned on, the columns are stripped of non-numeric characters and forced to be numeric. This may be enough to pass check. However, this correction is not a guarantee and all automatic corrections should be verified by the user.
test <- check_format(wrong_loa, correct=TRUE)
If there are no numerics in the columns automatic correction is not possible and and check complains.
wrong_loa <- okdata # Change to some entries to be alphabetical with no numerics wrong_loa[c(2,3),"loa"] <- "notnumeric" test <- check_format(wrong_loa, correct=TRUE)
This error will need to be fixed by hand.
The date columns depdate, retdate and fishdate must be characters and each entry must have 8 numeric characters of the format: YYYYMMDD, e.g. "20161023".
It is possible to automatically correct for the column not being a character string, e.g. if an 8 character numeric is entered. However, it is not possible to correct for the the format, e.g. if there are too few characters.
Here the data is numeric when it should be a character string. Automatic correction is possible in this case.
wrong_date <- okdata # Needs to be character string, not numeric even if format is OK wrong_date[, "retdate"] <- as.numeric(wrong_date[, "retdate"]) test <- check_format(wrong_date) # We can correct test <- check_format(wrong_date, correct=TRUE)
If the format is wrong then we cannot automatically correct and check complains.
wrong_date <- okdata # Wrong format - year is too short wrong_date[c(3,4), "retdate"] <- "141024" test <- check_format(wrong_date) # Wrong format again - month must be a numeric character wrong_date[c(3,4), "retdate"] <- "October14" test <- check_format(wrong_date)
Missing data is not allowed and check complains. It is not possible to correct for missing data.
wrong_date <- okdata # Missing data wrong_date[c(3,4), "retdate"] <- as.character(NA) test <- check_format(wrong_date)
The time columns deptime and rettime must be character strings of 4 numeric characters with the format HHMM, e.g. "0615". An additional : is allowed to seperate the HH and MM, e.g. "06:15".
Note that the times use the 24 hour clock.
Here the data is numeric when it should be a character string. It is possible to automatically correct for this.
wrong_time <- okdata # Needs to be character string, not numeric even if format is OK wrong_time[, "rettime"] <- as.numeric(wrong_time[, "rettime"]) test <- check_format(wrong_time) # We can correct by forcing to character test <- check_format(wrong_time, correct=TRUE)
If only 2 characters are provided and automatic correction is TRUE, the characters are assumed to be hours (HH) and minutes of "00" are appended to the string, e.g. "16" becomes "1600".
wrong_time <- okdata wrong_time[c(3,4), "rettime"] <- "16" test <- check_format(wrong_time) test <- check_format(wrong_time, correct=TRUE)
This automatic correction only works if the 2 characters are numeric
wrong_time <- okdata wrong_time[c(3,4), "rettime"] <- "TT" test <- check_format(wrong_time, correct=TRUE)
Two or four characters are needed. For example, "0130" is OK whereas "130" is not
wrong_time <- okdata wrong_time[c(3,4), "rettime"] <- "130" test <- check_format(wrong_time)
A : separator is accecptable (HH:MM) but is removed from the returned data.
wrong_time <- okdata wrong_time[c(3,4), "rettime"] <- "16:15" test <- check_format(wrong_time) test
NAs and missing values are not acceptable and cannot be automatically corrected.
wrong_time <- okdata wrong_time[c(3,4), "rettime"] <- as.character(NA) test <- check_format(wrong_time)
The gear code column must be a character string and the code must be found in the Master Data Register. The check is case sensitive, e.g. "otb" is not valid whereas "OTB" is.
Whitespace is not allowed. It can be automatically removed if wanted.
wrong_gear <- okdata # Gear code is OK but whitespace wrong_gear[1,"gear"] <- " OTB" # Fails test <- check_format(wrong_gear) # Correct removes whitespace test <- check_format(wrong_gear, correct=TRUE)
If the gear code is not found in the MDR list then no automatic correction is possible. Unknown gear codes are not allowed and not corrected for. Here the gear is unknown because it is lower case. All gear codes must be upper case.
wrong_gear[1,"gear"] <- "otb" test <- check_format(wrong_gear)
The gear_mesh_size column must be an integer. It holds the mesh size in mm. Every mm is considered as a different gear, e.g. a gear with a mesh size of 80 is considered to be a different gear to that with a mesh size of 81. This means that gear meshes in the range 80-89 mm should all be given the same gear mesh size of 80.
Entries that are not integer will make check complain. There is no option to autocorrect this.
wrong_ms <- okdata # Text in the entry - must be integer wrong_ms[4,"gear_mesh_size"] <- "80mm" test <- check_format(wrong_ms) # Not an integer wrong_ms[4,"gear_mesh_size"] <- 80.8 test <- check_format(wrong_ms)
If an entry is missing (e.g. it is NA) then check will complain. It is possible to automatically correct this in which case the missing entry has a mesh size of 0. The returned data will pass check but may not be what you want.
wrong_ms <- okdata wrong_ms[4,"gear_mesh_size"] <- NA test <- check_format(wrong_ms) test <- check_format(wrong_ms, correct=TRUE) test
The fishing_area column must be a character string that stores the DCF level 3 code (or DCF level 4 if in the Baltic). Whitespace is not allowed. However, it is possible to automatically correct for whitespace. Similarly, if points (.) are found at the beginning or end of an entry, it is possible to automatically correct for them. Note that the check is case sensitive and all the entries must be in upper case. It is possible to automatically correct for lower case.
wrong_fish_area <- okdata # Point at end wrong_fish_area[c(1,2),"fishing_area"] <- "27.4.A." test <- check_format(wrong_fish_area) test <- check_format(wrong_fish_area, correct=TRUE) # Lowercase wrong_fish_area[c(1,2),"fishing_area"] <- "27.4.a" test <- check_format(wrong_fish_area) test <- check_format(wrong_fish_area, correct=TRUE) # White space wrong_fish_area[c(1,2),"fishing_area"] <- "27.4.A " test <- check_format(wrong_fish_area) test <- check_format(wrong_fish_area, correct=TRUE)
Missing values are not allowed and cannot be automatically corrected.
wrong_fish_area <- okdata wrong_fish_area[c(1,2),"fishing_area"] <- as.character(NA) test <- check_format(wrong_fish_area)
The economic_zone column must be a character string and must be one of "EU", "NOR" or "UNKNOWN". The check is case sensitive. If the entries do not match these strings then check complains.
It is not possible to automatically correct any errors.
wrong_econ <- okdata wrong_econ[3,"economic_zone"] <- "USA" test <- check_format(wrong_econ)
Missing values are not allowed and cannot be automatically corrected.
wrong_econ <- okdata wrong_econ[3,"economic_zone"] <- "" test <- check_format(wrong_econ)
The rectangle column must be a character string and each entry must be a valid ICES rectangle. If non alpha-numeric characters are found in the data it is possible to automatically correct for them by removing them. Similarly, the check is case sensitive but it is possible to automatically correct the case.
wrong_rect <- okdata # with extra punctuation wrong_rect[3,"rectangle"] <- "39F0'" test <- check_format(wrong_rect) test <- check_format(wrong_rect, correct=TRUE) test
Missing values are not allowed and cannot be automatically corrected.
wrong_rect <- okdata wrong_rect[3,"rectangle"] <- "" test <- check_format(wrong_rect)
Each trip is defined by the vessel identifier, start and return dates and times and has a unique trip identifier. A trip entry with the same trip ID cannot have different vessel IDs, dates and times.
For example, here we change the departure time of an entry for trip2 so that the trip has different departure times.
# one trip, two days, different departure time, same identifier wrong_unique <- okdata wrong_unique[4,"deptime"] <- "0731" test <- check_format(wrong_unique)
There should be no duplicate entries in the data set. If duplicate entries are detected, it is possible to automatically correct by removing them.
# Duplicates wrong_dup <- okdata # Add a duplicate row wrong_dup <- rbind(wrong_dup, wrong_dup[1,]) test <- check_format(wrong_dup) test <- check_format(wrong_dup, correct=TRUE)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.