knitr::opts_chunk$set(echo = TRUE)
Proper data management is critically important and taken very seriously within the HEEL. Most funding agencies require that data be curated in easily accessible formats with appropriate metadata, archived securely and in perpetuity, and widely available after the research has been completed. It is therefore absolutely critical that all data generated as part of HEEL activities be properly curated and archived continually throughout the research process. The protocol below describes general requirements for data management as well as specific actions for some types of data. All HEEL personnel are required to follow this protocol when working on HEEL projects.
All data must be 1) adequately described via metadata, 2) managed for data quality, 3) backed up daily in a secure manner, and 4) archived in an easily reproducible format.
All research data must be accompanied with a thorough description of that data from the beginning of the work. Metadata describes information about a dataset, such that a dataset can be understood, reused, and integrated with other datasets. Information described in a metadata record includes where the data were collected, who is responsible for the dataset, why the dataset was created, and how the data are organized. Proper metadata includes the following four components.
Workflow Capture: A workflow is the formal description of how the data have been processed to get to the current state, which includes a description of the researcher's method for experimental data. It conceptualizes the data inputs, transformations and analytical steps to achieve the final data output.
Data Dictionary: A Data Dictionary is a repository of information which defines and describes the data resource with the goal of making it useable by someone unfamiliar with its collection. At a minimum, a data dictionary defines terms and variable in a data set including column headings, codes, etc. Data dictionaries can also include additional information such as units, measurement precision and accuracy, detectable limits, NA values, etc.
Data Citation: A suggested way this data set should be cited going forward. Often similar to citation of a journal article. Also includes reference to other data sets incorporated into the current dataset.
Access Controls: Defines who “owns” the data and allowable uses for the data including any limits to sharing or copyright concerns.
All HEEL researchers must take care that protocols and methods are employed to ensure that data are properly collected, handled, processed, used, and maintained, and that this process is documented in the metadata. HEEL methodological protocols are designed to prevent errors or defects (quality assurance, QA), however researchers should strive to improve these methods (they are in no way perfect) and continually check that the data they are generating matches expectations (e.g., by using quality control, QC, standards in analytical methods). The importance of QA/QC carries through to data entry and management; data should continually be check for errors and discrepancies. In short, check your data and seek ways to make it better.
Bottom line: It is unacceptable to lose project data due to computer hardware failure/loss. Thus, all HEEL researchers are required to have a daily (or better) backup system. This system must: a) be in two distinct locations, and b) include a findable “key” that describes how someone other than yourself can access the data. The Computing Resources section below describes available options, including the HEEL NAS system, but ultimately it is up to the researcher to decide what works best for them and to document where data are stored.
In contrast to data backup, which is to prevent catastrophic loss of work, the goal for data archiving is to make your research easily understandable and reproducible in the future. It is therefore incumbent upon the researcher that, by the end of a project, care and effort is given to providing a highly organized and traceable accounting of the research that is archived in perpetuity. At a minimum, this archive should include: raw and full processed data, complete metadata, all computer code, and any research products (manuscripts, published articles, figures, etc.).
The HEEL has bought into a network attached storage (NAS) system operated by the Fisheries Acoustics Lab. We have a total of 8 TB of storage space and the ability to do automated backups of personal computers. This resource is available to anyone in the lab, but you are also free to use a different service for daily backups. Time Machine, Dropbox, CrashPlan, CarbonCopyCloner, Carbonite are all viable alternatives. However, if you use a different system, please leave a text file on the HEEL NAS system that describes how someone might retrieve your HEEL data if necessary.
Apple macOS
smb://acoustics.washington.edu/HEEL
into the Server Address box. Hit enter.A new Finder window should open showing the folders within the HEEL NAS. This is where data archives should go and where you can use general data storage (e.g., if you need to transfer lots of data). Daily backups of machines do not go here.
MS Windoze OS
\\acoustics.washington.edu\HEEL
into the Run box. Hit enter.A new Explorer window should open showing the folders within the HEEL NAS. This is where data archives should go and where you can use general data storage (e.g., if you need to transfer lots of data). Daily backups of machines do not go here.
Apple macOS
home\CloudStation\Backup
. Within this folder create a new folder that identifies the computer. It should include at least the computer name, your name, and location. This is where the backup should go. MS Windoze OS
USGS Data Management standards and guidelines. \ \ \
x <- matrix(c("File rendered on:", format(Sys.time(), "%d %b %Y"), "R version:", as.character(getRversion()), "HEEL package version:", as.character(packageVersion("HEEL"))), ncol=2, byrow=T) print(knitr::kable(x, row.names = F))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.