R/ebv_create_taxonomy.R
In ebvcube: Working with netCDF for Essential Biodiversity Variables

Documented in ebv_create_taxonomy

#'Create an EBV netCDF with taxonomy
#'
#'@description Create the core structure of the EBV netCDF based on the json
#'  from the \href{https://portal.geobon.org}{EBV Data Portal}. Additionally,
#'  you can add the hierarchy of the taxonomy. This is not provided in the
#'  [ebvcube::ebv_create()] function. Use the [ebvcube::ebv_create()] function
#'  if your dataset holds no taxonomic information. Data will be added
#'  afterwards using [ebvcube::ebv_add_data()].
#'
#'@param jsonpath Character. Path to the json file downloaded from the EBV Data
#'  Portal. Login to the page and click on 'Uploads' and 'New Upload' to start
#'  the process.
#'@param outputpath Character. Set path where the netCDF file should be created.
#'@param taxonomy Character. Path to the csv table holding the taxonomy.
#'  Default: comma-separated delimiter, else change the `sep` argument
#'  accordingly. The csv needs to have the following structure: The header
#'  displays the names of the different taxonomy levels ordered from the highest
#'  level to the lowest, e.g. "order", "family", "genus", "scientificName". We
#'  strongly encourage the usage of the \href{https://dwc.tdwg.org/terms/#taxon}{Darwin
#'  Core terminology} for the taxonomy levels. The last column (if `taxonomy_key`=FALSE)
#'  is equivalent to the `entity` argument in the [ebvcube::ebv_create()]
#'  function. Each row of the csv corresponds to a unique entity. In case the
#'  `taxonomy_key` argument (see below) is set to TRUE, this table gets an
#'  additional last column which holds the taxonomy key per entity - in this
#'  case the second last column contains the entity names, e.g. the following
#'  column order: "order", "family", "genus", "scientificName", "taxonomy_key".
#'  The last column (here named "taxonomy_key") should have the exact name of
#'  the taxonomy key from the authority of the taxonomic backbone. It will be
#'  added as an additional attribute to the netCDF. For example, if your
#'  taxonomy is based on the \href{https://www.gbif.org/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c}{GBIF
#'  Backbone Taxonomy} the column name could be "usageKey". For an
#'  example CSV including the `taxonomy_key` download the 'Entities as CSV'
#'  from the \href{https://portal.geobon.org/ebv-detail?id=82}{Occurrence
#'  Metrics for the Birds Directive Annex I Species in EU27 dataset} of the
#'  portal. For an example without `taxonomy_key`, download
#'  \href{https://portal.geobon.org/ebv-detail?id=84}{Species habitat
#'  suitability of European terrestrial vertebrates for contemporary climate
#'  and land use}.
#'  To create your netCDF follow the same structure. The column names may be
#'  different depending on the taxonomy used.
#'@param taxonomy_key Logical. Default: FALSE. Set to TRUE if the last column in
#'  your taxonomy csv file defines the taxonomy_key for each entity.
#'@param epsg Integer. Default: 4326 (WGS84). Defines the coordinate reference
#'  system via the corresponding epsg code.
#'@param extent Numeric. Default: c(-180,180,-90,90). Defines the extent of the
#'  data: c(xmin, xmax, ymin, ymax).
#'@param resolution Numerical. Vector of two numerical values defining the
#'  longitudinal and latitudinal resolution of the pixel: c(lon,lat).
#'@param timesteps Character. Vector of the timesteps in the dataset. Default:
#'  NULL - in this case the time will be calculated from the start-, endpoint
#'  and temporal resolution given in the metadata file (json). Else, the dates
#'  must be given in in ISO format 'YYYY-MM-DD' or shortened 'YYYY' in case of
#'  yearly timesteps.
#'@param fillvalue Numeric. Value of the missing data (NoData value) in the
#'  array. Has to be a single numeric value or NA.
#'@param prec Character. Default: 'double'. Precision of the data set. Valid
#'  options: 'short' 'integer' 'float' 'double' 'char' 'byte'.
#'@param sep Character. Default: ','. If the delimiter of the csv specifying the
#'  entity-names differs from the default, indicate here.
#'@param force_4D Logical. Default is TRUE. If the argument is TRUE, there will
#'  be 4D cubes (lon, lat, time, entity) per metric. If this argument is changed
#'  to FALSE, there will be 3D cubes (lon, lat, time) per entity (per metric).
#'  So the latter yields a higher amount of cubes and does not bundle all
#'  information per metric. In the future the standard will be restricted to the
#'  4D version. Recommendation: go with the 4D cubes!
#'@param overwrite Logical. Default: FALSE. Set to TRUE to overwrite the output
#'  file defined by 'outputpath'
#'@param verbose Logical. Default: TRUE. Turn off additional prints by setting
#'  it to FALSE.
#'
#'@note To check out the results take a look at your netCDF file with
#'  \href{https://www.giss.nasa.gov/tools/panoply/}{Panoply} provided by the
#'  NASA.
#'
#'@note You can check the taxonomy info with [ebvcube::ebv_properties()] in the
#'  slot 'general' under the name 'taxonomy' and 'taxonomy_key'.
#'
#'@return Creates the netCDF file at the 'outputpath' location including the
#'  taxonomy information.
#'@export
#'
#'@importFrom utils head
#'
#' @examples
#' #set path to JSON file
#' json <- system.file(file.path("extdata/testdata","5.json"), package="ebvcube")
#' #set output path of the new EBV netCDF
#' out <-  tempfile(fileext='.nc')
#' #set path to the csv holding the taxonomy names
#' taxonomy <- file.path(system.file(package='ebvcube'),"extdata/testdata","id5_entities.csv")
#'
#' #create new EBV netCDF with taxonomy
#' \dontrun{
#' ebv_create_taxonomy(jsonpath = json, outputpath = out, taxonomy = taxonomy,
#'            fillvalue = -127, resolution = c(0.25, 0.25), verbose = FALSE)
#' #remove file
#' file.remove(out)
#' }
ebv_create_taxonomy <- function(jsonpath, outputpath, taxonomy, taxonomy_key=FALSE,
                                epsg = 4326, extent = c(-180, 180, -90, 90), resolution = c(1, 1),
                                timesteps = NULL, fillvalue, prec = 'double',
                                sep=',', force_4D = TRUE, overwrite = FALSE,
                                verbose = TRUE){

  # start initial tests ----
  # ensure file and all datahandles are closed on exit
  withr::defer(
    if(exists('hdf')){
      if(rhdf5::H5Iis_valid(hdf)==TRUE){rhdf5::H5Fclose(hdf)}
    }
  )
  gids <- c('mgid', 'sgid', 'mid')
  withr::defer(
    for (id in gids){
      if(exists(id)){
        id <- eval(parse(text = id))
        if(rhdf5::H5Iis_valid(id)==TRUE){rhdf5::H5Gclose(id)}
      }
    }
  )
  dids <- c('crs.id', 'lat.id', 'lon.id', 'time.id', 'did', 'entity.id',
            'ent_list_did', 'ent_level_did', 'taxonomy_key_did')
  withr::defer(
    for (id in dids){
      if(exists(id)){
        id <- eval(parse(text = id))
        if(rhdf5::H5Iis_valid(id)==TRUE){rhdf5::H5Dclose(id)}
      }
    }
  )
  withr::defer(
    if(exists('nc')){
      tryCatch(l <- utils::capture.output(ncdf4::nc_close(nc)))
    }
  )
  withr::defer(
    if(exists('nc_test')){
      tryCatch(l <- utils::capture.output(ncdf4::nc_close(nc_test)))
    }
  )

  #ensure that all tempfiles are deleted on exit
  withr::defer(
    if(exists('temp')){
      if(file.exists(temp)){
        file.remove(temp)
      }
    }
  )
  withr::defer(
    if(exists('csv_txt')){
      rm(csv_txt)
    }
  )

  #set UTF8 encoding
  withr::local_options(list(encoding = "UTF-8"))

  #are all arguments given?
  if(missing(jsonpath)){
    stop('Jsonpath argument is missing.')
  }
  if(missing(outputpath)){
    stop('Outputpath argument is missing.')
  }
  if(missing(taxonomy)){
    stop('Taxonomy argument is missing.')
  }
  if(missing(fillvalue)){
    stop('Fillvalue argument is missing.')
  }

  #turn off local warnings if verbose=TRUE
  if(checkmate::checkLogical(verbose, len=1, any.missing=FALSE) != TRUE){
    stop('Verbose must be of type logical.')
  }

  #check logical arguments
  if(checkmate::checkLogical(overwrite, len=1, any.missing=FALSE) != TRUE){
    stop('overwrite must be of type logical.')
  }
  if(checkmate::checkLogical(force_4D, len=1, any.missing=FALSE) != TRUE){
    stop('force_4D must be of type logical.')
  }

  #check if json exists
  if (checkmate::checkCharacter(jsonpath) != TRUE){
    stop('Filepath (JSON) must be of type character.')
  }
  if (checkmate::checkFileExists(jsonpath) != TRUE){
    stop(paste0('Json file does not exist.\n', jsonpath))
  }
  if (!(endsWith(jsonpath, '.json') || endsWith(jsonpath, '.js'))){
    stop(paste0('Json file ending is wrong. File cannot be processed.'))
  }

  #check if ouputpath exists
  #outputpath check
  if (checkmate::checkCharacter(outputpath) != TRUE){
    stop('Outputpath must be of type character.')
  }
  if(checkmate::checkDirectoryExists(dirname(outputpath)) != TRUE){
    stop(paste0('Output directory does not exist.\n', dirname(outputpath)))
  }
  if(!endsWith(outputpath, '.nc')){
    stop('Outputpath needs to end with *.nc ')
  }
  #check if outpufile exists if overwrite is disabled
  if(!overwrite){
    if(checkmate::checkPathForOutput(outputpath) != TRUE){
      stop('Output file already exists. Change name or enable overwrite.')
    }
  }

  #check if epsg is valid
  crs_wkt <- ebv_i_eval_epsg(epsg)

  #check extent
  if (checkmate::checkNumeric(extent, len = 4) != TRUE){
    stop('extent needs to be a list of 4 numeric values.')
  }

  #check taxonomy
  if (checkmate::checkCharacter(taxonomy) != TRUE){
    stop('Taxonomy must be of type character.')
  }
  if(checkmate::checkCharacter(taxonomy, len=1) != TRUE){
    #length longer than 1 -> return error
    stop('Taxonomy must be of length 1.')
  }else{
    # check csv
      if (checkmate::checkFileExists(taxonomy) != TRUE){
        stop(paste0('Taxonomy csv file does not exist.\n', taxonomy))
      }
      if (!endsWith(taxonomy, '.csv')){
        stop(paste0('Taxonomy file ending is wrong. File cannot be processed. Must be *.csv.'))
      }
      #read csv---
      # check if data inside
      tryCatch({
        csv_txt <- suppressWarnings(utils::read.csv(taxonomy, sep=sep, header=TRUE, fileEncoding="UTF-8", row.names = NULL))
      },
      error=function(e){
        if(stringr::str_detect(as.character(e), 'no lines available')){
          stop('Empty csv table given for taxonomy')
        } else {
          stop(paste0('Could not read csv (taxonomy):\n', as.character(e)))
        }
      })
  }

  #check prec
  if (! prec %in% c('short', 'integer', 'float', 'double', 'char', 'byte')){
    stop('prec value not valid!')
  }

  #check fillvalue
  if(checkmate::checkNumber(fillvalue) != TRUE && !is.na(fillvalue)){
    stop('The fillvalue needs to be a single numeric value or NA.')
  }


  #check resolution
  if (checkmate::checkNumeric(resolution, len = 2) != TRUE){
    stop('resolution needs to be a list of 2 numeric values.')
  }

  #get temp directory
  temp_path <- tempdir()

  #read json ----
  file <- jsonlite::fromJSON(txt=jsonpath)
  #json root
  json <- file$data

  #check timesteps----
  t_res <- json$time_coverage$time_coverage_resolution

  if(!is.null(timesteps) && t_res != 'Paleo'){
    if (checkmate::checkCharacter(timesteps) != TRUE){
      stop('timesteps needs to be a list of character values.')
    }else {
      for(ts in timesteps){
        #check ISO format
        if(!(grepl('^\\d{4}-\\d{2}-\\d{2}$', ts) || grepl('^\\d{4}$', ts))){
          stop(paste0('Your timestep ', ts, ' is not following the indicated ISO format. Check help page for more information.'))
        }

      }
    }
  }

  # end initial tests ----

  #overwrite --> delete file ---
  if (file.exists(outputpath) && overwrite==TRUE){
    tryCatch(file.remove(outputpath),
             warning = function(w){
               temp <- stringr::str_remove(as.character(w), '\\\\')
               if(stringr::str_detect(temp, 'cannot remove file')){
                 stop('Outputpath file already exists and you enabled overwrite, but file cannot be overwritten. Most likely the file is opened in another application.')
               }
             })

  }

  # read taxonomy info ----
  dim_csv <- dim(csv_txt)

  #get entities and maybe taxonomy_key-list
  if(taxonomy_key){
    key_name <- names(csv_txt)[dim_csv[2]]
    entities <- csv_txt[, (dim_csv[2]-1)]
    taxonomy_key_list <- csv_txt[, dim_csv[2]]
    csv_txt <- csv_txt[, -dim_csv[2]]
    taxon_list <- names(csv_txt)
  }else{
    taxon_list <- names(csv_txt)
    entities <- csv_txt[, dim_csv[2]]
    taxonomy_key_list <- NA
  }

  #double-check entities----
  if(any(is.na(entities))){
    stop('There is at least one entity-name NA. Please check again that the last (taxonomy_key=FALSE) or the second-last (taxonomy_key=TRUE) column of your CSV holds the entity-names and has no NA value.')
  }

  #check taxonomy_key list and entities
  if(taxonomy_key && (length(taxonomy_key_list)!=length(entities))){
    stop(paste0('The amoubt of your taxonomy keys differs from the length of the entities:\n',
                'length taxonomy keys: ', length(taxonomy_key),
                '\nlength entities: ', length(entities)))
  }

  # get basic hierarchy info ----
  metrics_no <- length(json$ebv_metric)
  entities_no <- length(entities)
  scenarios_no <- length(json$ebv_scenario)-3
  taxon_no <- length(taxon_list)
  if (scenarios_no==1){
    if(ebv_i_empty(file$data$ebv_scenario[[1]]) || file$data$ebv_scenario[[1]]=='N/A')
      scenarios_no <- 0
  } else if(scenarios_no<0){
    scenarios_no <- 0
  }

  # get crs information ----
  # :GeoTransform
  res <- resolution
  geo_trans <- paste0(extent[1], " ", res[1], " 0.0 ", extent[4], " 0.0 -", res[2])

  # :spatial_ref
  #remove additional whitespaces
  crs_temp <- stringr::str_replace_all(crs_wkt, '\n', ' ')
  crs_temp <- stringr::str_replace_all(crs_temp, '         ', ' ')
  crs_ref <- stringr::str_replace_all(crs_temp, '     ', ' ')

  # unit
  if(stringr::str_detect(crs_ref, 'PROJCRS')){
    crs_unit <- 'meter'
  } else{
    crs_unit <- 'degree'
  }

  # get dimensions ----
  # time ----
  t_start <- json$time_coverage$time_coverage_start
  t_end <- json$time_coverage$time_coverage_end

  #get ISO timesteps for irregular and paleo -> shiny app
  if(t_res=='Irregular'){
    if(is.null(timesteps)){
      timesteps <- json$timesteps[[1]]
    }
    if(!is.null(timesteps)){
      if(timesteps[1]=='N/A'){
        timesteps <- NULL
      }
    }
  }

  if(t_res=='Paleo'){
    if(is.null(timesteps)){
      timesteps <- json$timesteps[[1]]
    }
    if(!is.null(timesteps)){
      if(timesteps[1]=='N/A'){
        timesteps <- NULL
      }
    }
    if(!is.null(timesteps)){
      timesteps <- as.numeric(timesteps)
      timesteps <- sort(timesteps, decreasing = TRUE)
    }
  }

  #create integer timesteps
  add <- 40177

  #calculate timesteps
  if(is.null(timesteps) && t_res!='Paleo'){
    if(t_res=="P0000-00-00"){
      #one timestep only
      #check
      if(t_start!=t_end && verbose){
        warning('Your dataset has one timestep only based on the temporal resolution attribute but your given start and end date are different. Note: the end date will be applied to the dataset.')
      }
      date <- as.numeric(as.Date(t_end))
      timesteps <- date+add
    }else if(grepl('^P\\d{4}-?\\d{0,2}-?\\d{0,2}$', t_res)){
      #process ISO standard PYYYY-MM-YY or short PYYYY
      y <- stringr::str_split(stringr::str_remove(t_res, 'P')[[1]], '-')[[1]][1]
      m <- stringr::str_split(stringr::str_remove(t_res, 'P')[[1]], '-')[[1]][2]
      d <- stringr::str_split(stringr::str_remove(t_res, 'P')[[1]], '-')[[1]][3]
      if(as.numeric(y)!= 0){
        sequence <- seq.Date(from = as.Date(t_start),
                             to = as.Date(t_end),
                             by = paste0(y, ' year'))
      } else if(as.numeric(m)!= 0){
        sequence <- seq.Date(from = as.Date(t_start),
                             to = as.Date(t_end),
                             by = paste0(m, ' month'))
      }else if(as.numeric(d)!= 0){
        sequence <- seq.Date(from = as.Date(t_start),
                             to = as.Date(t_end),
                             by = paste0(d, ' day'))
      }

      #timestep values
      timesteps <- c()
      for (s in sequence){
        date <- as.numeric(s)
        timestep <- date+add
        timesteps <- c(timesteps, timestep)
      }

    }else{
      #process old standard
      if (mapply(grepl, 'year', t_res, ignore.case=TRUE)){
        start <- as.integer(stringr::str_split(t_start, '-')[[1]][1])
        end <- as.integer(stringr::str_split(t_end, '-')[[1]][1])
        intervall <- as.numeric(regmatches(t_res, gregexpr("[[:digit:]]+", t_res))[[1]][1])
        if(is.na(intervall)){ #yearly
          intervall <- 1
        }
        sequence <- seq(start, end, intervall)
        timesteps <- c()
        for (s in sequence){
          date <- as.numeric(as.Date(paste0(as.character(s), '-01-01'), format = '%Y-%m-%d'))
          timestep <- date+add
          timesteps <- c(timesteps, timestep)
        }
      } else if (mapply(grepl, 'month', t_res, ignore.case=TRUE)){
        start <- as.Date(t_start)
        end   <- as.Date(t_end)
        sequence <- seq(from=start, to=end, by='month')
        timesteps <- c()
        for (s in sequence){
          date <- as.numeric(as.Date(s, origin=as.Date("1970-01-01")))
          timestep <- s+add
          timesteps <- c(timesteps, timestep)
        }
      }else if (mapply(grepl, 'day', t_res, ignore.case=TRUE)){
        start <- as.Date(t_start)
        end   <- as.Date(t_end)
        sequence <- seq(from=start, to=end, by='days')
        timesteps <- c()
        for (s in sequence){
          date <- as.numeric(as.Date(s, origin=as.Date("1970-01-01")))
          timestep <- s+add
          timesteps <- c(timesteps, timestep)
        }
      } else if (mapply(grepl, 'decad', t_res, ignore.case=TRUE)){
        start <- as.integer(stringr::str_split(t_start, '-')[[1]][1])
        end <- as.integer(stringr::str_split(t_end, '-')[[1]][1])
        intervall <- 10
        sequence <- seq(start, end, intervall)
        timesteps <- c()
        for (s in sequence){
          date <- as.numeric(as.Date(paste0(as.character(s), '-01-01'), format = '%Y-%m-%d'))
          timestep <- date+add
          timesteps <- c(timesteps, timestep)
        }
      } else if (mapply(grepl, 'annually', t_res, ignore.case=TRUE)){
        start <- as.integer(stringr::str_split(t_start, '-')[[1]][1])
        end <- as.integer(stringr::str_split(t_end, '-')[[1]][1])
        intervall <- 1
        sequence <- seq(start, end, intervall)
        timesteps <- c()
        for (s in sequence){
          date <- as.numeric(as.Date(paste0(as.character(s), '-01-01'), format = '%Y-%m-%d'))
          timestep <- date+add
          timesteps <- c(timesteps, timestep)
        }
      } else {
        warning('Could not detect delta time. Empty time dataset created')
        timesteps <- c(0)
      }
    }
  }else if (t_res != 'Paleo'){
    #take given timesteps and transform them into integer values
    temp_temp <- c()
    for (ts in timesteps){
      if(!grepl('d{4}-\\d{2}-\\d{2}$', ts)){
        ts <- paste0(ts, '-01-01')
        date <- as.numeric(as.Date(ts))
        temp_temp <- c(temp_temp, date+add)
      }else{
        date <- as.numeric(as.Date(ts))
        temp_temp <- c(temp_temp, date+add)
      }
    }
    timesteps <- temp_temp
  }
  #if no timesteps are presented anywhere, throw error
  if(is.null(timesteps)){
    stop('There are no timesteps given. Define the argument "timesteps", to create your EBV netCDF.')
  }


  # lat ----
  res <- as.numeric(res)
  lat.min <- extent[3]
  lat.max <- extent[4]
  lat_data <- seq((lat.min+(res[2]/2)), (lat.max-(res[2]/2)), res[2])
  lat_data <- rev(lat_data)

  # lon ----
  lon.min <- extent[1]
  lon.max <- extent[2]
  lon_data <- seq((lon.min+(res[1]/2)), (lon.max-(res[1]/2)), res[1])

  # entities ----
  if(!entities_no==0){
    #create entity list
    entity.list <- c()
    for (e in 1:(entities_no)){
      ent <- paste0('entity_', as.character(e))
      entity.list <- c(entity.list, ent)
    }
  } else {
    entity.list <- c('data')
  }

  # create dimensions ----
  lat_dim <- ncdf4::ncdim_def('lat', crs_unit, vals = lat_data)
  lon_dim <- ncdf4::ncdim_def('lon', crs_unit, vals = lon_data)
  if(t_res=='Paleo'){
    time_dim <- ncdf4::ncdim_def('time', 'kyrs B.P.', timesteps, unlim = TRUE)#HERE
  }else{
    time_dim <- ncdf4::ncdim_def('time', 'days since 1860-01-01 00:00:00.0', timesteps, unlim = TRUE)#HERE
  }
  entity_dim <- ncdf4::ncdim_def('entity', '', vals = 1:entities_no, create_dimvar=FALSE)
  taxon_dim <- ncdf4::ncdim_def('taxonlevel', '', vals = 1:taxon_no, create_dimvar=FALSE)

  # create list of vars 3D----
  if(force_4D==FALSE){
    var_list <- c()
    # 1. metric, no scenario
    if(scenarios_no==0){
      for (j in 1:(metrics_no)){
        #all entities for that metric
        for (ent in entity.list){
          var_list <- c(var_list, paste0('metric_', as.character(j), '/', ent))
        }
      }
      #2. scenario and metric (entities are not relevant)
    } else {
      for (i in 1:(scenarios_no)){
        for (j in 1:(metrics_no)){
          #add entities
          for (ent in entity.list){
            var_list <- c(var_list, paste0('scenario_', as.character(i), '/metric_', as.character(j), '/', ent))
          }
        }
      }
    }
  } else{
    # create list of vars 4D----
    var_list <- c()
    # 1. metric, no scenario
    if(scenarios_no==0){
      for (j in 1:(metrics_no)){
        #add ebv_cube
        var_list <- c(var_list, paste0('metric_', as.character(j), '/ebv_cube'))
      }
      #2. scenario and metric (entities are not relevant)
    } else {
      for (i in 1:(scenarios_no)){
        #create scenario group
        ending.s <- as.character(i)
        for (j in 1:(metrics_no)){
          #create metric group
          ending.m <- as.character(j)
          #add ebv_cube
          var_list <- c(var_list, paste0('scenario_', ending.s, '/metric_', ending.m, '/ebv_cube'))
        }
      }
    }
  }

  #get units of metric ----
  units <- c()
  for (j in 1:(metrics_no)){
    #metric units list
    units <- c(units, eval(parse(text=paste0('json$ebv_metric$ebv_metric_', j, '$`:units`'))))
  }

  var_list_nc <- list()
  enum <- 1

  #check shuffle
  if(prec=='integer' || prec=='short'){
    shuffle <- TRUE
  } else{
    shuffle <- FALSE
  }

  #define chunksize----
  #create one 3D var to detect default chunksize
  #aim: do not chunk along entities but time!
  if(force_4D){
    #create temporary 3D file
    temp <- file.path(temp_path, 'ebv_chunksize_3d_test.nc')
    test_def <- ncdf4::ncvar_def(name = 'test_var', units = 'some units',
                                 dim= list(lon_dim, lat_dim, time_dim),
                                 compression=5, prec=prec,
                                 verbose=FALSE, shuffle=shuffle)
    nc_test <- ncdf4::nc_create(filename = temp,
                                vars = test_def,
                                force_v4 = TRUE,
                                verbose = FALSE)
    ncdf4::nc_close(nc_test)
    #read out chunksize definition
    nc_test <- ncdf4::nc_open(temp)
    chunksizes_old <- nc_test$var$test_var$chunksizes
    ncdf4::nc_close(nc_test)
    #define chunksize
    chunksizes_new <- c(chunksizes_old, 1)
    #remove temp file
    if(file.exists(temp)){
      file.remove(temp)
    }
  }

  # create all vars 3D ----
  if(force_4D==FALSE){
    if (!is.null(fillvalue)){
      for (var in var_list){
        metric.str <- stringr::str_split(var, '/')[[1]][stringr::str_detect(stringr::str_split(var, '/')[[1]], 'metric')]
        metric.digit <- as.numeric(regmatches(metric.str, gregexpr("[[:digit:].]+", metric.str))[[1]])
        name <- paste0('var', enum)
        assign(name, ncdf4::ncvar_def(name = var, units = units[metric.digit],
                                      dim= list(lon_dim, lat_dim, time_dim),
                                      missval=fillvalue, compression=5,
                                      prec=prec, verbose=verbose, shuffle=shuffle
        ))
        var_list_nc[[enum]] <- eval(parse(text=name))
        enum <- enum +1
      }
    } else {
      for (var in var_list){
        metric.str <- stringr::str_split(var, '/')[[1]][stringr::str_detect(stringr::str_split(var, '/')[[1]], 'metric')]
        metric.digit <- as.numeric(regmatches(metric.str, gregexpr("[[:digit:].]+", metric.str))[[1]])
        name <- paste0('var', enum)
        assign(name, ncdf4::ncvar_def(name = var, units = units[metric.digit],
                                      dim= list(lon_dim, lat_dim, time_dim),
                                      compression=5, prec=prec,
                                      verbose=verbose, shuffle=shuffle
        ))
        var_list_nc[[enum]] <- eval(parse(text=name))
        enum <- enum +1
      }
    }
  }else{
    # create all vars 4D ----
    if (!is.null(fillvalue)){
      for (var in var_list){
        metric.str <- stringr::str_split(var, '/')[[1]][stringr::str_detect(stringr::str_split(var, '/')[[1]], 'metric')]
        metric.digit <- as.numeric(regmatches(metric.str, gregexpr("[[:digit:].]+", metric.str))[[1]])
        name <- paste0('var', enum)
        assign(name, ncdf4::ncvar_def(name = var, units = as.character(units[metric.digit]),
                                      dim= list(lon_dim, lat_dim, time_dim, entity_dim),
                                      missval=fillvalue, compression=5, prec=prec,
                                      verbose=verbose, shuffle=shuffle,
                                      chunksizes=chunksizes_new))
        var_list_nc[[enum]] <- eval(parse(text=name))
        enum <- enum +1
      }
    } else {
      for (var in var_list){
        metric.str <- stringr::str_split(var, '/')[[1]][stringr::str_detect(stringr::str_split(var, '/')[[1]], 'metric')]
        metric.digit <- as.numeric(regmatches(metric.str, gregexpr("[[:digit:].]+", metric.str))[[1]])
        name <- paste0('var', enum)
        assign(name, ncdf4::ncvar_def(name = var, units = as.character(units[metric.digit]),
                                      dim= list(lon_dim, lat_dim, time_dim, entity_dim),
                                      compression=5, prec=prec,
                                      verbose=verbose, shuffle=shuffle,
                                      chunksizes=chunksizes_new))
        var_list_nc[[enum]] <- eval(parse(text=name))
        enum <- enum +1
      }
    }
  }

  #add crs variable ----
  var_list_nc[[enum]] <- ncdf4::ncvar_def(name = 'crs', units = '',
                                          dim= list(),
                                          prec='char', verbose=verbose)

  enum <- enum+1
  #check for special characters
  sz <- c()
  for (u in c('\ufc', '\uf6', '\ue4', '\udf', '\udc', '\uc4', '\ud6')){
    if(any(stringr::str_detect(entities, u))){
      sz <- c(sz, u)
    }
  }
  if (!ebv_i_empty(sz)){
    message(paste0('Your entity names (csv) encompasses the following special characters: ', paste(sz, collapse = ' '),
                   '. Please change these as they will not be stored correctly!'))
  }

  #add entities variable ----
  max_char_entity <- max(apply(csv_txt, 2, function(x) max(nchar(x))), na.rm=TRUE)
  dimchar_entity <- ncdf4::ncdim_def("nchar", "", 1:max_char_entity, create_dimvar=FALSE)
  #entity
  var_list_nc[[enum]] <- ncdf4::ncvar_def(name = 'entity', unit='1', #HERE adimensional
                                          dim=list(dimchar_entity, entity_dim),
                                          prec='char', verbose = verbose)
  enum <- enum+1
  #add entity_taxonomy_table variable ----
  var_list_nc[[enum]] <- ncdf4::ncvar_def(name = 'entity_taxonomy_table', unit='1', #HERE adimensional
                                          dim=list(taxon_dim, entity_dim, dimchar_entity),
                                          prec='char', verbose = verbose)
  enum <- enum+1

  # add entity_taxonomy_levels variable ----
  max_char_taxonlevel <- max(nchar(taxon_list))
  dimchar_taxonlevel <- ncdf4::ncdim_def("nchar_taxonlist", "", 1:max_char_taxonlevel, create_dimvar=FALSE)
  var_list_nc[[enum]] <- ncdf4::ncvar_def(name = 'entity_taxonomy_levels', unit='1', #HERE adimensional
                                          dim=list(taxon_dim, dimchar_taxonlevel),
                                          prec='char', verbose = verbose)
  enum <- enum+1

  # add entity_ids variable ----
  if(taxonomy_key){
    max_char_taxonomy_key <- max(nchar(taxonomy_key_list))
    dimchar_taxonomy_key <- ncdf4::ncdim_def("nchar_taxonid", "", 1:max_char_taxonomy_key, create_dimvar=FALSE)
    var_list_nc[[enum]] <- ncdf4::ncvar_def(name = 'entity_taxonomy_key', unit='1', #HERE adimensional
                                            dim=list(entity_dim, dimchar_taxonomy_key),
                                            prec='char', verbose = verbose)
  }

  # add all vars ----
  # also creates groups
  nc <- ncdf4::nc_create(filename = outputpath,
                         vars = var_list_nc,
                         force_v4 = TRUE,
                         verbose = verbose)

  # close file
  ncdf4::nc_close(nc)

  # use hdf5 to add all attributes ----
  # open file
  hdf <- rhdf5::H5Fopen(outputpath)

  # global attributes ----
  #static attributes
  ebv_i_char_att(hdf, 'doi', 'pending')
  ebv_i_char_att(hdf, 'Conventions', 'CF-1.8, ACDD-1.3, EBV-1.0')
  ebv_i_char_att(hdf, 'naming_authority', 'The German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig')
  ebv_i_char_att(hdf, 'date_issued', 'pending')
  ebv_i_char_att(hdf, 'history', paste0('EBV netCDF created using ebvcube, ', Sys.Date()))
  ebv_i_char_att(hdf, 'ebv_vocabulary', 'https://portal.geobon.org/api/v1/ebv')
  if(force_4D){
    ebv_i_char_att(hdf, 'ebv_cube_dimensions', 'lon, lat, time, entity')
  } else{
    ebv_i_char_att(hdf, 'ebv_cube_dimensions', 'lon, lat, time')
  }

  #dynamic attributes
  {
    global.att <- list()
    global.att['title'] <- 'title'
    global.att['id'] <- 'preliminary_id'
    global.att['summary'] <- 'summary'
    global.att['references'] <- 'references'
    global.att['source'] <- 'source'
    global.att['project_name'] <- 'project'
    global.att['project_url'] <- 'project_url'
    global.att['date_created'] <- 'date_created'
    global.att['creator_name'] <- 'creator$creator_name'
    global.att['creator_institution'] <- 'creator$creator_institution'
    global.att['creator_email'] <- 'creator$creator_email'
    global.att['creator_url'] <- 'creator$creator_url'
    global.att['license'] <- 'license'
    global.att['contributor_name'] <- 'contributor_name'
    global.att['publisher_name'] <- 'publisher$publisher_name'
    global.att['publisher_institution'] <- 'publisher$publisher_institution'
    global.att['publisher_email'] <- 'publisher$publisher_email'
    global.att['publisher_url'] <- 'publisher$publisher_url'
    global.att['comment'] <- 'comment'
    global.att['ebv_class']<-'ebv$ebv_class'
    global.att['ebv_name']<-'ebv$ebv_name'
    global.att['ebv_geospatial_scope']<-'ebv_geospatial$ebv_geospatial_scope'
    global.att['ebv_geospatial_description']<-'ebv_geospatial$ebv_geospatial_description'
    global.att['ebv_domain']<-'ebv_domain'
  }

  #keywords
  keywords <- paste0('ebv_class: ', json$ebv$ebv_class, ', ebv_name: ', json$ebv$ebv_name,
                     ', ebv_domain: ', paste0(json$ebv_domain[[1]], collapse=', '), ', ebv_geospatial_scope: ',
                     json$ebv_geospatial$ebv_geospatial_scope, ', ebv_entity_type: ',
                     json$ebv_entity$ebv_entity_type)

  if(scenarios_no > 0){
    global.att['ebv_scenario_classification_name']<-'ebv_scenario$ebv_scenario_classification_name'
    global.att['ebv_scenario_classification_version']<-'ebv_scenario$ebv_scenario_classification_version'
    global.att['ebv_scenario_classification_url']<-'ebv_scenario$ebv_scenario_classification_url'
    keywords <- paste0(keywords, ', ebv_scenario_classification_name: ',
                       json$ebv_scenario$ebv_scenario_classification_name)
  }


  #terranova datasets
  if(!is.null(json$terranova_type)){
    keywords <- paste0(keywords, ', terranova_type: ', json$terranova_type)
  }

  ebv_i_char_att(hdf, 'keywords', keywords)

  #add global.att to netcdf
  for (i in seq_along(global.att)){
    att.txt <- eval(parse(text = paste0('json$', global.att[i][[1]])))
    att.txt <- paste0(trimws(att.txt), collapse = ', ')
    if(names(global.att[i])=='contributor_name' || names(global.att[i])=='ebv_domain'){
      att.txt <- paste0(trimws(trimws(stringr::str_split(att.txt, ',')[[1]])), collapse = ', ')
    }
    ebv_i_char_att(hdf, names(global.att[i]), att.txt)
  }

  #add date_modified and date_metadata_modified
  ebv_i_char_att(hdf, 'date_modified', json$date_created)
  ebv_i_char_att(hdf, 'date_metadata_modified', json$date_created)
  # #add product version
  # product_version <- stringr::str_split(stringr::str_remove(basename(jsonpath), '.json'), '_')[[1]][2]
  # if(is.na(product_version)){
  #   product_version <- 'v1'
  # }

  #double check id - final jsons don't have 'preliminary_id' att
  id <- json$preliminary_id
  if(is.null(id)){
    id <- json$id
    ebv_i_char_att(hdf, 'id', id)
  }

  #geospatial attributes
  #bounds
  xmin <- min(lon_data) - res[1]/2
  xmax <- max(lon_data) + res[1]/2
  ymin <- min(lat_data) - res[2]/2
  ymax <- max(lat_data) + res[2]/2
  bounds <- paste0('POLYGON((', xmin, ' ', ymin, ', ', xmin, ' ', ymax, ', ',
                   xmax, ' ', ymax, ', ', xmax, ' ', ymin, ', ',
                   xmin, ' ', ymin, '))')
  #lat and lon
  if(stringr::str_detect(epsg, 'ESRI')){
    ebv_i_char_att(hdf, 'geospatial_bounds_crs', epsg)
  }else{
    ebv_i_char_att(hdf, 'geospatial_bounds_crs', paste0('EPSG:', epsg))
  }

  ebv_i_char_att(hdf, 'geospatial_bounds', bounds)
  ebv_i_char_att(hdf, 'geospatial_lat_resolution', paste0(res[2], ' ', crs_unit))
  ebv_i_char_att(hdf, 'geospatial_lon_resolution', paste0(res[1], ' ', crs_unit))
  if(crs_unit == 'meter'){
    ebv_i_char_att(hdf, 'geospatial_lon_units', 'meter')
    ebv_i_char_att(hdf, 'geospatial_lat_units', 'meter')
  }else{
    ebv_i_char_att(hdf, 'geospatial_lon_units', 'degree_east')
    ebv_i_char_att(hdf, 'geospatial_lat_units', 'degree_north')
  }

  #temporal attributes
  # acdd terms
  ebv_i_char_att(hdf, 'time_coverage_start', t_start)
  ebv_i_char_att(hdf, 'time_coverage_end', t_end)
  ebv_i_char_att(hdf, 'time_coverage_resolution', t_res)

  # change crs variable ----
  crs.id <- rhdf5::H5Dopen(hdf, 'crs')

  # :wkt ref
  #ebv_i_char_att(crs.id, 'crs_ref', crs_ref)
  ebv_i_char_att(crs.id, 'spatial_ref', crs_ref)
  ebv_i_char_att(crs.id, 'GeoTransform', geo_trans)

  #get grid mapping attributes
  crs_grid <- ebv_i_eval_epsg(epsg, proj=TRUE)
  crs_wkt_list <- stringr::str_split(crs_wkt, '\n')[[1]]


  #check name: change standard_name of lat and lon accordingly
  if(stringr::str_detect(crs_wkt, 'PROJCRS')){
    crs_proj <- TRUE
  }else{
    crs_proj<- FALSE
  }

  if(stringr::str_detect(crs_grid, 'utm')){
    #add grid mapping for UTM (not supported by ncmeta)
    #check WKT version and process accordingly
    if(ebv_i_eval_wkt(crs_wkt)){
      #process WKT2 (2019)
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'Latitude of natural origin'))]
      lat_proj <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'Longitude of natural origin'))]
      lon_proj <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'Scale factor at natural origin'))]
      scale_fac <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'False easting'))]
      f_east <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'False northing'))]
      f_north <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
    }else{
      #process WKT
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'latitude_of_origin'))]
      lat_proj <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'central_meridian'))]
      lon_proj <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'scale_factor'))]
      scale_fac <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'false_easting'))]
      f_east <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]
      part <- crs_wkt_list[which(stringr::str_detect(crs_wkt_list, 'false_northing'))]
      f_north <- regmatches(part, gregexpr("[[:digit:].]+", part))[[1]]

    }

    #add attributes
    ebv_i_char_att(crs.id, 'grid_mapping_name', 'transverse_mercator')
    ebv_i_num_att(crs.id, 'latitude_of_projection_origin', lat_proj)
    ebv_i_num_att(crs.id, 'longitude_of_projection_origin', lon_proj)
    ebv_i_num_att(crs.id, 'scale_factor_at_projection_origin', scale_fac)
    ebv_i_num_att(crs.id, 'false_easting', f_east)
    ebv_i_num_att(crs.id, 'false_northing', f_north)

  } else{
    #get grid mapping attributes
    grid_mapping <- ncmeta::nc_prj_to_gridmapping(crs_grid)

    if(!nrow(grid_mapping)==0){

      #add grid mapping name and remove from tibble
      ebv_i_char_att(crs.id, 'grid_mapping_name', grid_mapping$value[grid_mapping$name=='grid_mapping_name'][[1]])
      grid_mapping <- grid_mapping[!grid_mapping$name=='grid_mapping_name', ]

      #additional attributes
      for (name in grid_mapping$name){
        ebv_i_num_att(crs.id, name, grid_mapping$value[grid_mapping$name==name][[1]])
      }
    }else{
      warning('Simple georefencing done - without the CF conform gridmapping.')
    }

  }

  # #add standard_name and long_name
  # ebv_i_char_att(crs.id, 'standard_name', 'CRS')
  ebv_i_char_att(crs.id, 'long_name', 'CRS definition')

  #close ds
  rhdf5::H5Dclose(crs.id)

  # change lat variable ----
  # open dataset
  lat.id <- rhdf5::H5Dopen(hdf, 'lat')

  #if CRS is projected, add different standard_name
  if(crs_proj){
    ebv_i_char_att(lat.id, 'standard_name', 'projection_y_coordinate')
  }else{
    ebv_i_char_att(lat.id, 'standard_name', 'latitude')
  }

  # :axis = "Y";
  ebv_i_char_att(lat.id, 'axis', 'Y')

  # :units = 'degrees_north';
  if(crs_proj){
    ebv_i_char_att(lat.id, 'units', 'meter')#paste0(crs_unit, '_north'))
  }else{
    ebv_i_char_att(lat.id, 'units', 'degree_north')#paste0(crs_unit, '_north'))
  }


  #close dataset
  rhdf5::H5Dclose(lat.id)

  # change lon variable ----
  #open dataset
  lon.id <- rhdf5::H5Dopen(hdf, 'lon')

  #if CRS is projected, add different standard_name
  if(crs_proj){
    ebv_i_char_att(lon.id, 'standard_name', 'projection_x_coordinate')
  }else{
    ebv_i_char_att(lon.id, 'standard_name', 'longitude')
  }

  # :axis = "X";
  ebv_i_char_att(lon.id, 'axis', 'X')

  # :units = 'degrees_east';
  if(crs_proj){
    ebv_i_char_att(lon.id, 'units', 'meter')#paste0(crs_unit, '_north'))
  }else{
    ebv_i_char_att(lon.id, 'units', 'degree_east')#paste0(crs_unit, '_east'))
  }

  #close dataset
  rhdf5::H5Dclose(lon.id)

  # change time variable ----
  # open dataset
  time.id <- rhdf5::H5Dopen(hdf, 'time')

  # :axis = "T";
  ebv_i_char_att(time.id, 'axis', 'T')

  # :calendar = "standard";
  if(t_res != 'Paleo'){
    ebv_i_char_att(time.id, 'calendar', 'standard')
  }

  #close
  rhdf5::H5Dclose(time.id)

  #add values to entity var----
  entity_names <- as.data.frame(stringr::str_split(stringr::str_pad(entities, max_char_entity, side = c("right")), ''))
  entity_n <- enc2utf8(unlist(entity_names))

  entity.id <- rhdf5::H5Dopen(hdf, 'entity')#HERE
  rhdf5::H5Dwrite(entity.id, entity_n)

  # acdd terms
  ebv_i_char_att(entity.id, 'ebv_entity_type', json$ebv_entity$ebv_entity_type)
  ebv_i_char_att(entity.id, 'ebv_entity_scope', json$ebv_entity$ebv_entity_scope)
  ebv_i_char_att(entity.id, 'ebv_entity_classification_name', json$ebv_entity$ebv_entity_classification_name)
  ebv_i_char_att(entity.id, 'ebv_entity_classification_url', json$ebv_entity$ebv_entity_classification_url)

  #add long_name and standard_name
  ebv_i_char_att(entity.id, 'long_name', 'entity')

  rhdf5::H5Dclose(entity.id)

  # add metric and scenario attributes ----
  # 1. metric, no scenario (entities are not relevant)
  if(scenarios_no==0){
    for (i in 1:(metrics_no)){
      mgid <- rhdf5::H5Gopen(hdf, paste0('metric_', i))
      #add metric attributes
      standard_name <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:standard_name`')))
      long_name <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:long_name`')))
      unit.m <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:units`')))
      ebv_i_char_att(mgid, 'standard_name', standard_name)
      ebv_i_char_att(mgid, 'long_name', long_name)
      ebv_i_char_att(mgid, 'units', unit.m)
      #close data handle
      rhdf5::H5Gclose(mgid)
    }
    #2. scenario and metric (entities are not relevant)
  }else{
    for (j in 1:(scenarios_no)){
      #scenario path
      sgid <- rhdf5::H5Gopen(hdf, paste0('scenario_', j))
      #add attributes
      standard_name <- eval(parse(text=paste0('json$ebv_scenario$ebv_scenario_', j, '$`:standard_name`')))
      long_name <- eval(parse(text=paste0('json$ebv_scenario$ebv_scenario_', j, '$`:long_name`')))
      ebv_i_char_att(sgid, 'standard_name', standard_name)
      ebv_i_char_att(sgid, 'long_name', long_name)
      rhdf5::H5Gclose(sgid)
      for (i in 1:(metrics_no)){
        #open metric group
        mgid <- rhdf5::H5Gopen(hdf, paste0('scenario_', j, '/metric_', i))
        #add metric attributes
        standard_name <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:standard_name`')))
        long_name <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:long_name`')))
        unit.m <- eval(parse(text=paste0('json$ebv_metric$ebv_metric_', i, '$`:units`')))
        ebv_i_char_att(mgid, 'standard_name', standard_name)
        ebv_i_char_att(mgid, 'long_name', long_name)
        ebv_i_char_att(mgid, 'units', unit.m)
        #close datahandle
        rhdf5::H5Gclose(mgid)
      }
    }
  }

  #add entity attributes 3D ----
  if(force_4D==FALSE){
    #enum <- 1
    for(var in var_list){
      part <- stringr::str_split(var, '/')[[1]][2]
      enum <- as.integer(paste0(stringr::str_extract_all(part, '\\d')[[1]], collapse=''))
      did <- rhdf5::H5Dopen(hdf, var)
      ebv_i_char_att(did, 'grid_mapping', '/crs')
      ebv_i_char_att(did, 'coordinates', '/entity')#HERE
      ebv_i_char_att(did, 'coverage_content_type', paste0(json$coverage_content_type, collapse=', '))
      ebv_i_char_att(did, 'standard_name', entities[enum])
      #close dh
      rhdf5::H5Dclose(did)
      #enum <- enum +1
    }
  }else{
    #add entity attributes 4D ----
    enum <-1
    for(var in var_list){
      parts <- stringr::str_split(var, '/')[[1]]
      m <- paste0(parts[1:(length(parts)-1)], collapse='/')
      mid <- rhdf5::H5Gopen(hdf, m)
      long_name <- ebv_i_read_att(mid, 'standard_name')
      rhdf5::H5Gclose(mid)
      did <- rhdf5::H5Dopen(hdf, var)
      ebv_i_char_att(did, 'long_name', long_name)
      ebv_i_char_att(did, 'grid_mapping', '/crs')
      ebv_i_char_att(did, 'coordinates', '/entity')#HERE
      ebv_i_char_att(did, 'coverage_content_type', paste0(json$coverage_content_type, collapse=', '))
      #close dh
      rhdf5::H5Dclose(did)
    }
  }

  # close file 1 ----
  rhdf5::H5Fclose(hdf)

  # add values to 'entity_taxonomy_table' var ----
  level_i <- length(taxon_list)

  for(level in taxon_list){
    #transform values so they fit into the variable
    data_level_clean <- ebv_i_char_variable(csv_txt[, level], max_char_entity)

    if(verbose){
      print(paste0('add ', level, ' data to level: ', level_i))
    }

    rhdf5::h5write(data_level_clean, file=outputpath,
                   name="entity_taxonomy_table", index=list(level_i, NULL, NULL))

    level_i <- level_i-1
  }



  # add values to 'entity_taxonomy_levels' var ----
  level_d <- ebv_i_char_variable(taxon_list, max_char_taxonlevel, TRUE)
  rhdf5::h5write(level_d, file=outputpath,
                 name="entity_taxonomy_levels")


  # add values to 'entity_taxonomy_key' var ----
  if(taxonomy_key){
    ls_id_d <- ebv_i_char_variable(taxonomy_key_list, max_char_taxonomy_key)
    rhdf5::h5write(ls_id_d, file=outputpath,
                   name="entity_taxonomy_key")
  }

  #delete automatically created attribute: :rhdf5-NA.OK ----
  hdf <- rhdf5::H5Fopen(outputpath)
  #taxonomy_levels
  ent_level_did <- rhdf5::H5Dopen(hdf, 'entity_taxonomy_levels')
  if(rhdf5::H5Aexists(ent_level_did, 'rhdf5-NA.OK')){
    rhdf5::H5Adelete(ent_level_did, 'rhdf5-NA.OK')
  }
  rhdf5::H5Dclose(ent_level_did)
  #entity_taxonomy_key
  if(taxonomy_key){
    taxonomy_key_did <- rhdf5::H5Dopen(hdf, 'entity_taxonomy_key')
    if(rhdf5::H5Aexists(taxonomy_key_did, 'rhdf5-NA.OK')){
      rhdf5::H5Adelete(taxonomy_key_did, 'rhdf5-NA.OK')
    }
    #add taxonomy key name attribute
    ebv_i_char_att(taxonomy_key_did, 'long_name', key_name)
    #close Datahandle
    rhdf5::H5Dclose(taxonomy_key_did)
  }
  #entity_taxonomy_table
  ent_list_did <- rhdf5::H5Dopen(hdf, 'entity_taxonomy_table')
  if(rhdf5::H5Aexists(ent_list_did, 'rhdf5-NA.OK')){
    rhdf5::H5Adelete(ent_list_did, 'rhdf5-NA.OK')
  }
  rhdf5::H5Dclose(ent_list_did)

  # close file 2 ----
  rhdf5::H5Fclose(hdf)

  #set dim of all ebvcubes ----
  #get all cube paths
  paths <- ebv_datacubepaths(outputpath)$datacubepaths
  #open again
  hdf <- rhdf5::H5Fopen(outputpath)
  for(path in paths){
    did <- rhdf5::H5Dopen(hdf, path)
    #set new dimension of dataset
    rhdf5::H5Dset_extent(did, c(length(lon_data), length(lat_data), length(timesteps), entities_no))
    rhdf5::H5Dclose(did)
  }

  # close file 3 ----
  rhdf5::H5Fclose(hdf)



}