train: Train a machine learning model to classify images
In mikeyEcology/MLWIC: Machine Learning for Wildlife Image Classification

Description Usage Arguments

View source: R/train.R

train allows users to train their own machine learning model using images that have been manually classified. We recommend having at least 2,000 images per species, but accuracies will be higher with > 10,000 images. This model will take a very long time to run. We recommend using a GPU if possible. In the data_info csv, you must have two columns with NO HEADERS. Column 1 must be the file name of the image. Column 2 must be a number corresponding to the species. Give each species (or group of species) a number identifying it. The first species must be 0, the next species 1, and so on. If this is your first time using this function, you should see additional documentation at https://github.com/mikeyEcology/MLWIC . This function uses absolute paths, but if you are unfamilliar with this process, you can put all of your images, the image label csv ("data_info") and the L1 folder that you downloaded following the directions at https://github.com/mikeyEcology/MLWIC into one directory on your computer. Then set your working directory to this location and the function will find the absolute paths for you.

train(path_prefix = paste0(getwd(), "/images"),
  data_info = paste0(getwd(), "/image_labels.csv"),
  model_dir = getwd(), python_loc = "/anaconda2/bin/", os = "Mac",
  num_gpus = 2, num_classes = 28, delimiter = ",",
  architecture = "resnet", depth = "18", batch_size = "128",
  log_dir_train = "train_output", retrain = TRUE,
  retrain_from = "USDA182", num_epochs = 55, print_cmd = FALSE)

`path_prefix`	Absolute path to location of the images on your computer
`data_info`	csv with file names for each photo (absolute path to file). This file must have no headers (column names). column 1 must be the file name of each image including the extention (i.e., .jpg). Column 2 must be a number corresponding to the species. Give each species (or group of species) a number identifying it. The first species must be 0, the next species 1, and so on.
`model_dir`	Absolute path to the location where you stored the L1 folder that you downloaded from github.
`python_loc`	The location of python on your machine.
`os`	the operating system you are using. If you are using windows, set this to "Windows", otherwise leave as default
`num_gpus`	The number of GPUs available. If you are using a CPU, leave this as default.
`num_classes`	The number of classes (species or groups of species) in your model.
`delimiter`	this will be a ',' for a csv.
`architecture`	the architecture of the deep neural network (DNN). Resnet-18 is the default. Other options are c("alexnet", "densenet", "googlenet", "nin", "vgg")
`depth`	the number of layers in the DNN. If you are using resnet, the options are c(18, 34, 50, 101, 152). If you are using densenet, the options are c(121, 161, 169, 201). If you are an architecture other than resnet or densenet, the number of layers will be automatically set.
`batch_size`	the number of images simultaneously passed to the model for training. It must be a multiple of 64. Smaller numbers will train models that are more accurate, but it will take longer to train. The default is 128.
`log_dir_train`	directory where you will store the model information. This will be called when you what you specify in the `log_dir` option of the `classify` function. You will want to use unique names if you are training multiple models on your computer; otherwise they will be over-written.
`retrain`	If TRUE, the model you train will be a retraining of the model presented in the Tabak et al. MEE paper. If FALSE, you are starting training from scratch. Retraining will be faster but training from scratch will be more flexible.
`retrain_from`	name of the directory from which you want to retrain the model.
`num_epochs`	the number of epochs you want to use for training. The default is 55 and this is recommended for training a full model. But if you need to start and stop training, you may want to use a smaller number at times.
`print_cmd`	print the system command instead of running the function. This is for development.