train: Train a machine learning model to classify images

Description Usage Arguments

View source: R/train.R

Description

train allows users to train their own machine learning model using images that have been manually classified. We recommend having at least 2,000 images per species, but accuracies will be higher with > 10,000 images. This model will take a very long time to run. We recommend using a GPU if possible. In the data_info csv, you must have two columns with NO HEADERS. Column 1 must be the file name of the image. Column 2 must be a number corresponding to the species. Give each species (or group of species) a number identifying it. The first species must be 0, the next species 1, and so on. If this is your first time using this function, you should see additional documentation at https://github.com/mikeyEcology/MLWIC . This function uses absolute paths, but if you are unfamilliar with this process, you can put all of your images, the image label csv ("data_info") and the L1 folder that you downloaded following the directions at https://github.com/mikeyEcology/MLWIC into one directory on your computer. Then set your working directory to this location and the function will find the absolute paths for you.

Usage

1
2
3
4
5
6
train(path_prefix = paste0(getwd(), "/images"),
  data_info = paste0(getwd(), "/image_labels.csv"),
  model_dir = getwd(), python_loc = "/anaconda2/bin/", os = "Mac",
  num_gpus = 2, num_classes = 28, delimiter = ",",
  architecture = "resnet", depth = "18", batch_size = "128",
  log_dir_train = "train_output", retrain = TRUE, print_cmd = FALSE)

Arguments

path_prefix

Absolute path to location of the images on your computer

data_info

csv with file names for each photo (absolute path to file). This file must have no headers (column names). column 1 must be the file name of each image including the extention (i.e., .jpg). Column 2 must be a number corresponding to the species. Give each species (or group of species) a number identifying it. The first species must be 0, the next species 1, and so on.

model_dir

Absolute path to the location where you stored the L1 folder that you downloaded from github.

python_loc

The location of python on your machine.

os

the operating system you are using. If you are using windows, set this to "Windows", otherwise leave as default

num_gpus

The number of GPUs available. If you are using a CPU, leave this as default.

num_classes

The number of classes (species or groups of species) in your model.

delimiter

this will be a ',' for a csv.

architecture

the architecture of the deep neural network (DNN). Resnet-18 is the default. Other options are c("alexnet", "densenet", "googlenet", "nin", "vgg")

depth

the number of layers in the DNN. If you are using resnet, the options are c(18, 34, 50, 101, 152). If you are using densenet, the options are c(121, 161, 169, 201). If you are an architecture other than resnet or densenet, the number of layers will be automatically set.

batch_size

the number of images simultaneously passed to the model for training. It must be a multiple of 64. Smaller numbers will train models that are more accurate, but it will take longer to train. The default is 128.

log_dir_train

directory where you will store the model information. This will be called when you what you specify in the log_dir option of the classify function. You will want to use unique names if you are training multiple models on your computer; otherwise they will be over-written.

retrain

If TRUE, the model you train will be a retraining of the model presented in the Tabak et al. MEE paper. If FALSE, you are starting training from scratch. Retraining will be faster but training from scratch will be more flexible.

print_cmd

print the system command instead of running the function. This is for development.


alicehua11/photo_classification documentation built on Nov. 2, 2019, 1:40 p.m.