MLWIC2 can be used to automatically classify camera trap images or to train new models for image classification, it contains two pre-trained models: the species_model
identifies 58 species and empty images, and the empty_animal
model distinguishes between images with animals and those that are empty. MLWIC2 also contains Shiny apps for running the functions. These can be accessed using runShiny
. In the steps below, you can see Shiny options for some steps. This indicates that you can run these steps with Shiny apps by running the function provied. Note that when you are using Shiny apps to select directories and files, you can only navigate using the top part half of the screen and you must scroll to the bottom of the window to find the Select
button.
If you have issues, please submit them to the issues tab and do not email the authors of this package with questions. This way everyone can learn from the issue.
You need to have Anaconda Navigator installed, along with Python 3.7 (Python 3.6 or 3.5 will also work just as well). If you are using a Windows computer, you will likely need to install Rtools if you don't already have it installed.
MLWIC2
package in R# install devtools if you don't have it
if (!require('devtools')) install.packages('devtools')
# check error messages and ensure that devtools installed properly.
# install MLWIC2 from github
devtools::install_github("mikeyEcology/MLWIC2")
# This line might prompt you to update some packages. It would be wise to make these updates.
# load this package
library(MLWIC2)
When running install_github
, some users will get an error about Rcpp or Rlang. This is due to the update to R version 4. If you have issues, update R to the latest version and re-install these R packages.
You only need to run steps 2-3 the first time you use this package on a computer. If you have already run MLWIC on your computer, you can skip step 2
MLWIC2
using the R function setup
MLWIC2::runShiny('setup')
MLWIC2::setup()
and not specify any options, but if you need to specify options, they are described below:python_loc
is the location of Python on your computer. On Macs, it is often in the default-you can determine the location by opening a terminal window and typing which python
. In Windows you can open your command prompt and type where python
. r_reticulate = TRUE
; if you don't know what this means, leave this argument as the default by not specifying it. setup
- you can ignore these; if there are problems with the installation, whey will become apparent when you run classify
. gpu=TRUE
in this function. Using a GPU is not necessary to run MLWIC2, and if you are using a trained model to classify images it will not have a major effect, but if you are training a model, a GPU will result in much faster training; see more details here. model_dir
when you run the functions classify
, make_output
, and train
. (optional) If you want to check md5sums for this file, the value should be 14432A502FB78943890751608A8DAECC
. model_dir
), be sure that you are using the directory that actually contains the helper files. The correct directory will contain a file called arch.py
and run.py
, among others. Specifically Windows sometimes puts the files in a location .../MLWIC2_helper_files/MLWIC2_helper_files
instead of simply MLWIC2_helper_files
, as you would expect. make_input
MLWIC2::runShiny('make_input')
images_classified=TRUE
), you need to have an input_file
csv that has at least two columns and one of these must be "filename" and the other must be "class_ID".class_ID
is a column containing a number for the label for each species. If you're using the "species_model", you can find the class_ID associated with each species in this table and put them in this column. class_ID
containing the number associated with each species, you have a column called class
containing your classifications as words (e.g., "dog" or "cattle", "empty"), the function will find the appropriate class_ID
associated with these words (class_ID
s can be found in this table).input_file
csv in your with a column called "filename" and whatever other columns you would like. path_prefix
which is the parent directory of your images. If you have images stored in sub-folders within this directory, specify recursive=TRUE
, if not, you can specify recursive=FALSE
. You also need to specify the suffixes
(e.g., ".jpg") for your filenames so that MLWIC2 knows what types of files to look for. By default (if you don't specify anything), it will look for ".JPG" and ".jpg". ?make_input
for more details. classify
MLWIC2::runShiny('classify')
path_prefix
is the absolute path where your images are stored. path_prefix
, but this must be relfected in your data_info
file. For example, if you have a file located at .../images/subdirectory1/imagefile.jpg
, and your path_prefix=.../images/
, your filename for this image in your data_info
file would be subdirectory/imagefile.jpg
. data_info
is the absolute path to where your input file is stored. Check your output from make_input
. model_dir
is the absolute path to where you stored the MLWIC2_helper_files folder in step 3.log_dir
is the absolute path to the model you want to use. If you are using the built in models, it is either "species_model", "empty_animal", or "CFTEP". If you trained a model with MLWIC2, this would be what you specified as your log_dir_train
. os
is your operating system type. If you are using MS Windows, set os="Windows"
, otherwise, you can ignore this argument. num_classes
is the number of species or groups of species in the model. If you are using the species_model, num_classes=1000
; if you're using the empty_animal model, num_classes=2
, If you are using the CFTEP model, num_classes=10
. If you trained your own model, this is the number that you specified.top_n
is the number of guesses that classes that the model will provide guesses for. E.g., if top_n=5
, the output will include the top 5 classes that it thinks are in the image (and the confidences that are associated with these guesses).num_cores
is the number of cores on your computer that you want to use. Running parallel::detectCores()
will tell you how many cores you have on your computer. Depending on how long you intend to run the model, you might not want to use all of your cores. For example, you could specify num_cores = parallel::detectCores() - 2
so that you would keep two cores available for other processes. ?classify
for more options.
###### If you are having trouble finding your absolute paths, you can use the shiny option MLWIC2::runShiny('classify')
and select your files/directories from a drop down menu. Your paths will be printed on the screen so that next time you can run directly in the R console if you prefer (this is a good way to begin learning how to code). classify(path_prefix = "/Users/mikeytabak/Desktop/images", # path to where your images are stored
data_info = "/Users/mikeytabak/Desktop/image_labels.csv", # path to csv containing file names and labels
model_dir = "/Users/mikeytabak/Desktop/MLWIC2_helper_files", # path to the helper files that you downloaded in step 3, including the name of this directory (i.e., `MLWIC2_helper_files`). Check to make sure this directory includes files like arch.py and run.py. If not, look for another folder inside this folder called `MLWIC2_helper_files`
python_loc = "/anaconda2/bin/", # location of python on your computer
save_predictions = "model_predictions.txt", # how you want to name the raw output file
make_output = TRUE, # if TRUE, this will produce a csv with a more friendly output
output_name = "MLWIC2_output.csv", # if make_output==TRUE, this will be the name of your friendly output file
num_cores = 4 # the number of cores you want to use on your computer. Try runnning parallel::detectCores() to see what you have available. You might want to use something like parallel::detectCores()-1 so that you have a core left on your machine for accomplishing other tasks.
)
evaluate_performance
function to see how the model worked on your images. This function is easy, use ?evaluate_performance
to get details.write_metadata
(optional)MLWIC2::runShiny('write_metadata')
write_metadata
is a wrapper that will run the software to create metadata categories and fill them with the output of classify
. If you want to use this function you will need to first install Exiftool following the directions here. output_file
is the path to and file name of your output file from classify (model_dir
+ /
+output_name
). Unless you deviated from the default settings, this file should be located in your MLWIC2_helper_files
folder. model_type
is either the "species_model" or the "empty_animal" modelexiftool_loc
if you are running on a Windows computer. This is the path to your exiftool installation. write_metadata(output_file="/Users/mikeytabak/Desktop/MLWIC2_helper_files/MLWIC2_output.csv", # note that if you look at the classify command above, this is the [model_dir]/[output_name]
model_type="species_model", # the type of model I used for classify
exiftool_loc="/usr/local/bin", # location where exiftool is stored, you might not need to specify this.
show_sys_output = FALSE
)
train
MLWIC2::runShiny('train')
If you aren't satisfied with the accuracy of the builtin models, you can train train your own model using your images. The parameters will be similar to those for classify
, but you will want to specify some more options based on how you want to train the model.
- path_prefix
is the absolute path where your images are stored.
- data_info
is the absolute path to where your input file is stored. Check your output from make_input
.
- model_dir
is the absolute path to where you stored the MLWIC2_helper_files folder in step 3.
- num_classes
is the number of species (or groups of species) you want the model to recognize
- architecture
is the DNN architecture. The options are c("alexnet", "densenet", "googlenet", "nin", "resnet", "vgg"). I recommend starting with "resnet" and set depth=18
. If you get poor accuracy with this, "densenet" is another good option.
- depth
is the number of layers in the DNN. If you are using resnet, the options are c(18, 34, 50, 101, 152). If you are using densenet, the options are c(121, 161, 169, 201), otherwise, the depth will be automatically set for you.
- batch_size
is the number of images simultaneously passed to the model for training. It must be a multiple of 16. Smaller numbers will train models that are more accurate, but it will take longer to train. The default is 128.
- log_dir_train
is the directory where you will store the model information. This will be called when you what you specify in the log_dir
option of the classify
function. You will want to use unique names if you are training multiple models on your computer; otherwise they will be over-written
- retrain
If TRUE, the model you train will be a retraining of the model you specify in retrain_from
. If FALSE, you are starting training from scratch. Retraining will be faster but training from scratch will be more flexible.
- retrain_from
name of the directory from which you want to retrain the model. If you are retraining from the species model, you would set retrain_from="species_model"
. If you need to stop training (e.g., you have to turn off your computer), you can retrain_from
what you set as your log_dir_train
and set your num_epochs
to the total number you want minus the number that have completed.
- num_epochs
the number of epochs you want to use for training. The default is 55 and this is recommended for training a full model. But if you need to start and stop training, you can decrease this number.
- You can read about more options by typing ?train
into the console.
If you use this package in a publication, please site our manuscript: \ Tabak, M. A., Norouzzadeh, M. S., Wolfson, D. W., Newton, E. J., Boughton, R. K., Ivan, J. S., … Miller, R. S. (2020). Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2. Ecology & Evolution, 10(9): 10374-10383. doi:10.1002/ece3.6692
@article{tabakImprovingAccessibilityTransferability2020,
title = {Improving the Accessibility and Transferability of Machine Learning Algorithms for Identification of Animals in Camera Trap Images: {{MLWIC2}}},
shorttitle = {Improving the Accessibility and Transferability of Machine Learning Algorithms for Identification of Animals in Camera Trap Images},
author = {Tabak, Michael A. and Norouzzadeh, Mohammad S. and Wolfson, David W. and Newton, Erica J. and Boughton, Raoul K. and Ivan, Jacob S. and Odell, Eric A. and Newkirk, Eric S. and Conrey, Reesa Y. and Stenglein, Jennifer and Iannarilli, Fabiola and Erb, John and Brook, Ryak K. and Davis, Amy J. and Lewis, Jesse and Walsh, Daniel P. and Beasley, James C. and VerCauteren, Kurt C. and Clune, Jeff and Miller, Ryan S.},
year = {2020},
month = mar,
pages = {ece3.6692},
publisher = {{Wiley}},
doi = {10.1002/ece3.6692},
journal = {Ecology & Evolution},
language = {en}
}
Disclaimer: MLWIC2 is a free software that comes with no warranty. You are recommended to test the software's ability to classify images in your dataset and not assume that the reported accuracy will be found on your images. The authors of this paper are not responsible for any decisions or interpretations that are made as a result of using MLWIC2.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.