README.md

Outsource: Create Distributed Coding Tasks More Easily

GPL-3.0 Build Status

The Goal

The goal is to make it easier to produce distributed Human Intelligence Tasks (HIT, nomenclature courtesy Amazon). Uses include: production of training data, general class of recognition problems such as image recognition tasks that humans do with very little error and which machines are still somewhat bad at, surveys (where the source of data is the human being surveyed) etc.

The general idea traces its ancestry to CAPTCHA, which was developed to solve two problems at the same time --- provide a way to websites to distinguish between humans and bots, and help OCR written (or heard) material. But the current technology differs from CAPTCHA in three ways. First, our goal is to not try to solve two problems at once. Thus, instead of current CAPTCHA systems, which make it as hard as possible for humans to get the answer right, we want to invert that logic -- make it as easy for humans to get the answer right. Second, we want to build it for tasks other than recognition tasks. Third, we want to attach it to a payment architecture -- a micro-task market like Amazon-M-turk or a barter system where people do a small task for free access to content, e.g. Google Consumer Surveys.

The MVP

To start with, the tool will be greared towards making it easy to convert a large project into tasks that can be posted 'micro-task' markets like Amazon M-Turk and Crowdflower. And initially, we limit the tool to coding of texts. The tool would take a directory of text files and create tasks as suggested by the user and spit out tasks. The general architecture of coding text is as follows:

Specifically, the tool will take the following arguments:

If there are a total of k stories, there will be kn times number of tasks and each worker will get (kn)/m. The total number of HITs created would be the ceiling of that number. The tool will produce survey with one story and one html form per page and the name of the field would be the name of the story file. Each worker's survey will also have a worker ID and unique job completion ID that can be used to redeem credit.

Installation

To get the current development version from github:

# install.packages("devtools")
devtools::install_github("soodoku/outsource")

Usage

setwd(path.package("outsource"))
creator(input_files_dir = "inst/extdata/sample_in/", 
    path_to_form="inst/extdata/html_form.html",  
    output_files_dir="inst/extdata/sample_out/",
    n_per_worker  = 2, n_per_story  = 3)

License

Scripts are released under GNU V3



soodoku/outsource documentation built on May 30, 2019, 6:29 a.m.