README.md
In soodoku/outsource: outsource: Create Distributed Coding Tasks More Easily

The goal is to make it easier to produce distributed Human Intelligence Tasks (HIT, nomenclature courtesy Amazon). Uses include: production of training data, general class of recognition problems such as image recognition tasks that humans do with very little error and which machines are still somewhat bad at, surveys (where the source of data is the human being surveyed) etc.

The general idea traces its ancestry to CAPTCHA, which was developed to solve two problems at the same time --- provide a way to websites to distinguish between humans and bots, and help OCR written (or heard) material. But the current technology differs from CAPTCHA in three ways. First, our goal is to not try to solve two problems at once. Thus, instead of current CAPTCHA systems, which make it as hard as possible for humans to get the answer right, we want to invert that logic -- make it as easy for humans to get the answer right. Second, we want to build it for tasks other than recognition tasks. Third, we want to attach it to a payment architecture -- a micro-task market like Amazon-M-turk or a barter system where people do a small task for free access to content, e.g. Google Consumer Surveys.

To start with, the tool will be greared towards making it easy to convert a large project into tasks that can be posted 'micro-task' markets like Amazon M-Turk and Crowdflower. And initially, we limit the tool to coding of texts. The tool would take a directory of text files and create tasks as suggested by the user and spit out tasks. The general architecture of coding text is as follows:

Each person codes multiple stories
Each story is coded by multiple people.
A person doesn't code the same story more than once.
Thus each worker would see a version of a survey -- each screen will have some text and a form beneath it. No. of questions per user and no. of people who should code a story can be set by the user. And for now the HTML form can be designed by the user.

Specifically, the tool will take the following arguments:

Path to directory with input text files
Path to basic HTML form
Path to output directory where surveys will be ouput
No. of times each story needs to be coded (n)
No. of stories per worker (m)

If there are a total of k stories, there will be kn times number of tasks and each worker will get (kn)/m. The total number of HITs created would be the ceiling of that number. The tool will produce survey with one story and one html form per page and the name of the field would be the name of the story file. Each worker's survey will also have a worker ID and unique job completion ID that can be used to redeem credit.

To get the current development version from github:

# install.packages("devtools")
devtools::install_github("soodoku/outsource")

setwd(path.package("outsource"))
creator(input_files_dir = "inst/extdata/sample_in/", 
    path_to_form="inst/extdata/html_form.html",  
    output_files_dir="inst/extdata/sample_out/",
    n_per_worker  = 2, n_per_story  = 3)

Scripts are released under GNU V3

soodoku/outsource documentation built on May 30, 2019, 6:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com