CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Amrapali Zaveri; Pedro Hernandez Serrano; Manisha Desai; Michel Dumontier

doi:10.1145/3184558.3191543

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Amrapali Zaveri, Pedro Hernandez Serrano^*, Manisha Desai, Michel Dumontier

^*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

100 Downloads (Pure)

Abstract

Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks?'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.

Original language	English
Title of host publication	Companion Proceedings of the The Web Conference 2018
Place of Publication	Republic and Canton of Geneva, Switzerland
Publisher	International World Wide Web Conferences Steering Committee
Pages	1109-1116
Number of pages	8
ISBN (Print)	978-1-4503-5640-4
DOIs	https://doi.org/10.1145/3184558.3191543
Publication status	Published - 2018

Publication series

Series	WWW '18

Keywords

biomedical
crowdsourcing
data quality
data science
fair
metadata
reproducibility

Access to Document

10.1145/3184558.3191543

Full TextFinal published version, 1.72 MBLicence: Taverne

Cite this

@inproceedings{1a0efd70a9a94aec9428c7e37e314081,

title = "CrowdED: Guideline for Optimal Crowdsourcing Experimental Design",

abstract = "Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks?'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.",

keywords = "biomedical, crowdsourcing, data quality, data science, fair, metadata, reproducibility",

author = "Amrapali Zaveri and Serrano, {Pedro Hernandez} and Manisha Desai and Michel Dumontier",

year = "2018",

doi = "10.1145/3184558.3191543",

language = "English",

isbn = "978-1-4503-5640-4",

series = "WWW '18",

publisher = "International World Wide Web Conferences Steering Committee",

pages = "1109--1116",

booktitle = "Companion Proceedings of the The Web Conference 2018",

address = "Switzerland",

}

CrowdED: Guideline for Optimal Crowdsourcing Experimental Design. / Zaveri, Amrapali; Serrano, Pedro Hernandez; Desai, Manisha et al.
Companion Proceedings of the The Web Conference 2018. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 2018. p. 1109-1116 (WWW '18).

Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceeding › Academic › peer-review

TY - GEN

T1 - CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

AU - Zaveri, Amrapali

AU - Serrano, Pedro Hernandez

AU - Desai, Manisha

AU - Dumontier, Michel

PY - 2018

Y1 - 2018

N2 - Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks?'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.

AB - Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks?'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.

KW - biomedical

KW - crowdsourcing

KW - data quality

KW - data science

KW - fair

KW - metadata

KW - reproducibility

U2 - 10.1145/3184558.3191543

DO - 10.1145/3184558.3191543

M3 - Conference article in proceeding

SN - 978-1-4503-5640-4

T3 - WWW '18

SP - 1109

EP - 1116

BT - Companion Proceedings of the The Web Conference 2018

PB - International World Wide Web Conferences Steering Committee

CY - Republic and Canton of Geneva, Switzerland

ER -