CrowdED: Guideline for Optimal Crowdsourcing Experimental Design

Amrapali Zaveri, Pedro Hernandez Serrano*, Manisha Desai, Michel Dumontier

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingAcademicpeer-review

101 Downloads (Pure)


Crowdsourcing involves the creating of HITs (Human Intelligent Tasks), submitting them to a crowdsourcing platform and providing a monetary reward for each HIT. One of the advantages of using crowdsourcing is that the tasks can be highly parallelized, that is, the work is performed by a high number of workers in a decentralized setting. The design also offers a means to cross-check the accuracy of the answers by assigning each task to more than one person and thus relying on majority consensus as well as reward the workers according to their performance and productivity. Since each worker is paid per task, the costs can significantly increase, irrespective of the overall accuracy of the results. Thus, one important question when designing such crowdsourcing tasks that arise is how many workers to employ and how many tasks to assign to each worker when dealing with large amounts of tasks. That is, the main research questions we aim to answer is: 'Can we a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks?'. Thus, we introduce a two-staged statistical guideline, CrowdED, for optimal crowdsourcing experimental design in order to a-priori estimate optimal workers and tasks' assignment to obtain maximum accuracy on all tasks. We describe the algorithm and present preliminary results and discussions. We implement the algorithm in Python and make it openly available on Github, provide a Jupyter Notebook and a R Shiny app for users to re-use, interact and apply in their own crowdsourcing experiments.
Original languageEnglish
Title of host publicationCompanion Proceedings of the The Web Conference 2018
Place of PublicationRepublic and Canton of Geneva, Switzerland
PublisherInternational World Wide Web Conferences Steering Committee
Number of pages8
ISBN (Print)978-1-4503-5640-4
Publication statusPublished - 2018

Publication series

SeriesWWW '18


  • biomedical
  • crowdsourcing
  • data quality
  • data science
  • fair
  • metadata
  • reproducibility

Cite this