Data Scraping for the Training of Generative AI: Lessons from Chinese Case Law and Regulation

Qian Li; Konrad Kollnig

doi:10.9785/cri-2024-250201

Data Scraping for the Training of Generative AI: Lessons from Chinese Case Law and Regulation

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.

Original language	English
Pages (from-to)	33-41
Journal	Computer Law Review International
Volume	25
Issue number	2
DOIs	https://doi.org/10.9785/cri-2024-250201
Publication status	Published - 15 Apr 2024

Access to Document

10.9785/cri-2024-250201

https://www.degruyter.com/document/doi/10.9785/cri-2024-250201/html

Cite this

@article{4c0f862fada742119d61a5692281f579,

title = "Data Scraping for the Training of Generative AI: Lessons from Chinese Case Law and Regulation",

abstract = "The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.",

author = "Qian Li and Konrad Kollnig",

year = "2024",

month = apr,

day = "15",

doi = "10.9785/cri-2024-250201",

language = "English",

volume = "25",

pages = "33--41",

journal = "Computer Law Review International",

issn = "1610-7608",

number = "2",

}

TY - JOUR

T1 - Data Scraping for the Training of Generative AI

T2 - Lessons from Chinese Case Law and Regulation

AU - Li, Qian

AU - Kollnig, Konrad

PY - 2024/4/15

Y1 - 2024/4/15

N2 - The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.

AB - The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.

U2 - 10.9785/cri-2024-250201

DO - 10.9785/cri-2024-250201

M3 - Article

SN - 1610-7608

VL - 25

SP - 33

EP - 41

JO - Computer Law Review International

JF - Computer Law Review International

IS - 2

ER -