TY - JOUR
T1 - Data Scraping for the Training of Generative AI
T2 - Lessons from Chinese Case Law and Regulation
AU - Li, Qian
AU - Kollnig, Konrad
PY - 2024/4/15
Y1 - 2024/4/15
N2 - The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.
AB - The collection of data from websites at great scale–so-calleddata scraping–is the foundation for ChatGPT and most otherGenerative AI (GenAI) tools. Much of the previous discussion onthe regulation of GenAI has focused on the US and EU and notso much on more technical aspects like data scraping. In re-sponse, this article focuses on the regulation of data scraping tobuild and deploy GenAI in China, and reviews applicable regu-lation and case law. We find that the sectoral approach to AIregulation in China provides important insights into balancingtechnological progress and societal values, diverging from thelaissez-faire attitude in the US and the horizontal approach withthe AI Act in the EU.
U2 - 10.9785/cri-2024-250201
DO - 10.9785/cri-2024-250201
M3 - Article
SN - 1610-7608
VL - 25
SP - 33
EP - 41
JO - Computer Law Review International
JF - Computer Law Review International
IS - 2
ER -