Investigating antiquities trafficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs: A case study

Shawn Graham; Donna Yates; Ahmed El-Roby

doi:10.12688/openreseurope.16003.1

Investigating antiquities trafficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs: A case study

Shawn Graham, Donna Yates^*, Ahmed El-Roby

^*Corresponding author for this work

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Background: There is a wide variety of potential sources from which insight into the antiquities trade could be culled, from newspaper articles to auction catalogues, to court dockets, to personal archives, if it could all be systematically examined. We explore the use of a large language model, GPT-3, to semi-automate the creation of a knowledge graph of a body of scholarship concerning the antiquities trade. Methods: We give GPT-3 a prompt guiding it to identify knowledge statements around the trade. Given GPT-3’s understanding of the statistical properties of language, our prompt teaches GPT-3 to append text to each article we feed it where the appended text summarizes the knowledge in the article. The summary is in the form of a list of subject, predicate, and object relationships, representing a knowledge graph. Previously we created such lists by manually annotating the source articles. We compare the result of this automatic process with a knowledge graph created from the same sources via hand. When such knowledge graphs are projected into a multi-dimensional embedding model using a neural network (via the Ampligraph open-source Python library), the relative positioning of entities implies the probability of a connection; the direction of the positioning implies the kind of connection. Thus, we can interrogate the embedding model to discover new probable relationships. The results can generate new insight about the antiquity trade, suggesting possible avenues of research. Results: We find that our semi-automatic approach to generating the knowledge graph in the first place produces comparable results to our hand-made version, but at an enormous savings of time and a possible expansion of the amount of materials we can consider. Conclusions: These results have implications for working with other kinds of archaeological knowledge in grey literature, reports, articles, and other venues via computational means.

Original language	English
Article number	100
Journal	Open Research Europe
Volume	3
DOIs	https://doi.org/10.12688/openreseurope.16003.1
Publication status	Published - 1 Jan 2023

Keywords

antiquities trade
antiquities trafficking
art market
gpt3
illicit antiquities
knowledge graph
knowledge graph embedding model
Large language models

Access to Document

10.12688/openreseurope.16003.1Licence: CC BY

Cite this

@article{68d1e9b41e334825a76ca7cbd2703271,

title = "Investigating antiquities trafficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs: A case study",

abstract = "Background: There is a wide variety of potential sources from which insight into the antiquities trade could be culled, from newspaper articles to auction catalogues, to court dockets, to personal archives, if it could all be systematically examined. We explore the use of a large language model, GPT-3, to semi-automate the creation of a knowledge graph of a body of scholarship concerning the antiquities trade. Methods: We give GPT-3 a prompt guiding it to identify knowledge statements around the trade. Given GPT-3{\textquoteright}s understanding of the statistical properties of language, our prompt teaches GPT-3 to append text to each article we feed it where the appended text summarizes the knowledge in the article. The summary is in the form of a list of subject, predicate, and object relationships, representing a knowledge graph. Previously we created such lists by manually annotating the source articles. We compare the result of this automatic process with a knowledge graph created from the same sources via hand. When such knowledge graphs are projected into a multi-dimensional embedding model using a neural network (via the Ampligraph open-source Python library), the relative positioning of entities implies the probability of a connection; the direction of the positioning implies the kind of connection. Thus, we can interrogate the embedding model to discover new probable relationships. The results can generate new insight about the antiquity trade, suggesting possible avenues of research. Results: We find that our semi-automatic approach to generating the knowledge graph in the first place produces comparable results to our hand-made version, but at an enormous savings of time and a possible expansion of the amount of materials we can consider. Conclusions: These results have implications for working with other kinds of archaeological knowledge in grey literature, reports, articles, and other venues via computational means.",

keywords = "antiquities trade, antiquities trafficking, art market, gpt3, illicit antiquities, knowledge graph, knowledge graph embedding model, Large language models",

author = "Shawn Graham and Donna Yates and Ahmed El-Roby",

note = "Funding Information: This research was financially supported by the European Union{\textquoteright}s Horizon 2020 research and innovation programme under the grant agreement No 804851 (Trafficking transformations: objects as agents in transnational criminal networks [TRANSFORM]) to DY; and the Social Sciences and Humanities Research Council of Canada. Publisher Copyright: Copyright: {\textcopyright} 2023 Graham S et al.",

year = "2023",

month = jan,

day = "1",

doi = "10.12688/openreseurope.16003.1",

language = "English",

volume = "3",

journal = "Open Research Europe",

issn = "2732-5121",

publisher = "F1000 Research Ltd.",

}

TY - JOUR

T1 - Investigating antiquities trafficking with generative pre-trained transformer (GPT)-3 enabled knowledge graphs

T2 - A case study

AU - Graham, Shawn

AU - Yates, Donna

AU - El-Roby, Ahmed

N1 - Funding Information: This research was financially supported by the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 804851 (Trafficking transformations: objects as agents in transnational criminal networks [TRANSFORM]) to DY; and the Social Sciences and Humanities Research Council of Canada. Publisher Copyright: Copyright: © 2023 Graham S et al.

PY - 2023/1/1

Y1 - 2023/1/1

N2 - Background: There is a wide variety of potential sources from which insight into the antiquities trade could be culled, from newspaper articles to auction catalogues, to court dockets, to personal archives, if it could all be systematically examined. We explore the use of a large language model, GPT-3, to semi-automate the creation of a knowledge graph of a body of scholarship concerning the antiquities trade. Methods: We give GPT-3 a prompt guiding it to identify knowledge statements around the trade. Given GPT-3’s understanding of the statistical properties of language, our prompt teaches GPT-3 to append text to each article we feed it where the appended text summarizes the knowledge in the article. The summary is in the form of a list of subject, predicate, and object relationships, representing a knowledge graph. Previously we created such lists by manually annotating the source articles. We compare the result of this automatic process with a knowledge graph created from the same sources via hand. When such knowledge graphs are projected into a multi-dimensional embedding model using a neural network (via the Ampligraph open-source Python library), the relative positioning of entities implies the probability of a connection; the direction of the positioning implies the kind of connection. Thus, we can interrogate the embedding model to discover new probable relationships. The results can generate new insight about the antiquity trade, suggesting possible avenues of research. Results: We find that our semi-automatic approach to generating the knowledge graph in the first place produces comparable results to our hand-made version, but at an enormous savings of time and a possible expansion of the amount of materials we can consider. Conclusions: These results have implications for working with other kinds of archaeological knowledge in grey literature, reports, articles, and other venues via computational means.

AB - Background: There is a wide variety of potential sources from which insight into the antiquities trade could be culled, from newspaper articles to auction catalogues, to court dockets, to personal archives, if it could all be systematically examined. We explore the use of a large language model, GPT-3, to semi-automate the creation of a knowledge graph of a body of scholarship concerning the antiquities trade. Methods: We give GPT-3 a prompt guiding it to identify knowledge statements around the trade. Given GPT-3’s understanding of the statistical properties of language, our prompt teaches GPT-3 to append text to each article we feed it where the appended text summarizes the knowledge in the article. The summary is in the form of a list of subject, predicate, and object relationships, representing a knowledge graph. Previously we created such lists by manually annotating the source articles. We compare the result of this automatic process with a knowledge graph created from the same sources via hand. When such knowledge graphs are projected into a multi-dimensional embedding model using a neural network (via the Ampligraph open-source Python library), the relative positioning of entities implies the probability of a connection; the direction of the positioning implies the kind of connection. Thus, we can interrogate the embedding model to discover new probable relationships. The results can generate new insight about the antiquity trade, suggesting possible avenues of research. Results: We find that our semi-automatic approach to generating the knowledge graph in the first place produces comparable results to our hand-made version, but at an enormous savings of time and a possible expansion of the amount of materials we can consider. Conclusions: These results have implications for working with other kinds of archaeological knowledge in grey literature, reports, articles, and other venues via computational means.

KW - antiquities trade

KW - antiquities trafficking

KW - art market

KW - gpt3

KW - illicit antiquities

KW - knowledge graph

KW - knowledge graph embedding model

KW - Large language models

U2 - 10.12688/openreseurope.16003.1

DO - 10.12688/openreseurope.16003.1

M3 - Article

SN - 2732-5121

VL - 3

JO - Open Research Europe

JF - Open Research Europe

M1 - 100

ER -