Abstract
For many standard as well as emerging criminal law Web 2.0 applications, such as the development of mashups and dataspace systems, privacy preserving data integration is of crucial importance. In many organizations different databases contain different kinds of data concerning the same entity. This may have several good reasons. However, to have an integral and unified view of an entity, data reconciliation is of crucial importance. In this paper, we present an approach for data reconciliation that is based on available schemata of data sources and the content of the sources. The different schemata of data sources are used to determine what parts of the schemata pertain to the same entity type. The content of the sources is used to determine the association between different attributes stored in different sources. In establishing the relationships between different attributes, we have exploited the knowledge of domain experts as well. On the basis of the collected information, we identify a common set of attributes with regard to the data sources. A similarity function is associated to each attribute, which takes a record from each data source as input and computes a similarity value as output expressing how «similar» the records are. Depending on the similarity value, we decide whether or not to reconcile two entities. We illustrate the effectiveness of our approach by means of a real-life case in the field of police and justice. Our approach can be applied to support the development of a wide variety of criminal law applications, such as data warehouses, mashups, and dataspace systems.
Original language | English |
---|---|
Pages (from-to) | 125-138 |
Journal | Information Polity |
Volume | 15 |
Issue number | 1-2 |
DOIs | |
Publication status | Published - 1 Jan 2010 |