Compliance Rating Scheme: Data Provenance for Dataset Use in Generative AI Applications

Matyas Bohacek; Ignacio Vilanova Echavarri

doi:10.23106/dcmi.952486058

Poster

Compliance Rating Scheme: Data Provenance for Dataset Use in Generative AI Applications

Matyas Bohacek

, Ignacio Vilanova Echavarri

Download PDF Read Online

Abstract

Generative Artificial Intelligence (GAI) has experienced exponential growth in recent years, partly facilitated by the abundance of open-source large-scale datasets. These datasets are often built using unrestricted and opaque data collection practices. While most literature focuses on the development and applications of GAI models, the ethical and legal considerations surrounding the creation of these datasets are often neglected. Specifically, the information about their origin, legitimacy, and safety often gets lost. To address this, we conceptualize the Compliance Rating Scheme (CRS) as a tool to evaluate a given dataset’s compliance with a set of practical principles, enabling developers and regulators to gauge and verify the transparency, accountability, and security of these resources. We open-source a Python library built around these principles, allowing the integration of this tool into existing pipelines.

Author information

Matyas Bohacek

Stanford University, United States

ORCID Google Scholar Semantic Scholar

Ignacio Vilanova Echavarri

Imperial College London, United Kingdom

ORCID Google Scholar Semantic Scholar

Cite this article

Select Citation Style

Bohacek, M., & Vilanova Echavarri, I. (2024). Compliance Rating Scheme: Data Provenance for Dataset Use in Generative AI Applications. International Conference on Dublin Core and Metadata Applications, 2024. https://doi.org/10.23106/dcmi.952486058

DOI : 10.23106/dcmi.952486058

Published: 2024-12-20
https://doi.org/10.23106/dcmi.952486058

Metadata and citations of this article is published under the Creative Commons Zero Universal Public Domain Dedication (CC0), allowing unrestricted reuse. Anyone can freely use the metadata from DCPapers articles for any purpose without limitations.

This article full-text is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license allows use, sharing, adaptation, distribution, and reproduction in any medium or format, provided that appropriate credit is given to the original author(s) and the source is cited.