Poster
Compliance Rating Scheme: Data Provenance for Dataset Use in Generative AI Applications
Download PDF Read OnlineGenerative Artificial Intelligence (GAI) has experienced exponential growth in recent years, partly facilitated by the abundance of open-source large-scale datasets. These datasets are often built using unrestricted and opaque data collection practices. While most literature focuses on the development and applications of GAI models, the ethical and legal considerations surrounding the creation of these datasets are often neglected. Specifically, the information about their origin, legitimacy, and safety often gets lost. To address this, we conceptualize the Compliance Rating Scheme (CRS) as a tool to evaluate a given dataset’s compliance with a set of practical principles, enabling developers and regulators to gauge and verify the transparency, accountability, and security of these resources. We open-source a Python library built around these principles, allowing the integration of this tool into existing pipelines.
Author information
Cite this article
DOI : 10.23106/dcmi.952486058
Published