In the rapidly evolving world of artificial intelligence, the landscape of data sourcing is undergoing a significant transformation. Previously, major generative AI tools were primarily trained on data scraped from the internet. However, with the increasing restrictions on access to such data sources and the push for licensing agreements, the industry is now witnessing a shift towards more ethical data practices.
As the hunt for additional data sources intensifies, new licensing startups have emerged to ensure a continuous flow of source material for AI training. The Dataset Providers Alliance, a trade group composed of seven AI licensing companies, including Rightsify, Pixta, and Calliope Networks, aims to standardize and create a fair playing field in the AI industry. By advocating for an opt-in system, the alliance emphasizes the importance of obtaining explicit consent from creators and rights holders before using their data.
The concept of opt-in systems is seen as a more ethical approach by the Dataset Providers Alliance. According to Alex Bestall, CEO of Rightsify and the Global Copyright Exchange, selling publicly available datasets without consent could lead to legal issues and damage a company’s credibility. Ed Newton-Rex, from the ethical AI nonprofit Fairly Trained, also supports opt-ins, stating that opt-out systems can be fundamentally unfair to creators, as some may not even be aware of such options.
While the push for opt-in systems is commendable, there are challenges that come with such a standard. Shayne Longpre, from the Data Provenance Initiative, acknowledges the difficulty of sourcing data ethically, especially considering the vast amounts required by modern AI models. He raises concerns that only large tech companies may be able to afford the licensing fees for such extensive datasets, potentially leading to a data monopoly in the industry.
In their position paper, the Dataset Providers Alliance expresses opposition to government-mandated licensing, instead advocating for a free market approach where data originators and AI companies negotiate directly. The alliance also proposes various compensation structures to ensure that creators and rights holders are fairly compensated for their data, including subscription-based models, usage-based licensing, and outcome-based licensing tied to profit. These structures are designed to work across various industries, from music to film to books.
Overall, the shift towards ethical data sourcing and the emphasis on obtaining consent through opt-in systems are crucial steps towards creating a more standardized and fair AI industry. While challenges exist, such as the affordability of licensing fees for extensive datasets, the efforts of organizations like the Dataset Providers Alliance are paving the way for a more transparent and ethical approach to AI data licensing.
Leave a Reply