Brookings - The Case for Consent in the AI Data Gold Rush
CJL director Courtney Radsch talks about explicit user consent being essential in the AI data collection process to protect individual privacy and autonomy.
An OpenAI whistleblower who quit in protest over the company’s use of copyright-protected data was found dead in his apartment at the end of last year, less than two months after posting an essay laying out the technical and legal reasons why he believed the company was breaking the law. Having spent four years working at the company, Suchir Balaji quit because he was so sure not only that what they were doing was illegal, but also that the harms—to society, the open web, and human creators—outweighed the benefits. He subsequently became a key witness in the New York Times lawsuit against OpenAI and Microsoft.
The certainty of this young researcher, who worked on the underlying model for ChatGPT, stands in stark contrast to the uncertainty of the U.S. and British governments, which continue to vacillate on how to treat copyright when it comes to text and data mining for AI training.
Last month, the U.K. launched a new consultation on AI and copyright, following failed efforts earlier in the year to develop a voluntary code. And the U.S. House of Representatives released a bipartisan report on AI that basically punted on the question and deferred to the courts, a process that will take years to resolve.
As the arms race to develop bigger, better, and more capable AI models has accelerated since the release of ChatGPT in late 2022, so too has the tension between AI companies and the publishers, content creators, and website owners whose data they depend on to win the race. OpenAI and the New York Times went in front of a judge earlier this week to argue over whether the publisher’s copyright infringement case should be dismissed under the fair use doctrine.
Read full article here.