AI Companies Might Have to Disclose Their Sources of Traning Data Soon

Several AI companies have already gone through many lawsuits regarding their use of copyrighted materials to train their AI models. With the new proposed bill, AI companies will have to be more transparent about their sources.

AI Foundation Model Transparency Act

This could be the beginning of a new practice for AI companies, which could even benefit them as it would reduce the number of lawsuits that are thrown their way. If the bill is passed, AI developers would have to reveal their sources so copyright owners would know if their works are used.

The bill was filed by two lawmakers, D-CA Representative Anna Eshoo, and D-VA Representative Don Beyer, Booth would direct the Federal Trade Commission (FTC), as it works with the National Institute of Standards and Technology (NIST).

Other than full disclosure, all parties will help create rules that would help creators prevent infringement from AI developers. Once the bill is passed, AI companies will have to go through many processes first before the training data can be used.

Other than reporting where the training data was scraped from, the AI companies will also disclose how the data is kept as it is used, as well as describe the limitations and risks of the model, as reported by The Verge.

The model will have to abide by the rules of the NIST's planned AI Risk Management Framework and any other federal rules that may be created. While this can slow down the process of development, it can significantly help creators who had their work used without consent.

The bill even plans to address the kind of data that is generated by AI models. For instance, developers will have to report to a "red team," which aims to prevent the generation of inaccurate and harmful information when it comes to health-related questions.

It states that with the increase in public access to AI, there has also been an increase in lawsuits and public concerns about copyright infringement. "Public use of foundation models has led to countless instances of the public being presented with inaccurate, imprecise, or biased information."

The End of Unconsented Use of Scraped Data

No AI company has ever admitted that it has used publicly available copyrighted or private data with the consent of its owners, but there have been instances where this seems to be the case. OpenAI's ChatGPT, for instance, was able to provide private information using certain prompts.

The company has already fixed the issue, but a group of researchers managed to find the right prompt to have ChatGPT generate private information like email addresses, names, phone numbers of firms, fax numbers, and even Bitcoin addresses, as per Engadget.

Not only does this become a way for threat actors to dig up private information, but it also proves that the AI model was trained using private data as it can only provide what the developers feed the AI model.

What's worse is that OpenAI is not the only one accused of this. Other AI companies have been allegedly using copyrighted images and artwork as well for their AI image generators. Several copyright lawsuits have already been filed for this issue alone.

AI Companies Might Have to Disclose Their Sources of Traning Data Soon

AI Foundation Model Transparency Act

The End of Unconsented Use of Scraped Data

More from iTechPost

Max Multiview Feature Is Available for 2025 NASCAR Cup Series and Offers a Driver POV

Google's Gemini 2.0 Pro Experimental Now Live But Only For Advanced Users — What to Expect

'Call of Duty: Black Ops 6' Season 2 Features an Upcoming 'TMNT' Crossover, per Leakers

Waymo Expands Driverless Tests to 10 New Cities This Year — Here's Where Robotaxis Are Coming

AI Foundation Model Transparency Act

The End of Unconsented Use of Scraped Data

Sign Up for the iTechPost Newsletter