Reddit announced it will start blocking most automated bots from accessing the platform's public data, preventing others from using posts for AI training.
The company will update its robots.txt file to signal unauthorized companies to access Reddit's data.
Reddit to Ban Automated Bots With Updated Policy
The online forum company stated that the Robots Exclusion Protocol will be updated to increase the security against data scraping. The improved policy is expected to roll out in the coming weeks.
Reddit also shared that it will continue to rate-limit and block unknown bots and crawlers from accessing the website. Regardless, the company assured its majority of users that they would not affect the platform's performance.
"It's also a signal to bad actors that the word 'allow' in robots.txt doesn't mean, and has never meant, that they can use the data however they want," said Ben Lee, chief legal officer at Reddit.
How Reddit Robots Exclusion Protocol Works
The robots.txt file is commonly used to allow search engines and third-party sites to scrape a site. It is used to direct people to certain content, allowing individuals to gather data easily.
However, the surge of AI models has increased the incidence of unauthorized data scraping. Several websites have complained about getting their data scraped without acknowledging them as actual sources or asking for consent.
In a blog post, Reddit shared that it will remain open to "good faith actors" citing researchers and organizations like the Internet Archive. Those who will use the data for non-commercial use would have no problem accessing the site.
Once the protocol has been updated, search engines will have to make a licensing deal to use Reddit as a source of data for training AI models and other commercial purposes. Several AI companies like Google and OpenAI are currently striking deals with websites and publications.
Related Article : OpenAI to Bring Reddit Forum Boards to ChatGPT