Reddit Strengthens Policy Against AI Bots, Data Scraping

Reddit announced it will start blocking most automated bots from accessing the platform's public data, preventing others from using posts for AI training.

The company will update its robots.txt file to signal unauthorized companies to access Reddit's data.

Reddit, Quora Appearing More at the Top of Google Search Results
Brett Jordan via Unsplash

Reddit to Ban Automated Bots With Updated Policy

The online forum company stated that the Robots Exclusion Protocol will be updated to increase the security against data scraping. The improved policy is expected to roll out in the coming weeks.

Reddit also shared that it will continue to rate-limit and block unknown bots and crawlers from accessing the website. Regardless, the company assured its majority of users that they would not affect the platform's performance.

"It's also a signal to bad actors that the word 'allow' in robots.txt doesn't mean, and has never meant, that they can use the data however they want," said Ben Lee, chief legal officer at Reddit.

How Reddit Robots Exclusion Protocol Works

The robots.txt file is commonly used to allow search engines and third-party sites to scrape a site. It is used to direct people to certain content, allowing individuals to gather data easily.

However, the surge of AI models has increased the incidence of unauthorized data scraping. Several websites have complained about getting their data scraped without acknowledging them as actual sources or asking for consent.

In a blog post, Reddit shared that it will remain open to "good faith actors" citing researchers and organizations like the Internet Archive. Those who will use the data for non-commercial use would have no problem accessing the site.

Once the protocol has been updated, search engines will have to make a licensing deal to use Reddit as a source of data for training AI models and other commercial purposes. Several AI companies like Google and OpenAI are currently striking deals with websites and publications.

© 2024 iTech Post All rights reserved. Do not reproduce without permission.

Tags Reddit AI

More from iTechPost

Real Time Analytics