Reddit is limiting its availability to the Web Archive's Wayback Machine

The Web Archive’s Wayback Machine is the most recent sufferer of Reddit’s crackdown on knowledge entry. The corporate has begun to put new restrictions on what the archive website will be capable to entry in a transfer that may considerably restrict the Wayback Machine’s capability to protect data from Reddit.

With the change, the Wayback Machine, a venture run by the nonprofit Web Archive, will solely be capable to crawl Reddit’s homepage. It is going to now not be capable to entry feedback, subreddit pages, publish particulars, profiles and different knowledge.

The transfer is the most recent step Reddit has taken on its quest to restrict AI corporations’ capability to make use of its knowledge to coach massive language fashions with out paying licensing fees. It is also a notably totally different stance than the corporate took final 12 months, when it explicitly mentioned that it might not restrict “good religion actors,” including the Web Archive. It isn’t clear what precisely has modified since then. Reddit appears to consider that AI corporations are circumventing its guidelines by scraping knowledge through the Wayback Machine. We have reached out to the Web Archive for remark.

Information licensing has change into a big enterprise for Reddit. The corporate has struck multimillion-dollar offers with OpenAI and Google that permit them to make use of Reddit posts to assist prepare their AI fashions. On the similar time, Reddit has taken an more and more hardline stance in opposition to corporations that try to make use of its knowledge with out such preparations. Earlier this 12 months, the corporate sued Anthropic, alleging it scraped Reddit for years with out permission.

Trending Merchandise