A company called Anthropic uses a web crawler named ClaudeBot to gather information for training its AI models. This crawler recently visited the website of iFixit, a company that provides repair guides and resources, nearly a million times in a single day!
iFixit Says It’s Not Okay
iFixit believes this scraping violates their terms of service, which state that no one can copy or use their content for training AI models without permission. The company’s CEO, Kyle Wiens, pointed out that this excessive scraping uses up their computer resources. He offered to discuss licensing their content with Anthropic if they want to continue using it.
Anthropic Defends Its Actions
When questioned, Anthropic claims website owners can block their crawler using a specific tool. However, this method doesn’t allow website owners to choose what kind of scraping is allowed.
Is This a Common Practice?
Apparently, iFixit isn’t alone. Other websites have also reported aggressive scraping by ClaudeBot. This isn’t a new issue either, with reports of similar activity dating back months.
The Debate Continues
There’s no clear answer on how website owners can control who scrapes their data for AI training. Some companies, like OpenAI, use a similar method to Anthropic, while others completely ignore website restrictions.
This situation highlights the ongoing debate about how AI companies should gather training data and how website owners can protect their content.