Reddit Files Federal Lawsuit Alleging Perplexity AI Built $20B Business on Stolen Data

Reddit Files Federal Lawsuit Alleging Perplexity AI Built $2 - Legal Battle Over AI Training Data Reddit has initiated federa

Legal Battle Over AI Training Data

Reddit has initiated federal legal proceedings against artificial intelligence company Perplexity and multiple data scraping firms, accusing them of systematically stealing proprietary content to build AI products, according to court documents filed in Manhattan. The lawsuit alleges these companies illegally circumvented digital security measures to access Reddit’s data for training AI models without compensation or permission.

Allegations of Systematic Data Theft

The legal complaint states that Perplexity’s AI tools utilized Reddit comments to generate user responses even after the company had agreed to respect the social media platform’s data protection protocols. Sources indicate that Reddit sent a formal cease-and-desist letter to Perplexity in May 2024, demanding the AI company stop scraping Reddit data unless it established a formal partnership similar to agreements Reddit has with Google and OpenAI.

According to the lawsuit, Perplexity initially claimed it “was not using Reddit content to train any AI models and that it would respect Reddit’s robots.txt” protocols. However, the legal filing alleges that Perplexity’s citations to Reddit content increased “forty-fold after Reddit told it to stop,” suggesting the company developed alternative methods to access the restricted data.

Circumvention Techniques Alleged

The legal documents describe what Reddit characterizes as “increasingly devious schemes to circumvent Reddit’s security systems and policies.” Analysts suggest Perplexity may have employed third-party data scrapers to access Reddit content indirectly through Google’s search engine results, effectively bypassing Reddit’s direct security measures.

“In other words, Perplexity’s business model is effectively to take Reddit’s content from Google search results, feed them into a third party’s LLM, and call it a new product,” the lawsuit states. The legal filing further notes that while this approach “has somehow translated into a $20 billion valuation, it has not resulted in a willingness to pay for what others (including Google) have.”, according to market trends

Additional Defendants Named

The lawsuit also targets data scraping firms Oxylabs UAB, AWMProxy, and SerpApi, which the complaint identifies as companies that systematically harvest internet data for resale to artificial intelligence companies. Reddit’s legal team alleges these firms functioned as data intermediaries, with Perplexity potentially utilizing at least one of their services.

Reddit’s chief legal officer Ben Lee characterized these companies as “textbook examples” of illegal scrapers in a statement to Business Insider. “Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material,” Lee stated. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Corporate Responses and Defense

Perplexity spokesperson Jesse Dwyer defended the company‘s practices, stating they “will always fight vigorously for users’ rights to freely and fairly access public knowledge.” In his response to Business Insider, Dwyer added that “our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.”

The lawsuit draws a dramatic comparison between the data scraping operations and criminal activity, alleging: “In a very real sense, these Defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead.”

Significant Investment in Protection

According to reports, Reddit has invested tens of millions of dollars in developing anti-scraping systems that the lawsuit claims these companies successfully circumvented. A Reddit spokesperson confirmed the substantial financial commitment to Business Insider, highlighting the platform’s ongoing battle against unauthorized data harvesting.

Representatives for SerpApi and Oxylabs did not immediately respond to requests for comment from Business Insider. AWMProxy, identified in the lawsuit as a former Russian botnet operation, could not be reached for comment at the time of reporting.

This legal action emerges amid growing tensions between content platforms and AI companies regarding data sourcing practices, with multiple high-profile cases testing the boundaries of fair use and intellectual property rights in the age of artificial intelligence development.

References & Further Reading

This article draws from multiple authoritative sources. For more information, please consult:

This article aggregates information from publicly available sources. All trademarks and copyrights belong to their respective owners.

Note: Featured image is for illustrative purposes only and does not represent any specific product, service, or entity mentioned in this article.

Leave a Reply

Your email address will not be published. Required fields are marked *