To Torrent or Not to Torrent?: That is The Question

The controversy around A.I. continues with the publication of recently unsealed emails from Meta in a copyright case against them for illegally training its A.I. models on pirated books. The emails show that Meta torrented “at least 81.7 terabytes of data across different shadow libraries through the site Anna’s Archives.”
The complaint in said case, Kadrey v. Meta Platforms, Inc., is a reminder that companies training A.I. models need vast amounts of data to train their models, but this risks massive copyright liability. “Vastly smaller acts of data piracy—Just .008 percent of the amount of copyrighted works Meta pirated–have resulted in Judges referring the conduct to the US Attorney’s office for criminal investigation.”
As a Meta employee stated “[t]orrenting from a corporate laptop doesn’t feel right.” Yet the word “torrent” may seem alien to those not familiar with downloading terabytes of data, or with no background in computer science.
The plaintiffs allege that Meta employees were aware of the illegal, copyright infringing behavior they were engaging in, and continued anyway
As explained by Tech Radar, “[T]orrenting uses a peer-to-peer model for file transfer. Instead of downloading a file from a single source in one continuous stream, torrenting breaks the file into smaller pieces that can be rapidly distributed between peers in the ‘swarm.’ Each peer in a torrent swarm is responsible for uploading and downloading parts of the file simultaneously. Instead of file transfer being limited to the upload speed of the server as in the client-server model, each peer in the swarm can use their full bandwidth to distribute the file.” Further, without the use of a virtual private network (VPN), the torrentor’s IP address is visible. Some of Meta’s emails speak to this fact potentially being known by its employees and an effort to hide the IP address of Meta’s network.
These emails show, the plaintiffs allege, that Meta employees were aware of the illegal, copyright infringing behavior they were engaging in, and continued anyway. Meta maintains their use of the data constitutes “fair use” under copyright law.
What this complaint, and litigation more generally highlights, is a “do now, ask for forgiveness later” mentality of companies training large language models. Said models, put very simply, need access to vast amounts of data in order to write comprehensive responses to inquiries. The more data the models can pull from, the reasoning goes, the more sophisticated its answers can be.
The new frontier of A.I., and the possibility of substantial financial return for firms in this space to develop the best models both encourage a “take now and pay settlements for copyright violations later” philosophy. This is to say nothing about Meta using its own customers data “to train models by default, without explicit consent.”
As the Meta case unfolds, there are questions as to the precedent a case like this could set for future A.I. data use and training. Some commentators have spoken on the fact that using large data sets in this way may fall under the “fair use” doctrine of copyright law. Some courts have held that this could be true, such as a Delaware court holding before altering its opinion that, “[An A.I.] developer used copyrighted works only ‘as a step in the process of trying to develop a “wholly new,” albeit competing, product … that’s … transformative intermediate copying, [or fair use].’”
Regardless of how much weight the fair use doctrine will carry in the training of A.I. models, the facts alleged against Meta likely do not point in the direction of them winning the aforementioned case. Indeed, the case may end in a settlement as have other cases dealing with similar concerns. A.I. proponents, it’s fair to say, fear a case of copyright infringement being appealed to higher courts on the chance that they lose and the entire A.I. development-complex suddenly stares down the barrel of huge payments necessitated by their potentially illegal data piracy, torrenting, etc.
Regardless, if Meta or others like Google settle their cases out of court, the Meta controversy shows that, sooner or later, an owner of a copyright may likely challenge these companies to the bitter end and commentators, or the counsel for firms like Meta, can only speculate as to who will win and who will pay.
Brett R. Goble
Brett Rodney Goble is a 2L and the University of North Carolina School of Law. He graduated from Centre College with a Bachelor of Arts in Political Science in 2022. He enjoys chess, cooking, and distance running.