NVIDIA say they are not violating copyright in ‘scraping’ 80-years-worth of videos daily to train AI models

NVIDIA say they are harvesting information from the internet, not violating copyright. Correct Incorrect
NVIDIA say they are harvesting information from the internet, not violating copyright. Correct Incorrect

RELATED: Definition of copyright – does copyright of images still apply to the Internet?

Much of the content that NVIDIA scrapes from the Internet – and it is allegedly a huge amount – has been taken from YouTube and Netflix. This is according to Tom’s Guide’s report.

You may have heard of the AI companies “scraping” information from the Internet (harvesting information from the Internet using AI bots) in order to build a database upon which artificial intelligence can operate in providing answers to queries by Internet users in their chat bots.

But the problem is that they are allegedly stealing information from websites and in doing so they are also allegedly in breach of copyright on many occasions. But we don’t know how often they are in breach of copyright but we do know for certain that they are mining the Internet for all the information that it contains in order, then, to encourage people to bypass the Internet search engines and listed websites to find answers. Or the search engines use AI to provide results. This once again bypasses websites; the source of the information. It is highly unfair and unethical. They don’t mind because they can get away with it.

And the more we learn about how AI is built the more reports pop up of companies using copyrighted content to train AI without permission.

And here’s the interesting bit. NVIDIA was contacted by a representative of Tom’s Guide about these alleged breaches of copyright and data theft. And their response indicates an arrogance born of dominance. They said that they “respect the rights all of content creators” while saying that their research efforts are “in full compliance with the letter and the spirt of copyright law.”

And they added that “Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions.”

In other words, they say they are taking information, facts, ideas and data from the Internet which is freely available. There doing nothing unusual. People do this all the time when they write books or articles. They read other books and other articles or studies and then they write up what they read in their own words. So, they claim that they are not in breach of copyright because information per se is not copyrighted.

This begs the argument about what is copyright and how it overlaps with facts and information. Facts, ideas and information are generally not protected by copyright but it would seem that this information is merged with copyright and therefore you can’t extract facts while not touching copyright. Their argument doesn’t necessarily absolve them from copyright infringement. That’s the counter allegation.

And scraping large volumes of copyrighted videos from the internet without permission could still be considered copyright infringement even if the AI model is not directly copying the creative expression. Courts have found that wholesale copying of copyrighted works for the purpose of training AI can violate copyright holder’s exclusive rights.

Further, the “fair use” doctrine provides limited exceptions to copyright infringement for purposes like research, education and transformative use. The large-scale commercial use of copyrighted works to train AI models may not necessarily qualify as fair use.

Although, copyright law continues to evolve within the digital age, which leads to the fact that AI training via data extraction is an area of active legal debate and uncertainty. Courts have not yet established clear precedent on the permissibility of these types of AI training practices.

And there is perhaps the ultimate reason why big companies like NVIDIA do what they do. It’s why Google and Pinterest (in respect of photographs and other images) get away with what they do in scraping the Internet.

It is very hard for copyright holders to challenge them because they are so much smaller in terms of business operations and therefore don’t have the financial clout to challenge them in court. NVIDIA is an enormously wealthy company with an army of lawyers. They can do what they do and be untouchable. They can be above the law. And there will be an international context, which makes it even more complicated.

It is the complexity and uncertainty of outcome which presents an insurmountable barrier to a copyright holder challenging big businesses such as NVIDIA. It is about dominance and subservience. Dog-eat-dog. This is a complex issue, and there are valid concerns that some large tech companies may be pushing the boundaries or exploiting ambiguities in copyright law to their advantage.

Here are some more reasons why NVIDIA and other AI operators will get away with what they are doing:

  1. Legal Resources: Mega-corporations like NVIDIA can afford teams of highly skilled intellectual property lawyers who can mount aggressive defenses and exploit legal ambiguities. Individual creators or smaller organizations often lack the resources to match this legal firepower.
  2. Procedural Hurdles: Navigating the legal system to bring a successful copyright infringement case is extremely challenging. There are complex procedural requirements, evidentiary burdens, and lengthy court processes that favor well-resourced defendants.
  3. Unclear Precedents: As you noted, the application of copyright law to emerging technologies like AI training is not well-settled. Lack of clear judicial precedents makes it harder to win cases against companies pushing the boundaries.
  4. Deterrent Effect: The risk of facing a protracted legal battle with a deep-pocketed defendant can deter many copyright holders from even attempting to sue, given the uncertain outcomes and high costs involved.
  5. Public Relations Challenges: Large tech firms also have significant public relations and lobbying resources to shape narratives around innovation, the public good, and “fair use” – making it harder for individual creators to garner public sympathy for their cases.

It’s ironic that when an individual person like myself uses a photograph on the Internet which I have deemed to be in the public domain, I am challenged by a scammy company like Copytrack who I successfully defeated by the way. But it isn’t individuals like myself who are badly behaved from time to time. It’s massive businesses who ride roughshod over the law because they are able to in the wild west of the Internet.

Leave a Comment

follow it link and logo