OpenAI Says It’s “Over” If It Can’t Steal All Your Copyrighted Work

DeadNinja · 3 days ago

OpenAI Says It’s “Over” If It Can’t Steal All Your Copyrighted Work

@NewOldGuard@lemmy.ml · edit-2 3 hours ago

Oh no not the plagiarism machine however would we recover???

Please fail and die openai thx

Also copyright is bullshit and IP shouldn’t exist especially for corporate entities. Free sharing of human knowledge and creativity should be a right. Machine plagiarism to create uninspired mimicries isn’t a necessary part of that process and should be regulated heavily

@ef9357@lemmy.world · 2 days ago

Good, go away.

@Rekorse@sh.itjust.works · 2 days ago

Getting really tired of these fucking CEOs calling their failing businesses “threats to national security” so big daddy government will come and float them again. Doubly ironic its coming from a company whos actually destroying the fucking planet while it achieves fuck-all.

@0x0@programming.dev · 2 days ago

Oh no…
Anyway…

just some guy · 2 days ago

@fartsparkles@lemmy.world · 3 days ago

If this passes, piracy websites can rebrand as AI training material websites and we can all run a crappy model locally to train on pirated material.

@underisk@lemmy.ml · 3 days ago

That would work if you were rich and friends with government officials. I don’t like your chances otherwise.

@moreeni@lemm.ee · 3 days ago

Another win for piracy community

@Knock_Knock_Lemmy_In@lemmy.world · 3 days ago

Fuck it. I’m training my home AI will the world’s TV, Movies and Books.

@mindaika@lemmy.dbzer0.com · 2 days ago

Seems like just yesterday Metallica was suing people for enjoying copyrighted materials

@Creosm@lemmy.world · 3 days ago

Oh it’s “over”? Fine for me

@VeryInterestingTable@lemm.ee · 3 days ago

Ho no, what will we do without degenerate generative AIs?!

@nothacking@discuss.tchncs.de · 3 days ago

But when China steals all their (arguably not copywrite-able) work…

@turnip@sh.itjust.works · 3 days ago

Sam Altman hasn’t complained surprisingly, he just said there’s competition and it will be harder for OpenAI to compete with open source. I think their small lead is essentially gone, and their plan is now to suckle Microsoft’s teet.

@HiddenLayer555@lemmy.ml · 3 days ago

it will be harder for OpenAI to compete with open source

Can we revoke the word open from their name? Please?

Phoenixz · 3 days ago

This is a tough one

Open-ai is full of shit and should die but then again, so should copyright law as it currently is

@meathappening@lemmy.ml · edit-2 3 days ago

That’s fair, but OpenAI isn’t fighting to reform copyright law for everyone. OpenAI wants you to be subject to the same restrictions you currently face, and them to be exempt. This isn’t really an “enemy of my enemy” situation.

@Melvin_Ferd@lemmy.world · edit-2 2 days ago

Is anyone trying to make stronger copyright laws? Wouldn’t be rich people that control media would it?

PropaGandalf · 3 days ago

yes, screw them both. let altman scrape all the copyright material and choke on it

@masterspace@lemmy.ca · 3 days ago

Piracy is not theft.

@B1naryB0t@lemmy.dbzer0.com · 3 days ago

When a corporation does it to get a competitive edge, it is.

@masterspace@lemmy.ca · edit-2 3 days ago

No it’s not.

It can be problematic behaviour, you can make it illegal if you want, but at a fundamental level, making a copy of something is not the same thing as stealing something.

@pyre@lemmy.world · edit-2 3 days ago

it uses the result of your labor without compensation. it’s not theft of the copyrighted material. it’s theft of the payment.

it’s different from piracy in that piracy doesn’t equate to lost sales. someone who pirates a song or game probably does so because they wouldn’t buy it otherwise. either they can’t afford or they don’t find it worth doing so. so if they couldn’t pirate it, they still wouldn’t buy it.

but this is a company using labor without paying you, something that they otherwise definitely have to do. he literally says it would be over if they couldn’t get this data. they just don’t want to pay for it.

@masterspace@lemmy.ca · edit-2 3 days ago

That information is published freely online.

Do companies have to avoid hiring people who read and were influenced by copyrighted material?

I can regurgitate copyrighted works as well, and when someone hires me, places like Stackoverflow get fewer views to the pages that I’ve already read and trained on.

Are companies committing theft by letting me read the internet to develop my intelligence? Are they committing theft when they hire me so they don’t have to do as much research themselves? Are they committing theft when they hire thousands of engineers who have read and trained on copyrighted material to build up internal knowledge bases?

What’s actually happening, is that the debates around AI are exposing a deeply and fundamentally flawed copyright system. It should not be based on scarcity and restriction but rewarding use. Information has always been able to flow freely, the mistake was linking payment to restricting it’s movement.

@pyre@lemmy.world · 3 days ago

it’s ok if you don’t know how copyright works. also maybe look into plagiarism. there’s a difference between relaying information you’ve learned and stealing work.

@Grimy@lemmy.world · 3 days ago

Training on publicly available material is currently legal. It is how your search engine was built and it is considered fair use mostly due to its transformative nature. Google went to court about it and won.

@pyre@lemmy.world · 3 days ago

can you point to the trial they won? I only know about a case that was dismissed.

because what we’ve seen from ai so far is hardly transformative.

Pennomi · 3 days ago

It’s only theft if they support laws preventing their competitors from doing it too. Which is kind of what OpenAI did, and now they’re walking that idea back because they’re losing again.

@kibiz0r@midwest.social · 3 days ago

What OpenAI is doing is not piracy.

@Grimy@lemmy.world · 3 days ago

Whatever it is, it isn’t theft

@kibiz0r@midwest.social · 3 days ago

Also true. It’s scraping.

In the words of Cory Doctorow:

Web-scraping is good, actually.

Scraping against the wishes of the scraped is good, actually.

Scraping when the scrapee suffers as a result of your scraping is good, actually.

Scraping to train machine-learning models is good, actually.

Scraping to violate the public’s privacy is bad, actually.

Scraping to alienate creative workers’ labor is bad, actually.

We absolutely can have the benefits of scraping without letting AI companies destroy our jobs and our privacy. We just have to stop letting them define the debate.

Grumuk · 3 days ago

Molly White also wrote about this in the context of open access on the web and people being concerned about how their works are being used.

“Wait, not like that”: Free and open access in the age of generative AI

The same thing happened again with the explosion of generative AI companies training models on CC-licensed works, and some were disappointed to see the group take the stance that, not only do CC licenses not prohibit AI training wholesale, AI training should be considered non-infringing by default from a copyright perspective.

@Grimy@lemmy.world · 3 days ago

Creators who are justifiably furious over the way their bosses want to use AI are allowing themselves to be tricked by this argument. They’ve been duped into taking up arms against scraping and training, rather than unfair labor practices.

That’s a great article. Isn’t this kind of exactly what is going on here? Wouldn’t bolstering copyright laws make training unaffordable for everyone except a handful of companies. Then these companies, because of their monopoly, could easily make the highest level models only affordable by the owner class.

People are mad at AI because it will be used to exploit them instead of the ones who exploit them every chance they get. Even worse, the legislation they shout for will make that exploitation even easier.

@FauxLiving@lemmy.world · 3 days ago

Our privacy was long gone well before AI companies were even founded, if people cared about their privacy then none of the largest tech companies would exist because they all spy on you wholesale.

The ship has sailed on generating digital assets. This isn’t a technology that can be invented. Digital artists will have to adapt.

Technology often disrupts jobs, you can’t fix that by fighting the technology. It’s already invented. You fight the disruption by ensuring that your country takes care of people who lose their jobs by providing them with support and resources to adapt to the new job landscape.

For example, we didn’t stop electronic computers to save the job of Computer (a large field of highly trained humans who did calculations) and CAD destroyed the drafting profession. Digital artists are not the first to experience this and they won’t be the last.

@masterspace@lemmy.ca · 3 days ago

Our privacy was long gone well before AI companies were even founded, if people cared about their privacy then none of the largest tech companies would exist because they all spy on you wholesale.

In the US. The EU has proven that you can have perfectly functional privacy laws.

If your reasoning is based o the US not regulating their companies and so that makes it impossible to regulate them, then your reasoning is bad.

@FauxLiving@lemmy.world · edit-2 3 days ago

My reasoning is based upon observing the current Internet from the perspective of working in cyber security and dealing with privacy issues for global clients.

The GDPR is a step in the right direction, but it doesn’t guarantee your digital privacy. It’s more of a framework to regulate the trading and collecting of your personal data, not to prevent it.

No matter who or where you are, your data is collected and collated into profiles which are traded between data brokers. Anonymized data is a myth, it’s easily deanonymized by data brokers and data retention limits do essentially nothing.

AI didn’t steal your privacy. Advertisers and other data consuming entities have structured the entire digital and consumer electronics ecosystem to spy on you decades before transformers or even deep networks were ever used.

@zarathustra0@lemmy.world · 3 days ago

Piracy is only theft if AI can’t be made profitable.

@_lilith@lemmy.world · 3 days ago

Yeah but I don’t sell ripped dvds and copies of other peoples art.

@Knock_Knock_Lemmy_In@lemmy.world · 3 days ago

What if I run a filter over it. Transformative works are fine.

@SplashJackson@lemmy.ca · 3 days ago

OpenAI can open their asses and go fuck themselves!

@schnurrito@discuss.tchncs.de · 3 days ago

If It Can’t Steal All Your Copyrighted Work

https://commons.wikimedia.org/wiki/File:Copying_Is_Not_Theft.webm

@Niquarl@lemmy.ml · 3 days ago

Of course it is if you copy to monetise which is what they do.

@droplet6585@lemmy.ml · 3 days ago

They monetize it, erase authorship and bastardize the work.

Like if copyright was to protect against anything, it would be this.

@Zink@programming.dev · 3 days ago

What I’m hearing between the lines here is the origin of a legal “argument.”

If a person’s mind is allowed to read copyrighted works, remember them, be inspired by them, and describe them to others, then surely a different type of “person’s” different type of “mind” must be allowed to do the same thing!

After all, corporations are people, right? Especially any worth trillions of dollars! They are more worthy as people than meatbags worth mere billions!

@chicken@lemmy.dbzer0.com · edit-2 3 days ago

I don’t think it’s actually such a bad argument because to reject it you basically have to say that style should fall under copyright protections, at least conditionally, which is absurd and has obvious dystopian implications. This isn’t what copyright was meant for. People want AI banned or inhibited for separate reasons and hope the copyright argument is a path to that, but even if successful wouldn’t actually change much except to make the other large corporations that own most copyright stakeholders of AI systems. That’s not really a better circumstance.

@tacobellhop@midwest.social · edit-2 3 days ago

Actually I would just make the guard rails such that I’d the input can’t be copyrighted then the ai output can’t be copyrighted either. Making anything it touches public domain would reel in the corporations enthusiasm for its replacing humans.

@chicken@lemmy.dbzer0.com · edit-2 3 days ago

I think they would still try to go for it but yeah that option sounds good to me tbh

@ArtificialHoldings@lemmy.world · edit-2 3 days ago

This has been the legal basis of all AI training sets since they began collecting datasets. The US copyright office heard these arguments in 2023: https://www.copyright.gov/ai/listening-sessions.html

MR. LEVEY: Hi there. I’m Curt Levey, President of the Committee for Justice. We’re a nonprofit that focuses on a variety of legal and policy issues, including intellectual property, AI, tech policy. There certainly are a number of very interesting questions about AI and copyright. I’d like to focus on one of them, which is the intersection of AI and copyright infringement, which some of the other panelists have already alluded to.

That issue is at the forefront given recent high-profile lawsuits claiming that generative AI, such as DALL-E 2 or Stable Diffusion, are infringing by training their AI models on a set of copyrighted images, such as those owned by Getty Images, one of the plaintiffs in these suits. And I must admit there’s some tension in what I think about the issue at the heart of these lawsuits. I and the Committee for Justice favor strong protection for creatives because that’s the best way to encourage creativity and innovation.

But, at the same time, I was an AI scientist long ago in the 1990s before I was an attorney, and I have a lot of experience in how AI, that is, the neural networks at the heart of AI, learn from very large numbers of examples, and at a deep level, it’s analogous to how human creators learn from a lifetime of examples. And we don’t call that infringement when a human does it, so it’s hard for me to conclude that it’s infringement when done by AI.

Now some might say, why should we analogize to humans? And I would say, for one, we should be intellectually consistent about how we analyze copyright. And number two, I think it’s better to borrow from precedents we know that assumed human authorship than to invent the wheel over again for AI. And, look, neither human nor machine learning depends on retaining specific examples that they learn from.

So the lawsuits that I’m alluding to argue that infringement springs from temporary copies made during learning. And I think my number one takeaway would be, like it or not, a distinction between man and machine based on temporary storage will ultimately fail maybe not now but in the near future. Not only are there relatively weak legal arguments in terms of temporary copies, the precedent on that, more importantly, temporary storage of training examples is the easiest way to train an AI model, but it’s not fundamentally required and it’s not fundamentally different from what humans do, and I’ll get into that more later if time permits.

The “temporary storage” idea is pretty central for visual models like Midjourney or DALL-E, whose training sets are full of copyrighted works lol. There is a legal basis for temporary storage too:

The “Ephemeral Copy” Exception (17 U.S.C. § 112 & § 117)

U.S. copyright law recognizes temporary, incidental, and transitory copies as necessary for technological processes.
Section 117 allows temporary copies for software operation.
Section 112 permits temporary copies for broadcasting and streaming.

@ArtificialHoldings@lemmy.world · 3 days ago

BTW, if anyone was interested - many visual models use the same training set, collected by a German non-profit: https://laion.ai/

It’s “technically not copyright infringement” because the set is just a link to an image, paired with a text description of each image. Because they’re just pointing to the image, they don’t really have to respect any copyright.

@tacobellhop@midwest.social · 3 days ago

Based on this, can I use chat gpt to recreate a Coca Cola recipe

@ArtificialHoldings@lemmy.world · edit-2 3 days ago

Copyright law doesn’t cover recipes - it’s just a “trade secret”. But the approximate recipe for coca cola is well known and can be googled.

@RandomVideos@programming.dev · 3 days ago

I feel like it would be ok if AI generated images/text would be clearly marked(but i dont think its possible in the case of text)

Who would support something made stealing the hard work of other people if they could tell instantly

@Dengalicious@lemmygrad.ml · 3 days ago

Stealing means the initial item is no longer there

@RandomVideos@programming.dev · 3 days ago

If someone is profiting off someone elses work, i would argue its stealing