We are approaching the "UBI or Guillotine" fork simply because rules and regulations work selectively. Just like with the "If we pay for copyright or business becomes impossible" defense, this is yet another wast unfairness against those who had to transfer their resources to learn a skill. Awful lot of people had hard life or got into debt for things that big tech is immune from.
Or maybe we will come into the conclusion that all this works only if there's no such thing as IP, reset the playing field for everyone and if anyone wants to make money will have to actually work for it every single time. IIRC that's what's happening in China and its how they surpassed US in innovation.
Technically, that's a deregulation - just not the kind of deregulation the big tech is pushing for. Maybe the next time there's a graph showing how regulations made EU lag behind, add the graph of China too to spice things up.
With so many technical people out of work and promises of make the employed ones obsolete too, it can be a good idea to let people build thing instead of unfairly concentrating even more power onto kleptocratic entities.
> We are approaching the "UBI or Guillotine" fork
Even in the 18th century, the French aristocracy mostly cruised through the Revolution from afar, surviving with fortunes largely intact to this day [1]. If the fork is UBI or guillotine, the selfish move by the private-jetting billionaire class—personally and financially more mobile and global than the French aristocracy ever was—is the latter.
> if there's no such thing as IP, reset the playing field for everyone
Your thesis is letting Altman, Zuckerberg and Musk have free rein would decrease inequality?
> IIRC that's what's happening in China
Not really [2].
[1] https://www.bbc.com/news/magazine-37655777
[2] https://www.chinaiplawupdate.com/2023/08/china-prosecutes-11...
Extremely misleading citation.
> Criminal trademark infringement made up the majority of IP crimes with 10,384 people prosecuted accounting for 88.9% of the total.
Trademark infringement is of a completely different character from copyright.
Trademark infringement is pure fraud and lying.
Take out trademark infringement, and you have only 1 prosecution per year per 700,000 people.
> Take out trademark infringement, and you have only 1 prosecution per year per 700,000 people
What is it in America? Did we even have a single criminal non-trademark IP prosecution in 2024?
The other way to look at it though is that revolution won't solve your problems, and Americans are far too confident that it will.
> other way to look at it though is that revolution won't solve your problems, and Americans are far too confident that it will
Americans are largely not for a revolution because most of us aren’t idiots. There is idle chatter of a civil war, but that’s again (a) bluster (not that this can’t take on a life of its own) and (b) about consolidating control versus wholesale rebuilding the American class structure.
FWIW there is a difference between revolution and civil war. I see a decent number of people advocate for the first but basically no one advocate for the second. In either case the numbers aren’t a majority.
> see a decent number of people advocate for the first but basically no one advocate for the second
Anyone advocating for the first (as a popular revolt) thinking it wouldn’t result in the second isn’t thinking realistically.
> Anyone advocating for the first (as a popular revolt) thinking it wouldn’t result in the second isn’t thinking realistically.
There are plenty of examples in modern Europe where revolutions and regime changes didn't involve a civil war.
> plenty of examples in modern Europe where revolutions and regime changes didn't involve a civil war
Where internal power structures were preserved (or where the society was restructured under occupation), yes.
> Where internal power structures were preserved (or where the society was restructured under occupation), yes.
No. See for example Spain's or Portugal's transition from autocracy to democracy. The latter involved a military coup and exile of it's former dictator.
MAGA leadership advocates for the second.
I’m no advocate for revolution but the American problem is that our revolution actually worked. Americans freed themselves from a prior group of elites unlike the grandparent comment is claiming of the French elites.
> Americans freed themselves from a prior group of elites unlike the grandparent comment is claiming of the French elites
The American Revolution was one of American elites overthrowing their overseers. It worked and was not super disruptive because power (and class) structures were preserved. From the states through to the system of law and the people in power. (We also didn’t do any mass or political executions.)
unlike then, today global mobility is within the means of most the western world. A French Revolution today could very well extend globally to identify and re patriot.
> French Revolution today could very well extend globally to identify and re patriot
We have zero historical or contemporary precedent for this, and strong incentives for everyone else in the world to not play along. (As they did in sheltering the French aristocracy.)
In a hypothetical American revolution, foreign powers would be looking for their slice of the pie. To think through this dispassionately, imagine civil war breaking out in Russia or China. A second American revolution à la the first would put today’s billionaires and political elite in a room to draft a new constitution to their liking.
Isn't UBI just going to raise inflation? People who don't need it will claim it and use the existing tax loopholes. Tax laws will need to be rewritte.
The "U" in UBI is for "Universal". There is no means-testing. Everyone gets it regardless of assets or income, which means there is no need to spend any effort on checking whether someone is "poor enough".
Though the state would have to make sure the person receiving the benefit actually exists, is still alive, etc.
I understand what UBI means but it's the effect is what I think people do not understand. Based on the Cantillon effect, UBI will just accelerate the separation between the rich and the poor.
I'm not sure that Cantillon effect is majorly at play.
The very nature of Cantillon is unequally obtained new money, whereas UBI is universal. Any effect it has would be related to the poorest/neediest spenders now purchasing the sort of goods they do (and, realistically, no increase in spending by the richest). You might see increased consumption in neighborhoods/regions with high concentrations of poor, too.
The better fit for "UBI creates an economic problem" seems to be pricing stickiness. The above commenter focused on controlling general inflation through monetary/fiscal policy (keeping money supply stable, using tax mechanisms), but didn't actually address the concern about producers simply raising prices to absorb the UBI.
No, you can do a UBI that keeps the money supply the same, and use it as a way to stabilize the economy. With a $2000/mo UBI, 50% flat tax on other income, 25% VAT, phase it in by doing 10% of that the first year (and 90% of your current taxes, 90% of current support payments), second year 20% and 80%, so the impact isn't too disruptive. Adjust the flat tax rate as the Federal budget changed (a spending bill is automatically a tax bill as well). Adjust the VAT to control inflation.
You've got to be kidding. As a regular middle class citizen my taxes are high enough already. There's no way I'll vote for UBI so that some slackers can sit around getting high and playing Xbox.
Based on your comments you are in US. Your taxes are very low among Western countries.
That slacker is already getting high and playing on Xbox. With UBI they will have less worries about staying alive and the opportunity to try things to get more money. UBI is a great insentive for people to try new things without there being a financial risk of you losing your income. Just check the trials and their results - people are more productive and happy in general.
And instead, you vote for billionaires chilling on their yachts, paid for by your labor?
> Isn't UBI just going to raise inflation?
Even assuming this scaremongering scenario, the world would be in a far better place if society assured everyone would be guaranteed a certain income.
Also, the scenario that supports the hypothesis of higher inflation is that more people in society are suddenly able to afford goods and services that were out of their reach without UBI. Can anyone actually put to words why that is undesirable?
I think one criticism is that prices would change to capture the UBI.I think I read the idea in "Progress and Poverty," although I've certainly seen it elsewhere since:
- If everyone suddenly has more money (say $2 more per day)
- And milk is a basic necessity
- The milk seller knows everyone needs milk and now has $2 more to spend
- They can gradually raise the price of milk by close to $2
- Consumers must still buy milk at the higher price
- The intended benefit of the extra $2 is effectively captured by the milk seller
The increases in general purchasing power can be absorbed by suppliers of essential goods. If you have just excess discretionary income in the general case, then non-essential goods can bump in price, too.
The milk seller doesn’t even need to consciously increase prices to match the raise in household income. It will happen organically.
For the sake of argument, imagine UBI provides everyone with a million dollars a year. That doesn’t make everyone a millionaire. It just makes everyone’s money less valuable.
It’s no different on a smaller scale.
It's gonna be complex and messy. On the one hand yes, many people receiving UBI = inflation. On the other hand many highly paid software devs (And soon after - accountants, lawyers, marketers, sales people etc etc) are losing their incomes = very deflationary.
It's gonna be interesting that's for sure.
UBI has less friction as far as implementation since we don't need qualify anyone. With AI, we can afford to have that extra step (nuance) and be able to make sure its a needs based approach. The future requires various combinations of changes. Fix the tax system and then UBI (in this specific order) OR !UBI (needs based distribution).
Implement UBI as part of fixing taxes. A UBI combined with a flat tax plus a national sales tax, and including universal healthcare, can continue to be a progressive tax while eliminating a lot of the overhead of keeping track of it all. Look at the effective tax rates with a 50% flat tax, 25% sales tax, and $2000 per month UBI with UHC.
If it's truly universal, no. Several experiments (controlled and natural), have shown this.
Has there been experiments/testing at city/state scale. UBI is country scale and it's way more complex than testing it on a small town of people who I assume are selected for their needs.
Indeed it would as the landlords would just raise rents accordingly.
We saw a bit of that with Covid cheques.
I think you'll find rising rents are more correlated with rising interest rates than Covid cheques, but given one of the key grievances perceived by UBI advocates is class inequality and lack of social mobility, if UBI became politically possible then so would rent controls and controls on prices of key essential commodities while waiting for it to "settle in".
Good point about the interest rates. However, in the UK landlords adjusted their rents accordingly when the Govt introduced Housing Benefits (years before interest rates began to rise). A lot of govt MPs are landlords.
I'm not against the idea of UBI, I just see the landlords eating it up like they do with peoples wages.
There isn't going to be a revolution. Americans are all talk no action.
The legal problem is in outputting IP, I still have yet to see a convincing argument that training on copyrighted data is a breach of IP laws.
The trained models are trillionths the size of their training sets. There is no archive of copied data in them.
>argument that training on copyrighted data is a breach of IP laws.
You pay for access to materials, not using or remembering the material in its original format.
Nearly every website does not charge me anything to retrieve information that is their intellectual property.
Is the comment above about every website or libgen?
Training on copyrighted works licensed for such use is inarguably conforming.
Acquiring and using works without such license is just piracy. Whatever your stand on piracy is, most individuals and businesses are not free to incorporate it into their projects. Normal people have faced significant penalties for piracy, and concientious business operators avoid it.
Sure would be disappointing to all those people if there were suddenly a ruling that said "well, but it's okay that these guys did it because they're filthy rich and went real hard with it"
Again, models are not archives of data.
Llama 3.1 70B is around 45GB is size, despite being trained on likely hundreds of petabytes of data. And before you say it, they are not fancy compression algo's either, the loss is so high they would be useless.
Your argument is essentially: “I have downloaded and watched this movie, but because I cannot recreate the images, there was no copyright infringement involved”.
I would say it's more, I checked out a book from the library, read it, and learned some things about writing style and storytelling that I'm now going to apply to my own original works.
Libraries obey copyright, loaning out books for which they've acquired some right to lend to members. When I borrow a library book and read it that way, everything that happens is respecting the rights of the copyright's owner.
That has nothing to do with how LLM's were trained. They were trained on countless works for which Meta, etc had acquired no legitimate right for use at all.
i dont know of a law that says you have to purchase a book to be legally allowed to read it
The legal owner of the book has to allow you to read it. And the legal owner can't make additional copies to allow you to read it.
If I find a book on a park bench and read it, am I breaking the law in terms of intellectual property?
If they're training LLMs on books found on park benches, we don't have a problem. That's obviously not what we're talking about though.
My point is "the legal owner of the book has to allow you to read it" is not true
I will accept the argument they got the source material in a way where someone broke American law. I really do not think they've broken any laws whatsoever in terms of using it for LLM training
> they got the source material in a way where someone broke American law
Isn't inducing or offering someone incentives to break laws illegal by itself? I'll admit that isn't specifically an IP law violation, but it can't possibly be kosher.
For example if a buyer of goods can reasonably be expected to know the goods were stolen, they can also be charged. Isn't this the same thing?
I would go a step further, even, and say it's akin to borrowing a book and formally registering every little detail about it but the actual text itself, with extreme breadth and precision: grammar, style, lexicon (potential morpheme combinations, basically), wider discourse structure, use of special characters and formatting, etc., and then discarding the book.
Yes but your library still legally obtained those copies in the first place.
Most, if not all, pirated books are copies of books that had been legally obtained, so this is not how they are distinguished from books borrowed from a library. The only thing that makes them pirated is that the price paid for the original book is considered to not have covered the right of also distributing copies of the book.
Nowadays the surviving public libraries might pay special prices for the right of lending books, but that was not true in the past, when they just bought the books from the market like anyone else, at the same price.
I am pretty sure that the public libraries that I frequented as a child, many decades ago, did not pay anything for a book above the price that I would have paid myself, but nonetheless at that time nobody would have thought that they do not have the right to lend the books to whomever they pleased.
- [deleted]
What our society has to decide is whether these use cases are beneficial or detrimental to society at large, and adjust IP laws accordingly.
Whether LLMs are archives of data, a compression method, or whatever else is just an unimportant technical implementation detail.
Was this replied to the wrong comment? I'm not sure what it has to do with what I wrote.
But here's another way to think about what I'm saying, in case you missed it:
Personally, I'd love to download a complete archive of JSTOR. I'd train myself, and maybe even I could even use it as input into some product I mean to launch soon. JSTOR doesn't offer a license for that, at least not to me, but I'm sure I can scrape their site or find an archive elsewhere and make it happen anyway.
Do you think I should do that? What do you think might happen if I tried?
- [deleted]
How can it possibly be the case that it's ok for meta to download and ingest the entire contents of libgen but it is not ok for an individual human to selectively download a single work and read it?
Whatever legal contortions used to justify this are, quite frankly, bullshit. This isn't how anything should work even if these companies can buy themselves a regulatory regime where it does.
The idea that abolishing IP protections and letting AI companies run rampant is an offramp for wealth inequality is such a wild take to me?
Realistically billionaires are using racist and homophobic populism as a way to direct working class energy away from wealth inequality. Making people think "woke" is the reason why the earth is on fire and they can't have health insurance.
Ah yes, because the working class is primarily concerned with protecting their intellectual property…
the working class is paywalled out of education because of IP laws that can seemingly be ignored by the AI companies
I think OP is coming from the "temporarily embarassed billionaire" perspective where if only we had a libertarian hellscape without pesky laws they would be a funeral baron who runs Bartertown.
How can you get the definition of fairness so backwards? Giant corporations provide literally everything you take for granted and they should be punished because you are envious? I don't get it.
There is a reason everyone with over 130 IQ wants to work for them rather than starting their own companies.
They shouldn’t be punished because people are envious, they should be punished because they’re not respecting other people's intellectual property without an agreement in place.
We can’t protect IPs only when that benefits big corps. We should protect them always or accept that the world is better if we go in another direction, changing the rules for everybody.
Training on copyrighted data should be legally allowed
- of course exact reproduction of protected content is a no-no
- but learning is ok, as long as it is transformative. User prompts and responses are pushing the model outside its training distribution anyway - users add their own intent, making usage transformative
- when LLMs synthesize from multiple sources, the result is transformative
- if you try to protect expression it is meaningless now, but if you protect abstract ideas it kneecaps creativity
- the problems of copyright started with the apparition of internet, not with AI
- revenues from royalty are almost zero today, as each new content competes against an unbounded list of other works that have been accumulating for decades online
- because royalties are shit, creatives now focus on ads, and this leads to enshittification, attention grabbing junk everywhere, attention is scarce content is post-scarcity
- we actually like interactive participation more than passive consumption; we now edit Wikipedia, contribute to open source, have papers published for free on arXiv, use social networks where our comments are shared with the world, play games instead of reading books - it is another age, the interactive age
- AI is actually more than an infringement tool, it is useful for many legit purposes
- and AI is the worst possible infringement tool, it can hallucinate details, get thins wrong; By comparison copying is free and easy and precise to the letter
So the idea that training is infringement is pretty abusive, it tries to make copyright be about abstractions which is wrong. We can't return to 1990s, so we have to live with its demise. It's been dying for 3 decades already.
LLMs are allowed to "learn" from all this content because humans are allowed to. Most humans have to access the content legally to learn. But training LLMs it's basically "Copyright lol, yolo".
Is there a reason a human can't torrent movies and say "But I'm just learning from them"?
How do writers eat when the market value of their writing is zero?
It's been reduced to zero for 3 decades. When you publish your work, there are a million other works competing for attention. That is the real issue. When you search for an image, you get thousands of images instantly, faster than diffusion models. Content doesn't matter anymore, attention matters, curation matters too.
Even if you forbid AI from training on copyrighted works, people are going to comment about them online, and the model will pick up the ideas. There is no way to protect ideas from spreading and reaching AIs.
Also AI models trained by Chinese are not going to stop using copyrighted material.
How do chimney sweeps eat when everyone has a gas or electric furnace?
People who are smart typically have better things to do than talk about their IQs. Or sell ads, for that matter.
How can you get the definition of fairness so backwards? The King provides literally everything you take for granted and he should be punished because you are envious? I don't get it.
There's a reason why every vassal with a sizeable estate wants to be in the King's court rather than starting their own country.
- [deleted]
Alluded multiple times in the comments already but worth being explicit: Aaron Swartz killed himself 12 years ago yesterday for facing "a cumulative maximum penalty of $1 million in fines, 35 years in prison" [0] after downloading academic journal articles, which would be only a small percentage of what's available on LibGen.
Free for me, not for thee.
> Free for me, not for thee
Swartz was charged with 35 to 50 years, realistically faced up to 10, and was offered 6 months if he plead guilty [1]. That offer moreover wasn’t the final offer.
Put another way, it’s not clear that the law is being applied to Zuckerberg differently than it was to Swartz given the law wasn’t actually ever applied to Swartz. (Or that they wouldn’t gladly trade this lawbreaking for $1mm in fines and a negotiation over penalties where the prosecution opens with 6 months jail.)
The prosecutor acted inappropriately in that case; MIT, more wildly so. That doesn’t, however, carry over to a transgression of the law given we never got to that stage.
[1] https://www.forbes.com/sites/forbesdev/2023/02/28/increase-w...?
> it’s not clear that the law is being applied to Zuckerberg differently than it was to Swartz given the law wasn’t actually ever applied to Swartz
Has Zuckerberg actually been charged with something with equivalent potential consequences?
If not, then your statement is false on its face.
> Has Zuckerberg actually been charged with something with equivalent potential consequences?
I didn’t say Zuckerberg has been subjected to what Swartz was. Swartz never wielded the nation-state level power of a billionaire—it’s difficult to imagine how he could be subjected to similar psychological stress.
I said the law isn’t being applied to Zuckerberg (or anyone who has downloaded LibGen, for that matter) differently because the law was never applied to Swartz. Given the unpopular Swartz prosecution ended Ortiz’s career, and the lack of recent criminal copyright cases, it’s unlikely anyone would attempt to apply it as they did then. To anyone, including Zuckerberg.
TL; DR If you dislike what Zuckerberg is doing, you’re probably advocating for a clarification of the law. If you like it, erm, nothing much to do here.
> the law was never applied to Swartz
Merely being charged with or investigated for a crime is absolutely an application of the law.
LibGen is the most generic name ever, had to look it up. Turns out that LibGen is a collection of pirated books.
Shadow libraries are a heavily-discussed, recurring topic on HN,
https://hn.algolia.com/?query=libgen&type=all ("LibGen")
https://hn.algolia.com/?query=anna's%20archive&type=all ("Anna's Archive")
https://hn.algolia.com/?query=z%20library&type=all ("Z-Library")
It's not just a collection, it's the collection. It contains almost every scientific book ever printed, for one thing.
Frankly, it's a massive boon to researchers. It's like a top-tier research university library at your fingertips, and usually more convenient than the real thing.
Also free. That helps.
But the sad state of the affairs is that if Aaron Swartz does it, he ends up dead; if Meta does it, everything is fine.
A lot of people would gladly pay. I'm a paying subscriber to Anna's Archive, which vastly improves the experience of that site. (It's borderline unusable without a subscription.)
Thing is, the Elsevier/Springer model makes it incredibly difficult to pay them. With single papers or book chapters in the $30-40 range, an afternoon's research can easily cost $600. (Note that the authors and reviewers don't get royalties on this, and the Editor-in-Chief of any given journal usually only makes a small stipend!)
There are services like DeepDyve, but they're intentionally gimped and difficult to use, because their user interface is 100% built around preventing you from downloading or screenshotting the papers you "rent"!
If the publishers set up a $100/month all-open-access program, and if the experience were at least halfway decent, I'd bet that a lot of people sign up. And that's not cheap!
Funny that the world where almost all human knowledge and art is free and accessible for everyone exists in parallel to one where articles about which McDonalds meal are you are paywalled, and funny which world civilized nations have chosen in order to protect The Suite Life of Zack & Cody and all the artists whose livelihoods depend on reruns of iCarly.
A lot less generic than X
I would argue that it's right call: 1) it's in the world's best interest. I am running llama locally on laptop, and the ability to have the distilled world's knowledge at your fingertips will generate much much more value than what it takes. 2) it does not 'take' any value from the book creators. No one's going to 'not buy a book' because an LLM has been trained with its content (in contrast you might argue that you are likely to not buy a book because you downloaded it from libgen).
Copyright laws are not millennia-old ethical laws that everyone agrees on (like don't steal), they are a modern human construct that were created for the greater good (incentivize creation), and we should revisit them with new tech.
"1) it's in the world's best interest."
How is pleasing Meta's shareholders in world's best interest.
How is using llama for free locally pleasing shareholders?
> incentivize creation
Humans do that naturally (see: children)
The copyright laws are to protect profit.
wat? facebook is going to 'not buy a book' for each book it's gone through. world's best interest that one of the wealthiest companies in the world don't pay their dues? world's best interest? when we know nothing about the societal and political effects llms will have in the hands of such people?
what are you rationalising about?
There are three positions around the usage of of shadow libraries.
1- Should we develop this argument into more discussion as society and humans around the knowledge publication and the publication industry greed and the rent-seeking business model.
2- Big Corporation shouldn't just ignore the copyright law while maintaining the strongest copyright protections and going after small folks.
3- The usual argument about how LLMs training is different from people actually using pirated textbook because it is expensive (college and learning is hard and expensive specially in places like Africa).
These are different angles and I think we can try to address all of them as they are not exclusive. There are good arguments around point 3 on two sides. I don't think there is a good argument why we should allow the status quo regarding the first point though. For two, it is more complicated to even discuss specially on HN.
We can rewrite copyright laws.
It is very difficult for me to believe that Meta's recent political relations moves are not related to the open cases where Meta is the defendant.
I don't understand your comment. This is about a lawsuit which shows that Zuckerberg OK'd the downloading and use of LibGen data. The case exists at least since mid-2023 and was in discovery phase until 13. Dec 2024. Shortly before the deadline Meta provided this new information, because they had to.
I guess the parent is saying that the new administration could be more business friendly in prosecuting this type of cases. It might even drop this case altogether. But only if Meta is "friendly" to the administration too.
They're saying that Meta has been kowtowing to the incoming administration in hopes of getting in their good graces.
Rather famously, some elements of that administration are above the criminal code, so that's not implausible.
PP is referring to Facebook/Meta's new policy changes like banning intelligence/sanity-based insults on the Platform, but carving out an exception specifically and explicitly for transgender people as targets, and removing tampons from men's bathrooms.
He's saying that he wants to pay Trump to win these lawsuits, which is a smart move as we know justice is for sale.
The pivot would potentially help his cause though , would it not ?
The antitrust one is most relevant as the new party in power would be gleeful to see it broken up but otherwise disagrees with the concept of antitrust
"For my friends, everything; for my enemies, the law."
Nah it’s just that Zuck watched the Barbie movie and realized Soace Karen was getting entirely too much limelight and declared a Year of Masculinity
Calling someone Karen is a misogynist slur, and calling a man by a woman's name without consent is doubly a misogynist slur.
Ok Karen
PDF: https://ia902305.us.archive.org/34/items/gov.uscourts.cand.4...
Text: https://www.courtlistener.com/docket/67569326/373/kadrey-v-m...
"Meta's request is preposterous. With one possible exception, there is not a single thing in those briefs that should be sealed."
"It is clear that Meta's sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage. Rather, it is designed to avoid negative publicity."
"If Meta again submits an unreasonably broad sealing request, all materials will simply be unsealed."
"One final comment. Between this sealing request and assertions in Meta's opposition brief such as "[t]hat document expressly discusses torrents and seeding", Opp. at 7, the Court is becoming concerned that Meta and its counsel are starting to travel down a familiar road. See In re Facebook, Inc. Consumer Privacy User Profile Litigation, 655 F. Supp. 3d 899 (N.D. Cal. 2023)."
I guess the zuck would download a car...
Will be interesting to see where this lands, because all outcomes seem to have significant secondary effects.
I would download a car
https://cults3d.com/en/collections/best-stl-files-cars-3d-pr...
You're welcome! :P
In hindsight (considering how LLMs are trained etc.) it makes total sense, but "Big Tech vs. Big Copyright" is something I didn't have on my 2020s bingo card.
I wonder who will come out on top, and whether there will be any incidental improvements for consumers, but unfortunately I can imagine an "AI training exemption" all too well.
That's not surprising to me at all. Even in the 2000s there was a famous lawsuit about Google Books scanning books without approval and the proposed settlement was essentially allowing Google to sell scanned ebooks while giving copyright holders a cut[0]. At that time Google truly felt like don't-be-evil corporation, and lawyers for the copyright holders wanted to give Google all this data as long as Google pays the copyright holders. In the 2020s however I cannot imagine any Big Tech company to have that don't-be-evil spirit and I also cannot imagine them voluntarily paying anything to copyright holders.
[0]: https://www.newyorker.com/business/currency/what-ever-happen...
> "Big Tech vs. Big Copyright"
Indeed. But when do those intersect or diverge?
I don't blame him. What would you do? If I had a near perfect data training set of all the most useful books and a hungry AI to train, it would be the logical step.
The reason this is news is because of the stinking hypocrisy of it all. It's really the same topic as the Swartz-Altman discussion here [0], in that these giant companies want to have it both ways.
Where is Zuckerberg's shout-out for Alexandra Elbakyan? [1] Or for Brewster Kahle? Or any of the wast army of people who preserve and curate the vital culture of humanity by protecting it from intellectual property dungeons?
The colossal hypocrisy is that a company like Meta wishes to live under the protective umbrella of "Intellectual Property". It wants to stop me just stealing it's stuff and setting up a better Facebook
Were it exposed to the same rules it wishes to live by, it would be torn apart by vibrant and deserving competition within days.
All the Zuckerberg, Meta or OpenAI are doing is setting the ground for the abolition of intellectual property. They are literally the proverbial people who will buy the rope with which to hang themselves.
(Edit. that doesn't make sense insert <proverb about buying ropes that actually makes sense>)
I don't view Big Tech as being against copyright. They simply hold a position that they will not pay for something unless forced to ("make me" - a very common position for the powerful to hold).
In fact, I'd argue that Big Tech is pro copyright, because once they force the copyright holder to negotiate, the cost is irrelevant to them and they build a moat around that access.
For example, Google stole Reddit content for Gemini until Reddit was forced to the table, and now Google has a seemingly exclusive agreement around Reddit data for AI purposes.
> I don't view Big Tech as being against copyright. They simply hold a position that they will not pay for something unless forced to
Yep, the contradiction between them feeling entitled to use anything they want for training, while simultaneously having license terms which forbid using the output of their models to train other models is pretty glaring. Information wants to flow freely but only in one direction apparently.
> having licenses which forbid using the output of their models to train other models
I haven't been following it closely, but aren't there already court rulings saying that generative AI output by itself is not copyrightable?
- [deleted]
There's not much caselaw as to whether those terms are actually enforceable yet, but it at least indicates what they want to happen.
Yup. For Big Tech, the ideal outcome of these cases isn't that copyright is widely or deeply undermined as they rely heavily on it themselves (let alone how their customers and investors benefit from it).
Their ideal outcome is that there's some narrow carveout that gives them permission to ignore copyright where they want to, while extending similar permission to as few/irrelevant others as possible.
> I'd argue that Big Tech is pro copyright
I agree but for a different reason -- cost is actually relevant, in the sense that only the biggest player can afford to pay for the copyrights. If you are a small player, however your tech stack is or how good your model is, if you can't afford it, you can't compete with Google.
In the past we called that tyranny, when a power thought it could act entirely without restraint.
Now I guess it’s defended as good business and good science by so many flunkies.
Knifes edge stuff. Tech people should all be reading the books, not Mark’s steamroller.
There goes the gravy train
Odds are that licensing gets streamlined into something like compulsory mechanical licensing and rates get negotiated into something that Big Tech and Big Media can both live with.
The whole conflict boils down to one party having piles of money and another party having something they want. That's not an intractable problem.
Big tech will win, because what they're doing is already basically legal, and they're worming their way up the new administration's ass.
Maybe training on copyrighted data should be allowed if the size of the training set is huge, as each individual example is justa drop in the ocean compared to the full training set.
If you train a model 20B parameters on 20T tokens, even with 1000 tokens per example, the model extracts about 1 byte of information per example. What is the value of 1 byte of copyright infringement?
By the same logic, pirating movies should be allowed as long as the person doing it watches enough of them for each individual one to be almost meaningless…?
If by "pirating" you mean distributing copies, probably not. But if you mean downloading copies, probably yes. Consider the case of the film student studying the entire ouevres of multiple directors.
Yes, if they watch a billion movies, it should be free to watch any copyrighted one.
The hilarious thing is that the same people that freely pirate music, videos, books and articles are on the side of huge copyright hoarders like Disney
I just wish the big corps would change the law to allow everyone to pirate freely, but instead they’re arguing for a carve out specially for training language models.
Yes. Every "AI" company is training their software on everything, regardless of what they claim, and making millions, billions of dollars on it.
YouTube was mostly a library of pirated content when Google bought it for $1.6B.
Spotify began by uploading an employee's pirated MP3s, and is now valued at $92B.
There are plenty of other examples. One of the ways to success is to ignore silly legal matters, build a product people want, and worry about the legality later. It's not just AI companies, the pattern is well established.
>was
>began
You're very obviously missing a key point here. It's rather simple: pirating is integral to "AI", as it is of the utmost importance with regards to its optimization and even to building its basic functionalities. It will never cease to happen nor is it part of some "preliminary" process in which executives "ignore silly legal matters" in order to kick-start their projects only to discard those practices once they eventually take off. Comparisons to YouTube, Spotify, etc., are invalid for this very reason.
I should have been very clear that "silly legal matters" was meant tongue in cheek. I do not think that this is cool at all.
You raise a good point. However, both Spotify and YouTube benefited from network effects and being the biggest guerrilla in the room. Can you remove the initial illegality from their later success, since the latter dependend on the prior?
What seems inevitable is that some deal is made with major rights holders, the little guy gets screwed, as has happened before.
- [deleted]