The rise and fall of robots.txt: As unscrupulous AI companies seek out more and more data, the basic social contract of the web is falling apart.

alyaza [they/she]@beehaw.org · 9 months ago

The rise and fall of robots.txt: As unscrupulous AI companies seek out more and more data, the basic social contract of the web is falling apart.

MachineFab812@discuss.tchncs.de · edit-2 9 months ago

The basic social contract of the web was to keep things accessible, including to bots. No one has the storage capacity to rip the entire web like all these jokers are pretending - even google merely indexes it except for the most popular pages.

The thing ruining the social contract of the web is the profit motive of all these companies trying to convince people that they should be able to sell data that is otherwise publically accessible, for the purposes of allowing bots to look at it - they can’t memorize it.

Of course ChatGPT and the other AI companies ARE partially to blame: it seems they’ve poisoned the pot by giving their AIs continual access to the training sets and/or even the broader internet on the backend without making this clear to users, allowing Journalists to claim that these AIs have somehow memorized Pettabytes of data into a few Gigabytes. That is an ABSURD, basically impossible, compression ratio for anyone with even the slightest comprehension with the topic.

No, your random article you tricked ChatGPT into spitting out is not worth memorizing, not even to the lie and hallucination prone AI chatbots we have available to prod for free or otherwise. Oh, you paid for it, and your complaint is that its spitting accurate information? YOU’RE PAYING FOR THEM TO HOST THE CHATBOT FOR YOU AND PROVIDE IT ACCESS TO INFORMATION IT WOULD OTHERWISE NOT HAVE ACCESS TO ON THE BACK-END.

By all means, sue the companies into paying for their data, and force them to divulge the data-sets they keep on-hand so that they can be charged for information in them, but stop pretending the AIs themselves contain copies of it, or that its impossible to make them pay ex-post-facto (as opposed to the ENTIRETY of the rest of our legal system and enforcement) …

AND PEOPLE, stop letting all these companies trick you into thinking that this is a valid excuse to further lock-down the web, or that you must poison your fanart with methods that WILL be bypassed. Its just another potential expense and technical burden these companies want you to believe you must bear rather than sticking to the things you enjoy and/or that put food on your table.

darkphotonstudio@beehaw.org · 9 months ago

The thing ruining the social contract of the web is the profit motive

Dingdingdingdingding!

gayhitler420@lemm.ee · 9 months ago

robots.txt isn’t a basic social contract, it’s a file intended to save web crawlers precious resources.

bedrooms@kbin.social · 9 months ago

As I always write, trying to restrict AI training on the ground of copyright will only backfire. The sad truth is that malicious parties (dictatorships) will get more training materials because they won’t abide by rules. The end result is, dictators would outperform democracies in terms of future generation AIs, if we treat AI training like human reading.

zaphod@lemmy.ca · edit-2 9 months ago

You know what?

I’m fine with that hypothetical risk.

“The bad guys will do it anyway so we need to do it, too” is the worst kind of fatalism. That kind of logic can be used to justify any number of heinous acts, and I refuse to live in a world where the worst of us are allowed to drag down the rest of us.

Blisterexe@lemmy.zip · 9 months ago

But, if we make training ai without copyright illegal, it will hamper open source models, while not affecting closed source ones , because they could just buy it off of big social media conglomerates

bedrooms@kbin.social · 9 months ago

The consequence of falling behind is gravely different from most heinous acts. It can impact the military, elections, espionage, or whatever.

zaphod@lemmy.ca · edit-2 9 months ago

Really? I’m supposed to believe AI is somehow more existentially risky than, say, chemical or biological weapons, or human cloning and genetic engineering (all of which are banned or heavily regulated in developed nations)? Please.

I understand the AI hype artists have done a masterful job convincing everyone that their tech is so insanely powerful (and thus incredibly valuable to prospective investors) that it’ll wipe out humanity, but let’s try to be realistic.

But you know, let’s take your premise as a given. Even despite that risk, I refuse to let an unknowable hypothetical be used to hold our better natures hostage. The examples are countless of governments and corporations using vague threats as a way to get us to accept bad deals at the barrel of a virtual gun. Sorry, I will not play along.

davehtaylor@beehaw.org · 9 months ago

If you don’t see how even the most basic of AI images, videos, deepfakes, etc. can manipulate the public, the electorate, popular opinion, or even sow just enough doubt as a cause a problem, then I don’t know what to tell you.

People are already dying because of deepfakes and fake AI porn. We know that most people who see some headline on Facebook will never click farther to read it, and will just accept the headline and/or the synopsis as fact. They will accept something a 1000x re-shared image says, without sources or verification. The fact that a picture or vid might have a person with 8 fingers on one hand in the background isn’t going to prevent them from taking in the message. And we’ve all literally seen people around the web say , explicitly, something to the effect of “I don’t care if the story is true or not, it’s a real issue we need to consider” when we know for a fact that it is not.

Yes, mis- and dis-information are far more of an existential thread than chem or bio weapons, and we know this because we are already seeing the consequences of it. If you refuse to see that, then you are lost.

davehtaylor@beehaw.org · 9 months ago

“Bad guys are going to do bad things, so we shouldn’t even bother trying to do anything to make things better, and just let the dystopia happen” is not the answer