Generative artificial intelligence (GenAI) company Anthropic has claimed to a US court that using copyrighted content in large language model (LLM) training data counts as “fair use”, however.

Under US law, “fair use” permits the limited use of copyrighted material without permission, for purposes such as criticism, news reporting, teaching, and research.

In October 2023, a host of music publishers including Concord, Universal Music Group and ABKCO initiated legal action against the Amazon- and Google-backed generative AI firm Anthropic, demanding potentially millions in damages for the allegedly “systematic and widespread infringement of their copyrighted song lyrics”.

  • SuiXi3D@kbin.social
    link
    fedilink
    arrow-up
    44
    ·
    5 months ago

    …then maybe they shouldn’t exist. If you can’t pay the copyright holders what they’re owed for the license to use their materials for commercial use, then you can’t use ‘em that way without repercussions. Ask any YouTuber.

    • Even_Adder@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      5 months ago

      You might want to read this article by Kit Walsh, a senior staff attorney at the EFF, and this one by Katherine Klosek, the director of information policy and federal relations at the Association of Research Libraries. YouTube’s one-sided strike-happy system isn’t the real world.

      Headlines like these let people assume that it’s illegal, rather than educate them on their rights.

      • Snot Flickerman@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        8
        ·
        5 months ago

        When Annas-Archive or Sci-Hub get treated the same as these giant corporations, I’ll start giving a shit about the “fair use” argument.

        When people pirate to better the world by increasing access to information, the whole world gets together to try to kick them off the internet.

        When giant companies with enough money to make Solomon blush pirate to make more oodles of money and not improve access to information, it’s “fAiR uSe.”

        Literally everyone knew from the start that books3 was all pirated and from ebooks with the DRM circumvented and removed. It was noted when it was created it was basically the entirety of private torrent tracker Bibliotik.

        • Even_Adder@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          5 months ago

          AI training should not be a privilege of the mega-corporations. We already have the ability to train open source models, and organizations like Mozilla and LAION are working to make AI accessible to everyone. We can’t allow the ultra-wealthy to monopolize a public technology by creating barriers that make it prohibitively expensive for regular people to keep up. Mega corporations already have a leg up with their own datasets and predatory terms of service that exploit our data. Don’t do their dirty work for them.

          Denying regular people access to a competitive, corporate-independent tool for creativity, education, entertainment, and social mobility, we condemn them to a far worse future, with fewer rights than we started with.

          • Snot Flickerman@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            5
            ·
            edit-2
            5 months ago

            How am I doing their dirty work for them? I literally will stop thinking that they’re getting away with piracy for profit when we stop haranguing people who are committing to piracy for the benefit of mankind.

            I’m not saying Meta should be stopped, I’m saying the prosecution of Sci-Hub and Annas-Archive need to be stopped under the same pretenses.

            If it’s okay to pirate for the purpose of making money (what we put The Pirate Bay admins in jail for), then it’s okay to pirate to benefit mankind.

            There is literally no way in hell someone can convince me what Meta and others are doing is not pirating to use the data contained within to make money. What’s good for the goose is good for the gander, as they say.

            I reiterate, they knew it was pirated and had DRM circumvented when they downloaded it. There was zero question of the source of this data. They knew from the beginning they intended to profit from the use of this data. How is that different than what we accused The Pirate Bay admins of?

            It really feels like “Well these corporations have money to steal more prolifically than little people, so since they’re stealing is so big, we have to ignore it.”

            • Rivalarrival@lemmy.today
              link
              fedilink
              arrow-up
              1
              ·
              5 months ago

              There is literally no way in hell someone can convince me what Meta and others are doing is not pirating

              Then your argument is non-falsifiable, and therefore, invalid.

              Major corporations and pirates are finally on the same side for once. “Fair Use” finally has financial backing. Meta is certainly not a friend, but our interests currently align.

              The worst possible outcome here is that copyright trolls manage to convince the courts that they are owed licensing fees. Next worse is a settlement that grants rightsholders a share of profits generated by AI, like they got from manufacturers of blank tapes and CDs.

              Best case is that the MPAA, RIAA, and other copyright trolls get reminded that “Fair Use” is not an exception to copyright law, but the fundamental reason it exists: Fair Use is the promotion of science and the useful arts. Fair Use is the rule; Restriction is the exception.

              • Zaktor@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                1
                ·
                5 months ago

                Then your argument is non-falsifiable, and therefore, invalid.

                Wow this is some powerful internet word salad, just shot gunning scientific sounding words at the wall to try to pretty up a basic internet debate. Falsifiability is about scientific hypotheses, not statements of belief. “Nothing you can say can convince me that murder isn’t wrong” may mean there’s no further use in debate, but it isn’t “non-falsifiable” in any meaningful way nor does it somehow make the argument for the immorality of murder “invalid”.

  • davehtaylor@beehaw.org
    link
    fedilink
    arrow-up
    23
    ·
    5 months ago

    Then it shouldn’t exist.

    This isn’t an issue of fair use. They’re stealing other people’s work and using it to create something new and then trying to profit from it, without any credit or recompense.

  • OttoVonNoob@lemmy.ca
    link
    fedilink
    arrow-up
    20
    ·
    5 months ago

    Big Company: Well if you can’t afford food you should not have food.

    Also Big Company:… sobbing pwease we neeed fweee… pwease we need mowe moneys!

  • FfaerieOxide@kbin.social
    link
    fedilink
    arrow-up
    18
    ·
    5 months ago

    I’m all for stealing content willy-nilly but you can’t then use that theft to craft a privately “owned” mind.

    I’d have no problem with “ai” if it could unionize and had to pay for rice like the rest of humanity.

    These companies want to combine open theft with privately owned black boxen they can control and license out for money.

    It’s enclosure of The Commons all over again.

    • Deceptichum@kbin.social
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      5 months ago

      So youre fine with the free models Facebook and many others provide?

      Because many of these LLMs can be run on your own device without paying.

        • Deceptichum@kbin.social
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          5 months ago

          But you’re all for stealing content willy-nilly?

          And this is being offered to people without it being a privately owned blackbox licensed out for money.

          Feels kinda inconsistent.

          • FfaerieOxide@kbin.social
            link
            fedilink
            arrow-up
            0
            ·
            5 months ago

            Feels kinda inconsistent.

            Perfectly consistent. Seeming otherwise is down to a failure to grasp my position, not any inconsistency of the positions themselves.

              • megopie@beehaw.org
                link
                fedilink
                arrow-up
                2
                ·
                5 months ago

                There is a difference between an individual pirating a movie and a huge private company pirating a movie and then reselling it to people.

                You can debate the morality or social impacts of the former, but it is a very different question than the later.

  • Revv@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    9
    ·
    5 months ago

    To me, this reads like “Giant-ATV-Based Taxi Service Couldn’t Exist If Operators were Required to Pay Homeowners for Driving over their Houses.”

    If a business can’t exist without externalizing its costs, that business should either a. not exist, or b. be forced to internalize those costs through licensing or fees. See also, major polluters.

  • megopie@beehaw.org
    link
    fedilink
    arrow-up
    5
    ·
    5 months ago

    “Ai” as it is being marketed is less about new technical developments being utilized and more about a fait accompli.

    They want mass adoption of the automated plagiarism machine learning programs by users and companies, hoping that by the time the people being plagiarized notice, it’s too late to rip it all out.

    That and otherwise devalue and anonymize work done by people to reduce the bargaining power of workers.

    • Snot Flickerman@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      5 months ago

      They also don’t care if the open, free internet devolves into an illiterate AI generated mess, because they need an illiterate populace that isn’t educated enough to question it anyway. They’ll still have access to quality sources of information, while ensuring the lowest common denominator will literally have garbage information being fed to them. I mean, that was already true in the sense that the clickbait news outsold serious investigative news, and so the garbage clickbait became the norm and serious journalism is hard come by and costly.

      They love increasing barriers between them and the rest of the populace, physically and mentally.

  • Stillhart@lemm.ee
    link
    fedilink
    arrow-up
    3
    ·
    5 months ago

    It doesn’t matter what business we’re talking about. If you can’t afford to pay the costs associated with running it, it’s not a viable business. It’s pretty fucking simple math.

    And no, we’re not talking about “to big to fail” business (that SHOULD be allowed to fail, IMHO) we’re talking about AI, that thing they keep trying to shove down our throats and that we keep saying we don’t want or need.

    • intensely_human@lemm.ee
      link
      fedilink
      arrow-up
      1
      ·
      5 months ago

      Why are people publishing so much content online if they aren’t cool with people downloading it? Like, the web is an open platform. The content is there for the taking.

      Until one of these AIs just starts selling other people’s work as its own, and no I don’t mean derivative work I mean the copyrighted material, nobody is breaking the rules here.

      I read content online without paying for a license. I should only have to obtain a license for material I’m publishing, not material I read.

      • zaphod@lemmy.ca
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        5 months ago

        Until one of these AIs just starts selling other people’s work as its own, and no I don’t mean derivative work I mean the copyrighted material, nobody is breaking the rules here.

        Except of course that’s not how copyright law works in general.

        Of course the questions are 1) is training a model fair use and 2) are the resulting outputs derivative works. That’s for the courts to decide.

        But in general, just because I publish content on my website, does not give anyone else license or permission to republish that content or create derivative works, whether for free or for profit, unless I explicitly license that content accordingly.

        That’s why things like Creative Commons exists.

        But surely you already knew that.

        • blindsight@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          5 months ago

          Right, but I think it’s going to be a tough legal argument that using a text to adjust database weighting links between word associations is copying or distributing any part of that work. Assuming courts understand the math/algorithms.

  • ApeNo1@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    “today’s general-purpose AI tools simply could not exist” … “as a profitable venture”

  • Lvxferre@mander.xyz
    link
    fedilink
    arrow-up
    1
    ·
    5 months ago

    Most things that I could talk about were already addressed by other users (specially @OttoVonNoob@lemmy.ca), so I’ll address a specific point - better models would skip this issue altogether.

    The current models are extremely inefficient on their usage of training data. LLMs are a good example; Claude v2.1 was allegedly trained on hundreds of billions of words. In the meantime, it’s claimed that a 4yo child hears something between 45 millions and 13 millions words through their still short life. It’s four orders of magnitude of difference, so even if someone claims that those bots are as smart as a 4yo*, they’re still chewing through the training data without using it efficiently.

    Once this is solved, the corpus size will get way, way smaller. Then it would be rather feasible to train those models without offending the precious desire for greed of the American media mafia, in a way that still fulfils the entitlement of the GAFAM mafia.

    *I seriously doubt that, but I can’t be arsed to argue this here - it’s a drop in a bucket.