• Serinus@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    the training data is just a statistical record of human bias.

    It’s not. It’s a record of online conversations, which tend to be more polarized and extreme than real people.

    • coffeeauntie@feddit.de
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      That’s why I said

      So as long as the training data is well selected for your problem…

      It’s clear that in the training data for LLMs, 4chan, reddit, etc. are over-represented, so that explains why chatgpt might be more awful than an average person. Having an LLM decide on, e.g., college admission would be like having a Twitter poll to decide on who should be its next CEO. Like that’s obviously stupid, nobody would ever do that, right?

      The problem is that for the college admission example, the models were trained on previous admissions, taken by college employees , and these models are still biased.