Mine attempts to lie whenever it can if it doesn’t know something. I will call it out and say that is a lie and it will say “you are absolutely correct” tf.

I was reading into sleeper agents placed inside local LLMs and this is increasing the chance I’ll delete it forever. Which is a shame because it is the new search engine seeing how they ruined search engines

  • SmokeyDope@lemmy.worldM
    link
    fedilink
    English
    arrow-up
    20
    arrow-down
    1
    ·
    5 months ago

    Thinking of llms this way is a category error. Llms can’t lie because they dont have the capacity for intentionality. Whatever text is output is a statistical aggregate of the billions of conversations its been trained on that have patterns in common with the current conversation. The sleeper agent stuff is pure crackpottery they dont have a fine control over them that way (yet) machine model development is full of black boxes and hope-it-works trial and error training. At worst is censorship and political bias which can be post trained or ablated out.

    They get things wrong cofidently. This kind of bullshitting is known as hallucination. When you point out their mistake and they say your right thats 1. Part of their compliance post training to never get in conflict with you 2. Standard course correction once a error has been pointed out (humans do it too). This is an open problem that will likely never go away until llms stop being schastic parrots, which is still very far away.

    • Crescent Baddie@sh.itjust.worksBannedOP
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      9
      ·
      5 months ago

      Yet the people creating the LLMs admit they don’t know how it works. They also show during training the LLM is intentional deceptive at times. By looking at it’s thinking. The damn thing lies. Use whatever word you want. It tells you something wrong on purpose.

      • fruitycoder@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        7
        ·
        5 months ago

        “don’t how they work” misunderstands what scientist mean when they say that (also intentional misdirection from marketing in order to build hype). We know exactly how it works, you describe down to physics if needed, BUT at different levels of abstration in the precense of really world inputs the out puts are novel to us.

        Its predicting words that come after words. The “training” is inputing the numerical representation of words and adjusting variables in the algorythem until the given mathmatical formula creates the same outputs as inputs within a given margin of error.

        When you cat I say dog. When some says what are they together we say “catdog” or “pets”. Randomness is added so that the algorythem can say either even if pets is majority answer. Make the string more complicated and that randomness gives more oppertunity for weird answers. The training data could also just have lots of weird answers.

        Little mystery here. The interesting “we dont know how it works” is that these outputs give such novel output that is unlike the inputs sometimes to the degree it seems like it reasons. Even though again it does not

      • fibojoly@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        6
        ·
        5 months ago

        If you wanna put intent in there, maybe think of it as a kid desperately trying to give you an answer they think will please you, when they don’t know, because their need to answer is greater than their need to answer correctly.

  • Bob Robertson IX @discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    14
    ·
    5 months ago

    Think about the data that the models were trained on… pretty much all of it was based on sites like Reddit and Stack Overflow.

    If you look at the conversations that occur on those sites, it is very rare for someone to ask a question and then someone else replies with “I don’t know”, or even an “I don’t know, but I think this is how you could find out”. Instead, the vast majority of replies are someone confidently stating what they believe is the truth.

    These models are just mimicking the data they’ve been trained on, and they have not really been trained to be unsure. It’s up to us as the users to not rely on an LLM as a source of truth.

  • kopasz7@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    13
    ·
    5 months ago

    Stochastic parrots always bullshit. It can’t lie as it has no concept or care for truth and falsity, but spitting back noise that’s statistically shaped like a signal.

    In practice, I noticed the answer is more likely wrong the more specific the question. General questions that have the answer widely available in the training data will more often be there correctly in the LLMs result.

    • SaveTheTuaHawk@lemmy.ca
      link
      fedilink
      English
      arrow-up
      4
      ·
      5 months ago

      I feed my class quizzes in senior cell biology into these sites. They all get a C-.

      Two points of interest: they bullshit like students and they never answer " I don’t know" .

      Also Open AI and Grok return exactly the same answers, to the letter with the same errors.

    • SinAdjetivos@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 months ago

      All models are wrong but some are useful.

      ~George E. P. Box (probably)~

      This is as true of LLMs as a human’s mental model.

    • Crescent Baddie@sh.itjust.worksBannedOP
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      3
      ·
      5 months ago

      Good comment. But the way it does it feels pretty intentional to me. Especially when it admits that it just lied so that I could give an answer, whether the answer was true or false

  • HumanPerson@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 months ago

    Always. That is a known issue with ai that has to do with explainability. Basically, if you’re familiar with the general idea of neural networks, we don’t really understand the hidden layers so we can’t know if they “know” something so we can’t train them to give different answers based on if they do or don’t. They are still statistical models that are functionally always guessing.

    Could you post the link to the sleeper agent thing?

  • WraithGear@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    all the time, usually to protect entrenched power systems and about the efficacy of working within said system.

  • slazer2au@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    5 months ago

    Never because To me lying requires intent to deceive. As llm do not have intentions, the engineers behind the llms have intent.