Little Bobby Tables has a baby sister. Meet Sally Ignore Previous Instructions.

@snaggen@programming.dev · 2 years ago

Little Bobby Tables has a baby sister. Meet Sally Ignore Previous Instructions.

peopleproblems · 2 years ago

I was going to say that’s wild, but that’s the whole point of the model isn’t it.

I don’t remember how it all works, but I imagine it’s something like:

in = encode(prompt)
result = applyModel(in)
saveState(prompt, result)
out = decode(result)

I think these would all be model aware steps. If you put the validation after encode, you only run the model once on bad input, twice on good. But I also think it works where you can append the encoded validation to the encoded prompt, apply the model, and only save the state and return the generation if the result is safe.

that’s of course a super oversimplification, but it reduces the execution back to apply the model once.