The best clue might come from a 2022 paper written by the Anthropic team back when their startup was just a year old. They warned that the incentives in the AI industry — think profit and prestige — will push companies to “deploy large generative models despite high uncertainty about the full extent of what these models are capable of.” They argued that, if we want safe AI, the industry’s underlying incentive structure needs to change.
Well, at three years old, Anthropic is now the age of a toddler, and it’s experiencing many of the same growing pains that afflicted its older sibling OpenAI. In some ways, they’re the same tensions that have plagued all Silicon Valley tech startups that start out with a “don’t be evil” philosophy. Now, though, the tensions are turbocharged.
An AI company may want to build safe systems, but in such a hype-filled industry, it faces enormous pressure to be first out of the gate. The company needs to pull in investors to supply the gargantuan sums of money needed to build top AI models, and to do that, it needs to satisfy them by showing a path to huge profits. Oh, and the stakes — should the tech go wrong — are much higher than with almost any previous technology.
So a company like Anthropic has to wrestle with deep internal contradictions, and ultimately faces an existential question: Is it even possible to run an AI company that advances the state of the art while also truly prioritizing ethics and safety?
“I don’t think it’s possible,” futurist Amy Webb, the CEO of the Future Today Institute, told me a few months ago.
I will take a different tack than sweng.
You can’t inject programmatic controls.
I think that this is irrelevant. Whether a safety mechanism is intrinsic to the core functioning of something, or bolted on purely for safety purposes, it is still a limiter on that thing’s function, to attempt to compel moral/safe usage.
None of those changes impact the morality of a weapons use in any way.
Any action has 2 different moral aspects:
- the morality of the actor’s intent
- the morality of the outcome of the action
Of course it is impossible to change the moral intent of an actor. But the LLM is not the actor, it is the tool used by an actor.
And you can absolutely change the morality of the outcome of an action (I.e. said weapon use) by limiting the possible damage from it.
Given that a tool is the means by which the actor attempts to take an action, it is also an appropriate place that safety controls which attempt to enforce a more moral outcome should reside in.
I think I’ve said a lot in comments already and I’ll leave that all without relitigating just for arguments sake.
However, I wonder if I haven’t made clear that I’m drawing a distinction between the model that generates the raw output, and perhaps the application that puts the model to use. I have an application that generates output via OAI API and then scans both the prompt and output to make sure they are appropriate for our particular use case.
Yes, my product is 100% censored and I think that’s fine. I don’t want the customer service bot (which I hate but that’s an argument for another day) at the airline to be my hot AI girlfriend. We have tools for doing this and they should be used.
But I think the models themselves shouldn’t be heavily steered because it interferes with the raw output and possibly prevents very useful cases.
So I’m just talking about fucking up the model itself in the name of safety. ChatGPT walks a fine line because it’s a product not a model, but without access to the raw model it needs to be relatively unfiltered to be of use, otherwise other models will make better tools.