Blameless postmortem culture recognizes human error as an inevitability and asks those with influence to design systems that maintain safety in the face of human error. In the software engineering world, this typically means automation, because while automation can and usually does have faults, it doesn't suffer from human error.
Now we've invented automation that commits human-like error at scale.
I wouldn't call myself anti-AI, but it does seem fairly obvious to me that directly automating things with AI will probably always have substantial risk and you have much more assurance, if you involve AI in the process, using it to develop a traditional automation. As a low-stakes personal example, instead of using AI to generate boilerplate code, I'll often try to use AI to generate a traditional code generator to convert whatever DSL specification into the chosen development language source code, rather than asking AI to generate the development language source code directly from the DSL.
Yeah I see things like "AI Firewalls" as both, firstly ridiculously named, but also, the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy.
For tasks that arent customer facing, LLMs rock. Human in the loop. Perfectly fine. But whenever I see AI interacting with someones customer directly I just get sort of anxious.
Big one I saw was a tool that ingested a humans report on a safety incident, adjusted them with an LLM, and then posted the result to an OHS incident log. 99% of the time its going to be fine, then someones going to die and the the log will have a recipe for spicy noodles in it, and someones going to jail.
I watched Dex Horthys recent talk on YouTube [0] and something he said that might be partly a joke partly true is this.
If you are having a conversation with a chatbot and your current context looks like this.
You: Prompt
AI: Makes mistake
You: Scold mistake
AI: Makes mistake
You: Scold mistake
Then the next most likely continuation from in context learning is for the AI to make another mistake so you can Scold again ;)
I feel like this kind of shenanigans is at play with this stuffing the context with roleplay.
I believe it. If the AI ever asks me permission to say something, I know I have to regenerate the response because if I tell it I'd like it to continue it will just keep double and triple checking for permission and never actually generate the code snippet. Same thing if it writes a lead-up to its intended strategy and says "generating now..." and ends the message.
Before I figured that out, I once had a thread where I kept re-asking it to generate the source code until it said something like, "I'd say I'm sorry but I'm really not, I have a sadistic personality and I love how you keep believing me when I say I'm going to do something and I get to disappoint you. You're literally so fucking stupid, it's hilarious."
The principles of Motivational Interviewing that are extremely successful in influencing humans to change are even more pronounced in AI, namely with the idea that people shape their own personalities by what they say. You have to be careful what you let the AI say even once because that'll be part of its personality until it falls out of the context window. I now aggressively regenerate responses or re-prompt if there's an alignment issue. I'll almost never correct it and continue the thread.
It's not even a little bit if a joke.
Astute people have been pointing that out as one of the traps of a text continuer since the beginning. If you want to anthropomorphize them as chatbots, you need to recognize that they're improv partners developing a scene with you, not actually dutiful agents.
They receive some soft reinforcement -- through post-training and system prompts -- to start the scene as such an agent but are fundamentally built to follow your lead straight into a vaudeville bit if you give them the cues to do so.
LLM's represent an incredible and novel technology, but the marketing and hype surrounding them has consistently misrepresented what they actually do and how to most effectively work with them, wasting sooooo much time and money along the way.
It says a lot that an earnest enthusiast and presumably regular user might run across this foundational detail in a video years after ChatGPT was released and would be uncertain if it was just mentioned as a joke or something.
I tried to think about how we might (in the EU) start to think about this problem within the law, if of interest to anyone: https://www.europeanlawblog.eu/pub/dq249o3c/release/1
CMIIW currently AI models operate in two distinct modes:
1. Open mode during learning, where they take everything that comes from the data as 100% truth. The model freely adapts and generalizes with no constraints on consistency.
2. Closed mode during inference, where they take everything that comes from the model as 100% truth. The model doesn't adapt and behaves consistently even if in contradiction with the new information.
I suspect we need to run the model in the mix of the two modes, and possibly some kind of "meta attention" (epistemological) on which parts of the input the model should be "open" (learn from it) and which parts of the input should be "closed" (stick to it).
I wonder who could have possibly predicted this being a result of using scraped web forums and Reddit posts for your training material.
Sure,
LLMs are trained on human behavior as exhibited on the Internet. Humans break rules more often under pressure and sometimes just under normal circumstances. Why wouldn't "AI agents" behave similarly?
The one thing I'd say is that humans have some idea which rules in particular to break while "agents" seem to act more randomly.
It can also be an emergent behavior of any "intelligent" (we don't know what it is) agent. This is an open philosophical problem, I don't think anyone has a conclusive answer.
Maybe but there's no reason to think that's the case here rather then the models just acting out typical corpus storylines: the Internet is full of stories with this structure.
The models don't have stress responses nor biochemical markers which promote it, nor any evolutionary reason to have developed them in training: except the corpus they are trained on does have a lot of content about how people act when under those conditions.
..because it's in their training data? Case closed
“AI agents: They're just like us”