This AI security flaw might be impossible to fix • Graham Cluley

Then it dropped to number 3, and recently I think it dropped to number 6, but it’s still on there because we’re still screwing it up all the time.

And the way that you solve injection is twofold. So the first thing is that you validate all input. So, I don’t mean you sanitize.

I mean that, you know, if you’re supposed to get a date, you check that it is a date, you check that it’s in the range, you check the type, you check the format, you check every single thing about it.

And if anything’s off, you’re “No thanks, try again.” So, once it’s all the things that’s supposed to be, then you either escape or sanitize out special, potentially dangerous characters.

So, if for instance, you need to accept the name O’Malley, which has a single quote in it, then you accept it and then you either sanitize out, or in my case, I always escape.

So you put a backslash in front. Then the second part would be if you’re doing any sort of query language, then you run it through a stored procedure or a prepared statement.

And what that means is you actually choose it as a parameter, which identifies it as data. It says specifically, this is data. It can only be treated as data.

And then you bring it over to the SQL Server, you know, NoSQL, Mongo, whatever you’re using.

And it gets it and it’s “I understand this is only data.” And it does a bunch of magic there, which is escaping. Which is more escaping. And then it runs the thing.

And so with prompt injection, this is a thing that the industry’s really all over, they’re working really hard on it.

And so I was looking up some of the defenses because it’s literally changed over and over and over, each month there’s new things. And so they’re doing some of that.

So they’ll do things they’ll delimit the data, so there’s clear markings when the AI gets it and it’s “these are instructions from the user.” These can only be context and these must follow the rules and you can’t escape out of it.

And there’s multiple different ways that they show that.

They’re also, it sounds weird, but some of them will actually put a weird character in between every single word within what the user uses.

And then it’s if that character’s missing, then you know that this is not legit. It’s been injected. But there’s also sandboxing.

So you take it and you put it in a special place where you’re “we can be dangerous here and we know it’s going to be here.” Then there’s also— so they call it capability reduction.

But what I would say is applying least privilege. And so, you know, do you give every single person where you work in a big secure building a key that goes to every single room?

You probably don’t, right? They probably don’t have the key to the CEO’s private office. So, just only give it access to the things it actually needs.

Does it have to have read/write access to every single database? It probably doesn’t. Another thing they talk about is human in the loop.

So, getting a human being to review and then approve things. But guess how well that works, Graham? Yeah.

Could you review 5 billion requests per day manually where 99.999% of them are fine and they all look the same? Yeah. Do you want that person’s job? I don’t.

Source link