3aIT Blog

 ChatGPT on a mobile phoneWe've now been living with ChatGPT et al for a year or two, and statistics would suggest that most people reading this will have used these systems to some degree in that time. It's pretty clear they are amazing at a lot of things things. However, one of the bigger problems at the moment is that these chat systems will never say "I don't know".

Obviously, when you ask one of these chatbots to do something or the answer to a question, you don't want it to say it doesn't have the answer. Knowing that, this is how they have been trained to behave. It will always confidently give you what you asked for, which is great. Unless it's wrong...

Primary school maths

There's many examples online of these AI chatbots getting very simple things very wrong. From not knowing how many "t"s there are in a word that you've just given it, to it not knowing how many years ago 1995 was. Now that Google often returns an AI answer before any other search results, you may well have seen responses that are clearly not right here stated as facts.

Numbers 1-9 on a red backgroundWe laugh at this sort of thing because we can't believe that something so "clever" is getting something so basic wrong. And then we go and ask it to either do or help us understand something more complicated, and assume it's right.

Our Experience

As a company that specialises in development, we're somewhat at the coalface of this. Writing code is something that these systems are genuinely good at. However, even here, we've seen many examples where it would be much better for it to say "I don't know how to do that" or "What you've asked for makes no sense in this context".

There's times when we've been working on two things at once and accidentally typed a request into the wrong window. Rather than saying "You've asked for changes to Wordpress website code and this project isn't a Wordpress website", we've watched it then merrily go off and change hundreds of lines of code that completely break the system because they make no sense there. Equally, it will happily just guess at processes that don't actually exist because it seems to answer the question rather than saying "I don't have enough training data on that thing to be able to help".

Computer code on a monitorMost of the time, in a coding context, this is more of a nuisance than a big problem. If it's very obviously wrong on sight, you can just undo it all. If it's invented something that won't work, that becomes very clear when you try and run the code and it errors. However, there is a danger when it produces something that works on the surface. It requires a lot of experience to know when you're right and the machine is wrong, and is why we're not even close to the average non-technical person being able to safely produce anything more complex than a basic web page. Would you know where to start looking for security holes? We've had cases where its solution to a login issue was to just quietly remove the need for a password. Sure, that solves the problem, but the danger is clear. Unless you have a skilled person reviewing everything it's doing and stepping in where necessary, you're going to have problems sooner or later.

AI Everywhere

This becomes an interesting challenge as we try and find the right balance as clients approach us to integrate these tools into their systems. How do you safely deploy this when it will happily give you an answer, even when it has no real understanding of what you've just asked, or isn't including the data you think it is when providing that answer?

An incorrect sum on a blackboard (1+1 = 3)The biggest risk here seems to be in getting it to generate straight "factual" statistics based on chat style questions from the user. If you ask how many hours a staff member spent in the office based on login times, or how much money the business has generated this month, it will give you a number even if it has no data to base that on or hasn't really understood the question in the context of your system. These are fundamental business-changing numbers that you could act on based on a confident "hallucination". 

So while that sort of analysis seems just within reach, we've been tending to advise against it for now. Instead, the safer application within a system perspective for now is more language based. Get it to crawl through historical notes or email logs to pull out summaries or intelligent searches of the data. It's a lot clearer whether the answers to these sorts of queries are correct and safe to base decisions on.

This also goes for more general use of ChatGPT, Claude, Copilot et al. Always bear in mind that this seemingly endless source of information has huge blindspots in certain circumstances that it will invent an answer to rather than admitting. Always consider the possble implications if it's done that in response to your query before blindly trusting it's right.