Hello, is this IT Support? Help! My monitor ate my mouse!

Aug 18, 2025

We were creating demo images for a dummy retail website we wanted to create for a Solus demo instance. So, when we were getting images of electronic equipment, this popped up:

A screenshot of a cell phone

AI-generated content may be incorrect.

So, obviously, we leaned into it and decided to add a monitor to the product assortment. We are a VERY SERIOUS organization, after all.

A screenshot of a cell phone

AI-generated content may be incorrect.

But this little bit of hilarity nicely dovetails into the topic of today’s blog post: When you’re using generative AI tools at scale, how do you deal with the possibility that they might be wrong?

At the end of the day, Large AI models are essentially what some researchers refer to as “stochastic parrots”. The term captures two essential features of these models: one, that they essentially find patterns in large swaths of data and use those to predict what is the next most likely word to print on the screen, and two, that they produce fluent, coherent human-like responses because they’re trying to imitate what they’ve seen. Had someone trained a model only on the collected works of Shakespeare, the answer to “Is a neural collaborative filtering model better than simply listing the trending products when it comes to recommendations?” might well have been “That is the question.”

This is not to diss large AI models (I come to praise them, not bury them). More often than not, for the scenarios where we have tested them, they produce reliable, very sensible results. But what of those instances when they might produce plausible sounding but wrong answers? 

Here are some points to ponder:

  • Keep a human in the loop: If you have an application/agent that interprets your requirements and prescribes a decision based on the data, add a step where the decision and the rationale are reviewed by the end-user. A simple example is an agent to generate ideas for marketing campaigns. Let it do its job, but have a CRM leader sign off on actually sending the campaign out.
  • Show and tell: This is a corollary to the one above. If an AI system is used to generate conclusions based on data, try and add some support to the conclusion. Charts are a good example – people can read pictures better than tables, and can more easily confirm if the conclusions drawn are sensible. Providing a chain of thought that explains the reasoning is also useful.
  • Speak and listen: I’m sure you’ve encountered instances where you’ve asked an LLM a question and it’s given you a wrong answer. Now, you can either just dismiss the answer and go on your merry way, or you can give it feedback on where it’s gone wrong. A lot of the time, it reverts with the right answer.
  • Add algorithmic sanity checks: Sometimes, when you’re using an AI model to draw conclusions based on numeric as well as text data, it might be possible to run a sniff test on the conclusions by running some basic quantitative data analysis on the underlying numeric data. This will allow you to check the premises upon which the model’s conclusions are based. 
  • Maker-checker systems: If you have one LLM producing an output, have a second LLM to cross-check it. This is especially useful when you wish to generate code using AI tools – basically, deploy an AI-based QA agent as well.
  • Learn to say no: The fundamental premise behind automating nearly any cognitive process in a workplace is this: the majority of the problems being encountered fall into a small set of standard problem types. Which means, you need to either scope out the minority that remains, or know when to pass the problem onto a human being. For instance, if you use an agent to understand a natural language query and automatically construct and run SQL queries on your relational database, then you’re assuming that the vast majority of queries asked by the user fall into standard patterns. But if someone asks a seriously complex query, you need to have a monitoring mechanism to understand if the query is too complex to be reliably constructed, or sometimes just too complex to run without further optimization. So add a layer to identify when you’re better off just saying, “Hey, we’ll get back to you.”
  • Don’t toss out the old stuff: There are problems (time series forecasting, predictive modeling etc.) that are still best solved using classical ML techniques. These techniques have been designed to tease out complex nonlinear patterns in the data and generate predictions accordingly. Now, it is theoretically possible to give an LLM a time-stamped series of observations and ask it to predict what comes next, but why would you want to do that?
  • Iterate, iterate, iterate: It’s like with most products. Your first version ain’t gonna do everything, or do everything right. So assume that you will have to iterate, build better and more detailed prompts, add cut-outs to handle edge cases etc.

Everyone who hasn’t been living under a rock this past decade is curious about what generative AI can do for them. This is good news, because you’ll get more traction when you give them something to play with. But with great curiosity comes great expectations, so tell them that it won’t be perfect, but will keep getting better. Over time, users will also calibrate both their expectations and their own guardrails when it comes to using AI-driven systems.

You Might Also Like

Explore more insights from the Solus team.

Still not sure which plan fits you?

Talk to our team to customize a plan that aligns with your data, industry, and growth goals.