Exploring the Odds: What Will ChatGPT Get Right And Wrong?

by Bill Franks

I recently wrote about how you can think of everything that ChatGPT, or any other generative AI tool, creates as a hallucination. However, even given that generative AI is making up its answers from scratch based on probabilities, it still manages to get a lot of things right.

This led me to ponder … is it possible to know in advance how likely you are to get an accurate answer from ChatGPT? Are there patterns to what it gets right and wrong? Yes! I’ll outline some guidelines here.

How Recent Is Your Topic?

Like all models, ChatGPT was trained using a base of historical data. In ChatGPT’s case, the most recent versions were trained on information that was cut off in late 2021. As a result, ChatGPT will know very little about anything that occurred since the cutoff date of its training data except to the extent that people have provided information through their prompts.

If you want to know facts about World War II, you’re probably going to get decent answers. There are decades of books, articles, and other documents that provide robust and consistent accounts of many key facts about World War II. If you want to know things about the current war in Ukraine, you should be cautious using ChatGPT because the war did not exist at the time of the model’s training. In addition, there are many conflicting stories and “facts” being shared about Ukraine. As a result, since there are substantive inconsistencies in the prompt-generated training data, so will there be in the answers it gives.

How Obscure Is Your Topic?

ChatGPT will do best when providing information on topics with a lot of documentation and

where that documentation has a high level of consistency. Going back to our World War II example, you’re likely to get fairly accurate answers to basic questions about Pearl Harbor because that event is well documented. Similarly, if you ask basic questions about Winston Churchill, you’re likely to get pretty good results. However, as you ask about more obscure battles or less famous military figures, the documentation on them will be far less robust and ChatGPT will have much less to go on. As a result, you can expect answers to have much less quality. In general, ChatGPT will do great with popular, widely discussed topics and not so well with less popular, more obscure topics.

Do You Want Facts Or Subjectivity?

Whatever answer ChatGPT gives will necessarily be based upon the contents of the documents it has been trained on. To the extent that you want it to provide a fact, or even an assessment of how many people are on each side of an argument about a fact, it will probably do well. The stories of ChatGPT successfully passing various tests like the Bar Exam or SAT aren’t too surprising since many of the questions in such exams are focused on asking you to remember some key facts.

But what if you want a projection of what might happen in the future? Likely you won’t get very good answers unless there are many examples of people providing similar projections in the training data. For example, asking what the world population will be in 2100 will lead to a good answer because many such projections have been documented. Ask something more obscure such as what the population of a little-known tree frog in Brazil will be in 2100 and ChatGPT will probably provide something nonsensical. The more subjectivity and interpretation your question requires, the worse ChatGPT will do.

Beware Math And Computational Questions

You should also be very cautious when trying to use ChatGPT for computational purposes. Remember that it doesn’t know or understand math. It understands patterns in the text that it has seen. So, if there are many instances of the text “1 + 1 = 2” then ChatGPT will likely tell you “2” when you ask “What is 1 + 1?”. However, it didn’t actually do the math, it simply knew probabilistically that 2 was the most common response. As you get more complex, the chances of ChatGPT being able to get to the right answer goes down very quickly.

When I asked ChatGPT some math questions it appeared to pass some of my questions to a calculator app of some sort as opposed to trying to generate the answer, which surprised me. I then found that ChatGPT’s developers have integrated computational engines underneath the hood. The caution here ties to the prior point about facts versus subjectivity. While you can think of math problems as being factual, it does take a lot of thought to solve more complicated problems even though there is a single, factual answer. Even using a computational engine, ChatGPT still has to pass it the right question. Unless questions structured like yours have been asked and answered many times in the training data, ChatGPT’s answers will have limited and sporadic accuracy.

Making The Best Of ChatGPT

The moral of this story is that knowing the strengths and weaknesses of generative AI tools like ChatGPT is critical to being successful with them. While ChatGPT won’t handle all types of questions with equal success, there are patterns as to when it will perform better and worse. Of course, you can expect these patterns to change as time passes and more improvements to the model are made.

As I’ve suggested in the past, the best way to use ChatGPT is to consider it simply another input into your search for an answer – and an input that you weight no more strongly than any other. If a friend told you their best guess at an answer, you wouldn’t take it as fact. Rather, you’d take it as a starting point to validate. Same with ChatGPT.

Originally posted in the Analytics Matters newsletter on LinkedIn