Meditations on Confirmation Bias

Cole Wyeth

July 2022

The Background

Confirmation bias is the tendency to find additional supporting evidence for prior beliefs while ignoring conflicting facts. I suspect that the example which comes to the mind of my contemporary readers is a person who only follows news sources which confirm his political beliefs. This is usually followed by complaints about e.g. Facebook algorithms causing political division, or sometimes a diatribe on the topic of everyone else’s (or sometimes a convenient outgroup’s) stupidity for falling prey to this obvious fallacy. Note that our hypothetical partisan is looking for information from a source which he is quite certain will tell him what he thinks he already knows. He is not trying to be surprised; he is quite sure that his pet ideology (say, Reasonabilism) will be confirmed on Strawman News1. This is true, but not necessarily because the tenets of Reasonabilism are as ironclad as he imagines. Strawman News may simply employ a few very deranged pro-Reasonabilism anchors. So to really test Reasonabilism, our partisan could directly go out and check the tenets himself, or at least read some other news sources. That would allow him to be a little more confident that Reasonabilism was actually true, because he could distinguish the two potential causes (actually true, versus some anchors on Strawman News think it’s true).

The term confirmation bias was originally used by Peter Wason to describe effects observed in a series of 1960’s experiments. At least one of them (Wason 1960) is quite interesting in its own right. Subjects were presented with a sequence of numbers, 2, 4, 8, and told they satisfied a particular simple rule the experimenter had in mind. The subjects were supposed to figure out the rule by asking if some other sequences of three numbers satisfied it; they had to come up with these "test cases" themselves. Many of them latches on to the rule "add two" or sometimes "a sequence of even numbers." So they tested sequences like 10, 12, 14, and 5, 7, 9. The trick is, the rule was simply "a sequence of numbers increasing in magnitude." Subjects repeatedly confirmed the rule they had in mind, but they didn’t try to look for cases where they expected it NOT to be satisfied.

Imagine you were trying to determine whether A and B always occur together (that is, whether A ⇔ B is true). In this case, A might be "the numbers increase by increments of two" and B might be "the rule is satisfied." By testing a bunch of sequences increasing by two, you could check that A always implies B (A → B), but this would never be enough to convince you that B imples A (A ← B). That is, a sequence of numbers increasing by increments of two always satisfies the rule, but the rule is also satisfied by some sequences which do not increase my increments of two. To properly test the hypothesis A, subjects should have generated some cases which did NOT increase by increments of two and ensured that the rule was not satisfied (the contrapositive of A B is ¬A → ¬B). For a more in depth explanation take a look at Wason’s article, it’s a fun and enlightening read. Anyway, the fallacy Wason has uncovered here seems to describe many of the things we would call confirmation bias. I don’t think it can be stretched to include everything; for instance, the term confirmation bias is sometimes used to describe the tendency to interpret evidence in favor of your existing theory which is actually... maybe even dumber than the mistake Wason is pointing out. But it is certainly an instance of confirmation bias.

Okay, so with our definitions out of the way, let me get to the point. You might be thinking that now I’ve told you about this little trick, you’d never fall for it yourself. I’m sure you’ll be able to find lots of evidence to confirm that theory! But in the meantime, allow me to ruin your optimism by letting you know right now, the rest of this post is an absolute downer, and everything is way worse than you think.

BAOB Confirmation

I recently read something that forced me to give up a lot of faith in my idol, an algorithm2 called AIXI that I previously thought was sort of like God, except in a little box outside the universe, looking in and messing around with stuff using a remote controlled robot.

Okay, let’s back up.

So imagine you were watching our universe out of a little porthole. You’re not part of it, you can’t interact with it, you just get to watch. After a while you might start to figure out how things work in our universe. You’d maybe figure out object permanence, if you thought about it hard enough you might figure out some of the laws of classical physics, etc. Formally, this can be phrased in terms of a sequence prediction task. It’s the same sort of activity as guessing the next number in list, for instance

3, 6, ?, ...

3, 6, 12, ?, ...

3, 6, 12, 24, ?, ...


2, 3, 5, 7, ?, ...

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, ?, ...

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 60, ?, ...

(aggravating example thanks to Marcus Hutter)

You might ask how looking through a porthole has anything to do with numbers (probably not if you are a a computer scientist). Visual input can be represented as a sequence of frames, or rectangles of pixel values. The pixel values themselves could be the sequence (so there’d be many little prediction tasks for each frame) or the whole frame could be encoded as a number in some way (say as a binary file like a jpeg).

Now one might wonder if there is an optimal way to do sequence prediction. Sequence prediction tasks are often presented as if they had one right answer. However, you may have observed from the examples above that there is often more than one likely-seeming continuation. For instance the first question mark in the first example could be either a 9 or a 12. In general it’s even possible for a sequence to be totally random with no simple rule. However, if the sequence is assumed to be computer generated, there actually is an optimal solution: Solomonoff Induction. There are many great descriptions of Solomonoff induction, including a very in depth one "An Intuitive Explanation of Solomonoff Induction" on lesswrong. It is not my current purpose to explain the "algorithm," but in brief, the idea is to keep in mind every simple explanation for the sequence with particular focus on the simplest explanations, and eliminate any explanation that doesn’t match what you have seen so far. Unfortunately Solomonoff Induction is not itself computable, so you can’t actually write this up in Python to make your billions on the stock market (but if you already thought of that, kudos I guess).

Here’s an obligatory copy of the formula for Solomonoff’s Universal Prior, included not so much for any educational purpose as for the pseudo-religious devotion it inspires in me: M(x) := ∑p : U(p) = x*2l(p)

If none of that made sense, it’s okay. For my current purposes all you need to understand about Solomonoff Induction is that if you are looking through a little porthole trying to figure out what will happen next in some universe, and that universe is a computer simulation, there is an optimal way to make your predictions (up to constant terms blah blah choice of UTM blah blah). That is a deeply surprising and comforting fact. It still doesn’t tell us everything about optimal learning from observation, because as I was careful to stress, Solomonoff Induction is optimal for creatures existing outside of the universe, and we exist inside the universe; but in my opinion it comes damn close to telling us everything.

Now, humans don’t just learn passively from the universe, but also have to drive our meatsacks around in it doing stupid stuff and stubbing our toes. So it’s natural to wonder if we can adapt Solomonoff Induction so that it can tell us how to drive those meatsacks around better. Now instead of looking through a little porthole passively, we also get a remote controller that drives a robot (living in the universe!) around. Perhaps the porthole shows a video feed from a camera on the robot (a very astute reader may at this point realize they are confused about what a porthole is allowed to be). We want to use our remote controlled robot to achieve useful things, such as making pancakes or not stubbing our toes. This is no longer purely a learning task, we also have goals and immediate feedback (A.I. researchers would call this a reinforcement learning task). Now there’s a sort of canonical (okay, Bayes optimal) way to extend Solomonoff Induction to work with the remote controller. That extension is called AIXI and it was invented by Marcus Hutter. When I first learned about this, I got the impression that AIXI was basically God, and immediately felt much less confused about what intelligence is. The problem of optimal behavior (including learning through interaction) was apparently reduced to finding a good approximation of what AIXI would do. Unfortunately, this dream is to good to be true.

My faith in AIXI was broken by a 2015 paper (Hutter and Leike 2015) from Jan Leike and Marcus Hutter himself. This paper demonstrates that AIXI isn’t truly, objectively optimal. In fact, there are a lot of slightly different versions of AIXI, which come with different built in prior beliefs. Anyone (even a God?) needs to start out believing SOMETHING. This even applies to the simpler case of Solomonoff Induction; the surprising thing about Solomonoff Induction is that it doesn’t matter. A perfect Solomonoff Inductor will always change its mind to match the truth eventually, no matter what it expects initially (and the difference between different Solomonoff Inductors is a mere constant). The crushing blow in (Hutter and Leike 2015) is that this is NOT true for AIXI.

The reason AIXI is not objectively, universally optimal is basically a simple feedback loop: Beliefs effect actions effect observations effect beliefs. I’ll explain this step by step. A given AIXI model starts with some beliefs (technically, priors) about the universe it is interacting with. For instance, it might believe that if it leaves its house it is certain to get stuck somewhere and repeatedly stub its toes on rocks forever and ever. AIXI does not like to stub its toes, so this belief will effect its actions. If its prior beliefs are set up right, (such an example is constructed carefully in the paper) AIXI will never leave its house. Now if AIXI never leaves its house, and this house has no windows, it may never be able to learn whether the ground outside is actually covered in sharp rocks. That is, AIXI’s actions have limited its observations to the inside of the house. Unfortunately, this means that AIXI never gets a chance to change its (potentially false) belief that the ground outside is covered in sharp rocks.

The unfortunate fact is that AIXI’s false belief can be self reinforcing. In a way this is reminiscent of confirmation bias, but the situation is much more insidious. AIXI is actually acting in some sense rationally in accordance with its beliefs; the fact that it gets stuck with those beliefs may well be an inherent difficulty of the task we’ve posed it. In other words, it’s perfectly possible that the ground outside really IS covered in sharp rocks, and there doesn’t seem to be any really objective way of guessing how likely this is without checking. Learning from interaction is really harder than learning passively, in the sense that Solomonoff Induction solves the later but doesn’t extend easily to the former.

This tendency to get stuck with false beliefs when learning from interaction might be called Belief Action Observation Belief (BAOB) Confirmation. I imagine pronouncing this roughly like "Bob Confirmation." I’m not completely satisfied with this name, because what I am describing is much more concerning than the thing (e.g.) Wason described. But in accordance with Patrick Winston’s Rumplestiltskin Principle, it is probably best to have some kind of name for this issue.


Both Confirmation Bias and BAOB Confirmation point to the stickiness of false beliefs. That is, if you initially believe something which is false, it is very easy to become further entrenched in that belief. I’ve been thinking about this lately for various personal reasons. I will give one example without meaning to imply that it is necessarily an instance of Confirmation Bias/BAOB Confirmation/Both. I tend to speak very bluntly, and that has sometimes pushed people with certain personality characteristics away from me or even caused them to actively dislike me. This means I’m surrounded mostly by people who don’t mind my speaking bluntly, and it also reinforces my identity as someone who does so. It’s also possible that I expect people with those characteristics to eventually dislike me, and therefore don’t make much effort to prevent that from happening.

Personally, I find the possibility of getting stuck with false beliefs unsettling. At this point I don’t have a good solution. People can probably overcome many kinds of confirmation bias, such as the one described by Wason, through deliberate practice. Unfortunately, I can’t say the same for BAOB Confirmation. It is reminiscent of the Exploration v.s. Exploitation dilemma in Reinforcement Learning, which is definitely not a solved problem. Is there some best strategy to decide exactly how much risk to take attempting to falsify your beliefs? The problem is that this strategy has to be judged based on its performance interacting with all computable universes. Remember that we don’t know anything about the universe we are looking at through our porthole to begin with; that’s why learning is necessary. This is a very hard task because no strategy works best in every universe (in fact all strategies are Pareto Optimal, see the details in (Hutter and Leike 2015)). As far as I know, it’s an open problem to even come up with a good standard for judging whether or not a strategy is the best one. It’s likely that entirely new ideas (frames) will need to be invented to make this situation less confusing. That is both frightening and exciting!

Hutter, Marcus, and Jan Leike. 2015. “Bad Universal Priors and Notions of Optimality.” COLT 40: 1244–59.
Wason, Peter Cathcart. 1960. “On the Failure to Eliminate Hypotheses in a Conceptual Task.” Quarterly Journal of Experimental Psychology 12 (3): 129–40.

  1. I assure you that I have not chosen the token Strawman News to avoid offending anyone. On the contrary, I couldn’t come up with any source that would offend virtually everybody, so I decided to settle for making no one happy↩︎

  2. Since AIXI is not computable it isn’t really an algorithm.↩︎