The Collatz Conjecture: A simple Yet Devious Problem
A problem by many names
Imagine a hailstone in a storm. As it's thrown around higher in the clouds collecting moisture it grows larger. Eventually it collides with other hailstones, breaking in two, and falls closer to the ground. You may ask - will it ever collide with the Earth?
There is a similar problem in mathematics known by many names - the Collatz Conjecture, the problem, the Hailstone Problem etc. - which is defined as follows. For all positive integers where the iterative function is defined as
and , is there some iteration where ? For example consider . Applying the rules we have:
After iterations we arrive at which then enters a loop.
Does this happen for all numbers?
It's unknown if for all numbers applying these rules will end up at one - making it one of mathematics' most famous open problems! Due to the simple problem statement it's easy to implement in a script and verify for the first few thousand numbers that it holds. In fact it has been verified true for all numbers up to - a number over twenty digits long. Despite both this and statistical evidence suggesting the conjecture to be true, there is yet a definitive proof that it holds for all numbers.
You may ask - how may the conjecture be false? Well, one way would be finding an integer that enters a cycle that does not contain the number one. Another way is there is a number that when applied to the rules will never each one - a number perhaps so large the rate at which it's halved is less than the rate it increases by.
You may also ask is there something special about ? Would this occur for other rules? It happens there is something special: consider instead applying when is odd for and see what behavior occurs.
Some interesting observations if the conjecture is true
My favorite observation is to think about the conjecture in reverse. Imagine you were to write all positive integers on a piece of paper. For each you draw a line to the next number in the sequence after applying the rules. Because the conjecture is true each number will have a path to the number one. Moreover this path will be unique - there will be no two paths from the same number that lead to (minus the loop.)
Now suppose instead of walking this path from the number to one, you start at one. For any positive integer there is exactly one path (or application of the inverse rules - either double the number or subtract one and divide by three if the result is an integer) that will generate it. We can write this sequence using two symbols - for doubling and for subtracting by one and dividing by three - from right-to-left. For example to generate the number three we can represent the sequence as:
In the same way that there's a unique representation for each number or a unique prime factorization of it, there is a unique way to generate numbers using these rules. And while you may observe you can also generate any number by starting at one and either adding one and repeating or stop, the mystery of the conjecture is why exactly these rules would result in this behavior.
Another observation is the branches that form by applying the reverse rules. For any number we can always double it so there will be an infinite number of nodes representing numbers of the form . However it's not always the case that is divisible by three and could branch off into a second branch.
The conjecture states that all numbers lie somewhere on an infinite artery of the form where is odd. Following this artery to the rule takes us to some new artery of the form . This repeats until we reach the central artery in which we end up at the number one.
Why is it so difficult to prove?
Given the conjecture's algorithmic formulation one may ask if results from algorithmic complexity theory may provide insight into the problem. In 1972 Conway's Unpredictable Iterations showed a generalization of the Collatz Conjecture could be represented as the halting problem. That is, there is no algorithm that can decide if the number will either reach one or infinitely, other than by applying the conjecture's rules and seeing the result.
Another relationship within complexity theory is with the Busy Beaver problem. The Busy Beaver problem () asks, given a Turing machine with possible states and an infinite tape filled with zeros, what is a transition table over the states that will:
- Terminate in a finite number of steps, and
- Perform write operations on the tape to write the maximum number of ones (which is defined as )
Essentially asks for what is the longest running - but will terminate - algorithm using only states. Only up to is known as adding more states allows for more complex algorithms (and thus more iterations) are possible. In 1979 Erdős' Unconventional Problems in Number Theory conjectured a relationship between and the Collatz Conjecture. In 2024 Sterin and Woods showed that finding is at least as hard as proving this relationship (preprint).
What does a solution provide?
In closing it is surprising that a simply stated problem seems to be so difficult. Other than being an interesting problem (made even more so by its popularity) there is much consequence of proving the conjecture true or false. Unlike problems such as the Riemann hypothesis which would provide us more insight into the locations of prime numbers, solving the Collatz conjecture doesn't seem to be consequential. However while the implications of a solution itself may not be of much value the new tools, techniques, and insights gained may provide new understandings in surprising areas of mathematics.