On my first reading of Craig Feinstein's proof of the unprovability of the Collatz conjecture it seemed to depend heavily on the assumption that the minimum representation of a Collatz trajectory is the parity bit string which represents the even and odd positive integers encountered. That misreading of his proof led me to the following thoughts which may be of some interest, but in any case will help illuminate details of my approach which might otherwise escape notice.

What if it is not true that the parity bit string is the shortest representations of the paths? There should be shorter ones. After all, there is a structure (at least a determinism) to the Collatz itineraries. The Collatz itineraries may appear random, but are actually precisely determined. How could any such deterministic patterns be used to create a more compact bit string?

My contribution to the Collatz problem is a description of the structure of the Collatz itineraries. The replicating structure in the Collatz predecessor graph (which turns out to be a tree, thus, by itself, proving the conjecture) is an infinite set of l.d.a.s (left descent assemblies). L.d.a.s are terminated at leaf nodes (0[3]), are traversed upwards in the predecessor graph by a series of Collatz steps in which the number of divisions by 2 is either 1 or 2, and headed by an extension (5[8]). The l.d.a.s are joined together by extensions, each causing exactly 2 additional divisions by 2, which connect the header of any l.d.a. to some element in a higher l.d.a. in the predecessor tree, whose header is in turn connected by an extension to a higher l.d.a. and so on until 5 is reached. 5 is the first extension of 1, hence serves as proxy for 1, the conjectured root of the Collatz predecessor tree. 5 is thus the l.d.a. header from which the whole predecessor tree develops and the first element of {5[8]} which is at the root of the abstract predecessor graph.

Recall that I focus on the odd numbers in my description of the structure of the Collatz graph. Every odd number in the predecessor tree bears all its power of 2 multiples, infinite in number. These would be represented in the parity bit vector as an infinite string of '0's appended on the right end of the parity string, but such zeros are just as meaningless and unecessary as leading zeros would be on the conventional representation of integers. The powers of 2 themselves (i.e. 1, 2, 4, 8,...) would all be represented by '1' as their parity vector because all bear a string of trailing zeroes, which we conventionally drop from this viewpoint.

Two kinds of steps occur within an l.d.a.**:** an
*s*-step (when a node is 2[3]) (in which a single
division by 2 occurs in the upwards direction) and a
*b*-step (when a node is 1[3]) (in which 2 divisions by
2 occur in the upwards direction). The extensions may be denoted
as an *e*-step.

Extensions can exist in indefinitely long series. Each in a series of extensions is related to the
previous one in the ratio 4*n*+1**:***n*.
5**:**1**::**4*n*+1**:**n, whence 5 is indeed the
first extension of 1, and effectively the root of the tree and the
end of any Collatz path.

It turns out I have two ways to produce a shorter bit representation
of a Collatz itinerary. The first is relatively simple,
requiring very little other than an appreciation of the fact that any
path is composed of *s*-, *b*-, and *e*-steps. If
the usual parity bit string starts with 1 (to represent the
root) and reads left to right to represent the path through
predecessors, then an *s*-step appears as '01',
a *b*-step appears as '001', and an *e*-step as
'00'. These bit strings structurally represent the three basic
steps which compose the itinieraries as components of the parity bit
string, but nothing prevents us from using different encodings if
they provide some other advantage.

Unfortunately there is a glitch**:** not every itinerary is
self-evidently composed of *e*, *s*, and *b*
components. In my work I have adopted the use of a binary tree in which left descents are
called l.d.a.s and right descents are called extensions. In this
representation, an extra (phantom) value appears in the path
in the predecessor tree whenever the path leaves the header of an
l.d.a. to transit to an element in some higher l.d.a. Such a
transition really involves a combination of an *e* and an *s*
or an *e* and a *b* step even though the series of divisions
by 2 somehow elides the phantom node. This makes these
combinations somewhat obscure in the Collatz itinerary and in the parity
bit string which represents it.

I'll illustrate with two examples, one involving an *es*
combination (13->5) and the other involving an *eb* combination (37->7).
Small fragments of the binary predecessor tree show the situation.

5 7 / / s/ b/ / / 3 9 \ \ e\ e\ \ \ 13 37

The right descent, labelled *e* in each case, shows
the *n*:4*n*+1 relation between the parent and its
extension. The two examples of left descents, labelled
*s* or *b* as appropriate on the left descent, show the
two different types of transition between l.d.a.s. 3 reaches
5 (via 10/2) with a single division by 2, hence an
*s*-step. 9 reaches 7 (via 28/2^2) with two
divisions by 2, hence a *b*-step. But 3 and 9 are
phantoms; they do not appear in the Collatz trajectory between 13
and 5 (via 40/2^3) or between 37 and 7 (via
112/2^4). 3 and 9 really are predecessors of 5 and 7. They
need to be in the predecessor tree to represent them in the Collatz
predecessor graph, but they do not appear in the paths of any of
their extensions. They certainly are not artifacts of my view of
the structure of the Collatz process. You might regard all nodes
elided from the paths from their extensions as similar to unwanted
mother-in-laws or disgraced elder brothers quietly suppressed in
pictures of the family tree.

Higher extensions of 3 and 9 may be added to show how this works
across longer right descents. The first big row below shows the
effect of a single phantom node, and the second big row shows the
effect of adding 3 or 2 phantom nodes. The examples are written
from left to right in predecessor order. The effect is to insert a
1 in the parity bit string to represent the phantom, which enables
the bit string to express the parity string using the
{*s,b,e*} character set. A further confusing feature
appears since what I have regarded structurally as extensions appear as
*b*-steps (rather than *e*-steps) in the encodement.

1 1 3 3 5---3 5-3--3 7----7 7--9--7 10001 101001 100001 1001001 s e b e structurally s b b b as encoded 2 2 1 1 1 1 5 1 4 3 4 and 5-------3 5-3--3--3--3 7------9 7--9--7--9 100000001 101001001001 10000001 1001001001 s b e e b e e structurally s b b b b b b as encoded

Once an *esb* expression of a path is achieved, we need
only finish by employing a more compact representation of
{*e,s,b*}, namely {'0','10','11'}. Such a
representation is the result of the
Huffman text compression algorithm
if *e* is the most frequently occurring character of the three.

As an example of the whole procedure in a realistic context,
let's look at the
path to 705 which I have represented as "43bss63bb"
where each successive character represents the number of consecutive
divisions by 2 between successive odd integers in the path, with
*b* and *s* representing 2 and 1 divisions,
respectively. Recognizing that every 1 must be preceded by a
0, we see that {00,01,001} (nominally for
{*e,s,b*} but somewhat garbled when viewed structurally)
provide an adequate symbol set to represent any parity bit string which
has been augmented by inserting 1-bits to represent the phantoms.
Use of this allows an *esb* encoding in the next to last line of
the following. If we now use {0,10,11} to represent
{e,s,b} we get the final line (written with spaces to show
the sources of the component bits), which is markedly shorter than
the parity bit string. The structural fidelity, as seen by
comparing the second from bottom lines in the two segments, is
good, with all the b's and s's of the upper segment carried into
the lower segment. But otherwise the structure just gets
obscured.

1 3 5 7 } 1 1 1 4 9 2 0 } number reached 1 5 3 7 1 7 9 7 9 5 } 1000010001001010100000010001001001 parity bit string (length 34) 4 3 b s s 6 3 b b # of 0's between 1'sNow expand that with '1's where the phantoms go. (Look ahead a half a screen.)

1 3 5 7 1 1 1 3 4 9 9 2 0 1 5 3 3 7 1 7 9 7 9 9 7 9 5 10000101001001010100100100101001001001 parity bit string (length 38) * * * * stars mark phantoms e b s b b s s b b b s b b b esb encodement 10 11 1011 11 101011 11 11 1011 11 11 coded using {0,10,11} for {e,s,b} result is length 28

It's unnecessarily tedious and confusing to labor at the bit level. We can do it all at a structural level as is done in the following, wherein bit codes are applied directly to a representation of a path with phantoms inserted. We'll continue with the same example, the path from 705.

ee b s s b b 1--5 13--17--11--7 149 397--529--705 s\ e/ b\ e e/ s\ e/ 3 9--37 99 10010 0 11 10 1011 0 0 10 0 11 11 result is length 23

The upper line of the path shows the odd numbers actually
encountered, and the lower line shows the positions and values of
the phantoms. Finally encode {*e,s,b*} as
{0,10,11} just as before, but this time we have
structural conformance at lower cost. The encoded string
(again written with blanks to show where the component bits have
come from) is completely different, but is shorter than the
one above even though a longer sequence of odd integers are present in
the encoded path because it was augmented with the phantoms. The more
frequent use of *e*s rather than *s* or *b*s is
responsible.

Both methods are very tedious and error prone and unlikely to be very
practical. However, a few features exist in the numeric
structural picture like that immediately above which will help.
The phantom nodes always precede (in paths written left to right
from root to leaf, as above) 5[8] nodes, and the
value of the phantom nodes are determined from the n**:**4*n*+1
ratio of the values of the phantom to its extension. I have
not, and probably will not, make any attempt to determine
which of these approaches usually or always results in the greater
compression of the bit string length.

"Well, what exactly", you ask, "is the structural
feature of the Collatz itineraries which enables this compression of the
augmented parity bit string?" It is the fact that every odd number
is made into an even number by the 3*n*+1 step, and the
resulting even number is to be divided by 2^{i}. This
results
in the bits '01' and '001' as the basic (*s* and *b*
step) units, which need only be elaborated to encompass the
string of '0's which result when the even number is divisible by
some higher power of 2.

I had expected to take advantage of the appearance of l.d.a.s as the recurring structural units from which the Collatz predecessor tree is constructed. I won't review here the argument behind construction of the abstract predecessor tree, but only describe its key properties.

Suffice it to say that the l.d.a.s can be generated starting from the
set of header nodes (5[8]). Every node in an l.d.a.
represents an infinite set of positive integers which share a common
c[d]. Every internal (i.e. non-leaf) node in an
l.d.a. gives rise to a subset from an *s*-step and another
subset from a *b*-step. Thus a binary tree is formed.
If we call the first row of the tree (consisting of the null
l.d.a.s) level 1, the second row will contain two
nodes, the result of an *est* path and the result of an
*ebt* path. The third row will contain four
nodes, representing *esst*, *esbt*,
*ebst*, and *ebbt*, respectively. Each
row, *i*, in turn, will contain
2^(*i*-1) nodes, and the number of nodes in the tree
after the *i*th row is complete will be (2^*i*)-1. We
can number the nodes in order across the rows, with the null
l.d.a. numbered 1, the *est* and *ebt* nodes of
row 2 numbered 2 and 3, the 4 nodes of row 3 numbered 4,
5, 6, and 7, and so on. Every l.d.a. is
thus given a unique serial number, represented by a unique integer
of only *i* bits.

We have shown how to identify any positive odd integer as a member of its l.d.a. The connection from one l.d.a. to the next via one or more extensions will be to a number in another l.d.a. which in turn can be identified within its l.d.a. and so on. Thus a path in the Collatz predecessor tree can be characterized simply by listing a series of l.d.a. serial numbers and flagging the intervening extensions. It seems obvious that a considerable compression would be reaped by use of a bit string constructed using these notions..

Unfortunately, the serial numbers of the l.d.a.s will include
every possible bit pattern, and have every possible length.
This means that there can be no distinguished bit pattern which can be
adopted to flag the end of an l.d.a. and the beginning of a
series of extensions and * vice versa*. We can go to use of
two bits to represent 0 and 1, leaving different 2-bit patterns, one of
which is to serve as the flag. Thus a ternary notation using
{0,1,flag} represented as {'00'|'0','01','11'} (where
'00' is used within the serial numbers and '0' within the extension
string) and always process the bit string pairwise within the
l.d.a. serial numbers, we will have achieved our goal. The
approach causes loss of about half of our envisioned compression and the
frequent insertion of the flag consumes still more bits, so the
overall result is problematic in terms of a length reduction.

Here is an example which we've worked over before. This time we will identify the l.d.a.s encountered by their serial numbers. As before we show the details of the encodement step by step. The bit strings start out in extension mode.

ee b s s b b 1--5 13--17--11--7 149 397--529--705 s\ e/ b\ e e/ s\ e/ 3 9--37 99 00 0 0 0 0 0's for extensions 1 25 1 7 serial #s for ldas eef 1f ef 25 f ee f 1f ef 7 f for flag 100110111011010101000000 1100 110111 011010101 for 42 bits (vs 28)

Here's another example, the path from
3009 (Ass4bsbs3bb). (Note in passing that the string of
internal zeroes in the parity string of this example is fully conserved
in the {b,s,p} notation of the Collatz trajectory.)

ee s s b s b s b b 1--5 341--227--151 805--1073--715--953--635 1693--2257--3009 \e e /e b\ e/ s\ e/ 21--85 201 423 00 0 0 0 0 0 ssb=9 bsbss=52 bb=7 e e e e ef 9 f ef 52 f ef 7 10 0 0 0 01101000001 11 011010100010000 11 011010101 for 44 bits (vs. 39)

So even a path with longish chains of extensions and longish l.d.a.s doesn't quite result in a shortened bit representation compared to the parity bit string using this approach. If a way could be found to remove those flags this would become competitive.

Despite the fact that only one of two approaches succeeded, it certainly makes sense that capitalizing on known structures within them would permit more efficient encodements of Collatz itineraries. No approach which does not in some way encapsulate the structure must, indeed, have recourse to the full length bit string representation.

My Collatz Home Page Index to Terms Used