Although a great deal had been learned (October 2007) about the structure of the Collatz iteration, some features seemed to require additional work.

One such was my continued worry that some readers would not be satisfied that the residue sets completely fill the predecessor tree. In addition to the infinite nature of the tree, the fact that only the first instantiation of each residue set explicitly occurs in it makes it difficult to locate any sought residue set. Recall the development of the abstract tree in a top down manner followed by an upward portion. These examples identified few deeper residue sets due to their limited depth. The need for a complete demonstration of residue set deployment in the binary tree helped motivate this additional work.

Another compelling notion is that, if the residue sets were shown to contain all the odd positive integers, completely filling the infinite space of the abstract tree, it would follow that the inductive constructive proof of the Collatz conjecture would be founded on residue sets rather than individual integers.

A page which gives a fuller background for the following discussion is available.

To provide the data for such detailed analysis a program was written which performs the downward and upward development of the abstract tree while outputting a data set of all the residue sets visited.

The density of the odd positive integers in the residue sets is employed to measure the completeness with which their contents fill the abstract predecessor tree. The density of odd integers in the residue sets {c[d]} is 2/d, but this relationship is only applied in the very last step of the following development. Thus, the correctly totalled density will be 1/2.

Of course, the infinite size of the Collatz predecessor graph
(coupled with the finite nature of the computational environment) might place
a serious limitation on the precision available from the accumulated
densities. One way to overcome this might take snapshots of the
results of the *rsetidea* program through a series of increasing
depths (which results in improving precisions as a larger and larger
fraction of the decreasingly dense l.d.a.s in the predecessor tree are
included), and then extrapolate the answers from the series of
increasingly larger runs. This approach need not be fully explored
because the desired proof that the tree's contents contain all the odd
positive integers proved possible by closed infinite summation. But it
did provide the unexpected result that there is replication of some
integers among all the residue sets, resulting in double counting and
the resulting over-count.

The structure of the abstract predecessor tree in its simplest form hides its intrinsic complexity. A fairly schematic picture, without identifying the edges of the tree, shows that successive levels of elaboration of the predecessor tree result in production of smaller and smaller (i.e. sparser and sparser) sets as the tree is developed.

A picture of a small fragment of the
abstract tree illustrates the difficulty of assigning
*{c[d]}* values to the nodes of the abstract tree because any
**one** represents a stage at its level in the development of an
**infinity** of ever-longer l.d.a.s which are dependent from it.
As a result, numerous residue sets may be cited for every node in
an l.d.a. to specify individual residue sets which lead to each
deeper one in growing l.d.a.s and leaf nodes. This
picture, though complex, may be helpful in understanding the
concepts introduced above and illustrated in tables and descriptions of
the output of the *rsetidea* program to follow.

Note that leaf nodes are not given individual recognition in the binary tree representation of the predecessor graph. Each node is taken to represent a leaf node at its level as well as the internal nodes in all dependent l.d.a.s. Leaf nodes are represented in the total density calculation through their presence at their parent's level which, as we shall see, is true of the density of all dependent nodes within each l.d.a.'s subtree.

The output from *rsetidea* is both lengthy and complex, requiring two
separate stages of aggregation to determine the total odd integer density.
First, the contribution of the densities of every l.d.a. is
obtained and then a final summation across the infinite set of l.d.a.s
gives the sought result.

The raw *rsetidea* output, when sorted appropriately, gives a
series of blocks of output lines. (See below.) The
entire output from *rsetidea* consists of residue sets
*{c[d]}*, i.e. integers congruent to *c* modulo
*d*. Each block bears at the top the residue set of an
l.d.a. header. Under that are the layers showing the dependent
residue sets of the succeeding generations.

That common header is given in the first line and repeated in the
first three columns of each entry in each block. The descendent
l.d.a.s are given in column four with each one's *{c[d]}* values
in columns five and six. Thus the values ascribed to each parent
of the traversal appear side-by-side with those of the successive
elements of its dependents on each output line. The header
*d*-values reflect only the number of powers-of-two
reached. The dependent subsets' *d*-values also include one
power of 3 for each step traversed in the developing l.d.a. The
fact that all those sparser residue sets are subsets of the parental
residue set is evident from the fact that *c*(up) mod *d
*(down) equals *c *(down) in every branch of the tree
under a header node. This fully explains the observation that
the sum of the densities produced by the program indicated replication
of integers among the residue sets.

Each of the four blocks presented show four levels of subtrees headed
by the four densest l.d.a.s of the predecessor tree. The
*d*-values of the residue set elements (column 6) within each left
descent assembly in every case appear in expanding multiple instances
shared among the subsets in numbers
which go as successive powers of 2, i.e. 1, 2, 4, 8, ...
All possible patterns of the "b" and "s" steps are exhibited (column 4)
in each group. These patterns continue exactly until the program
parameter prevents higher residue sets, thus confirming the
accuracy of the program's behavior. (Further examples are far too
large to exhibit here.)

Annotations at the left of each block point out essential features
which explain the behavior within the l.d.a.s, and the
algebraic expression at the lower right gives the *d*-values for
the *i*th generation within the l.d.a.s.

1 2 3 4 5 6 parental lda c(down) d{down) lda at leaf c(up) d(up) --------l.d.a. headers--------- -----l.d.a. leaf nodes----- In the "e" subtree e 5 8 e 5 8 (those with header e 5 8 es 5 24 in {5[8]}, the e 5 8 eb 13 24 header d value is e 5 8 ebs 13 72 simply 2^3 so the e 5 8 esb 29 72 density of odd e 5 8 ebb 37 72 integers in the set e 5 8 ess 53 72 is 1/4. Note that e 5 8 ebss 13 216 every set in the last e 5 8 esbb 29 216 3 columns after the e 5 8 ebbs 37 216 first row is a subset e 5 8 esss 53 216 of the header, e 5 8 esbs 101 216 {5[8]}. e 5 8 ebbb 109 216 e 5 8 essb 125 216 2^3*3^i e 5 8 ebsb 157 216 i=0...infinity In the "es" subtree es 3 16 es 3 16 (those with header es 3 16 esb 19 48 in {3[16]}), the es 3 16 ess 35 48 header d value is es 3 16 esbb 19 144 2^4, resulting from es 3 16 esss 35 144 the contribution of es 3 16 esbs 67 144 the "s" step. As es 3 16 essb 83 144 always the second es 3 16 esbss 67 432 and subsequent es 3 16 essbb 83 432 upward-bound lines es 3 16 esbbb 163 432 are subsets of the es 3 16 esssb 179 432 parental residue es 3 16 esbsb 211 432 set. es 3 16 esbbs 307 432 es 3 16 essss 323 432 2^4*3^i es 3 16 essbs 371 432 i=0...infinity In the "eb" subtree eb 17 32 eb 17 32 (those with header eb 17 32 ebs 17 96 in {17[32]}), the eb 17 32 ebb 49 96 header d value is eb 17 32 ebss 17 288 2^5, resulting from eb 17 32 ebbs 49 288 the contribution of eb 17 32 ebbb 145 288 the "b" step. eb 17 32 ebsb 209 288 eb 17 32 ebssb 17 864 . eb 17 32 ebbsb 49 864 eb 17 32 ebbbs 145 864 eb 17 32 ebsbs 209 864 eb 17 32 ebbss 337 864 eb 17 32 ebbbb 433 864 eb 17 32 ebsss 593 864 2^5*3^i eb 17 32 ebsbb 785 864 i=0...infinity In the "ess" ess 23 32 ess 23 32 subtree (those with ess 23 32 esss 23 96 header in {23[32]}, ess 23 32 essb 55 96 the d value is also ess 23 32 essbb 55 288 2^5, resulting from ess 23 32 esssb 119 288 a pair of "s" ess 23 32 essss 215 288 steps. ess 23 32 essbs 247 288 ess 23 32 essbbb 55 864 ess 23 32 esssss 215 864 ess 23 32 esssbb 407 864 ess 23 32 essssb 503 864 ess 23 32 essbsb 535 864 ess 23 32 essbbs 631 864 ess 23 32 esssbs 695 864 2^5*3^i ess 23 32 essbss 823 864 i=0...infinity

These blocks of results teach several important lessons. Foremost is that all the residue sets beneath each header consist of elements which also appear in the header residue set itself. When the densities of each whole subtree, i.e. the header node itself as well as all the deeper residue sets who reach that same header in the Collatz itinerary are summed with and without the density of the header itself, the densities are in the ratio of 3:2. This explains the earlier (unexpected) result that the summation over the whole predecessor tree indicates too high a density when the tree is considered as a whole. But since all the subtree contents under an l.d.a. header node contain duplicates, the proper set of residues sets to be summed to obtain the unique integer density is the set of l.d.a. header nodes.

Also, these blocks representing each particular l.d.a. shows an increasing power of 2 multiplicity of occurences of residue sets deep in the tree. It is this constant doubling which accounts for the high fraction of duplication of member integers in the total subtree dependent from each header node. Of course, if the density had been being calculated, the multiplicity would have had to be taken into account.

All these conclusions from the behavior of the blocks of output from
rsetidea runs are uncomplicated by any conversion of *d*-values to integer
densities in the residue sets. The *d*-values served to make the
qualitative judgments required to focus on the features required in
further analysis.

The first four lines of the following table are taken from the four
blocks above, with the rest of the lines derived from subsequent
blocks of the complete sorted output of *rsetprog*. When
the residue sets of l.d.a. parents are put in a table (where each subtree
is represented solely by its header node), and again arranged so
that the residue sets which share a common *d*-value are grouped
together, it is seen that the multiplicity of the residue sets
sharing a given *d*-value occur as the integers in the Fibonacci
series.

This time we aggregate over actual densities. All the duplicated
integers within l.d.a.s have been winnowed out. So we use
*2/d* (as shown in column 5), the density in each subtree from
the d-values of the previous table, for each l.d.a. The integer
density of the residue sets multiplied by the multiplicity from the
Fibonacci series member
which obtains at each level calculates the total density
provided at each level. The infinite summation over these
expressions will give the the total density of odd integers in the
infinite predecessor tree.

1 2 3 4 5 l.d.a. c d expression density e 5 8 2^(i-3)/3^i 1/4 F(1) es 3 16 2^(i-4)/3^i 1/8 F(2) eb 17 32 2^(i-5)/3^i 1/16 F(3) ess 23 32 2^(i-5)/3^i 1/16 ebs 11 64 2^(i-6)/3^i 1/32 F(4) esss 15 64 2^(i-6)/3^i 1/32 esb 25 64 2^(i-6)/3^i 1/32 ebb 65 128 2^(i-7)/3^i 1/64 F(5) ebss 7 128 2^(i-7)/3^i 1/64 esbs 59 128 2^(i-7)/3^i 1/64 essb 73 128 2^(i-7)/3^i 1/64 essss 95 128 2^(i-7)/3^i 1/64

The known sum(F(i)/2^(i+1),i=1..infinity) is 1, where F(i) are the Fibonacci numbers. The densities exhibited by the l.d.a.s are half those of the above expressions, so the total density over all of the l.d.a.s is 1/2, showing that the binary predecessor tree does contain all the odd integers.

To summarize, the program *rsetprog* has produced quite
a lot of new understanding about the Collatz predecessor tree.
Perhaps foremost is the successful construction by the program of the
predecessor tree viewed as consisting of residue sets. Beyond the
production of huge amounts of detailed information about the placement
and organization of the integers in the higher portions of the tree by
tracing the paths of left descent assemblies upwards, the
program's output proved to contain systematic features from which it was
shown using infinite summation first over individual l.d.a.s to
eliminate duplicated integers and secondly over the entire set of
l.d.a.s that all the odd positive integers are included in the
predecessor tree. Thus, the program is tantamount to proof
by induction and construction that the predecessor tree is infinite and
contains all the odd positive integers. The program thus completes
a proof of the Collatz conjecture.