The Program Rsetidea -- Motivation and Results

Although a great deal had been learned  (October 2007)  about the structure of the Collatz iteration,  some features seemed to require additional work.

One such was my continued worry that some readers would not be satisfied that the residue sets completely fill the predecessor tree. In addition to the infinite nature of the tree, the fact that only the first instantiation of each residue set explicitly occurs in it makes it difficult to locate any sought residue set.  Recall the development of the abstract tree in a top down manner followed by an upward portion.  These examples identified few deeper residue sets due to their limited depth.  The need for a complete demonstration of residue set deployment in the binary tree helped motivate this additional work.

Another compelling notion is that,  if the residue sets were shown to contain all the odd positive integers, completely filling the infinite space of the abstract tree,  it would follow that the inductive constructive proof of the Collatz conjecture would be founded on residue sets rather than individual integers.

A page which gives a fuller background for the following discussion is available.

To provide the data for such detailed analysis a program was written which performs the downward and upward development of the abstract tree while outputting a data set of all the residue sets visited.

The density of the odd positive integers in the residue sets is employed to measure the completeness with which their contents fill the abstract predecessor tree. The density of odd integers in the residue sets {c[d]} is 2/d, but this relationship is only applied in the very last step of the following development.   Thus, the correctly totalled density will be 1/2.

Of course,  the infinite size of the Collatz predecessor graph (coupled with the finite nature of the computational environment) might place a serious limitation on the precision available from the accumulated densities.   One way to overcome this might take snapshots of the results of the rsetidea program through a series of increasing depths (which results in improving precisions as a larger and larger fraction of the decreasingly dense l.d.a.s in the predecessor tree are included),  and then extrapolate the answers from the series of increasingly larger runs.  This approach need not be fully explored because the desired proof that the tree's contents contain all the odd positive integers proved possible by closed infinite summation.  But it did provide the unexpected result that there is replication of some integers among all the residue sets, resulting in double counting and the resulting over-count.

The structure of the abstract predecessor tree in its simplest form hides its intrinsic complexity.  A fairly schematic picture, without identifying the edges of the tree, shows that successive levels of elaboration of the predecessor tree result in production of smaller and smaller (i.e. sparser and sparser) sets as the tree is developed.

A picture of a small fragment of the abstract tree illustrates the difficulty of assigning {c[d]} values to the nodes of the abstract tree because any one represents a stage at its level in the development of an infinity of ever-longer l.d.a.s which are dependent from it.   As a result, numerous residue sets may be cited for every node in an l.d.a.  to specify individual residue sets which lead to each deeper one in growing l.d.a.s and leaf nodes.   This picture,  though complex,  may be helpful in understanding the concepts introduced above and illustrated in tables and descriptions of the output of the rsetidea program to follow.

Note that leaf nodes are not given individual recognition in the binary tree representation of the predecessor graph.   Each node is taken to represent a leaf node at its level as well as the internal nodes in all dependent l.d.a.s.   Leaf nodes are represented in the total density calculation through their presence at their parent's level which, as we shall see, is true of the density of all dependent nodes within each l.d.a.'s subtree.

The output from rsetidea is both lengthy and complex, requiring two separate stages of aggregation to determine the total odd integer density. First, the contribution of the densities of every l.d.a. is obtained and then a final summation across the infinite set of l.d.a.s gives the sought result.

The raw rsetidea output, when sorted appropriately, gives a series of blocks of output lines.   (See below.)   The entire output from rsetidea consists of residue sets {c[d]}, i.e. integers congruent to c modulo d.  Each block bears at the top the residue set of an l.d.a. header.  Under that are the layers showing the dependent residue sets of the succeeding generations.

That common header is given in the first line and repeated in the first three columns of each entry in each block.  The descendent l.d.a.s are given in column four with each one's {c[d]} values in columns five and six.  Thus the values ascribed to each parent of the traversal appear side-by-side with those of the successive elements of its dependents on each output line.  The header d-values reflect only the number of powers-of-two reached.  The dependent subsets' d-values also include one power of 3 for each step traversed in the developing l.d.a.  The fact that all those sparser residue sets are subsets of the parental residue set is evident from the fact that c(up) mod d (down) equals c (down) in every branch of the tree under a header node.   This fully explains the observation that the sum of the densities produced by the program indicated replication of integers among the residue sets.

Each of the four blocks presented show four levels of subtrees headed by the four densest l.d.a.s of the predecessor tree.  The d-values of the residue set elements (column 6) within each left descent assembly in every case appear in expanding multiple instances shared among the subsets in numbers which go as successive powers of  2, i.e. 1, 2, 4, 8, ...   All possible patterns of the "b" and "s" steps are exhibited (column 4) in each group.  These patterns continue exactly until the program parameter prevents higher residue sets,  thus confirming the accuracy of the program's behavior. (Further examples are far too large to exhibit here.)

Annotations at the left of each block point out essential features which explain the behavior within the l.d.a.s, and the algebraic expression at the lower right gives the d-values for the ith generation within the l.d.a.s.

                          1       2        3            4        5        6
                  parental lda  c(down)  d{down)  lda at leaf   c(up)    d(up)
                  --------l.d.a. headers---------  -----l.d.a. leaf nodes-----
In the "e" subtree        e       5        8             e        5        8
(those with header        e       5        8            es        5       24
in {5[8]}, the            e       5        8            eb       13       24
header d value is         e       5        8           ebs       13       72
simply 2^3 so the         e       5        8           esb       29       72
density of odd            e       5        8           ebb       37       72
integers in the set       e       5        8           ess       53       72
is 1/4. Note that         e       5        8          ebss       13      216
every set in the last     e       5        8          esbb       29      216
3 columns after the       e       5        8          ebbs       37      216
first row is a subset     e       5        8          esss       53      216
of the header,            e       5        8          esbs      101      216
{5[8]}.                   e       5        8          ebbb      109      216
                          e       5        8          essb      125      216   2^3*3^i
                          e       5        8          ebsb      157      216   i=0...infinity

In the "es" subtree      es       3       16            es        3       16
(those with header       es       3       16           esb       19       48
in {3[16]}), the         es       3       16           ess       35       48
header d value is        es       3       16          esbb       19      144
2^4, resulting from      es       3       16          esss       35      144
the contribution of      es       3       16          esbs       67      144
the "s" step.  As        es       3       16          essb       83      144
always the second        es       3       16         esbss       67      432
and subsequent           es       3       16         essbb       83      432
upward-bound lines       es       3       16         esbbb      163      432
are subsets of the       es       3       16         esssb      179      432
parental residue         es       3       16         esbsb      211      432
set.                     es       3       16         esbbs      307      432
                         es       3       16         essss      323      432   2^4*3^i
                         es       3       16         essbs      371      432   i=0...infinity

In the "eb" subtree      eb      17       32            eb       17       32
(those with header       eb      17       32           ebs       17       96
in {17[32]}), the        eb      17       32           ebb       49       96
header d value is        eb      17       32          ebss       17      288
2^5, resulting from      eb      17       32          ebbs       49      288
the contribution of      eb      17       32          ebbb      145      288
the "b" step.            eb      17       32          ebsb      209      288
                         eb      17       32         ebssb       17      864
      .                  eb      17       32         ebbsb       49      864
                         eb      17       32         ebbbs      145      864
                         eb      17       32         ebsbs      209      864
                         eb      17       32         ebbss      337      864
                         eb      17       32         ebbbb      433      864
                         eb      17       32         ebsss      593      864    2^5*3^i
                         eb      17       32         ebsbb      785      864    i=0...infinity

In the "ess"            ess      23       32           ess       23       32
subtree (those with     ess      23       32          esss       23       96
header in {23[32]},     ess      23       32          essb       55       96
the d value is also     ess      23       32         essbb       55      288
2^5, resulting from     ess      23       32         esssb      119      288
a pair of "s"           ess      23       32         essss      215      288
steps.                  ess      23       32         essbs      247      288
                        ess      23       32        essbbb       55      864
                        ess      23       32        esssss      215      864
                        ess      23       32        esssbb      407      864
                        ess      23       32        essssb      503      864
                        ess      23       32        essbsb      535      864
                        ess      23       32        essbbs      631      864
                        ess      23       32        esssbs      695      864    2^5*3^i
                        ess      23       32        essbss      823      864    i=0...infinity

These blocks of results teach several important lessons.  Foremost is that all the residue sets beneath each header consist of elements which also appear in the header residue set itself.  When the densities of each whole subtree, i.e. the header node itself as well as all the deeper residue sets who reach that same header in the Collatz itinerary are summed with and without the density of the header itself, the densities are in the ratio of 3:2.  This explains the earlier  (unexpected)  result that the summation over the whole predecessor tree indicates too high a density when the tree is considered as a whole.  But since all the subtree contents under an l.d.a. header node contain duplicates,  the proper set of residues sets to be summed to obtain the unique integer density is the set of  l.d.a.  header nodes.

Also, these blocks representing each particular l.d.a. shows an increasing power of 2 multiplicity of occurences of residue sets deep in the tree. It is this constant doubling which accounts for the high fraction of duplication of member integers in the total subtree dependent from each header node.  Of course,  if the density had been being calculated,  the multiplicity would have had to be taken into account.

All these conclusions from the behavior of the blocks of output from rsetidea runs are uncomplicated by any conversion of d-values to integer densities in the residue sets.  The d-values served to make the qualitative judgments required to focus on the features required in further analysis.

The first four lines of the following table are taken from the four blocks above,  with the rest of the lines derived from subsequent blocks of  the complete sorted output of rsetprog.  When the residue sets of  l.d.a. parents are put in a table (where each subtree is represented solely by its header node),  and again arranged so that the residue sets which share a common d-value are grouped together,  it is seen that the multiplicity of the residue sets sharing a given d-value occur as the integers in the Fibonacci series.

This time we aggregate over actual densities.  All the duplicated integers within l.d.a.s have been winnowed out.  So we use 2/d (as shown in column 5), the density in each subtree from the d-values of the previous table, for each l.d.a.  The integer density of the residue sets multiplied by the multiplicity from the Fibonacci series member which obtains at each level calculates the total density provided at each level.  The infinite summation over these expressions will give the the total density of odd integers in the infinite predecessor tree.

 1     2    3       4         5
l.d.a. c    d    expression density
e      5    8   2^(i-3)/3^i  1/4   F(1)
es     3   16   2^(i-4)/3^i  1/8   F(2)
eb    17   32   2^(i-5)/3^i  1/16  F(3)
ess   23   32   2^(i-5)/3^i  1/16
ebs   11   64   2^(i-6)/3^i  1/32  F(4)
esss  15   64   2^(i-6)/3^i  1/32
esb   25   64   2^(i-6)/3^i  1/32
ebb   65  128   2^(i-7)/3^i  1/64  F(5)
ebss   7  128   2^(i-7)/3^i  1/64
esbs  59  128   2^(i-7)/3^i  1/64
essb  73  128   2^(i-7)/3^i  1/64
essss 95  128   2^(i-7)/3^i  1/64

The known sum(F(i)/2^(i+1),i=1..infinity) is 1,  where F(i)  are the Fibonacci numbers. The densities exhibited by the l.d.a.s are half those of the above expressions, so the total density over all of the l.d.a.s is 1/2, showing that the binary predecessor tree does contain all the odd integers.

To summarize,  the program rsetprog has produced quite a lot of new understanding about the Collatz predecessor tree.  Perhaps foremost is the successful construction by the program of the predecessor tree viewed as consisting of residue sets.  Beyond the production of huge amounts of detailed information about the placement and organization of the integers in the higher portions of the tree by tracing the paths of left descent assemblies upwards,  the program's output proved to contain systematic features from which it was shown using infinite summation first over individual  l.d.a.s to eliminate duplicated integers and secondly over the entire set of l.d.a.s that all the odd positive integers are included in the predecessor tree.  Thus,  the program is tantamount to proof by induction and construction that the predecessor tree is infinite and contains all the odd positive integers.  The program thus completes a proof of the Collatz conjecture.

My Collatz Home Page        Index to Terms Used