Although a great deal had been learned (October 2007) about the structure of the Collatz iteration, some features seemed to require additional work.
One such was my continued worry that some readers would not be satisfied that the residue sets completely fill the predecessor tree. In addition to the infinite nature of the tree, the fact that only the first instantiation of each residue set explicitly occurs in it makes it difficult to locate any sought residue set. Recall the development of the abstract tree in a top down manner followed by an upward portion. These examples identified few deeper residue sets due to their limited depth. The need for a complete demonstration of residue set deployment in the binary tree helped motivate this additional work.
Another compelling notion is that, if the residue sets were shown to contain all the odd positive integers, completely filling the infinite space of the abstract tree, it would follow that the inductive constructive proof of the Collatz conjecture would be founded on residue sets rather than individual integers.
A page which gives a fuller background for the following discussion is available.
To provide the data for such detailed analysis a program was written which performs the downward and upward development of the abstract tree while outputting a data set of all the residue sets visited.
The density of the odd positive integers in the residue sets is employed to measure the completeness with which their contents fill the abstract predecessor tree. The density of odd integers in the residue sets {c[d]} is 2/d, but this relationship is only applied in the very last step of the following development. Thus, the correctly totalled density will be 1/2.
Of course, the infinite size of the Collatz predecessor graph (coupled with the finite nature of the computational environment) might place a serious limitation on the precision available from the accumulated densities. One way to overcome this might take snapshots of the results of the rsetidea program through a series of increasing depths (which results in improving precisions as a larger and larger fraction of the decreasingly dense l.d.a.s in the predecessor tree are included), and then extrapolate the answers from the series of increasingly larger runs. This approach need not be fully explored because the desired proof that the tree's contents contain all the odd positive integers proved possible by closed infinite summation. But it did provide the unexpected result that there is replication of some integers among all the residue sets, resulting in double counting and the resulting over-count.
The structure of the abstract predecessor tree in its simplest form hides its intrinsic complexity. A fairly schematic picture, without identifying the edges of the tree, shows that successive levels of elaboration of the predecessor tree result in production of smaller and smaller (i.e. sparser and sparser) sets as the tree is developed.
A picture of a small fragment of the abstract tree illustrates the difficulty of assigning {c[d]} values to the nodes of the abstract tree because any one represents a stage at its level in the development of an infinity of ever-longer l.d.a.s which are dependent from it. As a result, numerous residue sets may be cited for every node in an l.d.a. to specify individual residue sets which lead to each deeper one in growing l.d.a.s and leaf nodes. This picture, though complex, may be helpful in understanding the concepts introduced above and illustrated in tables and descriptions of the output of the rsetidea program to follow.
Note that leaf nodes are not given individual recognition in the binary tree representation of the predecessor graph. Each node is taken to represent a leaf node at its level as well as the internal nodes in all dependent l.d.a.s. Leaf nodes are represented in the total density calculation through their presence at their parent's level which, as we shall see, is true of the density of all dependent nodes within each l.d.a.'s subtree.
The output from rsetidea is both lengthy and complex, requiring two separate stages of aggregation to determine the total odd integer density. First, the contribution of the densities of every l.d.a. is obtained and then a final summation across the infinite set of l.d.a.s gives the sought result.
The raw rsetidea output, when sorted appropriately, gives a series of blocks of output lines. (See below.) The entire output from rsetidea consists of residue sets {c[d]}, i.e. integers congruent to c modulo d. Each block bears at the top the residue set of an l.d.a. header. Under that are the layers showing the dependent residue sets of the succeeding generations.
That common header is given in the first line and repeated in the first three columns of each entry in each block. The descendent l.d.a.s are given in column four with each one's {c[d]} values in columns five and six. Thus the values ascribed to each parent of the traversal appear side-by-side with those of the successive elements of its dependents on each output line. The header d-values reflect only the number of powers-of-two reached. The dependent subsets' d-values also include one power of 3 for each step traversed in the developing l.d.a. The fact that all those sparser residue sets are subsets of the parental residue set is evident from the fact that c(up) mod d (down) equals c (down) in every branch of the tree under a header node. This fully explains the observation that the sum of the densities produced by the program indicated replication of integers among the residue sets.
Each of the four blocks presented show four levels of subtrees headed by the four densest l.d.a.s of the predecessor tree. The d-values of the residue set elements (column 6) within each left descent assembly in every case appear in expanding multiple instances shared among the subsets in numbers which go as successive powers of 2, i.e. 1, 2, 4, 8, ... All possible patterns of the "b" and "s" steps are exhibited (column 4) in each group. These patterns continue exactly until the program parameter prevents higher residue sets, thus confirming the accuracy of the program's behavior. (Further examples are far too large to exhibit here.)