Perhaps for no better reason than that for climbing mountains, I determined to write a program which would start from the root of the predecessor tree and construct, given time and space, any arbitrarily large portion of it. Initial attempts to construct the tree in binary form foundered around the problem of the "phantom" nodes introduced by the conversion from the general tree form to the binary form. So I was forced, sensibly enough in retrospect, to develop a program to generate the general tree form of the predecessor tree.

This program differs in a very major way from **treegrow** used to
generate all the l.d.a.s. This program follows paths in their
entirety, ignoring distinctions between headers of
l.d.a.s, internal elements in l.d.a.s and extensions,
while, of course, recognizing that leaf nodes terminate
paths. Its product, therefore, is some selected
portion of the general form of the Collatz predecessor tree.
Readers who feel somehow cheated that the main thrust of this work is
based on l.d.a.s may take some comfort in tracing out, among the
concrete examples which **nbigpath** provides, the relationship
between the general and the abstract forms of the predecessor tree.

Since extensions are not treated in any special way by the recursive
main program, it was necessary to include the extensions of the
root, 1, by calling the recursive routine with 1, then
its successive extensions, as arguments to the recursive
routine. All other extensions are produced as younger children of
their common parent within the recursive routine. Readers should
keep in mind that the program searches out all the **Collatz
predecessors**. The predecessors are depicted as children in the
predecessor tree, so the Collatz itinerary for any given integer
in the tree follows an upward path through its Collatz successors to the
root, 1, of the tree.

After a brief overview, the program itself will be given in
full. The heart of the program is a recursive routine controlled
by global variables to limit the size of integers to be discovered
(and, separately, those whose paths are reported), but
with steps otherwise capable of any depth of growth. Limiting both
printing and searching (by *plimit* and *slimit*,
respectively) are required because the much higher magnitudes
reached in certain paths in Collatz trajectories (as compared to their
starting point) means that those must be explored to much greater depths
than is sought for output to ensure that all desired output integers
will be included.

The program tracks the progress at each level of the developing paths
in terms of the formula 2^*a**3^*b***n + c*
(sometimes expressed as *d*n+c* or even {c[d]}).
Selection of appropriate succssive steps is determined in the long
series of *elif* clauses in the center of the recursive
routine. The local variable *next* keeps track of the
progress of the search for suitable steps to children at each level of
recursion. The superscript *b* increments by 1 for each down
step and the superscript *a* increments by *j* at each down
step in the recursive routine, where *j* is that of the
formula

n_{s} = (3*n_{p}+1)/2^{j}.

When tracing the successive children of a node in the general
tree (i.e. those called extensions in the binary tree
representation) , the variable *i* substitutes for *j*
in a loop across the children up to *slimit* . As
always, Collatz predecessors may arise using *n*,
*n+d*, or *n+2*d* as the point of departure for the next
path element.

Probably, the meanings of most variables will be clear enough
from the above paragraphs, but note that the suffixed *h* is
appended to denote the local (here) value of the several key variables.

The paths traced are expressed in "extended hexadecimal" notation as
indicated in the variable *alphabet*. Thus, the
strings employed to report each path represent the successive values
of *j*.

The series of statements at *#starters* (not a
routine) serve to initialize the various parameters and global
variables which control the behavior of the program.

>nbigstrt:=proc() >local fn,fa,fd; global plimit,slimit; >fopen(`\\maplev4\\mywork\\pathrept.lst`,APPEND,TEXT); >fn:=5; fa:=4; fd:=8; >while fn<=slimit do; >nbigcurs(fn,fa,0,fd,substring(alphabet,fa..fa)); >fn:=4*fn+1; fa:=fa+2; fd:=4*fd; >od; >fclose(`\\maplev4\\mywork\\pathrept.lst`); >end; >nbigcurs:=proc(n_init,a,b,d,path) >global alphabet,plimit,slimit; >local nh,ah,bh,dh,i,j,step,start,next,pathh,pathhe,newn,tn,k; >nh:=n_init; ah:=a; bh:=b; pathh:=path; dh:=d; >next:=1; >if nh mod 3=0 then next:=9; fi; >if nhfprintf(`\\maplev4\\mywork\\pathrept.lst`,`2^%3d*3^%3d c=%8d %st\n`, >a,b,nh,pathh); >fi; >while next<8 do; >if next<=2 and nh mod 3=1 then step:=2; j:=2; start:=nh; next:=3; >elif next<=3 and nh mod 3=2 then step:=1; j:=1; start:=nh; next:=4; >elif next<=4 and nh+dh mod 3=1 then step:=2; j:=2; start:=nh+dh; next:=5; >elif next<=5 and nh+dh mod 3=2 then step:=1; j:=1; start:=nh+dh; next:=6; >elif next<=6 and nh+dh+dh mod 3=1 then step:=2; j:=2; start:=nh+dh+dh; next:=7; >elif next<=7 and nh+dh+dh mod 3=2 then step:=1; j:=1; start:=nh+dh+dh; next:=8; >else next:=9;fi; >if next<=8 then >for i from j by 2 do; >newn:=(-1+2^i*start)/3; >if newn>slimit then RETURN(); fi; >nbigcurs(newn,ah+i,bh+1,dh*2^i,cat(pathh,substring(alphabet,i..i))); >od; >fi; >od; >end; #starters >alphabet:=`123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`; >fclose(`\\maplev4\\mywork\\pathrept.lst`); >stopat(nbigcurs); >plimit=1000002; slimit:=1000000002; >nbigstrt();

Here are the first 27 lines of output produced by the above program with the controlling parameters indicated just above. While the depth reached is pretty large (note the large powers of 2 and 3 quickly reached), the amount of tree traversal intervening between successive output lines is reasonably small. If it were not that a much larger depth for the search is required than that desired for output, the routine would be reasonably efficient.

2^ 4*3^ 1 c= 3 41t 2^ 12*3^ 5 c= 9 432112t 2^ 21*3^ 9 c= 57 4321142212t 2^ 27*3^ 12 c= 135 4321142214211t 2^ 35*3^ 15 c= 1281 4321142214213222t 2^ 44*3^ 19 c= 8097 43211422142132242122t 2^ 48*3^ 20 c= 43185 432114221421322421242t 2^ 55*3^ 23 c= 204729 432114221421322421244212t 2^ 57*3^ 23 c= 818917 432114221421322421244214t 2^ 53*3^ 22 c= 153547 43211422142132242124421t 2^ 59*3^ 25 c= 363963 43211422142132242124423121t 2^ 66*3^ 29 c= 575151 432114221421322421244231232111t 2^ 65*3^ 28 c= 862727 43211422142132242124423123211t 2^ 81*3^ 39 c= 319167 4321142214213224212442312511112122211111t 2^ 80*3^ 38 c= 478751 432114221421322421244231251111212221111t 2^ 79*3^ 37 c= 718127 43211422142132242124423125111121222111t 2^ 84*3^ 40 c= 851113 43211422142132242124423125111121241111112t 2^ 82*3^ 39 c= 638335 4321142214213224212442312511112124111111t 2^ 81*3^ 38 c= 957503 432114221421322421244231251111212411111t 2^ 58*3^ 24 c= 545945 4321142214213224212442312t 2^ 56*3^ 23 c= 409459 432114221421322421244231t 2^ 62*3^ 26 c= 970569 432114221421322421244233112t 2^ 60*3^ 25 c= 727927 43211422142132242124423311t 2^ 55*3^ 22 c= 614189 43211422142132242124423t 2^ 52*3^ 21 c= 230321 4321142214213224212442t 2^ 54*3^ 21 c= 921285 4321142214213224212444t 2^ 50*3^ 20 c= 172741 432114221421322421244t

Even though the search depth was 1000 times greater than the print
depth in the above run, only 499891 lines of output
resulted. The missing integers less than one million were
identified and their paths traced. In every case there was at
least one integer in the path greater in magnitude than 1000000002,
indicating that, had the *slimit* been larger,
all 500000 expected lines of output would have been produced.
The integer 704511 was the one which would require the largest
*slimit* value, 18997161173, nearly a full 19000 times
larger than the *slimit* value employed for the run. This
experience is reflective of the extremes of magnitude which can be
visited in some few of the paths, and illustrates the
impracticality of building a predecessor tree of any significant size
using this routine. Nevertheless, it appears certain that
the program works correctly to generate all possible paths never
exceeding the range granted it.

Similar situations where paths wander to much larger integers than their departure integer exist at much smaller magnitudes, e.g., the path from 27 traverses 3055 and those from 73 and 97 both traverse 3077.

Another program, **makepath**, which traces out
specified paths in the predecessor tree rather than attempting the whole
infinite tree, has been developed and
applied to a few interesting cases. Use of this program to
produce numeric values in certain extreme examples of trajectories led
to a requirement for a graphic visualization of
Collatz trajectories by **npthtrk** to gain a better feeling for
them. It proved possible to create plots very quickly of any Collatz
trajectory, some of which are striking in their appearance,
using graphing facilities of Maple 10.

First we want to record that the size of the **nbigpath** program
(including the few phrases which initiate it but without leading
blanks and blank lines) is 813 bytes or 6504 bits. It could
be represented in fewer bits by use of some representation using fewer
bits per character or by use of some compression algorithm. This
program contains all the information required to generate the entire
general predecessor tree (though it admittedly does not possess a
large enough alphabet to represent paths containing more than 61
successive divisions by 2). The above sample of output makes it
quite clear that it will take a very short incursion into the tree to
produce enough paths to give a total size very much larger than that
1000 bytes or so, even recognizing that the conversion of those
path representations into the more usual parity vector representation of
the paths will probably reduce the space required to represent them.

This fact implies that the Collatz iteration is computationally
reducible, i.e. that its information content can be
represented by a program in much less space than that required to
represent the totality of the Collatz iterations.

Further, a detailed
discussion of definitions and the P vs. NP problem is
available. Given time, any path in the Collatz tree will be
produced by the **nbigpath** program, yet, once
identified by its terminal integer, it is simple to verify that
the path is indeed in the tree because that path leads to 1. That
puts generation of the Collatz graph in the class of NP problems.
What ever grist there is to grind from that is left to others.