Program Nbigpath -- Generating the General Predecessor Tree

Perhaps for no better reason than that for climbing mountains, I determined to write a program which would start from the root of the predecessor tree and construct,  given time and space,  any arbitrarily large portion of it.  Initial attempts to construct the tree in binary form foundered around the problem of the "phantom" nodes introduced by the conversion from the general tree form to the binary form.  So I was forced,  sensibly enough in retrospect,  to develop a program to generate the general tree form of the predecessor tree.

This program differs in a very major way from treegrow used to generate all the l.d.a.s.  This program follows paths in their entirety,  ignoring distinctions between headers of   l.d.a.s,  internal elements in l.d.a.s and extensions,  while,  of course,  recognizing that leaf nodes terminate paths.  Its product,   therefore,  is some selected portion of the general form of the Collatz predecessor tree.  Readers who feel somehow cheated that the main thrust of this work is based on l.d.a.s may take some comfort in tracing out,  among the concrete examples which nbigpath provides,  the relationship between the general and the abstract forms of the predecessor tree.

Since extensions are not treated in any special way by the recursive main program,  it was necessary to include the extensions of the root,  1,  by calling the recursive routine with 1,  then its successive extensions,  as arguments to the recursive routine.  All other extensions are produced as younger children of their common parent within the recursive routine.   Readers should keep in mind that the program searches out all the Collatz predecessors.  The predecessors are depicted as children in the predecessor tree,  so the Collatz itinerary for any given integer in the tree follows an upward path through its Collatz successors to the root,  1,  of the tree.

After a brief overview,  the program itself will be given in full.   The heart of the program is a recursive routine controlled by global variables to limit the size of integers to be discovered  (and,  separately,  those whose paths are reported),  but with steps otherwise capable of any depth of growth.  Limiting both printing and searching   (by plimit and slimit,  respectively)  are required because the much higher magnitudes reached in certain paths in Collatz trajectories (as compared to their starting point) means that those must be explored to much greater depths than is sought for output to ensure that all desired output integers will be included.

The program tracks the progress at each level of the developing paths in terms of the formula 2^a*3^b*n + c  (sometimes expressed as d*n+c or even {c[d]}).   Selection of appropriate succssive steps is determined in the long series of elif clauses in the center of the recursive routine.  The local variable next keeps track of the progress of the search for suitable steps to children at each level of recursion.  The superscript b increments by 1 for each down step and the superscript a increments by j at each down step in the recursive routine,  where j is that of the formula
ns = (3*np+1)/2j
When tracing the successive children of a node in the general tree  (i.e.  those called extensions in the binary tree representation) ,  the variable i substitutes for j in a loop across the children up to slimit .  As always,  Collatz predecessors may arise using nn+d,  or n+2*d as the point of departure for the next path element.

Probably,  the meanings of most variables will be clear enough from the above paragraphs,   but note that the suffixed h is appended to denote the local (here) value of the several key variables.

The paths traced are expressed in "extended hexadecimal" notation as indicated in the variable alphabet.   Thus,  the strings employed to report each path represent the successive values of  j.

The series of statements at #starters  (not a routine)  serve to initialize the various parameters and global variables which control the behavior of the program.


 >nbigstrt:=proc()
   >local fn,fa,fd; global plimit,slimit;
   >fopen(`\\maplev4\\mywork\\pathrept.lst`,APPEND,TEXT);
   >fn:=5; fa:=4; fd:=8;
   >while fn<=slimit do;
     >nbigcurs(fn,fa,0,fd,substring(alphabet,fa..fa));
     >fn:=4*fn+1; fa:=fa+2; fd:=4*fd;
   >od;
   >fclose(`\\maplev4\\mywork\\pathrept.lst`);
 >end;

 >nbigcurs:=proc(n_init,a,b,d,path)
   >global alphabet,plimit,slimit;
     >local nh,ah,bh,dh,i,j,step,start,next,pathh,pathhe,newn,tn,k;
       >nh:=n_init; ah:=a; bh:=b; pathh:=path; dh:=d;
       >next:=1;
         >if nh mod 3=0 then next:=9; fi;
           >if nhfprintf(`\\maplev4\\mywork\\pathrept.lst`,`2^%3d*3^%3d c=%8d %st\n`,
             >a,b,nh,pathh);
           >fi;
       >while next<8 do;
         >if next<=2 and nh mod 3=1 then step:=2; j:=2; start:=nh; next:=3;
           >elif next<=3 and nh mod 3=2 then step:=1; j:=1; start:=nh; next:=4;
             >elif next<=4 and nh+dh mod 3=1 then step:=2; j:=2; start:=nh+dh; next:=5;
             >elif next<=5 and nh+dh mod 3=2 then step:=1; j:=1; start:=nh+dh; next:=6;
           >elif next<=6 and nh+dh+dh mod 3=1 then step:=2; j:=2; start:=nh+dh+dh; next:=7;
         >elif next<=7 and nh+dh+dh mod 3=2 then step:=1; j:=1; start:=nh+dh+dh; next:=8;
         >else next:=9;fi;
         >if next<=8 then
           >for i from j by 2 do;
             >newn:=(-1+2^i*start)/3;
             >if newn>slimit then RETURN(); fi;
             >nbigcurs(newn,ah+i,bh+1,dh*2^i,cat(pathh,substring(alphabet,i..i)));
           >od;
         >fi;
       >od;
     >end;

 #starters
   >alphabet:=`123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz`;
   >fclose(`\\maplev4\\mywork\\pathrept.lst`);
     >stopat(nbigcurs);
       >plimit=1000002; slimit:=1000000002;
       >nbigstrt();

Here are the first 27 lines of output produced by the above program with the controlling parameters indicated just above.  While the depth reached is pretty large  (note the large powers of  2 and  3 quickly reached),  the amount of tree traversal intervening between successive output lines is reasonably small.  If it were not that a much larger depth for the search is required than that desired for output,  the routine would be reasonably efficient.

2^  4*3^  1 c=       3 41t
2^ 12*3^  5 c=       9 432112t
2^ 21*3^  9 c=      57 4321142212t
2^ 27*3^ 12 c=     135 4321142214211t
2^ 35*3^ 15 c=    1281 4321142214213222t
2^ 44*3^ 19 c=    8097 43211422142132242122t
2^ 48*3^ 20 c=   43185 432114221421322421242t
2^ 55*3^ 23 c=  204729 432114221421322421244212t
2^ 57*3^ 23 c=  818917 432114221421322421244214t
2^ 53*3^ 22 c=  153547 43211422142132242124421t
2^ 59*3^ 25 c=  363963 43211422142132242124423121t
2^ 66*3^ 29 c=  575151 432114221421322421244231232111t
2^ 65*3^ 28 c=  862727 43211422142132242124423123211t
2^ 81*3^ 39 c=  319167 4321142214213224212442312511112122211111t
2^ 80*3^ 38 c=  478751 432114221421322421244231251111212221111t
2^ 79*3^ 37 c=  718127 43211422142132242124423125111121222111t
2^ 84*3^ 40 c=  851113 43211422142132242124423125111121241111112t
2^ 82*3^ 39 c=  638335 4321142214213224212442312511112124111111t
2^ 81*3^ 38 c=  957503 432114221421322421244231251111212411111t
2^ 58*3^ 24 c=  545945 4321142214213224212442312t
2^ 56*3^ 23 c=  409459 432114221421322421244231t
2^ 62*3^ 26 c=  970569 432114221421322421244233112t
2^ 60*3^ 25 c=  727927 43211422142132242124423311t
2^ 55*3^ 22 c=  614189 43211422142132242124423t
2^ 52*3^ 21 c=  230321 4321142214213224212442t
2^ 54*3^ 21 c=  921285 4321142214213224212444t
2^ 50*3^ 20 c=  172741 432114221421322421244t

Even though the search depth was 1000 times greater than the print depth in the above run,   only 499891 lines of output resulted.  The missing integers less than one million were identified and their paths traced.   In every case there was at least one integer in the path greater in magnitude than 1000000002,   indicating that,   had the slimit been larger,   all 500000 expected lines of output would have been produced.   The integer 704511 was the one which would require the largest slimit value,  18997161173,   nearly a full 19000 times larger than the slimit value employed for the run.   This experience is reflective of the extremes of magnitude which can be visited in some few of the paths,   and illustrates the impracticality of building a predecessor tree of any significant size using this routine.   Nevertheless,   it appears certain that the program works correctly to generate all possible paths never exceeding the range granted it.

Similar situations where paths wander to much larger integers than their departure integer exist at much smaller magnitudes,   e.g., the path from 27 traverses 3055 and those from 73 and 97 both traverse 3077.

Another program,  makepath,   which traces out specified paths in the predecessor tree rather than attempting the whole infinite tree,  has been developed and applied to a few interesting cases.  Use of this program to produce numeric values in certain extreme examples of trajectories led to a requirement for a graphic visualization of Collatz trajectories by npthtrk to gain a better feeling for them. It proved possible to create plots very quickly of any Collatz trajectory,  some of which are striking in their appearance,  using graphing facilities of Maple 10.

What are the big picture implications?

First we want to record that the size of the nbigpath program  (including the few phrases which initiate it but without leading blanks and blank lines)   is 813 bytes or 6504 bits.  It could be represented in fewer bits by use of some representation using fewer bits per character or by use of some compression algorithm.   This program contains all the information required to generate the entire general predecessor tree  (though it admittedly does not possess a large enough alphabet to represent paths containing more than 61 successive divisions by 2).  The above sample of output makes it quite clear that it will take a very short incursion into the tree to produce enough paths to give a total size very much larger than that 1000 bytes or so,  even recognizing that the conversion of those path representations into the more usual parity vector representation of the paths will probably reduce the space required to represent them.

This fact implies that the Collatz iteration is computationally reducible,  i.e.  that its information content can be represented by a program in much less space than that required to represent the totality of the Collatz iterations.  Gregory Chaitin's information theoretic worldview on computational complexity,  which is the basis of the notion of computational reducibility,   and Wolfram's definition and illustration with cellular automata of the term are both available on the 'net for study.  A further article by Craig Alan Feinstein for the general audience about computational complexity in which he includes a version of his proof of unprovability of the Collatz conjecture brings the issue home to the Collatz conjecture.   There is little point in reproducing these ideas here.  Perhaps nothing more is being said by its computational reducibility than that the Collatz tree is completely deterministic,  no matter how random any individual itinerary may appear to be to the unaided eye.

Further,  a detailed discussion of definitions and the P vs. NP problem is available.  Given time, any path in the Collatz tree will be produced by the nbigpath program,  yet,  once identified by its terminal integer,  it is simple to verify that the path is indeed in the tree because that path leads to 1.  That puts generation of the Collatz graph in the class of NP problems.  What ever grist there is to grind from that is left to others.


My Collatz Home Page       Index to Terms Used