Two Paradoxes Encountered in this Investigation of the Collatz Conjecture

Paradox Involving the Number of Even Integers in the Collatz Predecessor Tree

Since one of the characteristic features encountered when dealing with infinite sets is the appearance of paradoxes, it would be surprising if this work did not encounter some paradox.   We have such a paradox arising from a consideration of the relative size of the infinities of the odd and the even numbers which appear in the Collatz predecessor tree. 

On the one hand,  there must be an equal number of odd and even integers in the Collatz predecessor tree for the conjecture to be true because there are clearly an equal number of odds and evens among the integers.  (There's a 1:1 mapping of each odd number with its immediately larger even number.) 

On the other hand,  consider the predecessor tree,  written to include the even integers as well as the odd.   Every odd integer in the tree has an infinity of powers of 2 multiples of it stretching linearly off to infinity.   The itineraries from odd integers to the root contain more even numbers than odd. (Extensions involve at least 3 evens before reaching an odd, and within l.d.a.s there are always one or two even numbers between two odd ones.) Even if there were paths from an odd integer to the root which contained an excess of odd over even numbers,  any path to the root through it from infinity contains more even than odd integers for the simple reason that no matter how large may be the excess of odd over even integers in that portion of the path,  that excess is easily overwhelmed by the infinity of products by powers of 2.  Every odd integer bears its own infinite path segment consisting of its own multiples by successive powers of two,  so every branch and twig of the all-odd tree must exhibit this excess of even over odd numbers in the even-and-odd tree.   Thus, it seems the aggregate of all these paths, the entire tree, must contain more even than odd integers.

That's a solid paradox.  Both results cannot be true.  It smells of the well known paradox(es) about the frog jumping halfway out of the well at each leap or of the runner running second who keeps making up half of the distance by which he is behind.  But it is inside out -- instead of involving smaller and smaller increments of distance at each stage of the proceedings, it uses larger and larger leaps through the linear space of large numbers as it proceeds.

Can this paradox, like many of those classical ones, be resolved by resort to a different metric of the contents of the infinite sets involved?  Again, I come to the metric, density, the fraction of the integers in some infinite set under inspection as compared to the totality of integers.  These fractions get smaller and smaller as the gaps between powers of 2 get larger and larger, which might at first seem to be a dangerous parallel to the frog and the runner paradoxes.   The densities will be very very small in some sets containing the integers of any given path in the Collatz predecessor tree.  But that mustn't bother us because we're just going to sum them all up without getting dirty among the numbers.

First, remember that any odd integer of the odd-only version of the predecessor tree may be characterized uniquely by some expression dn+c, where d is of the form 2a*3b and may be dismayingly large if the integer is at the end of a lengthy path.  The density of the integers in that set is 1/d and its elements occur at a c offset from each multiple d*n.  Every instantiation of the elements of that set has its own set of multiples by successive multiplication by 2 and the corresponding elements occur with density 1/2d, 1/4d, 1/8d, ... 1/2n*d in that set of sets consisting of the individual successive instantiations by successive powers of 2.  We can calculate the density of integers of those sets because the sum of 1/2i, i from 1 to infinity, is 1.  So the sum of 1/2i*d for i from 1 to infinity is 1/d, exactly equaling the density contributed by the odd number itself. 

You might at first think we are counting even numbers twice, since they appear between elements of the  (all odd)  l.d.a.s and also as elements of the powers of 2 multiples of the odd numbers.   Not so.   The reckoning of the density of the odd numbers in the odd-only version of the predecessor tree is based entirely on the formulas describing all the odd numbers -- the even numbers are completely ignored there.  In the current reckoning of the density of the even numbers, they are counted only in their role as some power of two multiple of an odd number.   Their role as intermediate values in the l.d.a.s is again completely ignored.  Since we've counted the even numbers in only one of their roles,  and the odd numbers only as elements in the  (all odd)  abstract predecessor tree,  there has been no multiple counting here. (And,  yes,  it is true that in the above statement of the paradox the even numbers in the trajectories are counted twice.)

Since the sum of the densities among all the positive integers of the odd numbers in the Collatz predecessor tree is 1/2,  when we add in the equal-sized contributions of the series of powers of 2 multiples arrayed from each odd integer, we get exactly 1.  That is,  there is really no paradox, and all the even as well as all the odd integers are included in the Collatz tree which includes both even and odd integers.

Paradox Involving Set Densities of L.D.A. Header and Leaf Values

There is an interesting paradox to be observed here.   Two different ways of looking at the path segments which the l.d.a.s represent give two different ideas of the relationship between the magnitudes of their header and leaf nodes.

We have argued that the l.d.a.s in the abstract predecessor tree all terminate in a leaf node on the basis of an infinite sum of the fractions of  l.d.a.s ending in leaf node sets as the tree is traversed level by level.   However,  the density of the header extensions  (congruent to 5 mod 8)  among the odd integers is 1 in 4 and the density of the leaf nodes  (congruent to 0 mod 3)  among the odd integers is 1 in 3.   This should cause higher numbers to be reached faster among the headers than among the leaf nodes.  It would appear that,  on the average,   the l.d.a.s would have to trend from higher-valued headers to lower-valued leaf nodes,  i.e.  the l.d.a.s have to reach leaf nodes at a magnitude averaging 4/3 the value of the heading extensions.   The following discussion shows that this is not true.

By employing the formulas derived for the elements of the individual l.d.a.s,   selecting those for the extension header and the corresponding leaf node,   we can get a direct comparison of the magnitudes of the header and leaf nodes for any l.d.a. we might choose.   Of course,  for l.d.a.s in which s steps predominate the magnitudes of the leaves will be smaller than the header,  and,  for l.d.a.s in which b steps predominate,  larger.   We'll need to average the l/h ratio (i.e. leaf value divided by the header value) over the l.d.a.s of a given length to allow for this effect.

For large values of n in the formulas for the header and leaf nodes of a l.d.a.,  the dn term of the dn+c formula overwhelms the contribution of the c term,  especially since d is always larger than c.   Hence we can approximate the ratio of the l element to the h element by using the ratio of the d values in the formulas which describe the leaf and header values.   The result below will strictly apply only in the asymptote as larger and larger values of n in dn+c are applied.

For the empty left descent,   the header is the leaf node,   so the l/h ratio is inevitably 1.0000.

For the left descents with a single step,

      s:  24*3/(23*32) = 2/3
      b:  25*3/(23*32) = 4/3 
whose average is 1.0000.

For the left descents with two steps,

     ss:  25*3/(23*33) = 4/9        weight 1
    (sb): 26*3/(23*33) = 8/9        weight 2
     bb:  27*3/(23*33) = 16/9       weight 1 
whose (weighted) average is ((4+2*8+16)/9)/4 = (36/9)/4 = 1.0000.

For the left descents with three steps,

     sss:  26*3/(23*34) = 8/27      weight 1
    (ssb): 27*3/(23*34) = 16/27     weight 3
    (sbb): 28*3/(23*34) = 32/27     weight 3
     bbb:  29*3/(23*34) = 64/27     weight 1  
whose weighted average is ((8+3*16+3*32+64)/27)/8 = (216/27)/8 = 1.0000.

Each additional level in the abstract predecessor tree multiplies the numerator in the sum of the fractions by 6,   the denominator in the numerator by 3 and the final denominator by 2.   Thus,   the (weighted) average l/h ratio for every set of l.d.a.s of a given length remains at 1.0000 for all l.d.a. lengths.

The l/h ratio,  as approximated by ignoring the +1 of the 3n+1 process   (which is the source of all the c values),  is determined entirely by the s and b steps in the l.d.a. so there is no trend away from 1.000 as larger and larger values of n appear in this description of the Collatz trajectory.

The deviations in l/h  for n=0 and small values of d due to the effect of the +1 added in in the Collatz sequence are small.  The largest deviations from this effect occur in the est l.d.a. with 3/5 (vs. 2/3),  in the essst with 15/53 (.2830 vs. .2963) and in the ebssbt l.d.a. with 9/13 (.6923 vs.   .7901).  The other differences at n=0 are <0.02. It would appear that this effect of the +1 in the iteration cannot explain the whole difference between 1 and 4/3 as the two estimates for the l/h ratio.

The contradiction between the notion that the value of  l.d.a. headers should be 4/3 the value of their leaf nodes  (derived from the densities of the infinite sets which represent them)  and the observation that the ratio of header to leaf values is very close to 1.00  (derived directly from the l.d.a. element formulas)  seems like a paradox on first encounter.

However,  it is fairly typical of the paradoxes which often arise when infinite sets are involved.  From that point of view,  it was almost to be expected.  Indeed,  this paradox is closely akin to two (out of many) cases encountered by Galileo.  All three involve a conflict between infinite set sizes and some intuitive notion of the space the items in the set should collectively occupy.

The worrisome feature in this particular paradox was that it arose in spite of use of densities to calculate the fraction of coverage of the odd positive integers in the predecessor trees.  Our position has been that the use of densities usefully avoids paradoxes which arise from confusing two different metrics of infinity, so this should not have arisen!

It turns out that this is not a paradox.   Just a mistake!  The ratio of header to leaf values when calculated across the set of l.d.a.s at a given level in the abstract predecessor tree fails to take account of the lopsided character of that tree with respect to the powers of 2 reached in the d value of the dn+c formulas developed from the abstract tree.  It underweights those l.d.a.s with a large share of b steps whose header/leaf value ratios are heavily tipped toward values less than 1.  What if the header/leaf ratios are calculated on the basis of powers of 2 rather than depth in the abstract tree?

I wrote a little Maplev4 program to answer that question,  following the accumulating header/leaf ratio through l.d.a.s through 225.  Recall that to keep a balance (since 3 in 24 odd integers are headers at a given level, but 4 in 24 odd integers are leaves),  the header/leaf value ratio must be 3/4.  A selection of the results from this program is presented just below.  The point being made here is that the column headed by "thds/tlds"  (i.e. total header density sum/total leaf density sum)  and with an asterisk at the foot seems well on the way to approaching 0.75,  having started at 1.00 with the single-element l.d.a.s in which the single element is both the header and the leaf node.

Further details about the table are given below it.

 ns  nb  la  lb         ld    ha  hb              hd         m    tden        hden
 p2    hdensum   ldensum     hds/lds  thds/tlds   gtotsum
  1,  0,  4,  1,          48,  3,  2,              72,       1,.020833333,.013888888

  2,  0,  5,  1,          96,  3,  3,             216,       1,.010416666,.004629629
  0,  1,  5,  1,          96,  3,  2,              72,       1,.010416666,.013888888

  3,  0,  6,  1,         192,  3,  4,             648,       1,.005208333,.001543209
  1,  1,  6,  1,         192,  3,  3,             216,       2,.010416666,.009259259

  4,  0,  7,  1,         384,  3,  5,            1944,       1,.002604166,.000514403
  2,  1,  7,  1,         384,  3,  4,             648,       3,.007812500,.004629629
  0,  2,  7,  1,         384,  3,  3,             216,       1,.002604166,.004629629

  5,  0,  8,  1,         768,  3,  6,            5832,       1,.001302083,.000171467
  3,  1,  8,  1,         768,  3,  5,            1944,       4,.005208333,.002057613
  1,  2,  8,  1,         768,  3,  4,             648,       3,.003906250,.004629629


The table contains two kinds of lines, detail lines and summary lines, for each power of 2 reached by the program. Detail lines are given only for powers of 2 from 1 to 5, but were elided for higher powers.  The column headers tell the nature of the contents of the various columns,  with the upper line describing the detail lines,  and the lower one the summary lines,  as follows.
Detail lines
-- ns and nb indicate the number of s and b steps in the l.d.a.  The sum,  ns + 2nb,  must equal p2,  and the detail lines include all possible ways of achieving l.d.a.s of that p2 size.
-- la and ha give the value of a of the 2a*3b description of d of the dn+c formulation for the l.d.a.s for leaf and header,  respectively.
-- lb and hb give the value of b......for leaf and header,  respectively.
-- ld and hd give the value of d......for leaf and header,  respectively.
-- m gives the multiplicity  (obtained from the binomial coefficient)  of that detail line's contribution to the accumulating running sums.
-- lden and hden give the density of positive integers for leaf and header,  respectively,  calculated from m/d.   These were initialized to 1/24 to account for the l.d.a.s consisting of a single element  (containing both header and leaf nodes).
Summary lines
-- p2 gives the power of 2 in this (group of) line(s).
-- hdensum and ldensum give the contribution at this level of p2 of all the l.d.a.s in this p2 group for the headers and leaves,  respectively.
-- hds/lds gives the head/leaf value ratio for this power of 2.
-- thds/tlds give the head/leaf value ratio for the total over the powers of 2 so far calculated.   This value is the point of the exercise.
-- gtotsum is the grand total of all the densities across the single-element and all subsequent l.d.a.s.  This was initialized to 1/12 to account for the l.d.a.s consisting of a single  (both header and leaf)  nodes.

Any number of features may be pointed out.
(1) The m values are the number of ways that ns and nb steps in an l.d.a. can be arranged to make distinct l.d.a.s
(2) The density of integers contributed to the total density decreases as p2 increases  (see hdensum and ldensum).
(3) The header/leaf ratio at each value of p2 decreases as p2 increases  (see hds/lds).
(4) The header/leaf ratio from the sums accumulated over all the p2 values to date decreases slowly from its initial value of 1 for the single-element l.d.a.s,  appearing to approach the expected value of 3/4  (see thds/tlds).
(5) The total number of header and leaf set densities is not going to approach  0.5 (the density of the odd integers)  because all the nodes in the l.d.a.s other than the header and leaf nodes are not accumulated herein.

This recalculation of the leaf/header value ratios for the l.d.a.s at successive powers of 2 in the predecessor tree gives a convincing numerical demonstration that no paradox appears here.  Using a density metric to measure the contribution of all the infinite sets of header and leaf nodes suffices to expose the apparent paradox as a mere mistake.

My Collatz Home Page       Index to Terms Used