Curve fitting--part 6.

S. Arlinghaus

1.  VARIANTS ON EARLIER CURVE FITTING 

     A.  The exponential curve

Previous curve fitting efforts dealt with using least squares analysis to
fit a straight line,
an exponential, or a logarithmic function to a distribution of data.  

In the case of the exponential, the previous work assumed that the curve
eventually settled down to
the x-axis, the line y=0.  The x-axis need not be the horizontal asymptote
of this curve, however.

Consider the following example and fit two exponentials--one with
horizontal asymptote y=0 and one with
     a different horizontal asymptote.

The general form of the curve is:                                                               

y=Ce^(ax) + b                                                           

where a < 0 and y=b is the lower bound of the exponential; C is a
constant.                                                               

The added term shows how much the curve is lifted above or below the
x-axis.                                                         

Exponential curve fit to projected crude birth data, 2005 to 2025                                                               
Source:  WRD data for Bangladesh.                                                               
                                        y=0                     y=4
Year    WRD proj        exp. proj.      exp. proj.              WRD proj
y-4=Ce^(ax)
                2005    2005            LN(y-0)                 LN(y-4)
                y=0     y=4                                     

2005    5.97184 5.85017 5.9407          1.78706                 0.67897
2010    5.53296 5.5026  5.48587         1.71072                 0.4272
2015    5.07485 5.17568 5.13763         1.6243                  0.07218
2020    4.84262 4.86818 4.87101         1.57746                 -0.1712
2025    4.69974 4.57895 4.66688         1.54751                 -0.357

                                        -0.0122                 -0.0534
                                        26.3277                 107.75

                                        ln y = -0.01225x+26.32772
ln(y-4) = -0.05341x+107.7501
                                        y=exp(-0.01225x+26.32772)
y-4 = exp(-0.05341x+107.7501)

y=exp(-0.05341x+107.7501)+4

The value of y=4 as a different lower bound was suggested by the WRD data.                                                              

Graph:                                                          

















It appears that the WRD projection, while exponential in general shape,
may have been made using lines of different slopes joined at 2015 (3
above).
The exponential that has y=4 as a horizontal asymptote appears to be
closer to the
criteria used to make forecasts than does the exponential with y=0 as a
horizontal asymptote.

     B.  The logistic curve (variant).

     One variant of the logistic curve, in which the S-shape appears
flatter is the Gompertz curve;
it is used to model growth of various kinds, from financial to population.
The reason the 
curve is flatter becomes evident when the logistic equation is written as
a differential equation,
dP/dt = P(a-b*P), and the Gompertz is also written in an equivalent
manner, as 
dP/dt = P(a-b*ln P)
     the logarithmic factor tends to flatten out the curve and make the
S-shape less curved than would
     a logistic fit.

General form for the Gompertz curve:                                                                    

y=q*e^((-ce)^(-bx))                                                                     

where q is selected prior to making any analysis and is the value of the
upper bound selected by                                                                 
the user on carrying capapcity or other bases, and b and c are constants
to be determined depending                                                                      
on the values selected for q and the beginning and ending times chosen.
There are numerous equivalent forms.                                                                    

Example.  Gompertz curve fit to WRD data from 1955-2025, Bangladesh total
population.                                                                     

Year    Pop. mil.       q=300   q=300           Logistic
Gompertz
1955-2025       WRD     logistic        Gompertz
y=q/(1+ae^(bx)), b<0                            y=q*e^((-ce)^(-bx))
5 yr interval                                                                   
0       45.486  45.486  45.486          y=300/(1+ae^(bx))
y=300*e^((-ce)^(-bx))
1       51.419  54.4071 58.7728                                         
2       58.312  64.6335 73.3423         Find a:
Find c:
3       66.671  76.1858 88.8109           In 1955, t=0, y=45.486.
In 1955, when t=0, y=45.486
4       76.582  89.022  104.782         Thus, 45.486 = 300/(1+a)
Thus, 45.486 = 300*e^((-ce)^(-b0))
5       88.219  103.025 120.879         Solving, a=5.595435
Solving, c=ln(300/45.486)=1.8863779
6       101.147 117.999 136.767         y=300/(1+5.595435e^(bt)
y=300*e^((-1.8863779e)^(-bx))
7       115.593 133.673 152.17                                          
8       132.219 149.716 166.87          Find b.  use info. from t=14 in
2025                            Find b.  use info. from t=14 in 2025
9       150.589 165.765 180.711           In 2025, t=14, y=234.987.
In 2025, t=14, y=234.987
10      170.138 181.457 193.594         Thus,
234.987=300/(1+5.595435e^(14b))                         Thus,
234.987=300*e^(-1.8863779e)^(-14b))
11      188.196 196.461 205.464         Solving, b=-0.21477.
Solving, b=1/14 * ln(0.2442523/1.8863779)
12      204.631 210.503 216.306
=0.1460152
13      220.119 223.382 226.135         Logistic equation:
Gompertz equation:
14      234.987 234.981 234.987           y=300/(1+5.595435e^(-0.21477t))
y=300*e^((-1.8863779e)^(-0.1460152x))


Graph:                                                                  

















The two fits are quite far (relatively speaking) from the actual data in
1990.                                                                   
One might consider, therefore, using 1990 as the endpoint and
extrapolating beyond that.                                                                      
The procedure is the same as above--just use 1990 instead of 2025 as the
endpoint.                                                                       

Logistic                                Gompertz                                        

The value for a is the same as above.                           The value
for a is the same as above.                                     

y=300/(1+5.595435e^(bt)
y=300*e^((-1.8863779e)^(-bx))                                   

Find b.  use info. from t=7 in 1990                             Find b.
use info. from t=7 in 1990                                      
  In 1990, t=7, y=115.593.                                In 1990, t=7,
y=115.593                                       
Thus, 115.593=300/(1+5.595435e^(7b))                            Thus,
115.593=300*e^(-1.8863779e)^(-7b))                                      
Solving, b=-0.17927                             Solving, -b=1/7 *
ln(0.95371/1.8863779)                                   -0.9537
-0.1793                                       = -0.0974                         
Logistic equation:                              Gompertz equation:                                      
  y=300/(1+5.595435e^(-0.17927t))
y=300*e^((-1.8863779e)^(-0.09744x))


Year    Pop. mil.       q=300   q=300   
1955-2025       WRD     logistic        Gompertz        
5 yr interval                           
0       45.486  45.486  45.486  
1       51.419  52.8438 54.1925 
2       58.312  61.1059 63.5241 
3       66.671  70.2925 73.3722 
4       76.582  80.3954 83.6207 
5       88.219  91.3728 94.1508 
6       101.147 103.145 104.846 
7       115.593 115.594 115.596 
8       132.219 128.563 126.298 
9       150.589 141.869 136.861 
10      170.138 155.304 147.205
11      188.196 168.654 157.264
12      204.631 181.711 166.983
13      220.119 194.284 176.318
14      234.987 206.209 185.236

















Clearly the WRD forecast, if made using this sort of curve, required an
upper bound higher than q=300.


SUMMARY OF CRITERIA FOR CURVE FITTING

Linear
     y=mx+b, m the slope, b the second coordinate of the y-intercept.
     useful for linear increase.

Exponential
     y=e^(mx+b)
     Useful to suggest decline toward the horizontal asymptote.
     Useful to suggest unbounded increase--"worst case" picture.
Logarithmic
     y=ln(mx+b)
     Dampened increase in growth
     Unbounded
Cubic spline
     Exact fit using pieces of cubic curve between given finite set of
evenly-spaced data points.
     Bounded fit--not good for forecasting
     Interpolating curve.
Logistic
     Assumption of exponential growth that tapers off toward some upper
bound.
     Produces S-shaped curve (in its full extent) based on two endpoints
to find values of constants.
     y=q/(1+ae^(bx)), b<0
Gompertz
     Like a logistic curve--produces a flatter S-shape than does a
logistic for the same values.
     y=q*e^((-ce)^(-bx))

Other analytical tools

Feigenbaum's graphical analysis
     Useful for examining geometric dynamics--point of irreversibility of
dynamic process may suggest
     point at which to intervene.  The geometric dynamics are based on the
idea of geometric feedback.
     Likely useful when real-world feedback can be aligned with geometric
feedback.
Lattices
     Useful, potentially, in examining hierarchical structures in which
domination is involved.
     Very few applications of this sort of material exist in the
literature.

Fractals
     Useful when ideas of self-similarity and scale change are involved.
Difficulty comes in 
     identifying geometric self-similarity.

Graph theory                            
     Useful when the real-world situation can be partitioned into a set of
nodes (point-locations) and                             
     edges (channels linking these locations).  This alignment can be
quite far-fetched, including linking                            
     of ideas--not just the more obvious with transport networks, for
example.  Allows for abstract                           
     manipulation based on linkage pattern.
War of the Roses example.