Curve fitting--part 6. S. Arlinghaus 1. VARIANTS ON EARLIER CURVE FITTING A. The exponential curve Previous curve fitting efforts dealt with using least squares analysis to fit a straight line, an exponential, or a logarithmic function to a distribution of data. In the case of the exponential, the previous work assumed that the curve eventually settled down to the x-axis, the line y=0. The x-axis need not be the horizontal asymptote of this curve, however. Consider the following example and fit two exponentials--one with horizontal asymptote y=0 and one with a different horizontal asymptote. The general form of the curve is: y=Ce^(ax) + b where a < 0 and y=b is the lower bound of the exponential; C is a constant. The added term shows how much the curve is lifted above or below the x-axis. Exponential curve fit to projected crude birth data, 2005 to 2025 Source: WRD data for Bangladesh. y=0 y=4 Year WRD proj exp. proj. exp. proj. WRD proj y-4=Ce^(ax) 2005 2005 LN(y-0) LN(y-4) y=0 y=4 2005 5.97184 5.85017 5.9407 1.78706 0.67897 2010 5.53296 5.5026 5.48587 1.71072 0.4272 2015 5.07485 5.17568 5.13763 1.6243 0.07218 2020 4.84262 4.86818 4.87101 1.57746 -0.1712 2025 4.69974 4.57895 4.66688 1.54751 -0.357 -0.0122 -0.0534 26.3277 107.75 ln y = -0.01225x+26.32772 ln(y-4) = -0.05341x+107.7501 y=exp(-0.01225x+26.32772) y-4 = exp(-0.05341x+107.7501) y=exp(-0.05341x+107.7501)+4 The value of y=4 as a different lower bound was suggested by the WRD data. Graph: It appears that the WRD projection, while exponential in general shape, may have been made using lines of different slopes joined at 2015 (3 above). The exponential that has y=4 as a horizontal asymptote appears to be closer to the criteria used to make forecasts than does the exponential with y=0 as a horizontal asymptote. B. The logistic curve (variant). One variant of the logistic curve, in which the S-shape appears flatter is the Gompertz curve; it is used to model growth of various kinds, from financial to population. The reason the curve is flatter becomes evident when the logistic equation is written as a differential equation, dP/dt = P(a-b*P), and the Gompertz is also written in an equivalent manner, as dP/dt = P(a-b*ln P) the logarithmic factor tends to flatten out the curve and make the S-shape less curved than would a logistic fit. General form for the Gompertz curve: y=q*e^((-ce)^(-bx)) where q is selected prior to making any analysis and is the value of the upper bound selected by the user on carrying capapcity or other bases, and b and c are constants to be determined depending on the values selected for q and the beginning and ending times chosen. There are numerous equivalent forms. Example. Gompertz curve fit to WRD data from 1955-2025, Bangladesh total population. Year Pop. mil. q=300 q=300 Logistic Gompertz 1955-2025 WRD logistic Gompertz y=q/(1+ae^(bx)), b<0 y=q*e^((-ce)^(-bx)) 5 yr interval 0 45.486 45.486 45.486 y=300/(1+ae^(bx)) y=300*e^((-ce)^(-bx)) 1 51.419 54.4071 58.7728 2 58.312 64.6335 73.3423 Find a: Find c: 3 66.671 76.1858 88.8109 In 1955, t=0, y=45.486. In 1955, when t=0, y=45.486 4 76.582 89.022 104.782 Thus, 45.486 = 300/(1+a) Thus, 45.486 = 300*e^((-ce)^(-b0)) 5 88.219 103.025 120.879 Solving, a=5.595435 Solving, c=ln(300/45.486)=1.8863779 6 101.147 117.999 136.767 y=300/(1+5.595435e^(bt) y=300*e^((-1.8863779e)^(-bx)) 7 115.593 133.673 152.17 8 132.219 149.716 166.87 Find b. use info. from t=14 in 2025 Find b. use info. from t=14 in 2025 9 150.589 165.765 180.711 In 2025, t=14, y=234.987. In 2025, t=14, y=234.987 10 170.138 181.457 193.594 Thus, 234.987=300/(1+5.595435e^(14b)) Thus, 234.987=300*e^(-1.8863779e)^(-14b)) 11 188.196 196.461 205.464 Solving, b=-0.21477. Solving, b=1/14 * ln(0.2442523/1.8863779) 12 204.631 210.503 216.306 =0.1460152 13 220.119 223.382 226.135 Logistic equation: Gompertz equation: 14 234.987 234.981 234.987 y=300/(1+5.595435e^(-0.21477t)) y=300*e^((-1.8863779e)^(-0.1460152x)) Graph: The two fits are quite far (relatively speaking) from the actual data in 1990. One might consider, therefore, using 1990 as the endpoint and extrapolating beyond that. The procedure is the same as above--just use 1990 instead of 2025 as the endpoint. Logistic Gompertz The value for a is the same as above. The value for a is the same as above. y=300/(1+5.595435e^(bt) y=300*e^((-1.8863779e)^(-bx)) Find b. use info. from t=7 in 1990 Find b. use info. from t=7 in 1990 In 1990, t=7, y=115.593. In 1990, t=7, y=115.593 Thus, 115.593=300/(1+5.595435e^(7b)) Thus, 115.593=300*e^(-1.8863779e)^(-7b)) Solving, b=-0.17927 Solving, -b=1/7 * ln(0.95371/1.8863779) -0.9537 -0.1793 = -0.0974 Logistic equation: Gompertz equation: y=300/(1+5.595435e^(-0.17927t)) y=300*e^((-1.8863779e)^(-0.09744x)) Year Pop. mil. q=300 q=300 1955-2025 WRD logistic Gompertz 5 yr interval 0 45.486 45.486 45.486 1 51.419 52.8438 54.1925 2 58.312 61.1059 63.5241 3 66.671 70.2925 73.3722 4 76.582 80.3954 83.6207 5 88.219 91.3728 94.1508 6 101.147 103.145 104.846 7 115.593 115.594 115.596 8 132.219 128.563 126.298 9 150.589 141.869 136.861 10 170.138 155.304 147.205 11 188.196 168.654 157.264 12 204.631 181.711 166.983 13 220.119 194.284 176.318 14 234.987 206.209 185.236 Clearly the WRD forecast, if made using this sort of curve, required an upper bound higher than q=300. SUMMARY OF CRITERIA FOR CURVE FITTING Linear y=mx+b, m the slope, b the second coordinate of the y-intercept. useful for linear increase. Exponential y=e^(mx+b) Useful to suggest decline toward the horizontal asymptote. Useful to suggest unbounded increase--"worst case" picture. Logarithmic y=ln(mx+b) Dampened increase in growth Unbounded Cubic spline Exact fit using pieces of cubic curve between given finite set of evenly-spaced data points. Bounded fit--not good for forecasting Interpolating curve. Logistic Assumption of exponential growth that tapers off toward some upper bound. Produces S-shaped curve (in its full extent) based on two endpoints to find values of constants. y=q/(1+ae^(bx)), b<0 Gompertz Like a logistic curve--produces a flatter S-shape than does a logistic for the same values. y=q*e^((-ce)^(-bx)) Other analytical tools Feigenbaum's graphical analysis Useful for examining geometric dynamics--point of irreversibility of dynamic process may suggest point at which to intervene. The geometric dynamics are based on the idea of geometric feedback. Likely useful when real-world feedback can be aligned with geometric feedback. Lattices Useful, potentially, in examining hierarchical structures in which domination is involved. Very few applications of this sort of material exist in the literature. Fractals Useful when ideas of self-similarity and scale change are involved. Difficulty comes in identifying geometric self-similarity. Graph theory Useful when the real-world situation can be partitioned into a set of nodes (point-locations) and edges (channels linking these locations). This alignment can be quite far-fetched, including linking of ideas--not just the more obvious with transport networks, for example. Allows for abstract manipulation based on linkage pattern. War of the Roses example.