Curve Fitting Handout #1
S. Arlinghaus

Polynomials--general form

y=a_nx^n + b_(n-1)x^(n-1) +...+ a_3x^3 + a_2x^2 + a_1x^1 + a_0x^0

The simplest form of a polynomial is a straight line, generated either as
 
y=a_0x^0=a constant
or
y=a_1x^1 + a_0x^0=mx+b where m is the slope of the line and (0,b) is the
intercept of the line with the y-axis.

1.  Simple fitting of unbounded curves using Excel 5.0

Given a finite set of data points in two dimensions.  Find a straight line
that "best" fits the scatter of points.                                         
One might minimize the absolute value of the displacements of the points
from the line;                                          
one might minimize the squares of the absolute values of displacements;                                         
or, one might minimize a variety of other measures.                                             

The "regression" analysis featured in most spreadsheets is a "least
squares" analysis that                                          
minimizes the squares of the absolute values of displacements of the
actual data points from the line to be fit.                                             
Consider the following simple example:                                          

Data points--filled in red:                                             
        (1,1)                                   
        (2,3)                                   
        (3,3)                                   
                                                
displacements from line                                         
(points filled in blue)                                         (3,3m+b)
                                        (2,3)           
Displacements of the red points
(3,3)
from the blue points along the line                                                     
y=mx+b.                                          (2,2m+b)       

Displacement is a vertical measure.                             (1,m+b)                 
  --differences in y-coordinates.                                                       
d_1:  |m+b-1|                                                   
d_2:  |2m+b-3|                          (1,1)                   
d_3:  |3m+b-3|                                                  


MINIMIZE:                                                       
the sum of the squares of the displacements so that the absolute value is
no problem to deal with.                                                        

The line that "best" fits the data will be the one that is least far from
the data points--the one for which the                                                  
 sums of the squares of the displacements is a minimum.

MINIMIZE:

f(m,b) = (m+b-1)^2 + (2m+b-3)^2 + (3m+b-3)^2

SIMPLIFY:

f(m,b) = 14m^2 + 12mb - 32m
             +3b^2  -  14b   + 19

To find the minimum, find suitable partial derivatives, set them equal to
zero, and solve to find critical points.

partial of f with respect to m = 28m + 12b - 32
partial of f with respect to b  = 12m +   6b - 14

Set the two partial derivatives equal to zero and solve the set of two
equations in two variables:                     

Solve:                  

     14m + 6b =16                       
     12m + 6b =14                       

Therefore, m=1; b = 1/3                 a minimum (in theory could be a
maximum or saddle point).

So, y=mx+b becomes, for these data points:                      

y=x+1/3, the line of best fit obtained by minimizing squares of
displacements.                  

This very simple example illustrates WHY the strategy works; it also
illustrates that the role of                    
outliers is perhaps overly emphasized--a data point that is 5 units from
the line of best fit contributes a value                        
of 25.  Thus, in some cases, one might want to consider lines minimizing
strict linear displacement, even                        
though it is more trouble to deal with and is not customarily treated in
"black-box" mode, as is least squares   
analysis.       

BLACK-BOX treatment of least squares    

Most spreadsheets offer a very easy strategy for dealing with least
squares "regression" analysis.  
Of course, the manner of handling the analysis often changes, from one
update to the next, so having some      
understanding of what is happening is important.        

Use Excel 5.0 to find the line of best fit to the data points above.    
Data points:    
x       y
1       1
2       3
3       3
Enter these by hand, or import them and set them up in a manner such as
that above.     

Then, use the commands SLOPE and INTERCEPT to find the values to put in
y=mx+b                                                                          

Type =SLOPE({1,3,3},{1,2,3}) to return the value of the slope of the line
of least squares--                                                                              
     note that the numbers in the first set are the y's and in the second
set are the x's.                                                                                
Type =INTERCEPT({1,3,3},{1,2,3}) to return the value of the y-intercept of
the line of least squares--                                                                             
     note that the numbers in the first set are the y's and in the second
set are the x's.                                                                                
                                                                                
SLOPE CALCULATION:                                                                              
                                                                                

INTERCEPT CALCULATION:                                                                          


Then, it is an easy matter to read off the equation of the line of least
squares as y=1*x+0.3333333                                                                              
The equation can be used to forecast values.                                                                            

x       y       y=mx+b  
1       1       1.333333        Enter the extended set of
x-values--4,5,6,7,8,9
2       3       2.333333        
3       3       3.333333        Then, in the y-column, next to the 4, type
=a118+0.333333 (that is, mx+b).
4       4.33333 4.333333        Then, copy the cell b118 and paste it into
the highlighted range from b119 to b123.
5       5.33333 5.333333        
6       6.33333 6.333333        You can use chartWizard to graph the
results.  In many installations of 
7       7.33333 7.333333        Excel, you can do all this at once using
tools, data analysis.
8       8.33333 8.333333        
9       9.33333 9.333333        














2.  Simple non-linear curve fitting.

Exponential functions in general:

y=a^x, a is a fixed number

Suppose a>1  
         An increase in x causes an eventually sharper increase in y.                   
        Consider y=2^x for example                      

x       y                       
-1      0.5                     
0       1                       
1       2                       
2       4                       (0,1)



Suppose 00 and that a is not 1. 
Then, one of the two familiar exponential curves emerges.       

3.  Least squares and exponential curves.       

Basic definition:       
        Log_a y = x if and only if a^x = y.

        That is, exponential and logarithmic function are inverses--one
can be used to undo the other.

Consider the same data set as before:  (1,1), (2,3), (3,3).     

Suppose that we can fit a line of the form ln y = mx + b to the data.   

Then, from this line, we get (using the basic definition)       

y=e^(mx+b)      

as an exponential fit to the data.              

To do this, take the ln of the y values:                

(1, ln 1), (2, ln 3), (3, ln 3).                

Now, fit a line to these points, using least squares.           

x       ln y    y
1       0       1
2       1.098612        3
3       1.098612        3

Slope:  0.549306        
Intercept:      -0.366204       

Thus, ln y = 0.549306x-0.3662                   

So, y = e^(0.549306x-0.3662)                    

So, the exponential fit to the given values of x=1,2,3 is given below, as
is the projected fit for x=4,...,9                      
                actual y        y=x+0.333
x       y=e^(0.549306*x-0.3662)         
1       1.200942        1       1.3333
2       2.080092        3       2.3333
3       3.602824        3       3.3333
4       6.240273                4.3333
5       10.80847                5.3333
6       18.72081                6.3333
7       32.4254         7.3333
8       56.16243                8.3333
9       97.27617                9.3333







4.  Least squares and logarithmic curves.       

Basic definition:       
        Log_a y = x if and only if a^x = y.

        That is, exponential and logarithmic function are inverses--one
can be used to undo the other.

Consider the same data set as before:  (1,1), (2,3), (3,3).     

Suppose that we can fit a line of the form e^ y = mx + b to the data.           

Then, from this line, we get (using the basic definition)               

y=ln(mx+b)              

as a logarithmic curve fit to the data.         

To do this, take the exp of the y values:               

(1, e^ 1), (2, e^ 3), (3, e^ 3).                

Now, fit a line to these points, using least squares.           

x       e ^y    y
1       2.718282        1
2       20.08554        3               
3       20.08554        3               

Slope:  8.683629                        
Intercept:      -3.070804                       

Thus, e^ y = 8.683629x-3.0708                           

So, y = ln(8.683629x-3.0708)                            

So, the logarithmic curve fit to the given values of x=1,2,3 is given
below, as is the projected fit for x=4,...,9                            
                actual y        y=x+0.333       
x       y=e^(0.549306*x-0.3662)                 y=ln(8.683629*x-3.0708)
        exponential     linear (actual) linear (proj.)  logarithmic
1       1.200942        1       1.3333  1.725055
2       2.080092        3       2.3333  2.660012
3       3.602824        3       3.3333  3.134628
4       6.240273                4.3333  3.455171
5       10.80847                5.3333  3.697526
6       18.72081                6.3333  3.892452
7       32.4254         7.3333  4.05551
8       56.16243                8.3333  4.19567
9       97.27617                9.3333  4.318579






All of linear, exponential, and logarithmic curves offer a reasonable fit
to the                          
actual data; however, the various futures forecast by each are vastly
different!!                             

Note diagrams of this sort in Meadows--this is the theory behind them.          



                


If the tools, data analysis feature is installed, least squares is easy to
do;             
additional information is produced, and graphs can be drawn directly, too.              

x       e ^y    y
1       2.718282        1
2       20.08554        3
3       20.08554        3