CS130/230 Lecture 4
Regression Analysis
Tuesday, January 13, 2004
Save or print a pdf version of this document
Another
useful financial function is NPER, which calculates the number of periods for
an investment based on periodic, constant payment and a constant interest rate.
Based on
your budget and salary, you have worked out that you will have $1,000,000 (yes
one million dollars) when you retire. You estimate that you will draw out
$50,000 a year to live on and the remainder will go into a bank account with a
4% interest rate compounded yearly (yearly interest rate).
Part I: Use
the NPER function to work out how many years it will be before you run out of
money.
Part II:
Create a payment or amortization schedule for this period and verify that the
result of the NPER function is correct. Your table should contain columns for
beginning balance, amount withdrawn, interest added and final balance for every
payment period.
Question:
Is the payment period in months or years?
Part III:
Add a chart to your worksheet showing the decline in your balance till it
reaches zero.
Regression
analysis is a form of statistical analysis used for forecasting. Regression
analysis estimates the relationship between variables, so that a particular
variable can be predicted from one or more other variables. During regression
analysis, we need to fit functions to data.
Trendlines
are used to graphically display trends in data and to analyze problems of
prediction. In other words we try to draw a line that best fits the data. By
using regression analysis, you can extend a trendline in a chart beyond the
actual data to predict future values.
This
subject usually falls under statistics and mathematical modeling and can be
applied to many different scientific and business applications. Understanding
the various formulas for regression is beyond the scope of this class. However,
you should understand that the line should be placed such that the distance or
variation from each data point to the line is minimized.
In linear regression we try to find a straight line that best fits our data. We first need to plot our data using ExcelÕs XY or scatter chart. We then add the trendline to the chart and use the function to predict future values for our data.
The detailed
steps are:
á
Enter
the data in an Excel worksheet and select the data you want to plot.
á
Click
on the chart wizard.
á
Choose
XY (scatter) plot.
á
Check
that the data range is correct.
á
Enter
the titles and labels.
á
Click
on the chart then select CHART from menu bar and ADD TRENDLINE from this menu.
á
From
the menu that appears, select the type of function that you would like to use
for your model. In this example we will use the default, which is LINEAR
REGRESSION.
á
In
order to have Excel display the equation of our regression line and the
correlation coefficient, you need to click on the OPTIONS tab within the Add
Trendline screen above, click on these two options, and then press OK.
You should
be rewarded with a graph, equation and regression coefficient.
In the
CS130 Pub folder is a file called Candy Bars.xls. Copy this file to your
folder, open it and do the following.
Part I: Create a ScatterPlot of the data
Carbohydrates and Sugars.
Part II:
Add a trendline to
your chart and display the function or equation.
Part
III: What is the
amount of sugars (in grams) that we can expect from a candy bar with 60 grams
of carbohydrates?
Part IV:
Add an empty column
after name. In that column, place an asterisk for foods that have a
carbohydrate count of 40grams or higher and a sugar count of 35 grams or
higher.
Part V: Turn on the AutoFilter and find out
the number of M&M/Mars candy that fits these criteria.
The
regression coefficient, also known as the R-squared value, is an indicator that
ranges in value from 0 to 1 and reveals how closely the estimated values for
the trendline correspond to your actual data. A trendline is most reliable when
its R-squared value is at or near 1.
Often,
relationships are nonlinear and we need a different type of graph to fit the
data. Excel provides us with
different types of nonlinear functions that we can use to fit data. These functions include polynomial,
exponential, logarithmic and power.
Let us consider
the following data which represents the number of deaths, N, from AIDS in the
United States from 1981 to 1996, where t denotes the number of years after
1980.
t |
N
|
1 |
159 |
2 |
622 |
3 |
2130 |
4 |
5635 |
5 |
12607 |
6 |
24717 |
7 |
41129 |
8 |
62248 |
9 |
90039 |
10 |
121577 |
11 |
158193 |
12 |
199287 |
13 |
243923 |
14 |
292586 |
15 |
340957 |
16 |
375904 |
After
typing the data into an Excel spreadsheet, try to fit different types of
nonlinear functions to the data.
Which works the best? How
do we know?
Can we
predict what the number of deaths from AIDS would be in 1997?
Recall that
to solve an equation of the form y = ae bx for x (where a and b are just constants), you
first divide by a to obtain y/a = e bx . Now, you must take the
natural logarithm of each side to obtain ln(y/a)=bx. Dividing by b yields x = (1/b)ln(y/a).
Recall that
to solve an equation of the form y = a ln(bx) for x (where a and b are just
constants), you again divide by a to obtain y/a = ln(bx).
Now, you must exponentiate each side to obtain e y/a = bx. Dividing by b yields x = (1/b)e y/a .
The
following data is from an actual study that considered how memory decreases
with time. The subjects each read a list of 20 words slowly aloud, and later,
at different time intervals, were shown a list of 40 words containing the 20
words that he or she had read. The
percentage, P, of words recognized was recorded as a function of the time t
elapsed in minutes. The table below shows the averages for 5 different subjects. Your task is to, using the regression
capabilities of EXCEL, determine the best-fit model for this data AND to use
this model to predict the time at which the subjects will recognize 15% of the
words.
T,min |
5 |
15 |
30 |
60 |
120 |
240 |
480 |
720 |
2880 |
5760 |
P% |
73.0 |
61.7 |
58.3 |
55.7 |
50.3 |
46.7 |
38.3 |
29.0 |
24.0 |
18.7 |
Tomorrow we
will have our first midterm, which will include all the material that we have
covered in class up to this point. The first part of the class, I will go over
the next assignment and answer any questions. I will also cover some new
material. We will then take a 20-minute break and the last 90 minutes will be
for the exam. Part of the exam is written and part will be hands on. The first
part of the exam is the written part. Once you have handed in the written
portion, I will give you the hands-on portion that will be solved using Excel
and placed in the CS130 Drop Box. Remember, I will not answer any computer
questions, so make sure you can do everything we have discussed in class. I
will only clarify questions on the exam if you do not understand something.
The written
portion of the exam can consist of multiple choice, true/false, and short
answer questions. The hands on portion will be two or three problems of the
type we have done in class.
Save or print a pdf version of this document