CS130/230 Lecture 4

Regression Analysis

 

Tuesday, January 13, 2004

 

Save or print a pdf version of this document

NPER Financial Function

Another useful financial function is NPER, which calculates the number of periods for an investment based on periodic, constant payment and a constant interest rate.

Problem 1

Based on your budget and salary, you have worked out that you will have $1,000,000 (yes one million dollars) when you retire. You estimate that you will draw out $50,000 a year to live on and the remainder will go into a bank account with a 4% interest rate compounded yearly (yearly interest rate).

 

Part I: Use the NPER function to work out how many years it will be before you run out of money.

 

Part II: Create a payment or amortization schedule for this period and verify that the result of the NPER function is correct. Your table should contain columns for beginning balance, amount withdrawn, interest added and final balance for every payment period.

 

Question: Is the payment period in months or years?

 

Part III: Add a chart to your worksheet showing the decline in your balance till it reaches zero.

Regression Analysis

Regression analysis is a form of statistical analysis used for forecasting. Regression analysis estimates the relationship between variables, so that a particular variable can be predicted from one or more other variables. During regression analysis, we need to fit functions to data.

 

Trendlines are used to graphically display trends in data and to analyze problems of prediction. In other words we try to draw a line that best fits the data. By using regression analysis, you can extend a trendline in a chart beyond the actual data to predict future values.

 

This subject usually falls under statistics and mathematical modeling and can be applied to many different scientific and business applications. Understanding the various formulas for regression is beyond the scope of this class. However, you should understand that the line should be placed such that the distance or variation from each data point to the line is minimized.

Linear Regression

In linear regression we try to find a straight line that best fits our data. We first need to plot our data using ExcelÕs XY or scatter chart. We then add the trendline to the chart and use the function to predict future values for our data.

 

The detailed steps are:

á      Enter the data in an Excel worksheet and select the data you want to plot.

á      Click on the chart wizard.

á      Choose XY (scatter) plot.

á      Check that the data range is correct.

á      Enter the titles and labels.

á      Click on the chart then select CHART from menu bar and ADD TRENDLINE from this menu.

á      From the menu that appears, select the type of function that you would like to use for your model. In this example we will use the default, which is LINEAR REGRESSION.

á      In order to have Excel display the equation of our regression line and the correlation coefficient, you need to click on the OPTIONS tab within the Add Trendline screen above, click on these two options, and then press OK.

 

You should be rewarded with a graph, equation and regression coefficient.

Problem 2

In the CS130 Pub folder is a file called Candy Bars.xls. Copy this file to your folder, open it and do the following.

 

Part I: Create a ScatterPlot of the data Carbohydrates and Sugars.

 

Part II: Add a trendline to your chart and display the function or equation.

 

Part III: What is the amount of sugars (in grams) that we can expect from a candy bar with 60 grams of carbohydrates?

 

Part IV: Add an empty column after name. In that column, place an asterisk for foods that have a carbohydrate count of 40grams or higher and a sugar count of 35 grams or higher.

 

Part V: Turn on the AutoFilter and find out the number of M&M/Mars candy that fits these criteria.

Regression Coefficient

The regression coefficient, also known as the R-squared value, is an indicator that ranges in value from 0 to 1 and reveals how closely the estimated values for the trendline correspond to your actual data. A trendline is most reliable when its R-squared value is at or near 1.

Nonlinear Regression

Often, relationships are nonlinear and we need a different type of graph to fit the data.  Excel provides us with different types of nonlinear functions that we can use to fit data.  These functions include polynomial, exponential, logarithmic and power.

Problem 3

Let us consider the following data which represents the number of deaths, N, from AIDS in the United States from 1981 to 1996, where t denotes the number of years after 1980.

 

t

N

1

159

2

622

3

2130

4

5635

5

12607

6

24717

7

41129

8

62248

9

90039

10

121577

11

158193

12

199287

13

243923

14

292586

15

340957

16

375904

 

After typing the data into an Excel spreadsheet, try to fit different types of nonlinear functions to the data.  Which works the best?  How do we know?

 

Can we predict what the number of deaths from AIDS would be in 1997?

Solving Exponential and Logarithmic Equations

Recall that to solve an equation of the form y = ae bx for x (where a and b are just constants), you first divide by a to obtain y/a = e bx .  Now, you must take the natural logarithm of each side to obtain ln(y/a)=bx.  Dividing by b yields x = (1/b)ln(y/a).

 

Recall that to solve an equation of the form y = a ln(bx) for x (where a and b are just constants), you again divide by a to obtain  y/a = ln(bx).  Now, you must exponentiate each side to obtain e y/a = bx.  Dividing by b yields x = (1/b)e y/a .

Problem 4

The following data is from an actual study that considered how memory decreases with time. The subjects each read a list of 20 words slowly aloud, and later, at different time intervals, were shown a list of 40 words containing the 20 words that he or she had read.  The percentage, P, of words recognized was recorded as a function of the time t elapsed in minutes. The table below shows the averages for 5 different subjects.  Your task is to, using the regression capabilities of EXCEL, determine the best-fit model for this data AND to use this model to predict the time at which the subjects will recognize 15% of the words.

 

T,min

5

15

30

60

120

240

480

720

2880

5760

P%

73.0

61.7

58.3

55.7

50.3

46.7

38.3

29.0

24.0

18.7

 

Midterm

Tomorrow we will have our first midterm, which will include all the material that we have covered in class up to this point. The first part of the class, I will go over the next assignment and answer any questions. I will also cover some new material. We will then take a 20-minute break and the last 90 minutes will be for the exam. Part of the exam is written and part will be hands on. The first part of the exam is the written part. Once you have handed in the written portion, I will give you the hands-on portion that will be solved using Excel and placed in the CS130 Drop Box. Remember, I will not answer any computer questions, so make sure you can do everything we have discussed in class. I will only clarify questions on the exam if you do not understand something.

 

The written portion of the exam can consist of multiple choice, true/false, and short answer questions. The hands on portion will be two or three problems of the type we have done in class.

 


 

Save or print a pdf version of this document