Lesson 1, Topic 1
In Progress

3.2.2 Ordinary Least Square method (simple regression only)

ORDINARY LEAST SQUARES REGRESSION METHOD

 

Regression analysis is a technique that uses a statistical model to measure the amount of change in one variable (dependent variable) that is associated with changes in amounts of one or more variables.

This method is used to determine the equation of the line of best fit by minimizing the sum of the squares of the vertical

When it has been established that a causal relationship exists in the data and that a linear function is appropriate the statistical technique known as least squares is frequently used to establish values for the coefficients a and b (representing fixed and variable cost respectively) in the linear cost function.

 

=  +

 

where y is total cost – the dependent variable

and x is the agreed measure of activity – the independent variable

 

The values of a and b are determined after substituting data.

 

  1. In the normal equation below.

y n a b x……………………………………..(i)

x y ax b x2……………………………..(ii)

 

  1. In the formulas below

nxyxy                                      ybx                      _       _

b             2              2                        a                                or a ybx

nx (x)                                               n

When it has been established that a causal relationship exists in the data and that a linear function is appropriate the statistical technique known as least squares is frequently used to establish values for the coefficients a and b (representing fixed and variable cost respectively) in the linear cost function.

=  +

 

Where y is total cost – the dependent variable and x is the agreed measure of activity – the independent variable

 

Characteristics of linear regression

  1. It is objectively determined.
  2. It makes use of all the data or observations
  3. It minimizes the sum of squares of the error terms
  4. If there is a linear relationship between the dependent and independent variable, this method gives the best predictions within the relevant range.

 

Illustration

The following table shows the number of units of a good produced and the total costs incurred.

Units produced  Total costs
100

200

300

400

500

600

700

40,000

45,000

50,000

65,000

70,000

70,000

80,000

 

Calculate the regression line for y and n.

 

Solution

Notes on the calculation

The calculation can reduced to a series of steps as follows;-

Step 1:

Tabulate the data and determine which is the dependent variable, y, and which the independent x.

Step 2:

Calculate∑ , ∑ , ∑ , ∑ (leave room for a column for ∑ which may well be needed subsequently)

Step 3;

Substitute in the formation in order to find b and a in that order.

Step 4;

Substitute a and b in the regression equation.

 

The calculation is set out as follows, where x is the activity level in units of hundreds and y is the cost in units of sh.1, 000.

 

x y Xy x2  
1

2

3

4

5

6

7

28

40

45

50

65

70

70

80

420

40

90

150

260

350

420

560

1,870

1

4

9

16

25

36

49

140

 

 

 

 

 

 

 

n = 7

 

b =

 

Try to avoid rounding at this stage since, although n ∑  are large, their difference is much smaller.

 

a   – 6.79  = 60 – 27.16= 32.84

 

Therefore the regressional line for y on x is:

y = 32.84 + 6.79x  (x in hundreds of units produced, y in sh.1,000).

 

(Always specify what x and y are very carefully)

 

This line would be used to estimate the total costs for a given level of output. If, say, 250 units were made we can predict the expected yield by using the regression line where x = 2.5.

 

y = 32.84 + 6.79 x 2.5  = 32.84 + 16.975 = 49.815

i.e. we predict total costs of sh.49,815 for production of 250 units.

 

Using the regression line for forecasting

In the previous example, having found the equation of the line of best fit, we used this to forecast the total cost for a given level of activity.

The validity of such forecasts will be dependent upon two main factors.

  • Whether there is sufficient correlation between the variables to support a linear relationship within the range of the data used.
  • Whether the forecast represents an interpolation or an extrapolation

Illustration

The following data have been collected on costs and output:

 

Output (000s) Costs (sh.000s) 1

14

2

17

3

15

4

23

5

18

6

22

7

31

Required;-

Calculate the coefficients in the linear cost function.

y = a + bx

Using

  1. The Normal Equation and (ii) the coefficient formulae

 

Solution

Output (x) Costs (y) Xy x2
1

2

3

4

5

6

7

Σx = 28

14

17

15

23

18

22

31

Σy = 140

14

34

45

92

90

132

217

Σxy = 624

1

4

9

16

25

36

49

Σx2 = 140

 

Where n = 7 (i.e. number of pairs of readings)

 

  1. i) Using the normal equations

140      =          7a        +          28b      ………..I

624      =          28a      +            140b ……….. II

 

And eliminating one coefficient thus

624      =          28a      +           140b  ………..I

560      =          28a      +            112b ……….. 1 x 4

64        =                                  28b

 

∴ b = 2.286 and, substituting this value in one of the  equations, the value of a is found to be

10.86

 

∴ Regression line is y = 10.86 + 2.26x

 

  1. Using the coefficient formulae

 

  • =            = 10.86

 

  • = = 2.286

 

When the coefficients have been calculated the cost function can be used for forecasting simply by inserting the appropriate level of activity i.e. a value for x, and calculating the resulting total cost.

 

For example, what are the predicted costs at output levels of:

 

  1. 4,500 units (i.e. 4.5 in ‘000s), and
  2. 8,000 units (i.e. 8 in ‘000s)

 

y = 10.86 + 2.286 (4.5) = sh. 21,147

 

Note: A prediction within the range of the original observations (1 to 7 in Example 1) is known as an interpolation.

 

y = 10.86 + 2.286 (8) = sh.29,148

 

Note: A prediction outside the range of original observations is known as an extrapolation.

 

 

 

 

 

 

 

 

 

 

 

REVISION QUESTIONS

QUESTION ONE

The management of Limuru Processing Company Limited wishes to obtain better cost estimates to evaluate the company’s operations more effectively.

 

The following information is provided to you for analysis:

 

Year 2004 Equivalent production Overheads
Month Units (‘000’) Sh.’000’
January 1,425 12,185
February 950 9,875
March 1,130 10,450
April 1,690 15,280
May 1,006 9,915
June 834 9,150
July 982 10,133
August 1,259 11,981
September 1,385 12,045
October 1,420 13,180
November 1,125 13,180
December 980 10,430

 

Additional information:

  1. In November, the opening work in progress inventory contained 1,000,000 units that were 30% complete with respect to conversion costs.
  2. During the same month of November, the manufacturing department transferred 1,500,000 units.
  3. The closing inventory for the month of November was 1,200,000 units and the units were 305 incomplete with respect to conversion costs
  4. Using the above information, you have obtained the following variables by applying simple regression analysis.

Sh. ‘000’

Constant                3,709

Slope                     6,487

 

Required:

  1. i) Use the high-low method to estimate the overhead cost function. ii) Use the regression method to determine the overhead cost function.
    • Compute the equivalent units of production with respect to conversion costs for the month of November using the FIFO method.
  1. Use the regression function formulated in (ii) above to estimate the overhead cost for the month of November.

 

Solution:

  1. Use the high-low method to estimate the overhead cost function

 

Highest cost (OHs)  –  15,280 level of activity 1690

Lowest cost (OHs)  –  9150 level of activity 834

 

Range = 15,280 – 9,150 = 6130 = 7.16

1690 – 834     856

 

Y = a + bx       whereb = 7.16

Y = 15,280

Therefore 15,280 = a + 7.16 x 1690

a = 15,280 – (7.16 x 1690)

a = 3180

Therefore y = 3,180,000 + 7160x

 

  1. Use the regression method – determine the overhead cost function y = a + bx where a = 3,709,000

b = 6487

Therefore y = 3,709,000 + 6487x

 

  • Equivalent units of production

Looking at the output side using FIFO method

Completion % Conversion
Opening stock (WIP) 1,000,000 70 700,000
Completely processed during production 500,000 100 500,000
Closing stock (WIP) 1,200,000 1,199,695
Equivalent units with respect to conversion costs 2,399,695

 

  1. Estimate on cost for the month of November

Y = 3,709,000 +06487x                           where x = 1125

 

Therefore y = 3,709,000 + 6487 x 1125

= 11,006,875

 

 

QUESTION TWO

(a)Explain the advantages and disadvantages of the high-low method of cost estimation.

(b)Central Machinery Ltd. is preparing its budget for the year ending 30 June 2004.  For the fuel expenses consumption it is decided to estimate an equation of the form, y = a + bx, where y is the total expense at an activity level x, a is the fixed expense and b is the rate of variable cost.

 

The following information relate to the year ended 30 June 2003:

 

Month Machine hours Fuel Oil expense Month Machine hours Fuel oil expense
2003 (Sh. ‘000’) (Sh. ‘000’) 2004 (Sh.‘000’) (Sh. ‘000’)
July

August

September

October

November

December

34

30

34

39

42

32

640

620

620

590

500

530

January

February

March

April

May

June

26

26

31

35

43

48

500

500

530

550

580

680

 

The annual total and monthly average figures for the year ended 30 June 2003 were as follows:

Machine hours Fuel oil expense
(‘000’) (Sh. ‘000’)
Annual total

Monthly average

420 35 6,840 570

 

 

Required:

 

  • Using the high-low method, estimate and interpret the fixed and variable cost elements of the fuel oil expense.
  • Using the results in (i) above, predict the fuel oil expense for November 2004 if experience indicates that 41,000 machine hours will be used.
  • Briefly explain any two limitations of High-low method of cost estimation that may be overcome by using simple linear regression analysis.

 

 

 

 

Solution:

  • Advantages of high-low method
    • Method is easy to use
    • Not many data are needed
    • Visually it gives the general direction of the trend Disadvantages
    • Choice of the high and low points is subjective
    • Method does not use all available data
    • Cannot be used for more than one independent variable
    • Not possible to defend the results statistically
    • If the two points are outliers, the predictive equation will be wrong.
    • Method may not be reliable

 

  • (i) High-low method
Machine hours Sh. ‘000’ Fuel oil expense Sh. ‘000’
High-point (June 2004)

Low-point (January, 2004)

Difference

 

48

26

22

680

500

180

Variable cost per machine hour = 180,000

22,000

= Sh.8.182 per hour

Substituting for January 2004

Variable costs (26 × 8.182) =

Fixed cost (difference)

 

212,730

287,270

500,000

        Interpretation:

Within the relevant range, Sh.282,270 will be incurred irrespective of the machine hour usage of the unit i.e. 282,270 is fixed.

 

The total fuel consumption will thereafter vary at the rate of Sh.8.182 for each machine hour usage.

  • Fuel expense in November, 2004

=            287,264 + 8.182 x 41,000

=           Sh.622,726

  • Limitations of high-low method
    • Relies only on two data points – highest and lowest which may be outside and therefore not representative of the entire data set.
    • The method does not use robust statistical techniques, to measure the predictive quality of the resultant function.