3.2.2 Ordinary Least Square method (simple regression only)
ORDINARY LEAST SQUARES REGRESSION METHOD
Regression analysis is a technique that uses a statistical model to measure the amount of change in one variable (dependent variable) that is associated with changes in amounts of one or more variables.
This method is used to determine the equation of the line of best fit by minimizing the sum of the squares of the vertical
When it has been established that a causal relationship exists in the data and that a linear function is appropriate the statistical technique known as least squares is frequently used to establish values for the coefficients a and b (representing fixed and variable cost respectively) in the linear cost function.
= +
where y is total cost – the dependent variable
and x is the agreed measure of activity – the independent variable
The values of a and b are determined after substituting data.
- In the normal equation below.
y n a b x……………………………………..(i)
x y ax b x2……………………………..(ii)
- In the formulas below
nxyxy ybx _ _
b 2 2 a or a ybx
nx (x) n
When it has been established that a causal relationship exists in the data and that a linear function is appropriate the statistical technique known as least squares is frequently used to establish values for the coefficients a and b (representing fixed and variable cost respectively) in the linear cost function.
= +
Where y is total cost – the dependent variable and x is the agreed measure of activity – the independent variable
Characteristics of linear regression
- It is objectively determined.
- It makes use of all the data or observations
- It minimizes the sum of squares of the error terms
- If there is a linear relationship between the dependent and independent variable, this method gives the best predictions within the relevant range.
Illustration
The following table shows the number of units of a good produced and the total costs incurred.
| Units produced | Total costs |
| 100
200 300 400 500 600 700 |
40,000
45,000 50,000 65,000 70,000 70,000 80,000 |
Calculate the regression line for y and n.
Solution
Notes on the calculation
The calculation can reduced to a series of steps as follows;-
Step 1:
Tabulate the data and determine which is the dependent variable, y, and which the independent x.
Step 2:
Calculate∑ , ∑ , ∑ , ∑ (leave room for a column for ∑ which may well be needed subsequently)
Step 3;
Substitute in the formation in order to find b and a in that order.
Step 4;
Substitute a and b in the regression equation.
The calculation is set out as follows, where x is the activity level in units of hundreds and y is the cost in units of sh.1, 000.
| x | y | Xy | x2 | |
| 1
2 3 4 5 6 7 28 |
40
45 50 65 70 70 80 420 |
40
90 150 260 350 420 560 1,870 |
1
4 9 16 25 36 49 140 |
n = 7 |
b =
Try to avoid rounding at this stage since, although n ∑ are large, their difference is much smaller.
a – 6.79 = 60 – 27.16= 32.84
Therefore the regressional line for y on x is:
y = 32.84 + 6.79x (x in hundreds of units produced, y in sh.1,000).
(Always specify what x and y are very carefully)
This line would be used to estimate the total costs for a given level of output. If, say, 250 units were made we can predict the expected yield by using the regression line where x = 2.5.
y = 32.84 + 6.79 x 2.5 = 32.84 + 16.975 = 49.815
i.e. we predict total costs of sh.49,815 for production of 250 units.
Using the regression line for forecasting
In the previous example, having found the equation of the line of best fit, we used this to forecast the total cost for a given level of activity.
The validity of such forecasts will be dependent upon two main factors.
- Whether there is sufficient correlation between the variables to support a linear relationship within the range of the data used.
- Whether the forecast represents an interpolation or an extrapolation
Illustration
The following data have been collected on costs and output:
| Output (000s) Costs (sh.000s) | 1
14 |
2
17 |
3
15 |
4
23 |
5
18 |
6
22 |
7
31 |
Required;-
Calculate the coefficients in the linear cost function.
y = a + bx
Using
- The Normal Equation and (ii) the coefficient formulae
Solution
| Output (x) | Costs (y) | Xy | x2 |
| 1
2 3 4 5 6 7 Σx = 28 |
14
17 15 23 18 22 31 Σy = 140 |
14
34 45 92 90 132 217 Σxy = 624 |
1
4 9 16 25 36 49 Σx2 = 140 |
Where n = 7 (i.e. number of pairs of readings)
- i) Using the normal equations
140 = 7a + 28b ………..I
624 = 28a + 140b ……….. II
And eliminating one coefficient thus
624 = 28a + 140b ………..I
560 = 28a + 112b ……….. 1 x 4
64 = 28b
∴ b = 2.286 and, substituting this value in one of the equations, the value of a is found to be
10.86
∴ Regression line is y = 10.86 + 2.26x
- Using the coefficient formulae
- = = 10.86
- = = 2.286
When the coefficients have been calculated the cost function can be used for forecasting simply by inserting the appropriate level of activity i.e. a value for x, and calculating the resulting total cost.
For example, what are the predicted costs at output levels of:
- 4,500 units (i.e. 4.5 in ‘000s), and
- 8,000 units (i.e. 8 in ‘000s)
y = 10.86 + 2.286 (4.5) = sh. 21,147
Note: A prediction within the range of the original observations (1 to 7 in Example 1) is known as an interpolation.
y = 10.86 + 2.286 (8) = sh.29,148
Note: A prediction outside the range of original observations is known as an extrapolation.
REVISION QUESTIONS
QUESTION ONE
The management of Limuru Processing Company Limited wishes to obtain better cost estimates to evaluate the company’s operations more effectively.
The following information is provided to you for analysis:
| Year 2004 | Equivalent production | Overheads |
| Month | Units (‘000’) | Sh.’000’ |
| January | 1,425 | 12,185 |
| February | 950 | 9,875 |
| March | 1,130 | 10,450 |
| April | 1,690 | 15,280 |
| May | 1,006 | 9,915 |
| June | 834 | 9,150 |
| July | 982 | 10,133 |
| August | 1,259 | 11,981 |
| September | 1,385 | 12,045 |
| October | 1,420 | 13,180 |
| November | 1,125 | 13,180 |
| December | 980 | 10,430 |
Additional information:
- In November, the opening work in progress inventory contained 1,000,000 units that were 30% complete with respect to conversion costs.
- During the same month of November, the manufacturing department transferred 1,500,000 units.
- The closing inventory for the month of November was 1,200,000 units and the units were 305 incomplete with respect to conversion costs
- Using the above information, you have obtained the following variables by applying simple regression analysis.
Sh. ‘000’
Constant 3,709
Slope 6,487
Required:
- i) Use the high-low method to estimate the overhead cost function. ii) Use the regression method to determine the overhead cost function.
- Compute the equivalent units of production with respect to conversion costs for the month of November using the FIFO method.
- Use the regression function formulated in (ii) above to estimate the overhead cost for the month of November.
Solution:
- Use the high-low method to estimate the overhead cost function
Highest cost (OHs) – 15,280 level of activity 1690
Lowest cost (OHs) – 9150 level of activity 834
Range = 15,280 – 9,150 = 6130 = 7.16
1690 – 834 856
Y = a + bx whereb = 7.16
Y = 15,280
Therefore 15,280 = a + 7.16 x 1690
a = 15,280 – (7.16 x 1690)
a = 3180
Therefore y = 3,180,000 + 7160x
- Use the regression method – determine the overhead cost function y = a + bx where a = 3,709,000
b = 6487
Therefore y = 3,709,000 + 6487x
- Equivalent units of production
Looking at the output side using FIFO method
| Completion % | Conversion | ||
| Opening stock (WIP) | 1,000,000 | 70 | 700,000 |
| Completely processed during production | 500,000 | 100 | 500,000 |
| Closing stock (WIP) | 1,200,000 | 1,199,695 | |
| Equivalent units with respect to conversion costs | 2,399,695 |
- Estimate on cost for the month of November
Y = 3,709,000 +06487x where x = 1125
Therefore y = 3,709,000 + 6487 x 1125
= 11,006,875
QUESTION TWO
(a)Explain the advantages and disadvantages of the high-low method of cost estimation.
(b)Central Machinery Ltd. is preparing its budget for the year ending 30 June 2004. For the fuel expenses consumption it is decided to estimate an equation of the form, y = a + bx, where y is the total expense at an activity level x, a is the fixed expense and b is the rate of variable cost.
The following information relate to the year ended 30 June 2003:
| Month | Machine hours | Fuel Oil expense | Month | Machine hours | Fuel oil expense |
| 2003 | (Sh. ‘000’) | (Sh. ‘000’) | 2004 | (Sh.‘000’) | (Sh. ‘000’) |
| July
August September October November December |
34
30 34 39 42 32 |
640
620 620 590 500 530 |
January
February March April May June |
26
26 31 35 43 48 |
500
500 530 550 580 680 |
The annual total and monthly average figures for the year ended 30 June 2003 were as follows:
| Machine hours | Fuel oil expense | |
| (‘000’) | (Sh. ‘000’) | |
| Annual total
Monthly average |
420 35 | 6,840 570 |
Required:
- Using the high-low method, estimate and interpret the fixed and variable cost elements of the fuel oil expense.
- Using the results in (i) above, predict the fuel oil expense for November 2004 if experience indicates that 41,000 machine hours will be used.
- Briefly explain any two limitations of High-low method of cost estimation that may be overcome by using simple linear regression analysis.
Solution:
- Advantages of high-low method
- Method is easy to use
- Not many data are needed
- Visually it gives the general direction of the trend Disadvantages
- Choice of the high and low points is subjective
- Method does not use all available data
- Cannot be used for more than one independent variable
- Not possible to defend the results statistically
- If the two points are outliers, the predictive equation will be wrong.
- Method may not be reliable
- (i) High-low method
| Machine hours Sh. ‘000’ | Fuel oil expense Sh. ‘000’ | ||
| High-point (June 2004)
Low-point (January, 2004) Difference
|
48
26 22 |
680
500 180 |
|
| Variable cost per machine hour = | 180,000
22,000 = Sh.8.182 per hour |
||
| Substituting for January 2004
Variable costs (26 × 8.182) = Fixed cost (difference) |
212,730 287,270 500,000 |
Interpretation:
Within the relevant range, Sh.282,270 will be incurred irrespective of the machine hour usage of the unit i.e. 282,270 is fixed.
The total fuel consumption will thereafter vary at the rate of Sh.8.182 for each machine hour usage.
- Fuel expense in November, 2004
= 287,264 + 8.182 x 41,000
= Sh.622,726
- Limitations of high-low method
- Relies only on two data points – highest and lowest which may be outside and therefore not representative of the entire data set.
- The method does not use robust statistical techniques, to measure the predictive quality of the resultant function.
