Practical Guide To Business Forecasting 2015

1010

Business Forecasting Methods

Human Obsession with Future & ARIMA – by Roopam Humans are obsessed about their future – so much so that they worry more about their future than enjoying the present. This is precisely the reason why horoscopists, soothsayers, and fortune tellers are always in high-demand. Michel de Nostredame (a.k.a Nostradamus) was a French soothsayer who lived in the 16th century.

In his book Les Propheties (The Prophecies) he made predictions about important events to follow till the end of time. Nostradamus’ followers believe that his predictions are irrevocably accurate about major events including the World Wars and the end of the world. For instance in one of the prophecies in his book, which later became one of his most debated and popular prophesies, he wrote the following “Beasts ferocious with hunger will cross the rivers The greater part of the battlefield will be against Hister. Into a cage of iron will the great one be drawn, When the child of Germany observes nothing.” His followers claim that Hister is an allusion to Adolf Hitler where Nostradamus misspelled Hitler’s name. One of the conspicuous thing about Nostradamus’ prophecies is that he never tagged these events to any date or time period. Detractors of Nostradamus believe that his book is full of cryptic pros (like the one above) and his followers try to force fit events to his writing.

To dissuade detractors, one of his avid followers (based on his writing) predicted the month and the year for the end of the world as July 1999 – quite dramatic, isn’t it? Ok so of course nothing earth-shattering happened in that month of 1999 otherwise you would not be reading this article. However, Nostradamus will continue to be a topic of discussion because of the eternal human obsession to predict the future. Time series modelling and ARIMA forecasting are scientific ways to predict the future.

A Practical Guide To Valuing Small To Medium Sized Businesses Trugman. Of the valuation assignment: in forecasting the future performance of the subject company. Product (GDP) increased by 3.9 percent in the second quarter of 2015. J Loss Prev Process Ind 22:600–606 Lewis CD (1982) Industrial and business forecasting methods: a practical guide to exponential smoothing and curve fitting. Practical Business Forecasting [Michael K. Evans] on Amazon.com. *FREE* shipping on qualifying. Introductory Econometrics: A Modern Approach (MindTap Course List). Out of 5 starsGood book. February 15, 2015.

However, you must keep in mind that these scientific techniques are also not immune to force fitting and human biases. On this note let us return to our manufacturing case study example.

ARIMA Model – Manufacturing Case Study Example Back to our manufacturing case study example where you are helping PowerHorse Tractors with sales forecasting for them to manage their inventories and suppliers. The following sections in this article represent your analysis in the form of a graphic guide.

You could find the data shared by PowerHorse’s MIS team at the following link. You may want to analyze this data to revalidate the analysis you will carry-out in the following sections. Now you are ready to start with your analysis to forecast tractors sales for the next 3 years. Step 1: Plot tractor sales data as time series To begin with you have prepared a time series plot for the data.

The following is the R code you have used to read the data in R and plot a time series chart. Data = read.csv('data = ts(data,2,start = c(2003,1),frequency = 12) plot(data, xlab='Years', ylab = 'Tractor Sales') Clearly the above chart has an upward trend for tractors sales and there is also a seasonal component that we have already analyzed an earlier article on. Step 2: Difference data to make data stationary on mean (remove trend) The next thing to do is to make the series stationary as learned in. This to remove the upward trend through 1st order differencing the series using the following formula: 1st Differencing (d=1) The R code and output for plotting the differenced series are displayed below: plot(diff(data),ylab='Differenced Tractor Sales') Okay so the above series is not stationary on variance i.e. Variation in the plot is increasing as we move towards the right of the chart. We need to make the series stationary on variance to produce reliable forecasts through ARIMA models.

Step 3: log transform data to make data stationary on variance One of the best ways to make a series stationary on variance is through transforming the original series through log transform. We will go back to our original tractor sales series and log transform it to make it stationary on variance.

The following equation represents the process of log transformation mathematically: Log of sales The following is the R code for the same with the output plot. Notice, this series is not stationary on mean since we are using the original data without differencing. Plot(log10(data),ylab='Log (Tractor Sales)') Now the series looks stationary on variance. Step 4: Difference log transform data to make data stationary on both mean and variance Let us look at the differenced plot for log transformed series to reconfirm if the series is actually stationary on both mean and variance.

1st Differencing (d=1) of log of sales The following is the R code to plot the above mathematical equation. Plot(diff(log10(data)),ylab='Differenced Log (Tractor Sales)') Yes, now this series looks stationary on both mean and variance. This also gives us the clue that I or integrated part of our ARIMA model will be equal to 1 as 1st difference is making the series stationary. Step 5: Plot ACF and PACF to identify potential AR and MA model Now, let us create autocorrelation factor (ACF) and partial autocorrelation factor (PACF) plots to identify patterns in the above data which is stationary on both mean and variance. The idea is to identify presence of AR and MA components in the residuals. The following is the R code to produce ACF and PACF plots.

Par(mfrow = c(1,2)) acf(ts(diff(log10(data))),main='ACF Tractor Sales') pacf(ts(diff(log10(data))),main='PACF Tractor Sales') Since, there are enough spikes in the plots outside the insignificant zone (dotted horizontal lines) we can conclude that the residuals are not random. This implies that there is juice or information available in residuals to be extracted by AR and MA models. Also, there is a seasonal component available in the residuals at the lag 12 (represented by spikes at lag 12).

This makes sense since we are analyzing monthly data that tends to have seasonality of 12 months because of patterns in tractor sales. Step 6: Identification of best fit ARIMA model Auto arima function in forecast package in R helps us identify the best fit ARIMA model on the fly. The following is the code for the same.

Please install the required ‘forecast’ package in R before executing this code. Require(forecast) ARIMAfit = auto.arima(log10(data), approximation=FALSE,trace=FALSE) summary(ARIMAfit) Time series: log 10(Tractor Sales) Best fit Model: ARIMA(0,1,1)(0,1,1)12 ma1 sma1 Coefficients: -0.4047 -0.5529 s.e. 0.0885 0.0734 log likelihood=354.4 AIC=-702.79 AICc=-702.6 BIC=-694.17 The best fit model is selected based on Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) values. The idea is to choose a model with minimum AIC and BIC values. We will explore more about AIC and BIC in the next article. The values of AIC and BIC for our best fit model developed in R are displayed at the bottom of the following results: As expected, our model has I (or integrated) component equal to 1.

This represents differencing of order 1. There is additional differencing of lag 12 in the above best fit model. Moreover, the best fit model has MA value of order 1. Also, there is seasonal MA with lag 12 of order 1. Step 6: Forecast sales using the best fit ARIMA model The next step is to predict tractor sales for next 3 years i.e. For 2015, 2016, and 2017 through the above model.

The following R code does this job for us. Par(mfrow = c(1,1)) pred = predict(ARIMAfit, n.ahead = 36) pred plot(data,type='l',xlim=c(2004,2018),ylim=c(1,1600),xlab = 'Year',ylab = 'Tractor Sales') lines(10^(pred$pred),col='blue') lines(10^(pred$pred+2.pred$se),col='orange') lines(10^(pred$pred-2.pred$se),col='orange') The following is the output with forecasted values of tractor sales in blue. Also, the range of expected error (i.e. 2 times standard deviation) is displayed with orange lines on either side of predicted blue line. Now, forecasts for a long period of 3 years is an ambitious task.

The major assumption here is that the underlining patterns in the time series will continue to stay the same as predicted in the model. A short-term forecasting model, say a couple of business quarters or a year, is usually a good idea to forecast with reasonable accuracy. A long-term model like the one above needs to evaluated on a regular interval of time (say 6 months). The idea is to incorporate the new information available with the passage of time in the model.

Step 7: Plot ACF and PACF for residuals of ARIMA model to ensure no more information is left for extraction Finally, let’s create an ACF and PACF plot of the residuals of our best fit ARIMA model i.e. ARIMA(0,1,1)(0,1,1)12. The following is the R code for the same. Par(mfrow=c(1,2)) acf(ts(ARIMAfit$residuals),main='ACF Residual') pacf(ts(ARIMAfit$residuals),main='PACF Residual') Since there are no spikes outside the insignificant zone for both ACF and PACF plots we can conclude that residuals are random with no information or juice in them. Hence our ARIMA model is working fine.

However, I must warn you before concluding this article that randomness is a funny thing and can be extremely confusing. We will discover this aspect about randomness and patterns in the epilogue of this forecasting case study example. Sign-off Note I must say Nostradamus was extremely clever since he had not tagged his prophecies to any time period. So he left the world with a book containing some cryptic sets of words to be analysed by the human imagination. This is where randomness becomes interesting. A prophesy written in cryptic words without a defined time-period is almost 100% likely to come true since humans are the perfect machine to make patterns out of randomness.

Let me put my own prophesy for a major event in the future. If someone will track this for the next 1000 years I am sure this will make me go in the books next to Nostradamus. Hi Roopam, Great article, very good explanation. Here is a suggestion to help the readers. As the data is not in a time-series format, it will help if we convert it to a time-series when plotting it.

Change the code in the following steps: Step 1: plot(ts(data,2), xlab=”Years”, ylab = “Tractor Sales”) Step 2: plot(diff(ts(data,2)), xlab=”Years”, ylab = “Diff Tractor Sales”) Step 3: plot(log(ts(data,2)), xlab=”Years”, ylab = “Log Tractor Sales”) Keep up the great work. This data looks a lot like the world famous Box-Jenkins International Airline Passenger series example. It has the same length of the data too. When I compare the two time series and I divide the one column by the other for each data point and plot the ratio it screams out that I am right. Even the LOG transform is right from the Box-Jenkins text book(which is unneeded and actually harmful as the use of LOGS was incorrect as the outliers in the data creates a false positive on the F test suggesting that LOGS are useful). Roopam, Box and Jenkins didn’t have the tools and knowledge that we do today. They had slow computers and no ability to search for outliers like now.

If you ignore outliers, you will have a false conclusion that you need logs. However, if you look for outliers and build dummy variables for them (ie stepup regression with deterministic 0/1 dummy variables) then you will not use logs and NOT have a forecast that goes as high (incorrectly) as there’s did. We have a presentation on our website that discusses that I presented at the IBF in 2009 on this here. Hello Roopam First of all, congratulations, I have read your case study example and it is very clear. But I have a question, in your first part you said: “Eventually, you will develop an ARIMA model to forecast sale / demand for next year. Additionally, you will also investigate the impact of marketing program on sales by using an exogenous variable ARIMA model.” I don´t know if part 4 is final part or I have to wait until a future delievery to read about how we can used a exogenous variable like “marketing program. I hope you’re soon to continue with your example, because frankly I’m desperate to know how this story ends.

Thanks Gabriel Cornejo CHILE. Our final model was built with log10(Tractor Sales) data i.e we had log-transformed our original tractor sales data. The remaining operations i.e. Differencing and moving average are in-built in our Arima model i.e. Best fit Model: ARIMA(0,1,1)(0,1,1)12 (see step 5). Trend and other variations are part of this ARIMA model except log transformation.

Hence for making the final forecast we just need to exponentially-transform the data by raising it to the power 10. That is exactly what I did to arrive at the final values in the step 6. I did the same for margin of errors. Sanya: in the simplest form, think of ARIMA model like the following equation (simple regression model): Sales(Future) = Sales(Past) + 300 + Random Variable (Ignore random variable for now) Now, if Sales(Past) = 1000 units you could easily calculate Sales(Future) = 1300 units.This is a simple ARIMA model with just an Integrated term i.e.

The above model could be extended to include more terms like Auto-Regressive and Moving-Average parts. At the end, you will get an equation similar to our simple model above. Once you will substitute the right values in the past you will arrive at the value for future. Smita, that’s a great question. I know you have addressed Roopam, but if I may, I would like to attempt to answer your concerns. A major caveat to ARIMA modeling is non-constant variance (non-stationary), but, when identified correctly, it can be a great resource to the time series modeler. There are three ways (and reasons) to check your time series for changes in error variance.

1) The variance of the errors is changing by a discernible magnitude. For example, if you are looking at your series for changes in variance, it could be useful to test the relationship of the variance or standard deviation to the expected value, and if they are reasonably proportional, then a transformation could be in order.

Potentially, you could have any number of possible Box-Cox transformations that could help stabilize the variance. Take a close look at the picture in the post I have linked below.

Source: 2) The variance of the errors is changing by a deterministic structure at a particular point in time. This is not to be confused with power transforms mentioned. The nature of these deterministic structures speaks more to a paradigm shift in variance rather than a continuous proportion. For example, if one were to split a data set in half, where the first half of the series has a variance of S, and the second half of the series has a variance of S., we would need to be sure that the variances were not significantly different from another to justify changing the overall model. ARIMA models use an equal weighting scheme (OLS) unless specified otherwise — this is one of those times where this premise can hurt you. Switching over to a Generalized Linear Model, or Generalized Least Squares, within the framework of ARIMA estimation, allows us to account for non-constant error variance. By doing this, we can put “weights” on the data to dampen the impact of the increased variance.

From there, we can normalize the data to better detect other useful changes in the data. Source: 3) Variance of the errors is itself a random variable subject to some ARIMA structure. If we have a structure in the variance of the errors that is representative of some repetitive pattern, then perhaps using a GARCH or ARCH model could help us. Notice how this is similar to the deterministic “paradigm shift” mentioned above. Imagine that error variance is subject to the same type of analysis that your typical ARIMA model is. There can be lags, leads, and changes in variance structure the can be identified as a function of time. Of course, none of the above say much about the presence of outliers in your data; which, untreated, have the ability to suggest any of the above 3 cases when they would otherwise be unnecessary.

In fact, testing for outliers should preclude your analysis of change in variance. To tie off this response, a good visualization of the importance of outlier detection (and how the lack thereof can lead you astray) read this document starting on page 14. Hii roopam, nice explanation about ARIMA.I learned ARIMA model from your website and now trying to apply it to forecast sales for stores. I have time series data with daily observation. So my first question what should be value of frequency for daily data in step 1 and when I followed above steps for my data series I am not getting reasonable forecast.Below is link to my data and code.Please help me asap.

Link to dataset: This is my code: data. I am not sure about how to use xreg parameter. Can you please tell me what will be its value. Is it name of column which is having irregular 0’s. I did as follows: require(forecast) ARIMAfit. Here is one way to proceed: myts.

Thanks Ram for help! The forecast using xreg are better but not close to actual values. Can you please suggest me something that I can do to forecast as close as possible values.I tried model with different frequency like 7,12,365. Also I would like to tell you that I want to forecast only for two months august & september so used only past dataset of aug & september and it is giving some reasonable plots but not accurate.So what should I do to get more accurate forecast.Thanks in advance 🙂 This is plot of current forecast values. Hi Patrick, I agree, the forecast gives negative Sales for holidays.

I am not sure how we can get a better forecast. I am sure Roopam will be able to provide help. Here is one option you can try: Create a new series excluding the 0 Sales.

Then fit a model and do a forecast on this new series without xreg. Data.m = subset(data, Sales0) ## Select non-zero sales myts = ts(data.m,2) ## Do not specify the frequency for the series. ARIMAfit = auto.arima(myts, approximation=FALSE,trace=FALSE) ## Do not use xreg #Forecast for 2 months, assuming 52 business days.

(No forecast for holidays) fcast = forecast(ARIMAfit, h=52) print(fcast) plot(fcast) ## End ##Note: I have not done any transformation (diff or log) to the data, because the series is already stationary. (It has no trend, and has uniform variance). @Patrick: I would recommend you to ask these questions: 1) Do you need a daily forecast? A forecast of shorter duration tends to have a larger noise.

In this case since from the logical point of view there is no obligation on shoppers to shop on Mondays instead of Tuesdays. This variation will influence the overall accuracy of your model. You may want to create a weekly forecast. This will take care of weekly holidays. Also, add a dummy trigger if there are additional holidays in that week. Moreover, you could create a daily forecast by apportioning this weekly forecast.

2) Secondly, is it the sales forecast that your company cares about or the influence of other factors on purchases for instance promotions and discounts? This will change your strategy from a pure play forecasting model to hypotheses driven questions. 3) Most important, how are you planning to use this model? Hii roopam, I would like to anwser your question: 1.Yes.I need daily forecast. 2.There are other factors which influence forecast like promotions, school holidays.But right now I am just trying to forecast using simple timeseries and its giving me reasonable forecast using ARIMA. 3.Right now my approach is to forecast sales for August and September of 2015 using historical data of August and September of 2013,2014. So I am having historical data of these 100 stores to forecast.I am having csv file in the format like: index sales 1 4000 2 4965 and so on The algorithm I’m gonna use is as follows: For store k: 1.Read data from index i=1 to j=60 2.forecast for next 60 days 3.write in csv file(for consecutive store overwrite from next available index) 4.i=j+1, j=j+60 Repeat for next store Can you please tell me how to execute step 1 and 3 in R.

That is what should I use at start,end parameter in timeseries ts function so that from CSV file I can get only first 60 values of sales column to forecast for next 60days and overwrite csv file from next index in result csv and repeat this for all stores.Sorry for long explanation but I have just started learning prediction, forecasting so I am weak in R programming. Hi Patrick, To write your forecast to a file: I do not see any function to write the ‘forecast’ object to a file. Here is one way: Convert the forecast object into a data frame and write it to a csv file. Detailed steps: 1. Before starting the while loop, create an empty data frame df1. Df1 = data.frame print(df) 2. Inside the loop, after you plot the fcast object, convert it to a data frame and append it to df1.

Df1 = rbind(df1, data.frame(fcast)) 3. After you exit the loop, finally write the dataframe df1 to a file in the working directory.

Write.csv(df1, file=”fcast.csv”) 4. Open fcast.csv file from Excel and verify the contents, Hopefully this will resolve your problem.

Thanks you so much Ram for help! It helped me a lot!

I just learned that we can get more accurate forecast by making use of xreg operator if we have other covariates like in my case I have holidays, promotions.Below is my data with other covariates which causes little difference in forecast like when there is promotion and school holiday the forecast are little bit more. Forecast without using these covariates are good but I have to learn how I can make them more accurate so I found on many blogs that xreg can be used to get effect of covariates on forecast.I found I good example here: And I tried to do as there: xreg.

Hi Roopam, It is indeed a very useful article to follow for ARIMA, I understand it much better now, thanks. A few questions: a) on your Part 3 it says: “for a significant correlation the horizontal bars should fall outside the horizontal dotted lines”, should it be the vertical bars instead? B) on Part 4, I see that you use auto.arima on log10(data), should we use log10(data) or diff(log10(data)) or data itself? C) when I use fit. Hello Sir, I couldn’t work with the above procedure while implementing it for larger data sets. The warning starts from log10 command. I could not implement ACF and PACF as you can see it in the code.

Hello Tejeshwar, I have got your data. Please remove it from the post. (Request Roopam to delete it). Your data contains several 0 values. The last line shows incomplete data – This might be due to copy/paste error.

Here is one option. Create a subset of the data, with values 0 and try your acf again. Example: data=read.csv(“temperature.csv”, header=F, sep=” t”) data=subset(data, data,2 0) acf(ts(diff(log10(data,2))),main=”Temp”) pacf(ts(diff(log10(data,2))),main=”Temp”) (There might be other better options.) Hope this helps. Hi Roopam, Its very nice explanation about ARIMA.

I got some basic idea about time series and ARIMA. However I am novice to data analysis and unable understand following things: 1. Why differencing: I hope it is for making non stationary data into stationary, if so original data will be still non stationary, only differenced data may be stationary. Why ACF and PACF.

Can you provide me bit more explanation on the same please? Also when you are doing differencing, why it is called at Integrative?

Also can you suggest any reference books for further study. Thanks in advance.:) Thx & Regards B N REDDAIAH. Hello Roopam sir. I am a novice in this field, and thus request your help! Based on past data i have deciphered trend and seasonality of my system. And hence Realised a forecast. Simultaneously, I checked my sales value of each month, with same month’s trade discount values, to possibly arrive at a correlation.

My problem now lies in the fact that I have made this forecast based only on trend and seasonality. However I also have to consider the fact that each month had different trade discounts.

Say Jan 2013 had Trade discount of 10%, and its observed sales value was 2000 units. But for Feb 2013, Trade discount was of 8%, and observed sales was 3000 units. Hence sales not only depends on Trend and Seasonality, but also on Trade discounts. Also, Trade discounts may or may not be given depending on seasonality. Eg, the company may think that since seasonality of May is high, (that is, the product gets sold more in May), they will anyways expect sales of 2000 units in this month, hence they may decide to give no/low trade discounts to maximise profits, or decide to give 12%, if competition is high in the market!

Hence, I need to design a model by which, for any month, I can predict, based on seasonality of that month, and trend, that if i give x% trade discount, I should expect y% Sales. Any suggestions? That’s the best tutorial I have seen in ARIMA! I just have a few questions regarding forecasting through ARIMA: 1. Say if I need to forecast the data every 15 points, do I need to go through the entire process and update the ARIMA orders and corresponding parameters at each time I forecast?

Is there a function or a way that auto-check if my original historical data or transferred data (e.g through differencing or remove seasonal components) is or at least looks stationary? Say I have minute by minute data, and I have to forecast every 15 minutes, is there a way that if my data or transferred data is stationary? Best regards, Jian. Hey Roopam I am alien towards this topic however currently i have a problem which i believe you can be of some help. I have data of Q3 2015 of revenue (fields in excel Account Name, Revenue, FY, and so on ) now if i want to predict whats my revenue of Q3 2016 how would i go about and do it.

Business Forecasting Methods

I know this is a very base question but would start with some analysis and if i get% change or predicted amount it would be of great help for my project submission by COB by friday. Any help would be highly appreciated. Hi Roopam, I have some test data and for 3 years (monthly) and want to predict for the next 12 months. I have done all the steps but while predicting for the next 12 months, getting the below error. Hello Roopam, First of all thank you so much for such a wonderful series of articles on time series. Very helpful for a starter like me. I have a question.

To make the series stationary on both mean and variance, we had to take difference as well as log. But while building the model, you only provided log of the data but not the difference. What is the reason behind that? I understand that best fit arima will figure it out itself but then why to take log?

ARIMAfit doesn’t take care of stationarity on variance? Rainfall is a complicated phenomenon. It depends on so many other atmospheric factors such as pressure, temperature, cyclonic winds etc. To begin with, I suggest you read a few research papers on rainfall prediction and associated methods. I am not an expert on weather forecasting but chaos is another factor that dominates weather patterns more than sales and demand. Moreover, you are looking at the data for tropical rains that is spread over just 4-5 months in a year. I am not sure if a timeseries model without exogenous variables will be of much help for forecasting rainfall.

With regards to the above comment, i found out my mistake and resolved it. Learning ARIMA via your blog was the best thing i did today while trying to understand time series via other sources when in drain since days.I was about to lose hope and give up on ACF PACF as i just cudnt get it.But then i accidently found this mine of gold. 🙂 Especially the comparison with sugarcane juicer and stuff.loved it.made it really easy to understand and comprehend the underlying concepts.Thank you very much sir.Looking forward to go thru rest of the topics as well and keep learning 🙂. Hi Roopam, I honestly don’t have enough kind words to describe what you’ve built here.

It’s helped me so much and i love the art work that you put on the pages! At the risk of looking a gift horse in the mouth I’d like to ask you for additional help. My first questions is, at what value should i be worried about AIC and BIC? I saw in your example that you were in the negative triple digits but when I ran auto.arima on my data I ended up with an AIC and BIC of 2.07 and 2.55 respectively.

Should i change around my code to get those values lower? My second question is, i’m trying to get a a single value from the equation to plug into a sales forecast but when i try pred$pred on my data its giving me the RMSE instead of an actual value that would make sense.

I’ll include the code and data below. These are the values: 16.0 47.0 121.0 78.0 86.0 68.0 121.0 61.0 80.0 121.0 72.0 59.0 i’m using a 13 month rolling period to forecast the next month’s sales. Let me try to answer your first and third questions. Will leave the coding part for others to deliberate.

For AIC/BIC, don’t worry about specific values since these metrics are not comparable across different datasets/models. These are indicative metrics which you want to minimise for a specific dataset/model.

This minimised value could be anything. With 800 rows and roughly 13 months data you are looking at over 60 products. Running a loop is an easy solution but your bigger challenge will be to evaluate 60 odd models individually. See if you could group these products to reduce the number but still satisfy your business objectives. If not then you will have to evaluate each model individually. All the best.