River Logic Blog | Prescriptive Analytics Blog

Data complexities prove Excel spreadsheets difficult to maintain

Written by Renat Stewart | May 27, 2013

Excel Spreadsheets are Error-Prone

A case in point of this phenomenon is the recently discovered omission in Reinhart – Rogoff’s “Growth in a time of debt” model which examined the correlation of the countries’ debt burden vs. their economic growth. The 2010 paper’s main result was that “…median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise”.

This was widely cited as a landmark study by politicians and economists in the Austerity vs. Stimulus debate the world has been enjoying for the last few years. Indications suggest, however, that the Harvard scientists had a coding error in their Excel spreadsheet, accidentally skipping 5 cells (countries) in their averages calculations.

Another recent example is the “London Whale” $6 billion trading debacle at JP Morgan and the role Value–at-Risk model played there. VaR is a popular risk exposure metric in financial services industry. For a given portfolio over a specific time horizon, given a specific probability level, VaR is the threshold value such that the probability that the portfolio’s mark-to-market loss will exceed this threshold value is the given probability – e.g., we could say that for a portfolio X with a one year 5% VaR of $200M, there is a 5% chance of the portfolio losing more than $200M of its value over a one year horizon.

It appears that the Value-at-Risk model JP Morgan used involved a process of updating the spreadsheets manually, by copying and pasting data from one spreadsheet to another. Needless to say, the manual process could create a rather fertile ground for data inconsistencies. The model also had an error which may have had the effect of “muting volatility by a factor of two and of lowering the VaR”.

The Problem with Excel Spreadsheets

The issue here is that Excel does not have the inherent capacity to check for human- or system-introduced errors in an automatic and holistic manner. It will do all kinds of calculations with any kind of data you throw at it. However, as your spreadsheets continually become more and more complicated, it becomes nearly impossible to track changes and ensure that your model is still “sane”. As you start sharing the spreadsheets with multiple people who may work on different sets of the formulas or data, it becomes even more problematic.

How to Minimize Negative Effects of Using Excel Spreadsheets

To minimize negative effects of potential data/model issues, companies can try to limit the spreadsheet usage to very specific, high-value, limited in scope areas. However, the business need of creating dramatically complex, company-wide, integrated business planning and budgeting systems will still need to be addressed.

From the organizational perspective, companies can try to be very focused and ensure that the right processes are in place to catch any potential data or modeling errors.

From the technology point of view, businesses can seek out next generation alternatives which should have the same or surpassing levels of agility and rapidness of Excel modeling but also provide out-of-the-box, automatic truth maintenance features such as:

  • automatic equations generation, so the users do not have to go over every single line of code to ensure they are still not broken
  • mass / energy balancing, which would account for every single resource, material, and process entering and exiting the system for the given time period(s), so that no unaccounted items, quantities, or capacities can be left out by accident
  • financial auditing, which would automatically build financial statements, linking all the transactions occurring within the model and ensuring model sanity from the accounting perspective

This kind of technology would significantly minimize the time required to verify data and logic sanity and reduce the amount of re-work that would go into fixing any issues, especially with millions of rows of data involved.