You probably shouldn’t use a spreadsheet for important work

Following the Reinhart-Rogoff case, where famous scientists go formulas wrong in the Excel spreadsheet that supported their research, a lot of people commented on the adequacy of a spreadsheet tool for important work.

Excel does have one tremendous benefit: it is accessible. Most people using spreadsheets don’t even realize that they are programming. In the Reinhart-Rogoff case, this accessibility was a great virtue: it allowed a regular PhD student to verify the computations.

However, there are several critical problems with a tool like Excel that need to be widely known:

  • Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically.
  • Spreadsheets make code reviews impractical. To inspect the code, you need to look at every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake.
  • Spreadsheet encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?

Unfortunately, spreadsheet programming is far more common in research than we would like to admit. I keep reviewing research manuscripts where the figures were obviously made with Excel. It is also very widespread in business: decisions worth millions (if not billions) of dollars are taken on the basis of a spreadsheet all the time.

Professionals should avoid spreadsheets for activities where mistakes matter. Reinhart and Rogoff should have used a bona fide programming language with proper testing, code review and documentation.

Further reading: Lotus Improv was an early attempt to build a spreadsheet tool that did not have some of these problems. It was a market failure. (Credit: Preston L. Bannister)

7 thoughts on “You probably shouldn’t use a spreadsheet for important work”

  1. Here, here!

    As a programmer, I was shocked when I read the Reinhart-Rogoff calculations were done on Excel. As a researcher, however, I was not.

    At least tools like Matlab make the algorithms easily verifiable, the idea of writing tests in research is still an anomaly. The research groups that open source their projects do a much better job at this.

  2. IMHO, the biggest problem with Excel is that it is difficult to place logic in separate plain-text source files. By wrapping the logic and the data up into opaque binary .xls files, standard management tools like “diff” and “blame” cannot be used, which makes it extraordinarily and unnecessarily hard to effectively manage and review business critical (not to mention global-economy-critical) logic stored in Excel.

  3. Two nice blog posts on spreadsheet.

    But if we want to manipulate cross-referenced data on an intranet, do you have an idea of the best opensource tools or libraries to substitute to Excel usage ?

  4. Thanks for posting this.

    IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.

  5. I haven’t used Excel in a few years, but it certainly used to have a control (I think it was Cntl-`) to display the formula in each cell instead of the values.

    Now, I agree it’s still a nightmare to read and validate pages of cells with crazy formulae in them, but it’s slightly better than clicking into each cell and should help spot discrepancies like “this cell is doing sum(A10:A50) and the one next to it is doing sum(B10:B35). But just a FYI.

    Ironically enough I learnt this when supporting some people doing self-directed learning on Excel at a very low level. It really started from “this is how you turn the computer on, this is how you start Excel” in lesson 1 and this was in about lesson 3. I’d been using Excel for keeping accounts and the like for several years and training people to use it for a few by then. And, as you can, memory of the keystroke clearly lingered because I still never used it!

  6. I dare to disagree.

    People have good reasons to use spreadsheets, rather than telling them not to use spreadsheets we, as academics, should show how to fix spreadsheets. Or programming languages, for that matter.

    The one huge killer feature of spreadsheets is live programming! Spreadsheet are always running, output is immediate and input always has concrete values rather than being abstract variable names.

    Telling people to use “bona fide” programming languages instead of spreadsheets is like telling people to read musical scores rather than listening to recorded concert sessions. Nobody but a few highly skilled experts get the same value out of it.

    Here’s what we can and should do about it,

    — Fix spreadsheets to support best practices like testing, code reviews and version control.

    — Fix programming languages to be live, ie to be always running, having immediate output and concrete input available at all times of the programming activity.

    Academics telling people to change their behavior has never worked and will never work. People are not stupid, they are fully aware of the shortcomings of spreadsheets but use them because their added value (ie live programming) is just so much bigger. Where we, as academics, can add value to this is by fixing spreadsheets and programming language.

  7. Ok, yes, fair point.

    I agree that both spreadsheets and traditional programming tools are badly, badly broken, although for different reasons.

    I totally agree with your “liveness” argument — it is a massive feature, and it is something that mainstream software development environments badly need.

    I am super excited by the attention that projects like light table are getting, and hope that they spur others onto the same bandwagon.

    I also occasionally implement a poor-man’s version of “live coding” when writing python scripts by having my unit tests run every time I save one of my source files.

    My main problem with Excel is not the way that people interact with it; I am fine with spreadsheet programming, and would not expect people to have to learn a programming language just to edit a spreadsheet! The main problem that I have is that it is not (out-of-the box) possible to extract the formulae in a spreadsheet into a plain text file, merge it with somebody else’s changes, and import the merged document back into the main spreadsheet.

    Now, I admit, the .xlsx format does take some steps to address these concerns, but the format is not particularly transparent, well documented, simple or readable.

    As for your proposed fixes, I can do nothing else but support them wholeheartedly. I wonder if we should turn to KickStarter to try to fund an Excel alternative that sits on top of Python? (A highly scriptable spreadsheet implemented in the spirit of Sublime Text)?

Leave a Reply

Your email address will not be published. Required fields are marked *