We're sorry but this page doesn't work properly without JavaScript enabled. Please enable it to continue.
Feedback

Functional programming in Excel

00:00

Formal Metadata

Title
Functional programming in Excel
Title of Series
Number of Parts
133
Author
License
CC Attribution - NonCommercial - ShareAlike 3.0 Unported:
You are free to use, adapt and copy, distribute and transmit the work or content in adapted or unchanged form for any legal and non-commercial purpose as long as the work is attributed to the author in the manner specified by the author or licensor and the work or content is shared also in adapted form only under the conditions of this
Identifiers
Publisher
Release Date
Language

Content Metadata

Subject Area
Genre
Normed vector spaceComputer programmingFunctional programmingUniverse (mathematics)Functional programmingTwitterComputer programmingMereologyCoefficient of determinationPersonal digital assistantBasis <Mathematik>SpreadsheetJSONXMLUMLComputer animation
Software developerDirected graphMaxima and minimaStandard deviationComputer configurationoutputBit rateMusical ensembleSpreadsheetInternet forumSpreadsheetComputer programming2 (number)Message passingCodeTuring-MaschineCalculationRow (database)SoftwareView (database)Well-formed formulaProgrammer (hardware)Tape driveLengthProof theoryProgramming languageType theoryPoint (geometry)Visualization (computer graphics)Mobile appDisk read-and-write headReal numberPattern languageJava appletDomain nameSoftware developerIterationCondition numberSelf-organizationBitPrototypeVirtual machineMereologyZoom lensPerspective (visual)1 (number)ResultantWebsiteFinite-state machineForm (programming)Computer animation
Electronic program guideVirtual machineSoftware developerSummierbarkeitVideo gameTuring testInternet forumVideoconferencingSpreadsheetBitSpreadsheetComputer programmingSoftware engineeringAverageSoftwareType theorySource codeField (computer science)Pivot elementCalculationTable (information)Software maintenanceWell-formed formulaComplex (psychology)Visualization (computer graphics)Multiplication signCheat <Computerspiel>Computer animation
Kolmogorov complexitySoftware developerNormed vector spaceSource codeComplex (psychology)SpreadsheetProgramming languageVisualization (computer graphics)QuicksortResultantInheritance (object-oriented programming)Computer programmingCodeNetwork topologyDistanceCycle (graph theory)Well-formed formulaVideo gameCheat <Computerspiel>File formatCompilation albumComputer animation
Functional programmingProgramming languageFunction (mathematics)Sound effectSocial classPolymorphism (materials science)Software developerFunctional programmingCellular automatonPresentation of a groupWell-formed formulaSpreadsheetSound effectState of matterService (economics)Real numberSoftware developerProgramming languageMassComputer programmingSampling (statistics)XMLComputer animationUML
Software developerSquare numberoutputSummierbarkeitSpreadsheetComputer programmingFunctional programmingCanonical ensembleStatement (computer science)Sign (mathematics)Square numberIntegerSource codeRange (statistics)Electronic mailing listRow (database)Well-formed formulaCellular automatonMultiplication signSubject indexingDrag (physics)outputSummierbarkeitForm (programming)LaceMathematicsLevel (video gaming)NumberBitComputer animationMeeting/Interview
Software developerSquare numberoutputWeb pageFreewareWell-formed formulaCellular automatonElement (mathematics)BitQuicksortSquare numberoutputFunctional programmingResultantSpacetimeCellular automatonLevel (video gaming)Multiplication signReduction of orderSummierbarkeitWell-formed formulaType theorySpreadsheetElectronic mailing listRow (database)MathematicsRange (statistics)Real numberEntire functionBoolean algebraElement (mathematics)Binary multiplierShift operatorSpecial functionsForm (programming)AverageMultiplicationMassFile formatGame controllerLetterpress printingSheaf (mathematics)CalculationBit rateDifferent (Kate Ryan album)Computer animation
Software developerError messageoutputRandom numberError messageBitInterior (topology)Range (statistics)CASE <Informatik>outputMonad (category theory)MathematicsCalculationIntegerData typeReading (process)Multiplication signOrder (biology)MassArmLevel (video gaming)Moment (mathematics)Computer animation
Computer programmingSoftware developerWeb pageParameter (computer programming)RecursionMaizeMaxima and minimaSpreadsheetReal numberComputer programmingMaizeCellular automatonMaxima and minimaGame controllerAlgorithmSquare numberGraph (mathematics)System callInheritance (object-oriented programming)BitSource codeComputer animation
Software developerMaizeRecursionMaxima and minimaCellular automatonFunctional programmingMaxima and minimaSystem callField (computer science)Greedy algorithmMultiplication signMeeting/InterviewComputer animation
MaizeAerodynamicsMaxima and minimaSummierbarkeitCellular automatonMaxima and minimaMaizeTerm (mathematics)State of matterSemiconductor memoryComputer programmingMultiplication signType theorySystem callEntire functionDynamical systemFluid staticsDifferent (Kate Ryan album)Intermediate value theoremLevel (video gaming)RecursionMereologyCheat <Computerspiel>QuicksortInsertion lossOrder (biology)Disk read-and-write headComputer animation
TrigonometryParsingLine (geometry)EmpennageWell-formed formulaTerm (mathematics)BitMaizeExecution unitForm (programming)Computer animation
TrigonometryFreewareExpressionLetterpress printingWell-formed formulaGraphical user interfaceMaxima and minimaComputer programmingMaizeSymbol tableVideo gameType theoryMereologySoftware testingPRINCE2Video game consoleControl flowComputer animation
MaizeSemiconductor memoryLine (geometry)Maxima and minimaDirection (geometry)Right angleComputer animation
Cellular automatonBitMaizeWell-formed formulaLetterpress printingMedical imagingInstance (computer science)Order (biology)Computer animation
SummierbarkeitWell-formed formulaBitPRINCE2Right anglePointer (computer programming)String (computer science)Process (computing)Dynamical systemAlgorithmSoftware bugLine (geometry)Computer programmingMoment (mathematics)Multiplication signComputer-assisted translationCycle (graph theory)Insertion lossComputer animation
Software developerProgrammable read-only memorySoftware engineeringString (computer science)SpreadsheetCodeProduct (business)Proper mapFluid staticsComputer programmingSpreadsheetCodeOnline helpPresentation of a groupPhysical lawData managementSoftware engineeringSoftwareState observerComputer animation
Software developerWindowArchitectureComputer programForm (programming)Online helpView (database)WeightProgrammable read-only memoryPhysical systemGeneric programmingNormed vector spaceMenu (computing)Complex (psychology)WritingGoodness of fitCodeLine (geometry)Software testingVisualization (computer graphics)Integrated development environmentGraph coloringSoftware maintenanceCore dumpCheat <Computerspiel>SpreadsheetSource codeComputer animation
Software developerSimilarity (geometry)DivisorParameter (computer programming)Convex hullSummierbarkeitCondition numberFunction (mathematics)Maxima and minimaBit rateMatroidSpreadsheetCore dumpCodeWell-formed formulaContext awarenessForm (programming)MereologyFile formatMultiplication signTerm (mathematics)Closed setInternetworkingComplex (psychology)Condition numberCalculationFormal grammarGoodness of fitPay televisionProgramming languageCellular automatonStudent's t-testMathematicsDivisorCloningParameter (computer programming)Different (Kate Ryan album)Metropolitan area networkGroup actionMessage passing1 (number)NumberResultantExponentiationEndliche ModelltheorieSummierbarkeitComputer animation
Hill differential equationInternet forumSoftware developerMereologyCellular automatonWell-formed formulaParameter (computer programming)Electronic mailing listDifferent (Kate Ryan album)SpreadsheetLogical constantStructural loadGroup actionComa BerenicesForm (programming)Code refactoring
Software developerTotal S.A.MathematicsStatisticsEXCELInclusion mapAverageWave packetForm (programming)Group actionSpreadsheetWell-formed formulaSummierbarkeitCountingRewritingPattern languageCode refactoringTransformation (genetics)Programming languageSpeech synthesisDivisorProblemorientierte ProgrammiersprachePhysical systemComputer animation
Transformation (genetics)Programming languageRoundingMaxima and minimaCountingSummierbarkeitPRINCE2Computer programmingSpreadsheetSummierbarkeitLink (knot theory)Type theoryComplex (psychology)Code refactoringQuicksortRange (statistics)NumberPlug-in (computing)Pattern languageAverageCountingLibrary (computing)Program slicingForm (programming)DivisorWebsiteGroup actionComputer animation
Software developerSoftware testingAddress spaceError messageSummierbarkeitLocal ringProgrammable read-only memoryComputer programWeight functionWeb pageWorkstation <Musikinstrument>Wechselseitige InformationComputer fileEmpennageLie groupMenu (computing)MassCellular automatonInterior (topology)Maxima and minimaSheaf (mathematics)Asynchronous Transfer ModeSpreadsheetSoftware developerSoftware testingData managementCodeTowerWell-formed formulaCondition numberFunction (mathematics)Message passingAreaMetric systemType theorySummierbarkeitSuite (music)BitBound stateTraffic reportingStatement (computer science)Software bugBranch (computer science)Functional programmingCellular automatonComplex (psychology)CalculationFeedbackPower (physics)Natural languageData typeMereologyFile formatWebsiteForm (programming)PrototypeResultantProjective planeSweep line algorithmCovering spacePurchasingMatching (graph theory)Set (mathematics)Lecture/ConferenceComputer animation
Software developerWebsitePrototypeSoftware testingSpreadsheetSlide ruleReal numberSuite (music)Human migrationPhysical systemProjective planeVideo game
Software developerCodeCodeMessage passingProper mapSlide ruleBitEntire function2 (number)SpreadsheetMathematical optimizationProgramming languagePatch (Unix)Cellular automatonPurchasingGroup actionMaxima and minimaSimilarity (geometry)Functional programmingSoftware engineeringAlgorithmPattern languageLecture/Conference
Software developerRevision controlMoment (mathematics)Different (Kate Ryan album)CalculationCellular automatonSoftware testingProcess (computing)Envelope (mathematics)Rule of inferenceQuicksortComputer filePhysical systemFile formatForm (programming)SpreadsheetWell-formed formulaProper mapType theoryValidity (statistics)BitGame controllerCondition numberMereologyFitness functionPerfect groupFlow separationVideo gameWebsiteProgrammable read-only memoryDampingExecution unitScaling (geometry)Student's t-testSoftware frameworkArithmetic meanSoftware developerEmailUser-defined functionLatent heatInstance (computer science)Perspective (visual)Table (information)Projective planeRow (database)Scripting languageLevel (video gaming)Source codePoint (geometry)AreaText editorProgramming languageMechanism designWhiteboardVisualization (computer graphics)1 (number)Group actionDataflowDivisorShared memoryWritingContext awarenessoutputAuthorizationResampling (statistics)Code refactoringUnit testingGoodness of fitPlug-in (computing)Software engineeringReal numberDifferenz <Mathematik>Error messageEndliche ModelltheoriePivot elementDigitale VideotechnikCurvature
Transcript: English(auto-generated)
Good morning everyone. My name is Feline Hermans. I'm an assistant professor at Delft University of Technology and I'm going to talk about functional programming in Excel. This is me on Twitter, by the way. So if you're also on Twitter and you like my talk or you think it's crazy or awful, send me tweets
because part of this material I've never done it before. So I'm really looking for what the parts are that are cool and what are the things maybe I should change. So first of all, you might think, you probably think, you know, why programming in a spreadsheet? But spreadsheets, I think, are mislabeled. Many people think of spreadsheets as being data,
where spreadsheets are actually code. This is already 30 seconds in the main message of my talk. Spreadsheets are code. That's absolutely what you should remember. And you might not believe me, but I have three reasons why you actually should consider spreadsheets as pieces of programming.
Firstly, spreadsheets are often used for very similar problems. So what you see here is an investment calculation. You put some data in. Something is calculated and you get a result. You can program this in any language. You could make an app with Objective-C or you could do it in C Sharp or Java. This problem domain is very generic
and you can use spreadsheets as well. Of course, you could wonder why in a spreadsheet, why do people in banks and insurance companies, why do they use spreadsheets for these type of calculations? I have asked this to people that I've worked with in these type of organizations, and it turns out us software developers aren't actually very good at making software in the view of users
because these people often, they have asked their favorite IT supplier either within the company or outside of the company, hey, I have this calculation. I've already prototyped it in a spreadsheet. Can you make this into software? And then the answer will be, sure, that will be six months and one million euros.
And then if that were even true, maybe it wasn't that bad, but often then it is three years and ten million euros, and then it only has half the features they want. And it is already there. How hard can it be? So in the views of these users, they think they are better programmers. They're more efficient programmers than programmers are
because it's just very hard to get software done. But from their perspective, where is the money going if I already build it? So I go to great lengths to make my point. To such great lengths, in fact, that I have implemented a Turing machine using only spreadsheet formulas
to show that spreadsheet formulas are Turing-complete. So it's a real programming language. You can't say, no, it's not, because, look, a Turing machine. And I think, apart from this being proof that spreadsheets are Turing-complete, I think it's also a pretty nice visualization of a Turing machine. So one row is one iteration of the tape,
and with conditional formatting, I move the head of the Turing machine over the tape so you can really nicely see what is going on. And when we make bigger tape machines, you zoom out and you really see a pattern of the Turing machine moving over the tape. So it's a pretty nifty thing, if I may say so. They are Turing-complete.
And many people really seem to like this thing. It brought my blog down for a little bit because it was on Boing Boing, and it drew lots of traffic. So it's a fun thing. So the third reason that spreadsheets should be considered pieces of programming is they suffer from typical software engineering problems.
A spreadsheet, on average, only one in three spreadsheets have a manual, have documentation. Spreadsheets are used by an average of 12 different people over a lifespan that averages five years. So this really sounds like legacy software. This induces the type of problems that we had in the 70s and the 80s
when you realize that a program that you wrote 10 years ago was still there, and this happens to spreadsheets as well. We've seen spreadsheets in the field from 1997 that were still used for important calculations within companies. And of course, during that time, people had added formulas and more data sources and more pivot tables and visualization,
so it grows, and then you get typical maintenance problems. So in summary, both the things you do with spreadsheets, their complexity and the problems that they suffer from, are really similar to source code. Spreadsheets are code. But I'm going to go a bit further than that.
It's just not any programming language. I'm arguing that spreadsheets are the next programming language that you should learn. Resistance is futile. Really, spreadsheets are going to conquer the world. Who knows who made this? Does anyone know from which talk this visualization is?
No, no, it doesn't have something to do with F-sharp. It's made by a guy called Brett Victor in a talk inventing on principle. It's a really, really cool talk. You should totally see it. But the idea that he proposes here, and actually it's not his idea. It's an old idea, but he's sort of brought it back in fashion, is called live programming.
So on the one hand side, you have the code. And on the other hand, you have the instantiation of the code, the running code. And if you change something here, then immediately the tree will update. So you don't have to go through the edit and compile cycle. You just change something, and immediately it changes. So, I mean, that's super cool, everyone talking about live programming.
But we have that since VisiCalc. Spreadsheets are live programming. There's no code behind. I mean, there's very small distance between the thing you're editing and what you see immediately if you update a spreadsheet formula. And the results are right there. So they're live. Who likes functional programming?
Everyone likes functional programming. There's also an idea that's pretty old and suddenly somehow back in fashion. You know what's also functional programming? Excel. A formula can only take in values from other cells and present a value in that cell. It's impossible for a spreadsheet formula to make side effects
in another cell to change states. The only thing is I take values and I return something. So if you don't take VBA into account, formulas only are purely functional. And the third reason that it's really, really important, I think, for developers to know about spreadsheets is
everyone knows about spreadsheets but us. Really, everyone. I think Microsoft says that there are 750 million users of Excel and there was a census done in the Netherlands where they interviewed a random sample of the population of the Netherlands and asked people if they can do formulas in a spreadsheet.
And 52% of inhabitants of the Netherlands can do formulas in a spreadsheet. But I don't think there are lots of developers among them because often we pick real programming language instead of looking at what's already there. I think it's really nice if you have a customer that comes to you with a spreadsheet,
you have a sense of what's there, you speak their language a little bit. That's another reason why it's really cool to do some stuff with spreadsheets. I'm assuming you are convinced that spreadsheets are programming and you should learn about them. So let me show you a few pretty nifty programming problems expressed in spreadsheets. So if you Google for what is functional programming,
you find this statement, my Uncle Bob, saying that functional programming is programming without assignment statements. All right. And he has an example. He says the canonical example of functional programming is taking the square of integers. Well, let's start with that. That's simple. So we have a few input values here,
and what we want to do is calculate the square of those numbers. Well, that's easy enough. You can just multiply that value by itself and you have it. And what you can do then is drag the formula down. And you could say that that is something like a map. This is the input that you have.
This is the function definition. And then dragging the formula down is mapping the same function to all of the values in a cell. So that's really functional. But of course, you're going to say, yeah, that's ugly because this is A1 times A1. But we can make it a little bit nicer.
If we call these values input, we can give those cells a name. It's called a named range. So we call them input. And what we can do then is not only take the sum of those values, but also we can directly refer to that value. And what that means is here we get the intersection of this range
and the cell, the row that we're in. So it looks at the named range and picks the value that we're in. And now we can do this. We can make it into input times input and drag that down. And now we have the same formula everywhere. That's a lot nicer. That is more readable and more like programming
because it doesn't depend anymore on the cell value. You've given that range a name and you can really use it as a function. It's almost like you have a list and you get a value from a list and you don't even need an index because that's implicit. Pretty nifty. Let's make it a bit more exciting because, well, this is still pretty simple.
Suppose we want to have the sum of all squares and not just all squares. We only want to sum all the values that are above 30. So what you could do is you could add another column. And this is, I think, something that most people know. Let's call this square like we did with input.
And then we can here make a formula. Is square bigger than 30? And then you can use a sum if. That's a special function in Excel with a special syntax that says I'm going to look at this range. If it has this value, then I'm going to take the corresponding value
from another range and sum that. So it's nice, but it's very specific. You can only use it for sum if and suppose you want to do something else, then you can do that. And also, this uses lots of real estate in our grid because we need those two auxiliary columns
and if we're doing lots of calculations, then our grid will be filled with all those intermediary values and maybe we don't want them. So there's another trick we can use in Excel called array formulas. And this is where you calculate with a list at once. So I would like to be able to multiply this whole range with itself.
So if I try that, if I just multiply input with input, I get what we already had here. And what happens then, as I said, is this input is immediately converted into a cell because it's looking at what is on this row in the input value.
But we can force Excel to change that. This first takes the intersection and then multiplies. We can use Ctrl-Shift-Enter and that turns this formula into an array formula. So if I do that, then the value changes. See here, it was 25, but if it's evaluated as an array, then suddenly the value is 1.
Why is this 1? These are the types of things you're like, what is exactly happening? So what's happening is instead of taking the cell here, it's multiplying this entire array with itself pairwise. So the result of this formula is again an array of all the values,
of all the squares are now in it. However, there is no room in the cell to show the entire array. So what Excel does is just gives you the first value of that list. But if you give it space, if you say in all these cells,
I want to see the result of that multiplication, let's do that, give it a bit of space, and we type the same array formula, you get the first few elements of the list. So if you give it space, you will get the entire list. So it's manipulating now an entire list times an entire list, resulting in an entire list of which we get a few.
So that's nice, but we can also reduce those lists. We can, instead of taking space in the spreadsheet and printing the entire array, we can also squash it immediately. So we can take the sum of input times input, and then don't forget to do it control shift enter
so you get an array evaluation, and then this immediately multiplies these values with itself and sums it. And we can go even further. We can add a filter, either with an if, so we can say if it's above zero, and this is a trick that many Excel users use.
What you can also do is multiply with a boolean value. So here you have input times input is bigger than 30, and again, this is element-wise, so this results in a list of booleans. And then if you calculate with those booleans, they are coerced into an integer, so then false is zero and the rest is one.
So you can just multiply this, multiply it, whether or not it's above 30. And then in one formula, we've done the same as here with the sum if, and we don't need all of this. And also, as I said, there are only a few of those. You have sum if and count if and average if,
but using those array formulas, you can use whatever aggregation you want on the entire list. So you might think, wow, that's pretty complicated, but spreadsheet users, especially in finance, this is really common, and people know how to do this. So it's basically like a map reduce. You have values, you map to something,
and then you have a certain reduce function. This is no problem at all for many spreadsheet users, which I think is pretty interesting, because probably if you try to explain them what map reduce is, they wouldn't know. But if you show them this, no problem. Just, you know, really, go to the finance department next week. Ask people, what does this mean?
They'll be like, yes. They know array formulas. So another very, very interesting thing happens in Excel if you talk about error values. So let's dive a little bit into how error values in Excel are handled. So again, suppose I have an input array, and I'm going to do something.
I'm going to do something very dangerous. I have random data copied as values now, so it doesn't change all the time. I'm going to do something dangerous. I'm dividing by that random data, and it can be zero. Oh, so dangerous. So this, these are values, but if I'm dividing by zero, then I get an error.
So what is the data type of this? I know Excel doesn't really work with data types, but suppose you would explain to someone what is in that range. What would it be?
It's a maybe. It's a maybe-it. Either it's a value, or it's divided by zero. Isn't that nice? And Excel, and also Excel users, they work with this problematically, without any problems. So suppose we're doing some more calculation on our maybe.
We add 12 to it. Here you just get a value, and now you get another error. So that plus is just lifted, and it's not working. Yes, it's not working on a maybe. So that plus can work on two integers,
but it can also work on something that might be an integer. So these are monads. In case you haven't noticed, it's a monad. Error values in Excel, they are monads. So everyone in the world, again, 750 million users of Excel, they use monads. They don't talk about it all the time
and try to explain people what they are. They just do it, and it works. It's totally natural for people to use error values in this way. Well, it works, or it doesn't work. So if you're ever again confused about monads or you want to explain it to people, I really think this is a pretty intuitive and nice example
of what a maybe is and how a calculation flawlessly just works on either an int or something that might return in a value, and the error values are propagated. So, ta-da, monads. These were all very simple, simple programming examples,
just multiplication and dividing. That's a bit toy-exampling. So let's solve a real programming problem using a spreadsheet, and I will show you again that it's really intuitive and it can be fun. So suppose I have a maze. I live in a world filled with money.
In this maze, all of the squares in my maze have a little sack of money in them. Super nice world. And I want to walk through the maze and collect as much money as possible, but I can't go through all the cells. I can only go right or down. So I have to make a path that's something like this through the maze,
and I want to collect as much money as possible. And this still sounds like a little bit of an artificial problem. However, there are many real-world problems, like shortest path algorithms that are something like that, where you want to calculate what is the best way to traverse a graph or a square. So we can do that.
The recursive solution for this is actually pretty easy. So if we start here, let's say we want to have the value for O15. That's this cell. It's not very hard to calculate it because either we go that way or we go that way. So we can do a recursive call where we just say,
well, that way or that way. We want to have the maximum, and then we recursively call the same value on this way or that way, and we add the value that's in this cell. This is a solution. This is a perfectly fine solution. However, there's a problem with this. It's not really a very good solution.
Does anyone know why this is not a good solution, what could be improved? Yes, sure, sure. So we're going to define a recursive function for each of the cells, and we define that recursive function as the value of this guy is the maximum of a recursive call on this field
or the recursive call on that field, and then we propagate that. I mean, I have 15 here, but obviously that generalizes to n. So for all of the cells, every time I take the maximum of the two fields that I could have come from, and I propagate that.
Yes, so the solution is sort of greedy as it takes everything, and there could be a better solution. And another problem with this is that it's very expensive.
It's very inefficient because suppose I have a random path. Every time I do a recursive call, so I go here, I'm calculating this path, all the others as well, but this. And then I go here, and then again I'm calculating the value of that path. So what this is called is overlapping subproblems.
So there are lots of different ways in which you can break down the problem, but many of them are similar. So you take that path all the time. So it is not very efficient. It's a working solution, but it's pretty expensive. So there's a technique called dynamic programming
that we can use to attack this problem. And this name is immensely confusing because you have dynamic and static typing, so people immediately start to think about that. It has nothing to do with that. I don't really know why it's called dynamic programming, but this is how it's called. So forget that this is about dynamic types.
Don't let yourself be confused by this name. So the idea of dynamic programming on such problems is we're going to save intermediate values in the cell. We're sort of taking the problem and flipping it on its head, and instead of starting here and recursing back, we're going to start there.
So in another worksheet that I made called max, we're going to fill all the cells here, and at every cell, we're saving the maximum value we could have gotten at that stage in the maze. So here it's pretty easy for this first column because the maximum value we have,
there's only one way we could have gotten here, and that's by going down, because you can only go down or left, so there's no way you can do this to make more money. So that's easy. Similarly, this is very easy because exactly the same,
the only way we're going to ever get there is by taking this path. Now in the middle is where it gets a little bit interesting. So suppose we want to have the value for this 2 here. What we do is we take the maximum of the two cells that we could have come from, and to it we add the value that is here.
So we're still exploring the entire state space, we're just saving parts of it in corresponding cells. So here I came from that way, or I came from that way. I take the maximum of those two, and I add the value, I go back to the maze, and add the value that's in the maze. And in this way, we're trading expensiveness
in terms of lots of recursive calls for expensiveness in terms of using lots of memory. So I'm not really cheating and solving it in a better way, it's just this way is quicker but uses lots of memory, whereas the other solution is slower but doesn't use any memory because you don't have to save all these things.
So with that, the problem is solved. With just one nice little formula in Excel, and a little bit of auxiliary work in terms of making a worksheet, we have the solution. We can go through this maze and have 2,062 money units. So that's nice. If you compare that solution to here's a similar problem solved in C,
the Excel solution doesn't look that bad, right? And if you do it in Haskell, it's probably very hard to write that. I mean, it's hard to come up with that, whereas here, it's a pretty simple one formula.
And of course, that is because in the C solution, you need to have formulas or expressions for saving things and printing things, and all the GUI stuff in Excel, you get it for free. And what's also really nice, I think, about this type of solution, here the live programming comes in again, is you see what is going on.
You can look at everything that is there. So suppose here I would have a zero or strange error. I would see it immediately. I don't have to debug my program with a break statement or with a print console. I can see what is happening.
And of course, you say, yeah, but I have tests, so I will know what is happening, which of course is true, but tests is like looking at your program through a little hole, and you see part of the problem. But here you can see everything at once, which I think is really nice. I actually participated in a programming competition a while back, and initially I wanted to do it in Haskell
because that's just what you do. It's cool. But after a few problems, we thought, ah, well, this would be nice. This would be easy to do in Excel. Oh, this would be easy as well. Once you get the hang of how these formulas work and you've seen the beauty of being able to look at your whole problem at once,
you really get addicted to that liveness of it, being able to look at everything. Oh, there's something wrong in that formula. You click it, you look at it, and you see what's there. I really like it. Of course, in this solution, there's one problem still left because if we want to know not only the value
but also the path we have to take through the maze to obtain that maximum value, we have to do it manually. We have to say, okay, from here, the maximum is there, so that's where we came from. From here, the maximum is there, so that's where we came from. So we still have to walk through the maze to figure out what was the maximum path that we took.
We can make this. We can solve this as well with a bunch of formulas. So we make another worksheet. Yeah, we use some more memory. And what I'm going to do is with symbols, I'm going to represent the path through the maze. So here, I'm going up. I'm starting here at the end, and I'm going to draw a line through the entire path.
So if I'm here, the only way is up. If I'm here, the only way I can get back to the origin is up. And in a similar way here, the only thing we can do is go right. And here in the middle, of course, the direction we're going to walk through the maze
depends on what the maximum value is. So here, this would be the maximum value, so that's where we're going. So we have a formula like this. If this on the maximum, if that one is the higher, we go up. And if the other one is higher, we go right.
That's already nice, because now we can walk. We can go... It's useful. But of course, this is still too much. We want to have one nice path, so we just need to change our formula a tiny little bit to print a nice little walk through this maze.
Because what we can do is here, for instance, I only want to look at this value if the value below me is pointing at me. Otherwise, I can just make it empty. So I'm going to say, if the value below me is pointing at me,
it's pointing up, then I will print this formula that we already have, and otherwise, I'm printing empty. So this is just one dimension, just the up. And if we drag that, you see here the liveness comes in again. You see our formula works. You can look at it immediately and you see all of the values. If I'm only going up, I see it,
and otherwise, I don't see anything. But this is just the up dimension. We also need to make the right dimension, so we change our formula a little bit more. Still pretty short and simple. If either I'm pointed at from below or I'm pointed at from the left, then I print my pointer. Otherwise, I just print an empty string.
Ta-da! I mean, it's really pretty simple. Of course, you use real estate because you have different worksheets, but you can see it. And if you have made a mistake in your algorithm where you create a cycle or something, you would see that immediately.
I challenge you. Write a dynamic programming algorithm, insert a bug, let your cat walk on the keyboard, but let it compile, of course. You try it a few times. And then try to debug that in Haskell, Java, and then make it in a spreadsheet
and see how easy it is to find your bug. It's really that being able to look at your data, I mean, it's just awesome. So that's everything I wanted to share about how cool spreadsheet programming is. And I guess now you should really be convinced spreadsheets are code. You take programming problems and you can just use a spreadsheet to easily solve them.
And then the question, of course, is now what? Okay, spreadsheets are code. How does this help us in any way? So the observation I've made in my research is if spreadsheets are code, could we use techniques from software engineering to make spreadsheets better? These problems that I spoke of earlier in my presentation,
the long lifespan and lack of documentation, I'm not saying they're solved in software engineering, but we have worked on making those problems more manageable. In software, we have made tools. We have made tools that help people battle complex code. If you look at Visual Studio, there are tools for debugging.
There are tools for testing. There are tools for analyzing your code. These are things not necessarily in an IDE to help you write code, lines of code. They're there to help you write good code, to help you write in a responsible way. If you look at Excel, there's support for making things another font,
for making things bold, and for making things another color. This is all nice, and that definitely could aid maintainability if you have a good layout. But there's nothing really in Excel that helps you test or maintain or analyze your code. So that's the core idea of the research that I've been doing in Delft,
has been, if spreadsheets are so similar, why can't we use IDE-like techniques to help people manage their spreadsheets? And one of the things we've done is we've defined code smells, the idea of code smells by Fowler, for spreadsheet formulas. Is everyone aware with the code smells metaphor?
So if you say a long method is bad, you could say a long formula is bad. So this is long in terms of it has a sum and a divide and a plus and a multiply. A lot of different things are happening in this formula. Maybe you want to split it up in different steps so that it's easier, just like you would split up a long method
in a number of sub-methods. Another thing is, another code smell is a method that has lots of parameters. That is bad. That method is doing crazy stuff. So if you have something like this, a formula that uses this and that and that and that block, they're all over the place. Something is wrong with the design of your spreadsheet.
Something else, conditional complexity. You don't even need the quotes here. I needed the quotes for too many parameters. But conditional complexity, you have ifs, so you can nest them forever. And it will be very, very hard for people to comprehend what's going on, just as very deeply nested conditionals are in whatever language.
You have code clones. For instance, this formula has two sub-formulas that are exactly the same. That's a clone. That's some duplication that you could refactor out into a sub-formula. And that would not only be nice for readability purposes, but also for the performance of your spreadsheet.
Because the way the Excel caching model works, it only caches cell results. So if you put this sub-formula in a cell, it will remember it and only calculate it once. However, if it's not that smart, it doesn't explore all sub-formulas. So now this value, which is an exponent and pretty expensive, is calculated twice. And then you might think,
yeah, but who cares how quick my spreadsheet is? I have one PhD student whose only topic is performance of spreadsheets. And we work with insurance companies and banks that have spreadsheets. Really, we work with an insurance company that had a spreadsheet that took live data off the internet and calculated insurance premiums every month. So they run the spreadsheet.
Their entire department, which is six, seven people, go out for coffee to wait until the thing is finished. And this is not the worst part. They come back, four out of five times it has crashed, and it has to dry again. So speeding up spreadsheet models is definitely a thing that matters. So some of these improvements to spreadsheets could help for performance as well and for readability.
And this part, the part where we have the smell detection, is actually being commercialized into a company right now. The tool is called Perfect Excel, and you can upload your spreadsheet, and you get a list of all the smells we have detected. So you have here, this is feature envy.
So you have a spreadsheet that's not really in the right place. Here you have a formula that references lots of different cell groups. That's too many parameters. You have magic constants that we find. So we really help people to improve their spreadsheet, to look at smells and see how they can make it better. So of course, if you say smells, you say refactoring.
So we had the smell detection tool, and of course people say, yeah, but how do I make it better? And we can give them manual suggestions, but wouldn't it be nicer if we could help them with a tool? So we built a tool called Bumblebee, which is a refactoring tool for spreadsheet formulas.
So here's a very simple example of how you could apply it. You have here a sum divided by a count. That's not an optimal formula, because you could replace that with someone. An average, yes, you could replace that with an average. So maybe this user has never seen or used an average.
That's a possibility. But what we also see in this spreadsheet, if they have a long lifespan, is they just start simple. Maybe he started out with a sum, and then after a while he thought, oh, I also need to divide it, and then it turns into something that could have been more efficient. So you can ask our system, do you have rewrites that you can apply?
And then we say, yes, we have the sum and count to average refactoring. And you get a preview even of how your formula would look like, just as if you have a refactoring tool in an IDE, you get a preview of this is the method that you're going to extract. And if you like it, you can apply it, and then we automatically apply it, and we can also apply it in an entire worksheet or in an entire workbook.
So if you want to get rid of all these patterns of formulas, you can do it immediately. And these languages, these transformations, are programmable in a little language. So if you download it, then it comes with a number of refactorings, but also users can define their own refactoring.
So here you have the sum and average refactoring, the sum of a range divided by the count of the same range is an average. And we think that users that are able to do very complex formulas, or users that use VBA as well, should also be able to define these type of refactoring formulas. It's one of the things that we're exploring.
And another thing is to have them do some sort of programming by demonstration. So do a few refactorings for us, improve a few formulas, and then potentially we can learn patterns from them and add them to the library as well. These are things we're trying to see if that's something that people want. If you want to play with it,
or you know spreadsheet users and you want to encourage them to play with it, you can download the refactoring plugin from my website. All the links will be at the end of the slides as well, so you don't need to take notes. Then of course, if you say refactoring, you say testing, of course. So this was interesting because initially we thought,
how are we going to get spreadsheet users to test? It's already pretty hard to get normal developers to test and also to convince management of developers that testing is an important thing. If these spreadsheets aren't seen as code, how are we going to convince people that they need testing?
So this was typical ivory tower thinking because once we started looking at spreadsheets, we noticed that many people do things like this. So this is a formula already in the spreadsheet without me telling people, hey, test this stuff. People just write this.
If the sum of these guys isn't 100, I'm going to output error, and otherwise I output 100%. These formulas we found are pretty common. About 1 in 10 spreadsheets we found in a huge dataset we use for research have these type of formulas. So they check two values against each other and they output something like okay or no or please check this spreadsheet again.
And it's not that strange because many people, of course, that use spreadsheets are in finance and they are trained to test. They often have these type of calculations where they calculate things this way and that way and they see if the values match up. Is my income and my outcome, is it exactly the same?
Otherwise there is a mistake. There is a bug in my financial reporting. So we found them and then we thought, okay, if people already test, that's something we can use. We can exploit that. So we wrote a little test plug-in called Expector and what it can do is it can detect these type of test formulas. So you can say find existing test in this spreadsheet
and it's going to go through all the formulas, look at what looks like a test and present it to you. So we say, hey, this cell here, it expresses that the sum of this should be 100%. So we parse the formula a little bit and we take out the part, that's the assertion and the part that is the result.
So we can represent it as something like natural language and then if you say yes, that's true, that is a test, it's added to a test suite that you can then run. So you can say run my test and it says one of one have passed, indeed the sum of this is 100%. And then of course if you have a test suite, you can do coverage.
So we can say, hey, this is all tested. This formula is a test formula. We have agreed on that. So then all these cells which are referenced to by that formula, we say they're also tested because if something goes wrong here,
it will show up there and then the test will fail. Of course we're not talking about statement coverage and branch coverage of conditional formulas here. We're just saying, well, this is the area of your spreadsheet that is tested and in the same way you can say what is not tested. So we're just trying to help people get a sense of what the thing is,
how robust the spreadsheet is that they're working with. We added a few functions as well to help people make more tests and we did that by just giving them one formula to test. So you can say I want to test a complex formula and using the smells and the complexity metrics we have, we detect what's the most complex formula.
So here's something, it has two conditions nested. That's probably something that you want to test because that might fail. So we suggest a formula and then you can do something like, okay, it has an upper value, a lower threshold, an upper threshold, or it should be a number, it should be a value, it should be non-empty.
We add a few simple conditions for people to work with and one of the feedbacks we've already gotten is that people would also like to have a free formula where they can say, okay, it has to be exactly this. So they want a bit more power than we provide them just with upper and lower bounds and data types basically.
So if you add a test, we say, wow, nice, you've added, you've increased your coverage. You want to make it a little bit like something that is desirable to have lots of coverage. And then you have this in your test suite. You can run it again and you can show that all the tests have passed. And this too is available for my website. So if you're working with complex spreadsheets, then you could test it.
And we also think, we haven't really tried it yet, but we think that if you were going to do a project where you transform an existing spreadsheet prototype into something like real software, it could help if you, together with spreadsheet users, add some tests first, extract their intuition
from what the spreadsheet should do into a test suite because that will greatly support, we think, such a migration because you can use those tests then as also tests for the migrated systems. You can see if all the things that users of the spreadsheet think should hold still hold.
Could you wait for like five slides because this is the end and then we can do all the questions afterwards. Because, ta-da, this is like my final slide. Spreadsheets are code. That's the main message of the entire talk. And for everyone that maybe was still a bit hungover from the party, I will now summarize my entire talk in 30 seconds.
And if you did get all of it, of course, that is optimal preparation for question asking. Spreadsheets are code because they're used for similar problem domains. They are just as complex as real code and they suffer from typical software engineering problems. Not only that, spreadsheets are live, are functional,
yet everyone knows how to use them. So it's really a cool language. And I showed you that by implementing various little algorithms in Excel, including a nice pathfinder for a maximum pattern. If spreadsheets are code, can we learn anything from the software engineering domain?
That's the whole idea of my research. Yes, we can, is my conclusion. So we build a tool for smell detection, a tool for refactoring spreadsheets, and a tool to help users test their spreadsheets. If you want to know more, this is the website of the smell detection tool that's now in a commercial spinoff company.
This is my own website where you can download the refactoring plugin and the testing plugin. And this is the website of my research group where I force grad students to also like this. And if you want to know more, you can send me a tweet or send me an email, or you can ask a question now.
Yes, that's kind of a hit in my Excel. So the question is, there are other things how you could test as well, so you could use conditional formatting, for example, or data validation,
but it's a bit harder to extract them from the file format than it is to extract those formulas. And also, not with data validation, but the problem with conditional formatting is you never really know if red really means an error or it just means you should pay some attention to this, but the data validation rules would definitely be something we should add.
Yes? Yes, this is a very good question. So what about version control? So there is a company, a British company actually, I think they're in London, called Spreadgit,
which is exactly what you think it is. So there are people working on that. And also, the newer versions of SharePoint have pretty good version control and even authorization mechanisms. So you can say you can edit this part of the spreadsheet but not this. So from a research perspective, it seems to be solved-like.
So there's not too much there, I think, for us to do, but it's definitely an interesting question, because already source code is one-dimensional. So if you add it here and I add it there, then the merge is doable. But suppose I've added a column and then you have added a row, then my column needs to include that cell that you...
Yeah. So there will be, I think, merge conflicts will be harder and you're not only talking, of course, about flat data in a spreadsheet, what if you change things to a pivot table or to a chart? So I think the diffing and the merging can be exciting and hard problem. There is a tool called sheet diff that was a research prototype,
but it doesn't really scale to anything but toy examples. So I would say it's more the merging and diffing than the real version control that will be interesting open problems there for anyone who wants to do a PhD on this. Yeah.
So the question is what is the reaction to the testing framework? It really differs. So it seems, and maybe this holds for normal development as well,
that some people somehow enjoy making everything fit and perfect and having tests and making sure that everything is nice, and other people just want to get stuff done. So it really depends, and this is not just, I think, a personality thing, but I don't think we have enough data for that,
but it also probably depends on the type of company you work in, the type of role and the type of department, because, as I said, if you're a financial controller, your job is to check a spreadsheet and to make sure that all the money in the company is accounted for. You'll probably be more inclined to add tests than if you're just doing some sort of back-of-the-envelope calculation
to see how much you're going to sell somewhere. The type of spreadsheets is also different. Some are more throw-away, use-ones, and some are really long-lived financial models. So I don't know yet is the answer. Yes, like software, yeah, definitely, yeah.
So the question is how do we treat external DLLs and can we handle that?
So I didn't show you, but we also have a tool that visualizes spreadsheets, so it shows all the worksheets, how they relate, and also external sources. So we can see, for instance, if you use a database or if you use another spreadsheet, we can visualize that. But if you use lots of DLLs, then it will probably be possible
because everything will be there in the file because the file has to know where it has to reference. But we haven't looked at that, so it will probably be possible, but it's a pretty specific use case, I think. It sounds like you have a very elaborate system of using Excel as a front-end and doing calculations. I mean, there must be software engineers involved in that project anyway, I would guess.
Yeah, yeah, yeah, yeah.
Yeah, that's a good question. So the question is in source code you can do unit testing and you test a little part of the spreadsheet or the source code.
How would that work for spreadsheets? So it's an interesting question. So on the one hand, you want to test one formula and maybe have 20 inputs put in and go back just to test that formula. That's definitely a possibility. But then you lose sort of the liveness
because if you test with the data that's in there and how it looks normally, it might make more sense to a user if he's testing in the context of the data. If I see a zero there, oh no, that's bad. So I don't know if it would be possible for users to still understand.
Maybe not all users would be able to understand testing a specific part of a spreadsheet separate from the rest. I think it might be a bigger leap than looking at your thing and seeing if everything makes sense. I don't know. It's a good question and it would be an interesting experiment.
Again, you could correlate that with the type of job you're in and the type of company you work for. Maybe some people would be more prone to get it than others.
Yes, we have been thinking about that and probably it wouldn't be very hard to implement because you could detect what the input cells is and just put random between in it all the time and see what tests break. So it's definitely an interesting idea. We haven't built it yet, but it's an obvious next step to do
once you have tests that you can have some random tests.
Yes, yes, yes. You might not know this, but Simon Peas and Jones actually worked on this 20 years ago. So he proposed, and you should read these papers, they're really fun.
He proposed the idea where users could write user-defined functions using formulas. That would be a very elegant solution, obviously, to the problem that the difference between playing with some formulas and doing VB is huge. And also, I don't know if anyone has looked at the VB editor recently,
but they haven't really heard of developments in IDE land. So if you compare that to Visual Studio or anything, then they have hardly any support. So you're throwing users into a very difficult programming language with the Excel spreadsheet basically being one big global variable
you can write from everywhere without proper tool support. So yes, it would definitely be interesting to get that closer. Simon tried it, and I mean, it has been a while, and I know that he really believes in that idea.
I don't know. I mean, I know that people would, I don't know if everyone would like that, but I know it has a thought that I've heard this thought before,
but I don't work for Microsoft too. I don't know. Why from VBA to JavaScript? Why not to F sharp or something sensible? Why? Well, yeah, I mean, given the popularity of JavaScript, of course, it makes sense.