# Formal definitions are less useful than you think

There is a widely held belief that shared formal definitions improve collaboration. Certainly, most scientists share several unambiguous definitions. For example, there cannot be a disagreement as to what 2+2 is.

In crafting a research paper, it is important to keep ambiguities to a minimum. You do not want the reader to keep on wondering what you mean. Yet many highly useful research papers contain few, if any, formal definitions. In fact, entire fields exist without shared formal definitions. One such field is OLAP: the craft of multidimensional databases. The term was coined by Codd in 1993, yet, as of 2007, I have no yet seen a formal definition, shared or not, of what OLAP is! But it gets more interesting: even the common terms in the field, such as *dimension*, are fuzzy. What is a dimension in OLAP depends very much on who is holding the pen. Yet there is no crisis lurking and people do get along.

In short, you do not need shared formal definitions to be productive as a group. A good research paper does not need to introduce formal definitions. Your research papers will be slightly ambiguous.

There are fundamental reasons why that is. For one thing, formal definitions are often intractable!

Let us consider an example. You might think that the equality between two numbers is one such “fact” that is well defined and poses no problem.

Really? How do you check if two numbers are equal? Computers tend to use floating point numbers. My computer is telling me, right now, that 8/9 = 0.88888888888888884. Does this equality hold? Formally, it does not. Yet, a computer cannot represent 8/9 exactly using floating point numbers, so most software will say that two numbers are equal if they differ by a small amount (the tolerance factor) to account for the numerical error. Two scientist might have a different tolerance factors, and so, the **equality of two floating point numbers is not subject to a shared formal definition**. Ah! But you may object that computers can represent numbers symbolically. However, checking the equality of two symbolic expressions is intractable in general.

So, we cannot even share a common, tractable definition of what it means for two numbers to be equal. This should entice us to be quite modest in our attempts to “formalize our knowledge.”

**Update**: Our modern civilization relies on relational database. They are everywhere, from bank account to this web site. Yet relational databases are logically inconsistent. Does it matter? Apparently, it matters very little.

(Remember, that in the AI wars, I am a ‘scruffy’)

The fact that ‘your computer’ is storing 8/9 as a floating point is an artifact of the programming language/math system you are using. Languages/systems that support a ‘full numeric tower’ can store rationals, and it’s very nice when that numeric equality is formally defined, as in the Scheme spec, or the Common Lisp spec.

It really does matter, in practical terms; real engineering disasters occur when the numeric representations and translations and limits are not understood or not formally defined.

for example:

http://www.ima.umn.edu/~arnold/disasters/ariane.html

Comment by Will — 5/12/2007 @ 10:37

I agree. The equality between two numbers is not a trivial issue.

Trying to figure out what x=y means for an engineer and whether it means the same thing for another engineer, is hard.

Hence, we have to expect semantic technologies (such as the ones you are no doubt working on) to have hard limits.

Comment by Daniel Lemire — 5/12/2007 @ 11:22

One point of this is to develop technologies with ‘soft’ limits. For example, checking whether 2 numbers are within some delta of one another, rather than equal. For the semantic technologies, we are using lots of statistical techniques in combination with symbolic techniques.

Another point is be careful of assumptions, and, to as great extent as practical, don’t build those into the engineered system. For example, it probably actually does not make sense to have a ‘=’ function for floating point numbers.

A third point is that ‘equality’ is tricky, and almost always means ‘equality for some purpose.’

Comment by Will — 5/12/2007 @ 11:30

Some years ago (at BNR) I was mired in the problem of floating-point representations for real numbers and, in an effort to obey the formal definitions for arithmetic operations – such as Commutativity and Distribution – we developed a constraint programming system on bounded intervals. It was an interesting system.

There’s a significant literature on the subject

http://www.google.ca/search?q=constraints+on+interval+logic+programming

You can also compute real numbers using continued fractions:

http://en.wikipedia.org/wiki/Continued_fraction

Comment by Andre Vellino — 5/12/2007 @ 16:06

The problem, from a philosophical point of view, is with the concept of definition itself. Descartes, Frege, Brouwer, and so many others have tackled with it. But no definitive concept of definition was vastly accepted. Why? Because, simply, as turns out, not that any concept will do, it is just that a definition generally relies on an overlapping conceptual schema, build on a proposed convention by which it therefore becomes possible to infer or deduce what is so being demonstrated. We then can see that the concept of definition is a concept of a larger theory of demonstration or of proof theory, which has to contain some extraneous … defintions of its own in order to fulfill its function: semantics. And we then go back to Göddell and understand a little bit more of this intricate relation in between syntaxis and semantics.

Comment by Jean Robillard — 6/12/2007 @ 10:10