# Formal definitions are less useful than you think

There is a widely held belief that shared formal definitions improve collaboration. Certainly, most scientists share several unambiguous definitions. For example, there cannot be a disagreement as to what 2+2 is.

In crafting a research paper, it is important to keep ambiguities to a minimum. You do not want the reader to keep on wondering what you mean. Yet many highly useful research papers contain few, if any, formal definitions. In fact, entire fields exist without shared formal definitions. One such field is OLAP: the craft of multidimensional databases. The term was coined by Codd in 1993, yet, as of 2007, I have no yet seen a formal definition, shared or not, of what OLAP is! But it gets more interesting: even the common terms in the field, such as dimension, are fuzzy. What is a dimension in OLAP depends very much on who is holding the pen. Yet there is no crisis lurking and people do get along.

In short, you do not need shared formal definitions to be productive as a group. A good research paper does not need to introduce formal definitions. Your research papers will be slightly ambiguous.

There are fundamental reasons why that is. For one thing, formal definitions are often intractable!

Let us consider an example. You might think that the equality between two numbers is one such “fact” that is well defined and poses no problem.

Really? How do you check if two numbers are equal? Computers tend to use floating point numbers. My computer is telling me, right now, that 8/9 = 0.88888888888888884. Does this equality hold? Formally, it does not. Yet, a computer cannot represent 8/9 exactly using floating point numbers, so most software will say that two numbers are equal if they differ by a small amount (the tolerance factor) to account for the numerical error. Two scientists might have a different tolerance factors, and so, the equality of two floating point numbers is not subject to a shared formal definition. Ah! But you may object that computers can represent numbers symbolically. However, checking the equality of two symbolic expressions is intractable in general.

So, we cannot even share a common, tractable definition of what it means for two numbers to be equal. This should entice us to be quite modest in our attempts to “formalize our knowledge.”

Update: Our modern civilization relies on relational database. They are everywhere, from bank account to this web site. Yet relational databases are logically inconsistent. Does it matter? Apparently, it matters very little.

Further Reading: “I don’t believe in word senses”

### Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

## 5 thoughts on “Formal definitions are less useful than you think”

1. (Remember, that in the AI wars, I am a ‘scruffy’)

The fact that ‘your computer’ is storing 8/9 as a floating point is an artifact of the programming language/math system you are using. Languages/systems that support a ‘full numeric tower’ can store rationals, and it’s very nice when that numeric equality is formally defined, as in the Scheme spec, or the Common Lisp spec.

It really does matter, in practical terms; real engineering disasters occur when the numeric representations and translations and limits are not understood or not formally defined.

for example:

http://www.ima.umn.edu/~arnold/disasters/ariane.html

2. I agree. The equality between two numbers is not a trivial issue.

Trying to figure out what x=y means for an engineer and whether it means the same thing for another engineer, is hard.
Hence, we have to expect semantic technologies (such as the ones you are no doubt working on) to have hard limits.

3. One point of this is to develop technologies with ‘soft’ limits. For example, checking whether 2 numbers are within some delta of one another, rather than equal. For the semantic technologies, we are using lots of statistical techniques in combination with symbolic techniques.

Another point is be careful of assumptions, and, to as great extent as practical, don’t build those into the engineered system. For example, it probably actually does not make sense to have a ‘=’ function for floating point numbers.

A third point is that ‘equality’ is tricky, and almost always means ‘equality for some purpose.’

4. Some years ago (at BNR) I was mired in the problem of floating-point representations for real numbers and, in an effort to obey the formal definitions for arithmetic operations – such as Commutativity and Distribution – we developed a constraint programming system on bounded intervals. It was an interesting system.

There’s a significant literature on the subject