Too Much Semantics is Harmful in Information Technology

It has become evident that, in the realm of Web Services, the REST paradigm is taking over while the Service-oriented Architecture Protocol (SOAP) is progressively being forgotten except in some academic circles and by some companies interested in selling tools to alleviate the pain1.

Here is what Clay Shirky was saying in 2001:

This attempt to define the problem at successively higher layers is doomed to fail because it’s turtles all the way up: there will always be another layer above whatever can be described, a layer which contains the ambiguity of two-party communication that can never be entirely defined away.

No matter how carefully a language is described, the range of askable questions and offerable answers make it impossible to create an ontology that’s at once rich enough to express even a large subset of possible interests while also being restricted enough to ensure interoperability between any two arbitrary parties.

The sad fact is that communicating anything more complicated than inches-to-millimeters, in a data space less fixed than stock quotes, will require AI of the sort that’s been 10 years away for the past 50 years.

The main reason being put forward is that SOAP is simply too complex. But does complexity means here? The Web is something incredibly complex if you consider how many parts it has, yet, we consider it to be simple.

How to recognize a simple technology? The first criteria any engineer would use is the number of points of failures. SOAP architectures can break in many more ways than REST architectures, and so they are more complex. Meanwhile, theoretical computer science teaches us that something is more complex if it requires more CPU cycles to run. Well, SOAP architectures are also more complex in this light as well, as there is simply a lot more XML going around and the requests are far more verbose.

I’d like to propose that there is another criteria for complexity. And that’s semantics. One should always aim for the simplest possible solution… and providing lots of semantics is not a simple feat. SOAP architectures necessarily include semantics to define the meaning of terms used in the description and interfaces of the service. This is totally absent from REST architectures. It is not so much that there is no semantics in the REST paradigm, but it is kept extremely simple: you only need to know about the semantics of the main HTTP operation (POST, GET, PUT and DELETE). In fact, the wikipedia REST entry includes the following citation attributed to Roy Fielding:

REST’s client-server separation of concerns simplifies component implementation, reduces the complexity of connector semantics (…)

I think this is fundamental. What makes REST simple is that it reduces the amount of semantics the software has to worry about.

Why would semantics be a bad idea? Well, simply because semantics implies coupling, and too much coupling makes a system too complex. Without any coupling, we cannot do anything, but when we throw too much, we harm the system. What type of coupling are we talking about? Well, if I pass the variable x to the function f, there is relatively little coupling. All I do is that I establish a relationship between the function f and the variable x. But what if x is mean to be the cost of a product? Then x must be tied explicitly to the product ID, to some price identifier, and so on. This makes the system harder to maintain, harder to debug, and more failure-prone.

Fundamentally, software design is about communication. But not communication between machines… rather communication between developers. And communications between distributed folks works much better when the message they need to send to each other is kept very simple. That is why the SOAP philosophy is fundamentally flawed.

So, when you design software, you should include as little semantics as possible as this will make your system simpler, and thus, easier to manage.

This is, of course, contrary to what AI enthusiasts do.

1. See recent posts by Larry O’Brien and Nick Gall.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

7 thoughts on “Too Much Semantics is Harmful in Information Technology”

  1. I do not think I’m confused. I mean exactly what you seem to imply I mean. SOA web services expect you to describe the semantics of any interaction. REST, on the other hand, does not. Even if you only had two possible interactions with a SOA web service, they would still be more complex than the REST equivalent.

    I would argue that what you call “application semantics” is independent of the technology being used and, therefore, a constant we can dismiss. (If it can’t be factored away, as you say, then it is always there… so it is a constant.)

    Let us take your example of a gene or protein. Should the web service need to know the taxonomy or ontology of genes or proteins? I think not. Does it need to know that these two unique identifiers refer to the same thing? I think not. I think information should be provided strictly on a “need-to-know” basis. The minute you start tying things together, you introduce coupling, and therefore complexity. Your software starts making assumptions, you increase (without cause) the level of abstraction, and things break.

    One should always aim for the simplest possible solution… and providing lots of semantics is not a simple feat.

  2. Aren’t you confusing service semantics and application semantics? REST reduces the possible ways to interact with a remote application/service to 4 (really 2, most of the time). This is clearly less complex. However, the application semantics don’t seem any simpler to me. The developer has to understand what the result is of GETting a resource at a particular URI. They need to understand the interface or API being used. This seems potentially very complex — am I getting a gene or protein, is the name based on terminology A or B, what is the precision of the expression level, etc. This kind of complexity can’t be factored away.

  3. I think we’re in agreement. I just wanted to point out that of the hard things that need to be done in interacting with a service, the nuts-and-bolts of the interaction are the simple parts. It’s the application logic that is complex. But a lot of the debate seems to ignore that aspect.

  4. I feel that one can track the origin of the semantic complexity with SOAs to their explicitness and – as you discussed – interaction formalization. However, (informal) semantics are present in REST architectures as well. Here a programmer has to figure out more meaning prior to implementing – otherwise his code will not work. As the semantics are not explicitly handled any more once the programs run there are few points of failure, as you state.

  5. I think SOA is a much broader paradigm than you seem to imply. In my understanding REST is just another way to interact with web services. REST is an alternative to SOAP, and not SOA. SOA would include in addition to the interaction protocols, service description, service publishing, service discovery, service matching, service selection, and service engagement.

  6. Nirmit: Yes, there is a whole lot of stuff (for lack of a better word) in the SOA stack. But I do precisely mean that people are sidestepping all of it in favor of something several orders of magnitude Web-friendlier.

    I predicted in 2002 that SOA would go nowhere. We are in 2007 and it has gone nowhere so far. And people keep claiming that it will make it, finally, maybe next year, but it won’t.

    Oh! Microsoft has web services. Java has web services. But these things do not interoperate. There are nothing else but proprietary remote function calls as we have had them for years and years.

    We have not come forward one iota.

  7. Hi, all excess are harmful even in technology. The best motto is to keep it short and simple. As you pointed out, the simplier, the better.

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    Markdown is turned off in code blocks:
     [This is not a link](

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see