Evil abbreviations in programming languages

Programming language designers often abbreviate common function names. The benefits are sometimes dubious in an era where most programmers commonly use meaningful variable names like MyConnection or current_set as opposed to single letters like m or t.

Here are a few evil examples:

  • The C language provides the memcpy function to copy data. The clearer alternative memcopy requires one extra key stroke.1
  • To query for the length of an object like an array in Python and Go, we use the len function as in len(array). The expression length(array) would be clearer to most and require only three additional characters.
  • Though most languages use the instruction if or the equivalent to indicate a conditional clause, when it comes to “else if”, designers get creative. Ruby uses elsif, helpfully saving us from typing the character e. Python uses elif, saving us two key strokes. PHP alone seems to get it right by using elseif.
  • In Python, we abbreviate string to str. Most languages seem to abbreviate Boolean as bool.

I am not opposed to a judicious use of abbreviations. However, by going too far, we create a jargon. We make the code harder to read for those unfamiliar with the language without providing any benefit to anyone. Let us not forget that source code is often the ultimate documentation of our ideas.

Credit: John Cook’s comment on G+ inspired this blog post.

Update: Commenters point out that memcpy had to be shortened due to technical limitations restricting the length of functions to 6 characters in the old days. Fair enough. However, common C functions that use more than 6 characters also look like alphabet soup: fprintf, strcspn, strncpy, etc.

18 thoughts on “Evil abbreviations in programming languages”

  1. One of the nice things about Mathematica is that naming is very consistent and seldom uses abbreviations. I’ve been able to come back to Mathematica after not using it for years and quickly remember or guess what things are named.

  2. The “memcpy” style naming has a reason. In the old days of C (1989 or thereabouts), the linkers were only guaranteed to recognize the first six characters of a variable or function name.

  3. Without providing any benefit to anyone? Perhaps you mean without any benefit to you. In many cases they provide benefit to me as they result in cleaner easier to read code.

    It’s a bit like the difference between an introductory/tutorial guide and a reference manual. When starting out everything needs to be spelt out slowly and longhand but once you are familiar you start to appreciate conciseness and brevity.

    Perhaps it’s similar to using letters in algebra.

    If we don’t see the benefit of the short forms, perhaps we should be spelling out . as “fullstop”.

  4. Regarding C memcpy, that’s likely a result of DEC PDP-11 file name limitations. You could only use 6 characters for the main part of the name, plus 3 characters for the (one, single) extension. So the file implementing memcpy could be saved as the eponymous memcpy.c or memcpy.asm. No space for an extra “o”!

    (Not sure if there was ever a good implementation reason for using exactly the same name for function & file name, or if it was just more pleasing to the designers…)


  5. I have taught coding to non-native English speakers, and I can testify that many people have trouble remembering if it’s `length` or `lenght` (and similarly for `height`). Abbreviating it to `len` fixes this problem.

    (Of course, I realize that the ideal situation is an editor where you can write leng and it gets autocompleted. In this way one does not care anymore if a function is called `len` or `lengthInCharactersOfThisMostPreciousStringComposedOfUtf8Codepoints`.)

  6. Ruby and Python use elsif and elif not to prevent you from typing else if, but to prevent too many indentation levels.

  7. When memcpy() was coined, linkers on some systems limited the length of external symbols; eight and six characters were common limits. A quick look at 7th Ed. Unix suggests external symbols longer than six characters came along with stdio.h and ctype.h, later additions to Unix, perhaps when the earlier six-character (DEC) limit was removed.

    I disagree with your other points. len() is so common that the shorter, less noisy, quicker to pronounce len is to be preferred over length. Ditto str and bool. Why do you not complain of int, should it not be integer? s/float/floatingpoint/ s/func/function/ s/def/define/ s/var/variable/

    Ruby takes its elsif from Perl, Python its elif from Bourne shell. These have special else-if keywords to make if-else chains be at the same parse level. Unlike C, what follows else must be a block and not a statement, e.g. another if. Without elsif it would be if () {} else { if () {} else { if () {} } } with lots of closing braces at the end, a la Lisp. A keyword elseif looks harder to pronounce and probably has novices to the language wonder why it exists and why can’t they add a space.

    You say source code is the ultimate documentation of ideas. But notation is needed to express those ideas succinctly. Even if not programming, one would find a limited vocabulary useful compared to free-form English. Isn’t that why mathematics has much notation, to succinctly represent understood concepts from a limited set?

    Using full English for programming leads to very wordy code that takes time for a human to parse, has little content within a given space, e.g. screen, and looks like COBOL. http://www.csis.ul.ie/cobol/examples/SeqIns/SEQINSERT.htm

  8. Oh, I agree so much. This also bugs me with golang and rust. Why make the same mistakes over and over again?

  9. memcpy is six letters long because at the time it was invented, it was still common for linkers to truncate shared identifiers to six characters. Support for these old linkers was still in the C standard until at least 1999.

    Also, saving keystrokes is not as ridiculous as you seem to think. Early terminal keyboards were very difficult to type on, and data transmission was limited to ten *characters* per second.

    The elif / elsif abbreviation eliminates the bug, very common in C programs, where an “else” clause is silently associated with the wrong condition. The designers of the shell, perl, python, ruby, etc., were well aware of this.

    I suggest that when you see decisions in the past that you don’t understand, it would be more productive to try to understand them _before_ you label them “evil”.

  10. I like mangled names because they’re specific. There’s only one memcpy, but there might be dozens of MemoryCopy’s.

    Likewise a single strange algorithm may not be better described by a long name. Is “QuickSort” better than “qsort”? Neither means anything by itself.

  11. Python elif its needed to avoid anidation, so avoid involve nesting tabs. But i can’t say anything about def instead definition.

  12. Haha, I don’t mind short names. What is clearly irritating is that different languages have different conventions. Boolean in Java vs bool in C++ may be irritating, but the really infuriating thing is the lack of indexing conventions. Sometimes indexes start from zero, sometimes they start from one. It is even worse with ending indexes, because they often point to the element after the last one. This is apparently a convention for most Java libraries and standard functions. However, it is very poorly described. I would say these things are barely mentioned in the docs.

  13. I’ve loosened up quite a bit on names recently. I’m no longer convinced that longer names provide much benefit. And I have a hard time calling any of these abbreviations evil.

    But I do have to add the story that “Ken Thompson was once asked what he would do differently if he were redesigning the UNIX system. His reply: ‘I’d spell creat with an e.'” ( http://en.wikiquote.org/wiki/Ken_Thompson#Quotes )

  14. I think that kind of names are “ok”. We just have to memorize them. A worse names are names that are similar but they do completely different things, for example in a Picat: “chr” and “char”. “char” check if variable is a character. “chr” change a number (e.g. 97) to it’s UTF8 representation (“a” in this case).
    On the other hand, names that are different but do similar thing. For example “/” that divides 2 numbers and returns integer, “div” that divides 2 numbers and return float. I like how Factor deals with this – “/” returns integer, “/f” returns float.

  15. I think you picked the tamest example for C, which is the worst at this, especially if you get to some of the POSIX stuff like unistd.h. Python does abbreviate some stuff, but at least it has a culture of spelling things out with long variable names rather than using short, confusing names.

Leave a Reply

Your email address will not be published. Required fields are marked *