Write a Twitter application in 5 minutes

I spend much time alone, writing and thinking. Twitter helps me stay connected. I love the platform.

On Friday, I wanted to find the intersection between the users followed by any two individuals. Indeed, suppose that you like both Joe and Jill, and they have similar interests. Maybe whoever they both read is also interesting? I could not find a tool to do it, so I built it with python-twitter.

Anybody with a working knowledge of Python can do it in less than 5 minutes. I used only twenty lines of code (in total!!!). The code proved immediately useful.

If you do not know Python or Ruby. Learn one or the other. Tonight. It is powerful stuff.

Update: A few days after I wrote this post, Twitter made my script obsolete by changing their authentification requirements.

On the design of design

Following a blog post by John D. Cook, I started reading Fred Brooks‘ latest book. Brooks is famous, among other things, for his earlier book, the Mythical Man-Month. The book is really a collection of essays, organized like blog posts. It is really engaging.

I had never read about design per se, except in Paul Graham‘s essays. For me, the core message of the book is that writing software, planning houses, writing books or poems, are very similar tasks. The metaheuristics are the same. The lessons you must learn are similar. You are trying to solve very large NP-hard problems where you can’t reliably divide the problem space. Systematic greedy algorithms may work, but they may also mislead you. You need some formalism, some rigor, but you also need experience, or instinct.

External-Memory Sorting in Java

Update: this code is now obsolete. Please see the corresponding Github project.

Sometimes, you want to sort large file without first loading them into memory. The solution is to use External Sorting. Typically, you divide the files into small blocks, sort each block in RAM, and then merge the result.

Many database engines and the Unix sort command support external sorting. But what if you want to avoid a database? Or what if you want to sort in a non-lexicographic order? Or maybe you just want a simple external sorting example?

When I could not find such a simple program, I wrote one.

Please help me improve it. It needs:

  • To be easy to modify: the code must remain simple.
  • It must scale to very large files.
  • While scalability and simplicity are most important, it must also be sensibly efficient.

Once we have a good solution, I plan to post it on Google code and add a link in the External Sorting wikipedia entry. If you contribute to the code, please add your name so you can get credit.

Reference: ExternalSort.java, http://pastebin.com/H57TZF7e

Language, Mathematics and Programming

Even if you have extensive training in Mathematics, the average Mathematics paper is undistinguishable from the ramblings of a madman. Many of these papers seek to solve narrow problems. And yet, we respect Mathematicians.

Software programming is a form of communication, usually between human beings and machines. While different in style, programming is a subset of the language of Mathematics. If you dig into the average source code, it is undistinguishable from ramblings, even if you are an expert developer.

Yet, we denigrate programming. Many will even deny that it is a Mathematical language. But Mathematics and Programming are not so different:

Mathematics Programming
Building on the previous research papers requires you to dig through endless piles of boring, badly written research papers. Maintaining millions of lines of codes written by various people over the years is difficult, boring, error-prone.
Inventing new theorems or new mathematical theories requires much creativity. Coming up with the next best iPhone application requires much creativity.
For most people, mastering even part of Mathematics requires a decade or more. Please read Teach yourself programming in ten years by Peter Norvig.
The language of Mathematics has directly contributed to technological progress. Electricity, engines, nuclear power, space travel all required extensive use of Mathematics. Google changed the world through the brilliance of its software engineers. The open source revolution has changed how people think about collaboration.
Some Mathematicians are widely recognized as being extremely smart. Some famous people have done a fair share of difficult and technical programming : Donald Knuth and TeX, Tim Berners-Lee and the Web, Linus Tovarlds and Linux.

Why is programming getting so little respect?

  • The intense commercialization of programming has commoditized it. As Paul Graham might say : painters where initially “portrait takers”. It is only when painting lost its commercial function that it became recognized as a noble art. However, just like painters always used their free time to create great art, the best programmers are open sourcing beautiful code all the time.
  • The study  of programming itself remains rather informal. You can get degrees in Computer Science, Computing Engineering or Software Engineering, but there is no degree in Programming. Programming is taught in universities, but generally only in the first few courses of a degree. Yet, there are degrees in Communication, Fine Art, Architecture, Music or Dance. While a degree in Computer Science or Software Engineer can make you a better programmer, the fact remains that your professors are not expert practitioners.

How can we fix this? I have this secret dream of setting up the equivalent of “Creative Writing” program, but for programmers. Call it “Creative Programming”. Basically, students would come together to write great code. Yes, such code might be useful commercially, but that would be a secondary consideration. The pursuit of greatness would be the only goal that matters. It would treat programming as a bona fide language. It would attract the best programmers as guest lecturers. Would this ever work out? I do not know.

I am sure that many will point out that my secret dream is impractical. Beauty should not come first : we want cheap, reliable, maintainable code. We also want programmers to be replaceable, inexpensive and practical. However, human beings can both pursue greatness while being practical. Compromise is possible.

Let me conclude by quoting Donald Knuth:

(…) computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. A programmer who subconsciously views himself as an artist will enjoy what he does and will do it better.

Further reading: The best software developers are great at Mathematics? and Is programming “technical”?

A review of “Hello World: Computer Programming for Kids and Other Beginners”

I learned programming on my own when I was twelve years old with a TRS-80 and Microsoft Basic. The documentation that came with the TRS-80 was fantastic. Alas, today, no vendor would ever think of including an introduction to programming with a computer. If your are a dad (or a mom) and you regret this, then Hello World: Computer Programming for Kids and Other Beginners by Warren and Carter Sande is for you. The book is cheap: while the list price is $34, Amazon sells it for a little over $23.

As a kid, my initial goal was to create my own video games. And I did! I learned (entirely on my own) that:

  • Even though computers are fast, it is hard to create fast software. Being clever is hard work!
  • While it is easy to program a small game, programming a slightly more complex game can be orders of magnitude more difficult.

This book should allow your kids to learn as much and more.

Instead of using Microsoft Basic, the authors use Python. An excellent choice. They cover all the same material as in my venerable TRS-80 documentation: random numbers, variables, loops, graphics, functions and sound. However, there are a few more advanced concepts: dynamic arrays, arrays of arrays, objects, modules, file input and output, and event-based programming.

Note that many of the examples of the book would not run on the latest Python release (3.1). That is a minor concern since I would recommend you stay away from Python 3 in any case.

Disclosure: I was given a free e-book in exchange for my review.

Why are dynamic languages easier than static languages?

Dynamic langages like Python or Ruby are considerably easier than static languages like Java and C++. John is asking us why:

How do you account for the huge increases in productivity that dynamic language advocates say they experience.

My answer is duck typing. It is powerful because formal and tight definitions are hard and less useful than we might expect.

The difference is fundamental:

Static view Dynamic view
Tight definitions Duck typing
Ontologies Folksonomies
Specific solutions Generic solutions

Further reading:

Fast argmax in Python

In my post Computing argmax fast in Python, I reported that Python has no builtin function to compute argmax, the position of a maximal value. I provided one such function and asked people to improve my solution. Here are the results:

argmax function running time
array.index(max(array)) 0.1 s
max(izip(array, xrange(len(array))))[1] 0.2 s

Conclusion: array.index(max(array)) is simpler and faster.

Update: Please see The language interpreters are the new machines.

Cool software design insight #6

Here is a simple recipe I have learned for efficient software design:

Less planning, more prototyping!

Planning is typically a long and expensive process. Some “experts” justify it by claiming that one week of planning saves ten weeks of programming. In practice, this payoff rarely comes. Why? Because as you make progress, the project itself changes and the planning becomes obsolete.

I am not saying you should not manage and plan your projects. However, all planning should be short-term. The only question you must answer is:

What is the most efficient use of my time today?

In short, I advocate you follow a greedy algorithm to manage your time and your projects. Given that your problems are ill-defined and constantly changing, maximizing your short-time efficiency is the only sane thing to do.

Stop investing your time today in the hope that you will be efficiency tomorrow. Be efficient now!

Of course, if you live in some huge slow-moving corporation and work on problems that will remain the same for decades, then please do plan ahead!

Update: What if you are asked to provide engineering-like schedules and deadline? Make them up.

Cool software design insight #5

OO helps us hide away the routine problems and the makes the code easier to use. Among the first things you learn with class-based object-oriented (OO) languages like Java and C++ is how to use inheritance. Inheritance is a form of taxonomy for programmers. It makes great-looking diagrams. Pedagogically, it makes type polymorphism easy to understand.

However, I will tell you a little secret: cool programmers do not use inheritance, except maybe to derive new classes from the standard classes provided by the language. Inheritance tends to make code more difficult to maintain. In some cases, inheritance makes software slower.

The better alternatives are:

C++ :
Use templates, they are both faster and easier to maintain than class inheritance.
Java :
Use interfaces. Java interfaces are a bit annoying to maintain, but they do not contribute any bugs.
Python, Ruby, Objective-C, Perl, JavaScript/ECMAScript :
Use duck typing.

Cool software design insight #4

Mathematicians and philosophers often make terrible programmers. They also tend to write gibberish even in English. (Ok, I do not know if it is a fact, but stay with me.)

A terrible way of programming is to try to hold the entire problem in your head and to put it into code in one shot. Why? Because you are almost certainly overestimating your brain. Your mind can only cope with few parameters at a time.

Here is how you have to program:

  • If you have the luxury to start fresh, start small. Otherwise, make sure you understand and respect the code you start with.
  • Identify specific changes you want to implement. Small changes! Then implement them. Then test them!

You should never redesign working software from the ground up without incremental testing. Never. Even if you work alone.

Interestingly, I also write my papers incrementally, fixing small things one after the other. No other way works for me.

Cool software design insight #3

In a comment to my unit testing post, David suggested using property testing. Languages like Java, C and C++ have formalized this very idea as assert instructions. Other languages have the equivalent under different names. You can also manually implement asserts by throwing exceptions or logging errors and warnings.

My experience has been that you should use asserts relatively generously in your code for the following reasons:

  • While some fancy tools allow you to run through a program in debug mode and check the values of the variables, asserts help you fix bugs happening remotely.
  • Asserts are a great way to document your code. They tell the reader about your expectations.

However, asserts are not as useful as unit testing. Whereas you can write thousands of tests to test your software, adding thousands of asserts to your software may be a bad idea. It makes the code less readable and slower.

Cool software design insight #2

The number 1 difference between an experienced hacker and the random guy out of school is unit testing. Unit testing is simple. Anyone can do it. You do not need a sophisticated library. All you need is to run a program that does sanity checks over the different components of your software. The rule is simple:

You should always do unit testing for any kind of code that is supposed to have lasting value.

It is worth repeating: the single most important non-trivial strategy in software design is unit testing. All more sophisticated strategies are usually not worth the cost, and all simpler strategies are somewhat trivial. While you can discover most good coding techniques on your own, unit testing is not something very natural to most hackers.

If you sell software for a living, unit testing is extremely important to keep your sanity. While you cannot provide bug-free code, you can at least provide software that passes some unit tests.

Cool software design insight #1

I plan to progressively discuss a few things I have learned about software design during the rest of the year. Trivial things that make a big difference in your productivity. I do not claim that any of these insights will be novel in any way.

As a college professor, I do not code full time. Usually, I build dirty software that will last just long enough to make a point. I do not need to build industrial-strength software. I have no business needs to satisfy. I can afford to throw away code and never look at it again once a research project is completed. None of my code needs to run for more than a few days at a time.

With this disclosure in place, here is insight #1:

Remove features as often as you can.

Repeatedly, I have observed that my software is too complex for its own good as months go by. Often, I thought that my code would need to do X when, in reality, the need never arises. For example, maybe you wrote code that could sort strings or integers, and you realize that you never sort integers.

It is tempting to leave these extra functions in place. After all, what is the harm? And maybe I will need the extra power some day.

However, I have learned that I systematically underestimate the cognitive overhead of these useless features. I always think that this little extra template parameter is harmless. It is only after removing it, and working with my code some more that I realize how much easier my work has become.

So, drop useless flags and parameters. Do your brain a favor!