Most amazing Cringely article ever…

Cringely published an amazing paper on crime in the USA. Turns out that in 1982, a study was paid-for by the American Department of Justice. Three people were involved: Michael Block, Fred Nold, and Sandy Lerner. Cringely believes their study showed that the current sentencing guidelines would lead to a poor, more crime-ridden USA (and it did). The study was “hidden away”. Turns out that killed himself in 1983. Block became a law professor and won’t comment to Cringely about the study. Sandy Lerner went on to found Cisco.

A few things are amazing. The suicide of a researcher who possibly felt like a loser. It reminds me of Wallace Carothers who invented Nylon. It is unclear to me how you can feel like a loser after inventing Nylon, but apparently someone did. The second one is that the USA knows and knew that they were headed for a crime-ridden society and they went ahead anyhow. Why? I can’t figure it out. Lastly, there is the little detail that the statistician part of the study, Sandy Lerner, founded Cisco. This is an interesting contrast with the other fellow who killed himself.

A Theory of Strongly Semantic Information

Thanks to my colleague Jean Robillard, I found out that philosophers do Knowledge Management too! Following a request I made, Jean suggested I read an Outline of a Theory of Strongly Semantic Information by L. Floridi.

Of course, I’m a naïve reader, but still. I think I grasped some very important things.

He starts out by asking how much information is there in a statement? Well, in a finite discrete world (the realm where Floridi appears to live), you can reasonably define “information content” in terms of how many possibilities the statement rules out. For example, if my world is made of two balls, each of which can be either red or blue, so my world has 4 possible states, and I say that “ball 1 is blue”, there are only 2 possibilities left (ball 2 is either red or blue) so I could say that I’ve ruled out 2 possibilities and so my information content is 2. If I say “both balls are blue”, my information content is 4. You can see right away that a self-contradictory statement (“ball 1 is blue, both balls are red”) rules out all possibilities as well, so it has maximal information content. A tautology (“ball 1 is either blue or red”) has 0 information content. Floridi is annoyed by the fact that a self-contradictory statement has maximal information content.

In section 5, he points out that statements are not only either true or false, but they have a degree of discrepancy. So, for example, I can say that I have some balls. This is a true statement, but with high discrepancy. However, I can say that I have 3 balls when in fact I have 2 balls and while false, this is a statement with lower discrepancy, and maybe a more useful statement. Apparently, he borrows this idea from Popper, but no doubt this is not a new idea.

He comes up with conditions on a possible measure of discrepancy between -1 and 1. -1 means that the statement is totally false and matches no possible situation (“I have 2 and 3 balls”), 0 means that you have a very precise and true statement (“I have 2 balls”), and 1 means that I have a true, but maximally vague statement (“I have some number of balls”). What he is getting at is that both extremes (-1 and 1) are equally unuseful, but that things near zero are equally useful (either false or true). Let’s call this value upsilon.

Then, he defines the degree of informativeness as 1-upsilon^2.

This solves the problem we had before. The statement “ball 1 is blue, both balls are red” will now have an upsilon value somewhere between -1 and 0, so it will have some degree of informativeness, but nothing close to the maximal. The statement “ball 2 is either red or blue” will upsilon = 1 and so will have a degree of informativeness of 0. Finally, “ball 1 is blue” will have upsilon positive but less than 1, and possibly close to 0, so that it will have a good degree of informativeness.

That’s what I got out of it for now.

Journal of Algorithms is no longer accepting submissions

We just submited an article to the Journal of Algorithms and we were told that starting in 2003, the editors have stopped accepting papers. One alternative appears to be ACM Transactions on Algorithms.

It seems like the entire board of the Journal of Algorithms had resigned some time ago. I had no idea that Elsevier and other big publishers were in such troubles. I had heard about the Journal of Machine Learning

It feels like soon, all the big journals will have moved to an open or semi-open setup. Very scary for big publishers. Very scary. Yes, they’ve been making ever larger profits, but it may all come down to a stop really soon. Tipping point coming!

Anonymous Academic Bloggers

Ernie’s 3D Pancakes has a post on anonymous academic bloggers. To me, this is an interesting question. I use my own name everywhere on this blog. You can easily figure out where I work, what I teach and to whom, where I publish and so on. You can even find who my son is and so on. I think that Jeff correctly points out that feeling you need to be anonymous is probably misguided. The likelyhood that a colleague is going to come to my blog, read it, be insulted, and try to hurt me on the job, is very, very slim. One reason for that is that I would never bad mouth a colleague on my blog: it just wouldn’t be fun and interesting for my target audience. The likelyhood that a reviewer of a paper I submitted would come on my blog and be insulted and reject my paper is also very slim. However, reviewers have many more reasons to wrongly reject a paper and if you start worrying about this sort of thing, you are not out of the woods!

So, I use my own name. There.

23% Fewer Computer Science Majors This Year!

Slashdot reports on a USA Today article saying that there fewer Computer Science Majors. They cite a 23% decline in enrollment in North America. Here’s one comment about the article:

Most engineering schools are reporting declines in enrollment. This is hardly surprising since most engineering curriculums, including CS, are difficult compared to other fields of study. Without the prospect of a good job waiting for them, many college students are veering away from these majors.

Update: Yuhong correctly points out that this is mostly at the undegraduate level. Graduate schools are finding enough students, at least according to Yuhong. I think this is expected: if job prospects are bad, people won’t enter the system but once they’ve entered it, they will stay in it longer if jobs are scarse.

Cool RDF tools

RDF is everywhere it seems: from Dublin Core to RSS, all to way to FOAF… However, it can be quite painful to parse. Cool tools are starting to emerge however, but google is not yet very good at finding them.

Suppose you have a RDF/XML representation and you want the triples… go to W3C RDF Validation Service and it will do it nicely for you.

On the other hand, the form on this page allows you to go from N3 (the user friendly RDF syntax) to RDF/XML.

How to be creative

Through Downes’, I found this great post about how to be creative. HOWTOs are always interesting and sell magazines, but they are somewhat more interesting in blogosphere because someone you can get to know put his heart into it.

  • Ignore everybody
  • Creativity is its own reward
  • Put the hours in
  • If your biz plan depends on you suddenly being “discovered” by some big shot, your plan will probably fail
  • You are responsible for your own experience
  • everyone is born creative; everyone is given a box of crayons in kindergarten
  • Keep your day job
  • Companies that squelch creativity can no longer compete with companies that champion creativity
  • Everybody has their own private Mount Everest they were put on this earth to climb
  • The more talented somebody is, the less they need the props
  • Don’t try to stand out from the crowd; avoid crowds altogether
  • If you accept the pain, it cannot hurt you
  • Never compare your inside with somebody else’s outside

One of the most interesting one is number 5: Nobody can tell you if what you’re doing is good, meaningful or worthwhile. The more compelling the path, the more lonely it is.

Of course, I don’t buy all of it. Being extremely lonely is no way to be creative I think. Nobody gets awfully creative at the bottom of a cave. I do think you have to look for others. The strength of your network is key because it multiplies your own brain power. I guess we go back to Emerson’s independence of solitude. Be in a network, be in a crowd, but do not be a mere node in the crowd, be your own node. It does require courage though, and you have to expect to fail, fail badly even.

Great Hackers

Paul Graham wrote an essay called ‘Great Hackers‘. I’m pretentious enough to call myself a hacker (though I do not claim to be great), so I had to jump on it!

Here are some juicy quotes…

Good hackers find it unbearable to use bad tools. They’ll simply refuse to work on projects with the wrong infrastructure.

Great hackers also generally insist on using open source software. Not just because it’s better, but because it gives them more control.

They [great hackers] work in cosy, neighborhoody places with people around and somewhere to walk when they need to mull something over, instead of in glass boxes set in acres of parking lots.

There’s no way around it: you can’t manage a process intended to produce beautiful things without knowing what beautiful is.

And this is the reason that high-tech areas only happen around universities. The active ingredient here is not so much the professors as the students. Startups grow up around universities because universities bring together promising young people and make them work on the same projects. The smart ones learn who the other smart ones are, and together they cook up new projects of their own.

If you’re worried that your current job is rotting your brain, it probably is.

A megabyte is a mebibyte, and a kilobyte is a kibibyte

If you’ve been annoyed about the fact that a kilobyte has 1024 bytes and not 1000 bytes, well, you were right all along! What people call a kilobyte is really a kibibyte. (Thanks to Owen for pointing it out to me!)

Examples and comparisons with SI prefixes
one kibibit  1 Kibit = 210 bit = 1024 bit
one kilobit  1 kbit = 103 bit = 1000 bit
one mebibyte  1 MiB = 220 B = 1 048 576 B
one megabyte  1 MB = 106 B = 1 000 000 B
one gibibyte  1 GiB = 230 B = 1 073 741 824B
one gigabyte  1 GB = 109 B = 1 000 000 000 B

Source: Definitions of the SI units: The binary prefixes

Michael Nielsen: Principles of Effective Research

Michael just finished his essay: Principles of Effective Research. I think it is a must read for all Ph.D. students, young researchers, and even idiots like me who always get it wrong. Michael takes a very refreshing view to what research is all about. He is not cynical yet he is true to what research really is. You may never win the Nobel prize if you follow his guidelines, you may never be a guru researcher, but I think you’ll be a good or even excellent researcher. As he explains, being an influent researcher is not a subset of being a good researcher, and that’s a very important statement. In any case, Michael did all of us a favor and I hope that he essay is read by a lot of people. (Power of the network?) I implore you all: link to his essay!!!