I found this excellent survey of Collaborative Filtering which includes a wide range of techniques, problems found and so on. A must if you want to better understand how sites such as Amazon help you find the books you love!
Go! Download now! It is free! It is legal! It is fun! And be sure to rate these artists so that you can help others find the good ones!
This bring him to raise an interesting question. Why doesn’t Gmail sends XML back and forth? Indeed, isn’t XML the data format of the Web? Here’s what he has to say:
I think he is very much on target in the sense that people who see everything as Java or C# are likely to perceive XML very much differently from people using higher level languages.
The lesson here people is that you should master a range of languages and not one or two. And no, taking a class in Haskell once in your life doesn’t qualify.
Tim Bray who invented XML among other things, takes a stand against Web Services. Here’s what he says:
No matter how hard I try, I still think the WS-* stack is bloated, opaque, and insanely complex. I think it’s going to be hard to understand, hard to implement, hard to interoperate, and hard to secure.
I look at Google and Amazon and EBay and Salesforce and see them doing tens of millions of transactions a day involving pumping XML back and forth over HTTP, and I can’t help noticing that they don’t seem to need much WS-apparatus.
I’m deeply suspicious of “standards” built by committees in advance of industry experience, and I’m deeply suspicious of Microsoft and IBM, and I’m deeply suspicious of multiple layers of abstraction that try to get between me and the messages full of angle-bracketed text that I push around to get work done.
It should be noted that Tim has recently taken a job with Sun Microsystems. His current employer is very actively involved in Web Services, so I believe he takes this stand despite the current interest of his employer.
Important post today by Yuhong, on her experience with e-Learning. She recalls a few facts:
- a decent videoconference setup for a classroom is less than $5000;
- MIT is setting itself up to become the major competitor in the future education market through e-Learning: webLab and open sourceware;
- we know of some tremendously succesful endeavours like MusicGrid lead by Martin Brooks.
I think Downes once wrote that while physical classrooms won’t go away, they will increasingly become a lifestyle choice. In the near future, when my son will attend college (if he does so), he will find a very different landscape. There will much high quality learning opportunities outside classrooms, to the extend that he may avoid entirely classrooms and actually get an even better education. On the other hand, the remaining classrooms will be high-tech classrooms with remote instructors, remote laboratories and so on.
I stumbled on an interesting SQL article on the Misuse of the FROM Clause. The author argues that FROM clauses should refer to only two types of tables:
- those from which you want values returned
- those allowing to join two or more tables in the above category
In other words, if your select is on tables A and B, then you can select from tables A and B, and any table that can be joined with A and B, but no others.
The argument he offers is based on performance concerns. It does seem to me that any query not fulfilling this requirement would have to be relatively complex.
Some days ago, I stated on this blog that I had a Ph.D. in mathematics (true fact) and that I didn’t know my own phone number nor did I know multiplication tables (also true). My wife knows it is true. She still claim she has superior brain power because not only does she know our phone number, but she even knows our postal code, and she knows many other things. There is not question that my wife is one of the smartest lady in Montreal. Hey! There is a reason why I fell in love with her!
Still, I claim not to be a brain-damaged moron despite these apparent short-comings. You see, I do not memorize on purpose because I think that my time is better used by solving problems and learning new tricks.
From Downes’, I got the following bit of wisdom telling I’m not alone in thinking that memorizing facts is not key to learning…
My own research – reserach that can be extended through the many resources on this site – has already convinced me that neural structures are, as they say, plastic. For me what this means is that learning based on the fostering of habits is more important than learning based on transmission of facts, that, indeed, the facts aren’t that important at all, not nearly as important modelling effective practice, paying attention to environment, immersive, experiential based education.
So, please, do me a favor: if you teach, do not ask your students to memorize. Ask them to change their neural pathways, their thinking patterns… let their PDAs and the Web be a fact storage unit, don’t waste their brains.
Update: A colleague who has a training in history and who holds a Ph.D. says he could never remember dates, and only memorized one: December 25th 800. So, I can say that I’m not alone to think that memorization is only a minor part of learning.
- SOAP is difficult to debug. The SOAP message format is verbose even by XML standards, and decoding it by hand is a great way to waste an afternoon. As a result, development took almost twice as long as anticipated.
- The fact that all requests happened live over the network further hampered debugging. Unless the user was careful to log debugging output to a file it was difficult to determine what went wrong.
- SOAP doesn’t handle large amounts of data well. This became immediately apparent as we tried to load a large data import in a single request. Since SOAP requires the entire request to travel in one XML document, SOAP implementations usually load the entire request into memory. This required us to split large jobs into multiple requests, reducing performance and making it impossible to run a complete import inside a transaction.
- Network problems affected operations that needed to access multiple machines, such as the program responsible for moving templates and elements. Requests would frequently timeout in the middle, sometimes leaving the target system in an inconsistent state.
Here’s some comments by Joe Walnes on his experience with SOAP. The scary thing is that he comes to exactly the same conclusions as I did on my own… Any SOAP supporter out there wants to answer these:
On the last system I worked on, we were struggling with SOAP and switched to a simpler REST approach. It had a number of benefits.
Firstly, it simplified things greatly. With REST there was no need for complicated SOAP libraries on either the client or server, just use a plain HTTP call. This reduced coupling and brittleness. We had previously lost hours (possibly days) tracing problems through libraries that were outside of our control.
Secondly, it improved scalability. Though this was not the reason we moved, it was a nice side-effect. The web-server, client HTTP library and any HTTP proxy in-between understood things like the difference between GET and POST and when a resource has not been modified so they can offer effective caching – greatly reducing the amount of traffic. This is why REST is a more scalable solution than XML-RPC or SOAP over HTTP.
Thirdly, it reduced the payload over the wire. No need for SOAP envelope wrappers and it gave us the flexibility to use formats other than XML for the actual resource data. For instance a resource containing the body of an unformatted news headline is simpler to express as plain text and a table of numbers is more concise (and readable) as CSV.
Through Didier, I got to Victor Shoup’s Home Page. He has an on-line textbook called A Computational Introduction to Number Theory and Algebra. It is unclear whether he intends the textbook to remain free, but it is pretty cool to post the book on his home page. Shoup’s is an expert in cryptography.
Here’s a page of links reporting many SOAP Problems. Glad to see I’m not the only one who doesn’t get SOAP.
From Peter Turney, here are two books to convince you that analogies are an important concept: Metaphors We Live By and Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being.
From the wish-I-was-there Department, here’s a review of Stephen Downes’ keynote at ITI.
There are few people that can be called “visionary”. I’ve met very few. Very few can pass my tests over and over because often, you discover they had one idea and the rest is just fluff or posturing. Stephen is the real thing. That doesn’t mean he makes a lot of friends in the way. However, maybe the people who should listen to him just don’t understand him and that’s why he doesn’t get shot.
Here’s a quote from a post by James Robertson:
If you make sure that you do exactly what the other guys do,
you have made a risk averse decision – you won’t fail any worse than they do,
but you also won’t succeed any better.
Cringely points out that Apple is slowly making the normal retail and marketing process obselete.
This is the end of the RIAA and the big recording industry. Apple in the last year has signed deals with more than 300 independent record labels, most of them not big enough to do much promotion. But now they don’t have to because that promotion will be handled by mtv.com and every music web logger, now that they have a material incentive to make recommendations and print lists. If I recommend a song — IF I JUST TYPE A FEW WORDS — and a thousand people decide to download based on my recommendation, heck, I just made $50 bucks. This is like sending tens of thousands of record sales people out on the road except that they can sell anything THEY like — any of the one million iTunes songs — making them salespeople with real conviction and maybe even with good taste. Maybe.
To me, this is extremely interesting. When the dot-com era started, people began talking about the new economy. It was a catchy phrase, but it turned out to be wrong. There wasn’t a new economy, yet, but mostly an extension of the old one using new tools. However, the new economy is slowly emerging out of the burning ashes of the old one. Here is what is being transformed forever and dramatically: marketing and distribution channels. I think we are moving to a more distributed world. And I have the nagging feeling that Internet publishing will be the core element. We will buy and sell according to what we read and experience on the Web. Who controls that? Right now, the rising force are blogs. Blogs are essentially distributed publishing units. This is where the future lies, maybe.
Mathworld is a mathematical encyclopedia on the Web. Up until now, I thought it was the only one. I was a bit annoyed at having to use Mathworld because it is owned by the Mathematica people and so, you never know when they won’t pull a Microsoft on you.
Didier (who I wrongly assumed to be from France initially) pointed out PlanetMath. The cool thing about PlanetMath is that the content is great and released under GPL. This means that they won’t pull a Microsoft on you! You can copy the content and redistribute it if you so wish. They can close their servers, but the data itself is free, free to go with someone else, free to be reproduced, free.
Last night, I got my wife to watch Star Trek IV again with me. I got all excited when I found out that 3M researchers had invented what seems to me to be transparent aliminium. Of course, as everyone knows, transparent aluminium was passed on to us by Scotty, the famous spaceship engineer, when the crew of the Enterprise travelled back in time to save whales.
Before you start wondering: no I did not fail at anything today. In fact, my life is rather smooth going and while you routinely get bad and good and not so good and not so bad reviews from time to time, all my projects are proceeding forward better than I had a right to expect.
But like so many people, I’m haunted by the constant fear that I may fail. I was reminded of how hard it is by the pressure some Canadian athletes have reported feeling at the Olympics these days. Constant fear of failure is hard because even if your life is beautiful and you succeed in everything, you are still focused on possible failures. Ok. I’ll admit. I’m a pessimist. Or rather, a realist living in a bleak world.
Why do I fear failure so much? Failure is a neutral or even positive force. In fact, many times when I failed, I’ve actually been glad of the failure and found positive things in it… I don’t know… You might not get in the school you want, but you end up getting in an even better school. You do not get to see the movie you wanted to see, but you get to see an even better movie.
I suspect that there is a little cave man in me who fears he’ll get eaten by a dinosaur (yes, I know, I’ve watched too many Flintstones). Failure might be really bad… like having your feet in a dinosaur’s mouth and expecting the dinosaur to start eating you up.
What I know for certain is that fear of failure is a negative force in most of my life. It distracts me. Pulls me away from my family. Makes me dumber. Takes my eyes away from the road and on the ravin where my car will end up.
Two profs allegedly got fired because they refused to grade students based on “effort” instead of results. Not that I think that recognizing effort in the grading is such an evil thing… and maybe the policy was even acceptable… Saying that students attending all lectures will pass the course might have its advantages… but the fact that the fellows were fired tells us something about the state of education in North America right now… I think there is clearly a downward spiral as far as the academic level goes. Not that I think it is necessarily bad.
It is a bit troubling in the following way however. If Internet is making information more widely available as before, and the university is no long the holder (and certainly not creator) of knowledge… I was thinking that universities could still authenticate knowledge: provide proof to someone that you do, in fact, know about archeology. But I forgot that academic levels have been going down in the last 20 years or so. So what will remain?
Someone commented in one of my earlier posts that universities are good at organizing knowledge. Knowledge might be readily available through Google, but it isn’t validated or organized very well. I guess, this is true: university professors are pretty good at determining what is sensible knowledge, with the unavoidable mistakes and bias. We are also pretty good at organizing it in a sensible fashion. However, time and time again, studies show that students overwhelming enrol in courses and degrees, not to learn, but for the recognition they get. They don’t care so much about the work professors do to organize and validate knowledge. If we lower the academic levels further, could it be that students will just leave universities? I think that if we ever reach the tipping point where corporations lose confidence in the training students receive, and this day is around the corner, we’ll be in trouble.
Thanks to my colleague Jean Robillard, I found out that philosophers do Knowledge Management too! Following a request I made, Jean suggested I read an Outline of a Theory of Strongly Semantic Information by L. Floridi.
He starts out by asking how much information is there in a statement? Well, in a finite discrete world (the realm where Floridi appears to live), you can reasonably define “information content” in terms of how many possibilities the statement rules out. For example, if my world is made of two balls, each of which can be either red or blue, so my world has 4 possible states, and I say that “ball 1 is blue”, there are only 2 possibilities left (ball 2 is either red or blue) so I could say that I’ve ruled out 2 possibilities and so my information content is 2. If I say “both balls are blue”, my information content is 4. You can see right away that a self-contradictory statement (“ball 1 is blue, both balls are red”) rules out all possibilities as well, so it has maximal information content. A tautology (“ball 1 is either blue or red”) has 0 information content. Floridi is annoyed by the fact that a self-contradictory statement has maximal information content.
In section 5, he points out that statements are not only either true or false, but they have a degree of discrepancy. So, for example, I can say that I have some balls. This is a true statement, but with high discrepancy. However, I can say that I have 3 balls when in fact I have 2 balls and while false, this is a statement with lower discrepancy, and maybe a more useful statement. Apparently, he borrows this idea from Popper, but no doubt this is not a new idea.
He comes up with conditions on a possible measure of discrepancy between -1 and 1. -1 means that the statement is totally false and matches no possible situation (“I have 2 and 3 balls”), 0 means that you have a very precise and true statement (“I have 2 balls”), and 1 means that I have a true, but maximally vague statement (“I have some number of balls”). What he is getting at is that both extremes (-1 and 1) are equally unuseful, but that things near zero are equally useful (either false or true). Let’s call this value upsilon.
Then, he defines the degree of informativeness as 1-upsilon^2.
This solves the problem we had before. The statement “ball 1 is blue, both balls are red” will now have an upsilon value somewhere between -1 and 0, so it will have some degree of informativeness, but nothing close to the maximal. The statement “ball 2 is either red or blue” will upsilon = 1 and so will have a degree of informativeness of 0. Finally, “ball 1 is blue” will have upsilon positive but less than 1, and possibly close to 0, so that it will have a good degree of informativeness.