Oracle has recently bought Innobase which makes one library MySQL relies upon for storing its tables. One user on slashdot had the following insightful comment:

Among the technologies that MySQL licenses from third parties under commercial redistribution licenses:

Berkeley DB (Sleepycat Software)
InnoDB (Oracle, formerly Innobase)
MaxDB (SAP AG)

See the problem? MySQL itself is largely a language parser and a simple and technically inadequate storage engine (for anything where data integrity matters). In other words they don’t own any of the foundations of their technologies.

This is interesting. We always encourage developers to use and reuse existing libraries. Should MySQL be blamed for doing so?

The comparison with PostgreSQL is interesting. PostgreSQL works in a decentralized way as opposed to MySQL which is developed by single company, using libraries.

I think that MySQL could definitively be a fragile product whose development could be impaired through various business decisions. However, I think it has nothing to do with the fact that MySQL relies on libraries it hasn’t written, but rather on the fact that there is no community of MySQL developers.

Free Sofware is not a cure to the world’s hunger.However, building software using a highly distributed community might very be the best possible way to develop generic software.

I’m working rather intensively on a new course (Information Retrieval and Filtering) which should be offered in 2006 or 2007. This course is really a pleasure. Normally, teaching is something you do seriously, while you either do as much consulting or as much research as you can. You won’t see many university professors spending 60 hours a week preparing a single course. However, sometimes, teaching is something that you can really become passionate about. While I have published work in Information Retrieval, I never paid much attention to the field. Being too busy in my research to stop and start fiddling with more elementary concepts such as the Zipf law: where it comes from and what you can do with it. Thanks to Will Fitzgerald, I now know how to use n-grams and Shannon’s information value to determine the language a text is written in. As a researcher, this is highly enjoyable and likely to help my research.

Steven will be presenting our paper Analyzing Large Collections of Electronic Text Using OLAP at APICS 2005. This work is based on an idea by Owen Kaser: what happens if we apply multidimensional databases (OLAP) to literary research?

Data Mining and Information Retrieval techniques are used routinely for literary research or processing text in general, but decision support techniques commonly used in the business world (sometimes called “Business Intelligence”) have not seen much use yet in text processing. The main difference between decision support systems and data mining is the fact that in decision support, the user remains in control, thus simple yet extremely efficient algorithms are favoured over sophisticated, but possibly expensive algorithms. Ideally, all decision support algorithms should be O(1) after accounting for precomputations. With infinite storage almost available now, decision support research is due for a technological and scientific boom.

Computer-assisted reading and analysis of text has various applications in the humanities and social sciences. The increasing size of many electronic text archives has the advantage of a more complete analysis but the disadvantage of taking longer to obtain results. On-Line Analytical Processing is a method used to store and quickly analyze multidimensional data. By storing text analysis information in an OLAP system, a user can obtain solutions to inquiries in a matter of seconds as opposed to minutes, hours, or even days. This analysis is user-driven allowing various users the freedom to pursue their own direction of research.

I got my new Logitech USB Desktop Microphone working under Linux. Should have been very easy, but I hit a small nail.

Plug the device in and type “lsusb”, you should see:

Bus 001 Device 004: ID 0556:0001 Asahi Kasei Microsystems Co., Ltd AK5370 I/F A/D Converter

Ah! The device is called AK5370.

Do “dmesg”‘ you should see two lines like those:

usb 1-3: new full speed USB device using ohci_hcd and address 4

usbcore: registered new driver snd-usb-audio

If you don’t see the second line, you have a problem. In my case, I didn’t have the usbaudio driver so I only got the first line. I had to go compile usbaudio. To do so, I did “uname -a”, it gave me “Linux romeo 2.6.10-gentoo-r6″. I went under /usr/srclinux-2.6.10-gentoo-r6 and typed

genkernel --no-clean --menuconfig all

Next, after the menu opened up, I went under driver/audio and chose usb audio drivers (and loadable modules). Exiting genkernel launched the compilation of the module and all I had to do was to unplug/replug my microphone. You should check that /dev/dsp1 appears.

All I had to do after this was to launch mhwaveedit and choose “hw:1,0″ as my recording device, so that I would not record out of my sound card, but rather from my microphone. Setting the sampling rate to 44100 Hz seemed to be necessary.

To enable the microphone under KDE, you have to launch kmix and choose the appropriate device, if you don’t see the device, quit kmix (through the file menu) and restart it. This being said, I don’t see why you need the microphone under KDE. However, make sure you turn the gain all the way to the maximum for optimal sound quality.

Voilà! Isn’t Linux friendly?

For recording tips, see this page by Bob Cunningham.

Update: sometime you might have to force the drive to load up doing “modprobe snd-usb-audio”. In theory, modprobe shouldn’t be necessary as devices should be automatically recognized, but it happens to me sometimes that I need to help my kernel a bit. (Bugs?)

Through Will’s I got to the Martin Shuffle which is a cool randomized algorithm to quickly find sonds on a MP3 player (without browsing them one by one). They implement a nice Markov Decision Process using my favorite language: Python.

« Previous PageNext Page »

Powered by WordPress