I wanted to retrieve a local copy of my online XML course. I instructed the technical staff to serve the XHTML files as application/xml. I believe this was to work around the limitations of Internet Explorer. In any case, I stumbled upon a wget bug! Wget won’t process XHTML with the mime-type application/xml as an XHTML file, and hence, it won’t follow the links inside it.

A deeper limitation is that wget doesn’t know XML. This means that it will not follow stylesheets. Wget also doesn’t know about javascript.

This meant I had to write my own scripts to recover the course. First, a bash script:

wget -m -r -l inf -v -p http://www.teluq.uquebec.ca/inf6450/index-fr.htm
find -path "*.htm" | xargs ./extracturls.py | xargs wget -m -r -l inf -v -p
find -path "*.html" | xargs ./extracturls.py | xargs wget -m -r -l inf -v -p
find -path "*.xhtml" | xargs ./extracturls.py | xargs wget -m -r -l inf -v -p
find -path "*.xml" | xargs ./extracturls.py | xargs wget -m -r -l inf -v -p
find -path "*.xml" | xargs ./extracturls.py | xargs wget -m -r -l inf -v -p

You see that the last line is repeated twice. Don’t do this type of scripting at home. Bad design!

Next I need a python script to extract the URLs I need (Perl or Ruby would also do):

#!/bin/env python
import re,sys
for filename in sys.argv[1:]:
file=open(filename)
#print "from ", file
for line in file:
# better hope that we don't have repeated spaces!
for m in re.findall( "(?< = re.findall( "(?<= re.findall("(?<=).*(?=)",line)+\
re.findall("(?< =openwindow\(').*?(?=')",line)+\
re.findall("(?< =stylesheet href=["']).*?(?=["'])",line):
print "http://"+re.search("www.*/",filename).group()+m

This is a pretty awful hack, but it works!

Here is a project for the tech savvy among you: extend wget so that it can parse XML!

I’m a few years behind, but while I knew you could modify classes in Python, it turns out you can do it conveniently as of Python 2.2.

Here’s how you can dynamically create a class:
>>> from new import classobj
>>> Foo2 = classobj('Foo2',(),{"mymethod":lambda self,x: x})
>>> Foo2().mymethod(2)
2

Once the class has been created, you can easily modify it as well:

>>> setattr(Foo2,'anothermethod',lambda self,x: 2*x)
>>> Foo2().anothermethod(2)
4

I recall that I found ways to achieve the same result by manipulating directly class objects, but this is much neater.

You can even dynamically change the class of an object:
>>> t.__class__=classobj('newFoo2', (Foo2,), {"supermethod":lambda self,x: 5*x})
>>> t.supermethod(2)
10

That’s pretty satisfying.

Just bought a cheap Creative Labs Instant webcam. These things are supported by the spca5xx driver. To install the driver under gentoo, do:

emerge spca5xx

To see if your device is detected after connecting it up, do:

lsusb

You should see a line similar to this one:

Bus 002 Device 006: ID 041e:4034 Creative Technology, Ltd

A useful tool if you use the spca5xx driver is spcaview. In any case, initially, after plugging the webcam in, all I got was a dark screen. I believe that’s because the device needs to be initialized. I cannot be sure exactly how I activated it, except that I plugged and unplugged it while running various software. Yes, it is a bit fuzzy. Sorry.

Now, it works beautifully.

Update: You might need to plug the device after the driver has been loaded. Try typing “rmmod spca5xx” and if that works, remove your device, load the driver (modprobe spca5xx) and plug the device back in. If you want to check if it works without any fancy software, do “more /dev/video0″, there should be no error message and you ought to read some characters (maybe blanks).

Annoying bug: It isn’t clear why, but gnomemeeting has the voice button disabled despite the fact that my voice hardware is correctly configured. It could be that the spca5xx driver is messing up gnomemeeting or that gnomemeeting is buggy and sometimes erronously disables the audio.

Update on annoying bug: Gnomemeeting works just fine, but for audio input to work, you have to open a connection to another user. Great software.

I have an old an dusty 3Com HomeConnect webcam. Until a few minutes ago, I thought it was dead. Not so! I plugged it in the USB port, and did

lsusb

and voilà:

Bus 001 Device 019: ID 04c1:009d U.S. Robotics (3Com) HomeConnect WebCam [vicam]

Ah! so Linux recognizes it and suggests the vicam driver, ok, there you go:

modprobe vicam

As a basis for comparison, I tried to connect the same thing to Windows XP and no luck. The only drivers available are 16 bits drivers and they won’t install.

In gnomemeeting, I now see an fuzzy image. After taking apart the lens of webcam and cleaning it up, the image is considerably better, but still very ugly.

I guess I have to buy a new webcam! This user-commented list seems like a good start to choose a new device for a Linux user, but the list of webcams supported by the SPCA5xx driver is impressive. It is still satisfying to know I’m buying new hardware because it is worn out, not because my OS won’t support it. BTW gnomemeeting is really great software.

The Semantic Web Services Challenge 2006 is organized by Stanford University. Phase I will be held March 8-10, 2006 whereas phase II will be held June 15-16, 2006.

The goal of the SWS Challenge is to develop a common understanding of various technologies intended to facilitate the automation of mediation, choreography and discovery for Web Services using semantic annotations. The intent of this challenge is to explore the trade-offs among existing approaches. Additionally we would like to figure out which parts of problem space may not yet be covered. The workshop aims to provide a forum for discussion based on a common application. This Challenge workshop seeks participation from industry and academic researchers developing software components and/or intelligent agents that have the ability to automate mediation, choreography and discovery processes between Web services.

And yes, I am critical of the pratical side of this research. But, people can do research on whatever they want, as long as their results are neat.

On the positive side of things, I believe these challenges are a great contribution to the research community. We need to have more of those.

« Previous PageNext Page »

Powered by WordPress