Computer-free math is obsolete

Just read Doron Zeilberger’s 72nd Opinion. The man is quite a bit pretentious (“mathematician is really another species, higher than homo sapiens”) though he doesn’t lose one bit of respect from me.

Back in 2002, I had a discussion, in a café in the old town of Saint Malo, where I basically tried to convey the following message, but not as well put:

(…) most of human mathematics is completely useless. It was developed by humans for human consumption. In order for humans to understand it, it had to proceed in tiny steps, each comprehensible to a human. But if we take the “mesh size” of each step, dA, to be larger, one can do potentially much bigger and better things, and the computer’s dA is much larger, so we can (potentially) reach a mountain-top much faster, and conquer new mountain-tops where no humans will ever tread with their naked brains.

So this activity of computer-generated mathematics is the future. Unfortunately, many human mathematicians still don’t realize the importance of this activity, and dismiss it as “just a computer program” and “no new mathematics”.

At the time, the mathematician I was talking to, a man I greatly respect, objected that nobody could predict what could be useful, so to claim that non-computable math. is worth less than computable math., was just foolish. It managed to silence me. Indeed, while I believe that algorithms are a higher form of mathematics, I cannot prove that it is, and neither can Zeilberger, but he makes a great case for it:

[computable math] is a methodology that will make all computer-free math obsolete very soon.

PyX – Python graphics package

PyX is a Python package for the creation of PostScript and PDF files. It combines an abstraction of the PostScript drawing model with a TeX/LaTeX interface. Complex tasks like 2d and 3d plots in publication-ready quality are built out of these primitives.

Here is the type of things you can do:

Naturally, the code is a bit hairy:


from math import pi, cos
from pyx import *
from pyx.deco import barrow, earrow
from pyx.style import linewidth, linestyle
from pyx.graph import graphxy
from pyx.graph.axis import linear
from pyx.graph.axis.painter import regular
from pyx.graph.style import line
from pyx.graph.data import function
mypainter = regular(basepathattrs=[earrow.normal], titlepos=1)
def mycos(x): return -cos(x)+.10*x
g = graphxy(height=5, x2=None, y2=None,
x=linear(min=-2.5*pi, max=3.3*pi, parter=None,
painter=mypainter, title=r"$\\delta\\phi$"),
y=linear(min=-2.3, max=2, painter=None))
g.plot(function("y(x)=mycos(x)", context=locals()),
[line(lineattrs=[linewidth.Thick])])
g.finish()
x1, y1 = g.pos(-pi+.1, mycos(-pi+.1))
x2, y2 = g.pos(-.1, mycos(-.1))
x3, y3 = g.pos(pi+.1, mycos(pi+.1))
g.stroke(path.line(x1-.5, y1, x1+.5, y1), [linestyle.dashed])
g.stroke(path.line(x1-.5, y3, x3+.5, y3), [linestyle.dashed])
g.stroke(path.line(x2-.5, y2, x3+.5, y2), [linestyle.dashed])
g.stroke(path.line(x1, y1, x1, y3), [barrow.normal, earrow.normal])
g.stroke(path.line(x3, y2, x3, y3), [barrow.normal, earrow.normal])
g.text(x1+.2, 0.5*(y1+y3), r"$2\\pi\\gamma k\\Omega$", [text.vshift.middlezero])
g.text(x1-.6, y1-.1, r"$E_{\\rm b}$", [text.halign.right])
g.text(x3+.15, y2+.20, r"$2J_k(\\varepsilon/\\Omega)+\\pi\\gamma k\\Omega$")
g.writeEPSfile("washboard")
g.writePDFfile("washboard")

(Source: 0xDE)

Canadian Semantic Web Working Symposium

The Canadian Semantic Web Working Symposium was held yesterday in Quebec City and it was a great success. Of particular interest was the keynote speaker, Michael Huhns who argued that web services ought to be thought of as agents. The best paper award was given out to Jie Zhang and Robin Cohen for their paper “A Trust Model for Sharing Ratings of Information Providers on the Semantic Web” which I thought was really well thought out. You can buy the proceedings on amazon.

How to recover “lost” changes in CVS

Suppose someone destroyed one of your file revisions by checking in a file undoing any changes you made. While your version is still in the CVS tree (you are using version control, aren’t you?), you don’t know how to merge them with the current version of the file.

Well, here’s a hack that will do it:


cvs update myfile
cvs update -r 1.39 -p myfile > mychanges
cvs update -r 1.38 -p myfile > beforemychanges
merge myfile beforemychanges mychanges

I’m sure there is a cleaner way to do it.

ACL 2006 accepted papers

The list of ACL 2006 accepted papers is up. Here are a few that caught my eye:

You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) — A Qualitative Evaluation of Association Measures for Collocation and Term Extraction
Joachim Wermter and Udo Hahn

Are These Documents Written from Different Perspectives?
A Test of Different Perspectives Based On Statistical Distribution Divergence
Wei-Hao Lin and Alexander Hauptmann

Scaling Distributional Similarity to Large Corpora
James Gorman and James Curran

Expressing Implicit Semantic Relations without Supervision
Peter Turney

Autocompletion in the Python console

Hans tells us how to have autocompletion in the Python console:


import readline, rlcompleter
readline.parse_and_bind("tab: complete")

This is brilliant, but I wonder how I would ever have guessed this exact sequence of code. Isn’t it a bit obscur?

Anyhow. Hans also tells us how to have this run automatically every time the console is launched. I think it is very nice.

STXXL: C++ Standard Template Library for Extra Large Data Sets

I haven’t tried it, but the description is cool:

The The core of STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

They also have external memory suffix arrays.

Source: geomblog.

Update: Hazel pointed me to TPIE:

TPIE is a software environment (written in C++) that facilitates the implementation of external memory algorithms. The goal of theoretical work in the area of external memory algorithms (also called I/O algorithms or out-of-core algorithms) has been to develop algorithms that minimize the Input/Output communication (or just I/O) performed when solving problems on very large data sets. The area was effectively started in the late eighties by Aggarwal and Vitter and subsequently I/O algorithms have been developed for several problem domains.

Efficient Storage Methods for a Literary Data Warehouse

FACULTY OF COMPUTER SCIENCE
NOTICE OF ORAL DEFENCE

MCS Degree

Efficient Storage Methods for a Literary Data Warehouse
By
Steven W. Keith
Examining Committee:
Supervisors: Dr. Owen Kaser (UNBSJ), 
Dr. Daniel Lemire (Adjunct Prof., Univ.of Quebec)
Chairperson: Dr. Larry Garey (UNBSJ)
Internal Reader: Dr. Weichang Du
External Reader: Dr. George Stoica (UNBSJ)
 
Monday, May 29, 2006
10:30 a.m.
VIDEO CONFERENCE
UNBSJ LOCATION: MacMurray Room-Oland Hall- Rm 203
UNBF LOCATION:  Multimedia Center(1st floor, room 126)Marshall D'Avray Hall

ABSTRACT

Computer-assisted reading and analysis of text has applications in the
humanities and social sciences. Ever-larger electronic text archives have
the advantage of allowing a more complete analysis but the disadvantage
of forcing longer waits for results. This thesis addresses the issue of
efficiently storing data in a literary data warehouse. The method in which
the data is stored directly influences the ability to extract useful, 
analytical results from the data warehouse in a timely fashion. 
A variety of storage methods including mapped files, trees, hashing,
and databases are evaluated to determine the most efficient method
of storing cubes in the data warehouse. Each storage method's ability
to insert and retrieve data points as well as slice, dice, and roll-up a
cube is evaluated. The amount of disk space required to store
the cubes is also considered. Five test cubes of various sizes are used to 
determine which method being evaluated is most efficient. The results lead
to various storage methods being efficient, depending on properties of the 
cube and the requirements of the user.



ALL GRADUATE STUDENTS ARE ENCOURAGED TO ATTEND


*********************
Linda Sales
Graduate Studies Program
Administrative Assistant
Faculty of Computer Science
University of New Brunswick
540 Windsor Street
Fredericton, NB
E3B 5A3

Phone: 506-458-7285
Fax: 506-453-3566

Curse of Dimensionality and intuition

Yaroslav has this thought provoking article on the Curse of Dimensionality:

(…) consider a cube of width 1. As dimension increases, the volume stays the same. But (…) eventually almost all the mass is concentrated in the corners (meaning outside of the inscribed sphere).

The plot is especially shocking: at d=8, the sphere of diameter 1 inscribed in the unit cube has a negligible volume!

Beyond the algorithmization of the sciences

Thomas Easton promotes algorithms as a higher form of science in his paper Beyond the algorithmization of the sciences:

Algorithms have thus made biology as useful a science as physics, chemistry, and computer science. But are algorithms enough to move biology closer to the throne? Are they math?
(…)
Five decades ago, most mathematicians would have said no. Then in the 1970s, they discovered the value of computers for “proving” theorems (…)

Myself, I haven’t had time to spend much time thinking about it, but it is quite clear that I value algorithms at least as highly as I value theorems.

There’s No Such Thing as a Learning Object

Michael Feldstein, Assistant Director, SUNY Learning Network, says that there is no such thing as a learning object.

We learn by doing. We consider. We compare. We measure, discuss, debate, critique, test, and explore. We try, fail, and try again. Learning is an activity. It’s a process. Given this undeniable fact, the term “learning object” can only be an oxymoron. An object is a thing. We don’t learn from things. We learn from doing things.
(…)
I believe the term “learning object” has become harmful. It hides the same old, bad lecture model behind a sexy buzz phrase.

Harold calls for the death of learning objects

David Wiley said that learning objects are a bad idea. Albert Ip points out that standards in eLearning technology bring no visible benefit. Now, Harold Jarche adds his two cents to this “debate” (if “debate” there is):

In a recent project where I reviewed the business case for SCORM implementation, I found no evidence of a market for digital learning objects. There were several vendors offering SCORM conversion or SCORM implementation assistance, but no one was actually buying and selling objects. The bet seems to be that standards will create the market, as shipping containers enabled the free flow of goods over various forms of transportation. Here I disagree, because learning cannot be “containerised”.
In theory, reusable digital learning objects make sense, but in practice they don’t work. The problem is that learning objects cannot be separated from their context.

In the drive to make money in the learning business, too many people are trying to find a way to codify pieces of the messy, personal process known as learning. The learning content market is based on the premise that these pieces can be quantified and therefore owned by someone.

A Star Is Made – New York Times

A pretty exciting article in the New York Times about what is “talent”. The short story is that you should practice hard while ensuring you have immediate feedback and have clearly defined goals.

when it comes to choosing a life path, you should do what you love — because if you don’t love it, you are unlikely to work hard enough to get very good. (…) what they really lack is the desire to be good and to undertake the deliberate practice that would make them better.

(..) there is surprisingly little hard evidence that anyone could attain any kind of exceptional performance without spending a lot of time perfecting it.

(…) Students should be taught to follow their interests earlier in their schooling, the better to build up their skills and acquire meaningful feedback. (…)

(…) two key elements of deliberate practice: immediate feedback and specific goal-setting.

(Got it from Unreasonable.)

Multidimensional OLAP Server for Linux as Open Source Software

Jedox will release a free open source Linux MOLAP server by the end of the year. A pre-release of the software is expected by mid of 2005.

All data is stored entirely in memory. Data can not only be read from but also written back to the cubes. Like in a spreadsheet, all calculations and consolidations are carried out within milliseconds in the server memory while they are written back to the cube.

I sure hope that by “memory” they include “external” memory because otherwise, their cubes are going to have to be quite small. Normally, you’d at least memory map large files as Lemur OLAP does.

How do people balance out precision and recall?

In Information Retrieval, you can’t have both great recall and great precision, so you have to balance the two. What are the possible criteria to pick the best recall/precision?

What I found so far, on wikipedia of all places, is the so-called F-measure or balanced F-score, and it is merely the harmonic mean of the recall and the precision. This seems to have almost no theoretical foundation?

Anyone out there has proposed an exciting way to pick your recall and precision?

I don’t want to be too critical of the field of Information Retrieval, but sometimes, when I read papers from the 1950’s, it feels like they knew everything there was to know. I sure hope that people came up with more exciting ideas than just “use the harmonic mean” to pick the best recall?

Update: I found this related paper:
Cyril Goutte and Eric Gaussier, A Probabilistic Interpretation of Precision, Recall and F-score, with Implication for Evaluation, but it is only a partial answer to my question.