Project Gutenberg is a fantastic project where a large collection of books has been scanned and made available for free. The problem has been that they are available as text which makes automated processing sometimes a problem. Extracting the title of a book can be a problem (though an easy one). However, the nice people at the HTML Writer Guild have maked up a large collection of Gutenberg book using a XML with a publicly available DTD.

Possible application: have a given book be automatically integrated in a content management system (learning management system).

You might also want to consider GutenMark as a tool to process Gutenberg books (output to LaTeX and HTML).

1 Comment »



  1. Output du weekend sur del.ico.us
    Je me demandais la semaine dernière si certaine personnes qui lisaient mon carnet n’étaient pas abonnés au fil web de mes signets partagés sur del.ico.us… Je me disait que je devrais à l’occasion poster ici aussi….

    Comment by A Frog in the Valley — 8/2/2005 @ 10:41

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress