Dear reader,
I’m looking to record everything I ever browse on the Web using Firefox. That is right. I want a copy of every single document, Web page, query, and so on, I ever encounter. I also want to record the content of every single form I submit. The result should be adequately protected so that it is not possible to have access to the data without access to my machine. It would be akin to the wayback machine except that it would cover only the stuff I have seen. I have not yet decided whether it should record all the youtube videos I watch, but I guess it should.
Ideally, it should be able write the data on a portable disk.
Why do I want to do this? Why not. Actually, I would then want to use this data to build a fancy database that would support things like drill-down or roll-up queries (Ã la OLAP).
If you know how to build this, want to help, or know of such a plug-in, please drop me a line.
I’m also running a competition to decide how to call such a thing. I initially thought about calling it a webex but it turns out that’s a trademark. Then I decided that Hammerspace might be a better name, or maybe Magic Satchel? What do you think?
Hammerspace sounds cool and the idea behind it sound great too.
I’d be willing to help with developing it. I’ve been meaning to try a firefox plugin for some time now. I really don’t know the first thing about it though. I’m a quick learner though.
Does it have to be a Firefox plugin? I’m guessing you can setup a proxy server that records everything it proxies. You’ll pay a bit of a performance penalty, but it’ll be a nice and simple solution.
Maybe ScrapBook extension could help you? http://amb.vis.ne.jp/mozilla/scrapbook/
Try Charles:
http://xk72.com/charles/index.php
It is a proxy solution and you will need to somehow make it persist the data (it records everything into memory), but it does the other 90% of what you need.
Very cool idea. Sounds like something I might want to use myself. Personally, I’d probably favour the proxy solution because I use a variety of different browsers.
I wrote something similar while at NRC. I needed a Web crawler that could evaluate JavaScript in an authentic browser environment, complete with a DOM and all the usual embedding connections. So I wrote the crawler as a Mozilla extension. It didn’t actually write things to file (it wrote summaries to a DB), but that’s pretty easy. What it *did* do was figure out all the containment relationships so that it could fix up links for local use, the way wget does. It was actually kind of tricky figuring out how to associate independent requests with each other — e.g., was that page loaded into a top level window, or into an iframe on some page? But Mozilla (and hence Firefox) has enough hooks to do it, if only barely.
I’d love to help. I wish I had the time. If a few months go by and you’re still looking for something like this, the timing might be better for me.
Um. No one mentioned Slogger? That’s exactly what you want.
http://www.kenschutte.com/slogger/
Please add firefox cookies/bad web sites immunization in next version!
Firefox 2 cannot reject third party cookies!!!!!!!!