Through Downes’, I got to this paper about Your Brain and Learning .

The new scientific understanding of our most vital organ can help us improve everything about our learning — from choosing our best times and places to learn, to setting grander goals for how much we can grow.

Some key advice: you have multiple intelligences and your brain is an organ. Having multiple intelligences means that while you may have difficulties with one type of task in a given field, it doesn’t mean you cannot be a star performer. Everyone has various strengths and weaknesses and you have to learn yours. The fact that your brain is an organ means you should not expect to do your best work when you are tired or drunk. Though I have had the feeling of doing great work while being very tired.

As a researcher, I’ve grown to find out about my “style”. That’s right, all researchers have a style. I’ve got a way to do research that differs from most other researchers. I’m bad at some things, but I find that I can be quite good at others.

Mostly though, I tend to do better at research when I enjoy myself.

This is fun. This paper on Employers’ preferences for academic letter recommendations shows that the opinion of Computer Information Systems professors is more valued than Humanities professors:

A survey was done with 72 corporations to find out the value of professor’s reference letters. The null hypothesis was corporations value reference letters from Computer Information Systems (CIS) professors, Business professors, and Humanities & Arts professors equally. Job skills (CIS) and people skills (Humanities & Arts) were considered equally important. Results from a Friedman Test reject the null hypothesis. A Sign Test on multiple comparisons indicated that employers valued professors’ reference letters in the following order: CIS, Business, then Humanities & Arts. Future research needs to be done to see if employers value CIS reference letters stressing people skills greater then letters stressing job skills and knowledge.

What does this mean exactly? I don’t know.

This has been long coming.

In recent years, I extended my research horizon in many new directions. I think it made me into a better researcher. When you start out in research, you have a very tight focus. You may change your focus as time passes, but you tend to work on only one or, at most, two problems at a time. This is entirely justified by the fact that you are only starting now and need to achieve some results, any result, before spreading out your wings.

Yuhong was reporting on her perception that European researchers tend to remain very focused all their lifes which bring them a nice, constant flow of results… whereas North American researchers will constantly adapt and change their research focus, in part because they are always seeking new funding. I don’t know how accurate Yuhong’s view is. It does match my intuition, but only partly so. I think that many researchers, even in North America, remain very focused all their lifes.

What is true, I think, is that you need to broaden your horizon at some point in time. Otherwise, while you may keep publishing at a constant rate, you are unlikely to be able to reflect in a critical fashion on your current research projects. How can you tell if your research is currently relevant if you have and always will work on the same things, no matter what? By uncoupling yourself from your immediate research agenda, I think you become a better researcher who can not only do good research, but also learn to choose good research topics.

Ah! But here comes the downside. You can’t possibly do everything. Well, maybe you can if you have people and take credit for their work.

How many research projects can you be involved in at a given time? My magic number would be 3. You could go up to 4 is you are merely finishing off one of them, but 3 seems like a reasonable number. What happens beyond this number is that I get stressed out, overworked, and I lack focus. How do I define a project? Typically as something that will produce one or two papers. As a side note, this suggests that maybe, publishing 3 papers a year is a good target. In any case, you have to choose 3 projects and not get involved more than you should.

So, I wrote a list this morning. I went for a long walk, and I decided to settle on the 3 most promising research projects. This worked well, until I had to add one. So I have 4 on my list. They appear right there on my palm every time I look at my schedule.

This is quite a small number since I have a far greater number of individual projects going on right now.

This is an experiment. I need to keep this list a short list. It needs to remain under control. If I add something, I need to take something out. Will it work? I’ll report about this little experiment here.

Yuhong reflects on the different ways people approach research, and she has this conclusion:

Anyway, for most of the researchers, research is a career that needs to manage and exploit. Many are hardworking craftsmen, instead of being a master. I would like to suggest that instead of working hard as a bee to accumulate the publication list and funding, it is better to enjoy your life if you do not have splendid ideas to work on.

It has been over a week since I came back from SIAM Data Mining 2005. It is about time I put down on my blog some of the notes I took during the meeting. Notice that these notes are not meant to be accurate, always refer back to the original author…

Strategies for visual data mining by Ed Wegman

I’m not very excited by visual data mining, but this was a good talk. He started out by describing the field in terms of 4 levels (in increasing order of sophistication):

  1. static graphics
  2. interactive static graphics: the user can interact with the otherwise static graphics, I presume you can zoom in or out and so on.
  3. dynamic graphics: what gets plotted can be changed, presumably the user can request for only some type of data to be plotted as opposed to everything
  4. evolutionary graphics (streaming data)

Not all data sources are adequate for visual data mining. The scales are as follows… (in bytes)

  1. 10^2 (tiny)
  2. 10^10 (huge)
  3. 10^15 (super massive)

Best you can display is about 10^6 bytes (2333 by 1866 pixels).

He described several techniques for visual data mining:

  1. Paralel axis: instead of using one x-axis, you have several and a point becames a set of line going through all axis
  2. Grand tour: rotate plots over all possible angles
  3. Saturation brush: different clusters with different desaturated colors, frequent white, rare black

He has other innovative ideas. For example, present multidimensional data as an image and rotate the data. He calls this the pixel tour.

A tool can be downloaded from his web site.

Segmentation algorithms for time series and sequence data by Aristide Gionis and Heikki Mannila

They work on segmenting sequences, particularly genomic sequences. For them a sequence can be a string, a time series, a sequence of events or any variant on this. A genomic sequence is a sort of string. The goal of the segmentation problem is to have homogeneous segments.

The generic solution to the segmentation problem is dynamic programming. Optimal local segmentations can be glued together. Need to construct a n by k matrix. It is quadratic in n or worse, which can be very bad.

He describes a top-down approach which adds one segmentation point at a time and is n k log n, uses a heap.

He describes also a randomized algorithm (Himberg 2001) which starts with a random segmentation and take segmentation point at random and see if you can move it to a better location. Open problem: does it always converge?

He describes the sliding window algorithm which is provably optimal for the L_inf norm (for constant fitting?)

There are other approximation algorithms like approximative dynamic programming (Guha) and divide and segment (Terzi).

There are other interesting problems like the Bursty algorithm which finds dense intervals by dynamic programming; segmentation of data streams.

How do you choose the number of segments K?

He talks about BIC (bayesian approach): score is error plus penalty, error plus K log(n), related to MDL. Hypothesis testing: stop splitting segments when new segments have the same probability distribution.

Embedding by Mark Hansen

Key idea he presented was the idea of using the web to create “human sensors”. For example, using bird watchers to collect information about birds. Or plane watchers to collect information about planes. He suggests this could be used by environmentalists to monitor pollutants and stuff.

Simple Models for Customer-based Analysis by Peter Fader

This talk made quite an impression on me. He says everything must be implemented in excel and must be understandable by smart MBA students. The guy worked on the “CDNow 1997 data set”. He assumes that only 3 variables matter: monatary value of purchase, frequency of purchases, and recency. He assumes monatary value is independent of the couple frequency-recency. He models the transaction process as follows: an active customer buys according to a Poisson distribution and die after time t, exponentially distributed with rate mu. Dropout rates by user is gamma distributed. He simplifies his distributions by assuming discrete data. He points out that Poisson-Gamma distributions are often better than Zipf distributions. He says the best way to test a model is on conditional expectations. He says we find out that lesser customers (long tail?) are very important because we have many of them. [Note: he obviously focuses on retail. In a service industry, many lesser customers is not a good thing.]

He gets to an apparent contradiction with his model: customers who haven’t bought in a long time are more valuable if they are less frequent buyers. This can be explained by his model where users die: a frequent buyer who has bought in a long time is probably dead.

He explains that using simple excel-implemented model is a great way to lock in his customers: they will come back to him for more.

The practice of cluster analysis by Jon R. Kettenring

There are two types of cluster analysis: hierarchical (linkage and ward) and partitioning (k-means, mixture). He points out that hierarchical are by far more popular. He points out that cluster analysis is usually considered as a form of unsupervised learning by that even k-means is somewhat supervised: we assume clusters are about the same size and spherical.

There has been an exponential growth in the number of cluster papers between 1995 and 2003. Seems like it is slightly dropping in 2004.

Most people cluster on small data sets. It is a myth that clustering is done on large data sets.

One common problem is that the different axis use different units (cm, Kg…) and so you have to “rescale”. People do something called “autoscaling”. It is not certain that this is all done with rigor.

Clustering takes you away from the data and leads you to unexplainable conclusions. Clustering is often a black box. People using PCA left and right without understanding and it can be evil if the clusters are too large.

Truly automatic clustering is far from easy.

Online Learning by projecting by Yoram Singer (from Google)

I really liked this talk. He wants to build a linear classifier, that is, a learned vector w such that sign(w . x) classifies x (as spam or not, say). The margin of an example is the amplitude of w.x. He makes a separability assumption: the space can be separated in two with a non-zero margin.

The idea of the algorithm is to have classified x come in one by one, each time, you rotate w so that x is correctly classified, essentially projecting w on the space of all w such that w.x has the proper sign. In a variant, you don’t project all the way and only move w by epsilon.

He gives some references:
Kivinen et al., Online Learning with Kernels
Herbster, Learning additive models online with fast evaluating kernels
Bauschke, On projection algorithms for solving convexity feasibility problems

« Previous Page

Powered by WordPress