A simple enough similarity measure is the cosine similarity measure. It is used often in Information Retrieval and it works well. It is also quite simple: cos(v,w)=<v/|v|,w/|w|>.

Clearly, it is reflexive (cos(v,v)=1) and symmetric (cos(v,w)=cos(w,v)). But it is also transitive: if cos(v,w) is near 1, and cos(w,z) is near 1, then cos(v,z) is near 1.

Can you prove transitivity?

I do have a hastily-derived inequality, but I want to know if anyone can better me. (Not hard.)

(Yes, I am looking for a two-liner.)

3 Comments »

  1. What exactly do you mean by transitivity for a function that returns a numeric value ? are you looking for a triangle inequality by any chance ?

    The geometric interpretation of the cosine similarity should get you what you want: it corresponds to the chordal distance between the points u, and v, when projected onto the unit sphere.

    Comment by Suresh — 20/8/2007 @ 17:34

  2. What exactly do you mean by transitivity for a function that returns a numeric value ? are you looking for a triangle inequality by any chance ?

    A triangle inequality is a form of transitivity. Specifically, it implies sum-transitivity (as opposed to, say, product-transitivity).

    There is no true formal and universal definition of transitivity here, but similarity measures are used by Machine Learning people to define classes (think clustering algorithms), but this only makes sense if you have some form of transitivity.

    The cosine similarity measure is neither sum nor product transitive. Yet, it is clearly (as you point out next) “transitive” in a “geometrical way”.

    The geometric interpretation of the cosine similarity should get you what you want: it corresponds to the chordal distance between the points u, and v, when projected onto the unit sphere.

    Sure. Anyone can draw a picture and “know” that it must be transitive. But then… I am not entirely satisfied by this explanation.

    What I did was to derive a simple bound on cos(v,z) given cos(v,w) and cos(w,z). It is trivial algebra, but I did a rough job. I do not recall how it goes exactly, but I think I have something like cos(v,z) >= cos(w,z)+sqrt(1-cos(v,w)^2). (It it not very nice, you see.) Being lazy, I do not want to go through five lines of algebra to derive the nicest lower bound possible, and I hope that someone has worked out the math. before. Or maybe someone has a surprisingly nice two-liner argument for transitivity that goes beyond “draw a picture and you’ll see”.

    There is probably a very nice and trivial lower bound for cos(v,z) given cos(w,z) and cos(v,w). That’s probably something given out to smart students in high school.

    For extra points, generalize your work to the Tanimoto similarity measure.

    Comment by Daniel Lemire — 20/8/2007 @ 21:33

  3. Well, the geometric intuition is what you need. Intuitively, the worst scenario for cos(v,z) comes about when the projections of v, w, z, are in a geodesic on the unit circle. In this case, if the angle between v and w = a, and the angle between w and z = b, then we can see that

    cos(v,z) = cos(a + b), and some taylor expansion messiness will give you something like

    cos(v,z) >= [cos(v,w) + cos(w,z)]^2/2 – 1

    i guess the question is: how messy do you want the inequality to be ?

    Comment by Anonymous — 20/8/2007 @ 21:59

Leave a comment

Warning: When entering a long comment, please ensure that you make copy of your text prior to submitting it. If the server should fail or if you hit a bug, you might lose your work. I am not responsible for your lost effort.

To spammers: I carefully review every single post and make sure that spam gets deleted. You are wasting your time if you are manually entering spam using this form. Read my terms of use to see what I consider to be abusive.

Example: duo plus septem is '9'. The numbers are expressed in latin numerals but you should give your answers using ordinary digits.

 

« Blog's main page

Powered by WordPress