A simple enough similarity measure is the cosine similarity measure. It is used often in Information Retrieval and it works well. It is also quite simple: cos(v,w)=<v/|v|,w/|w|>.

Clearly, it is reflexive (cos(v,v)=1) and symmetric (cos(v,w)=cos(w,v)). But it is also transitive: if cos(v,w) is near 1, and cos(w,z) is near 1, then cos(v,z) is near 1.

Can you prove transitivity?

I do have a hastily-derived inequality, but I want to know if anyone can best me. (Not hard.)

(Yes, I am looking for a two-liner.)

What exactly do you mean by transitivity for a function that returns a numeric value ? are you looking for a triangle inequality by any chance ?A triangle inequality is a form of transitivity. Specifically, it implies sum-transitivity (as opposed to, say, product-transitivity).

There is no true formal and universal definition of transitivity here, but similarity measures are used by Machine Learning people to define classes (think clustering algorithms), but this only makes sense if you have some form of transitivity.

The cosine similarity measure is neither sum nor product transitive. Yet, it is clearly (as you point out next) “transitive” in a “geometrical way”.

The geometric interpretation of the cosine similarity should get you what you want: it corresponds to the chordal distance between the points u, and v, when projected onto the unit sphere.Sure. Anyone can draw a picture and “know” that it must be transitive. But then… I am not entirely satisfied by this explanation.

What I did was to derive a simple bound on cos(v,z) given cos(v,w) and cos(w,z). It is trivial algebra, but I did a rough job. I do not recall how it goes exactly, but I think I have something like cos(v,z) >= cos(w,z)+sqrt(1-cos(v,w)^2). (It it not very nice, you see.) Being lazy, I do not want to go through five lines of algebra to derive the nicest lower bound possible, and I hope that someone has worked out the math. before. Or maybe someone has a surprisingly nice two-liner argument for transitivity that goes beyond “draw a picture and you’ll see”.

There is probably a very nice and trivial lower bound for cos(v,z) given cos(w,z) and cos(v,w). That’s probably something given out to smart students in high school.

For extra points, generalize your work to the Tanimoto similarity measure.

What exactly do you mean by transitivity for a function that returns a numeric value ? are you looking for a triangle inequality by any chance ?

The geometric interpretation of the cosine similarity should get you what you want: it corresponds to the chordal distance between the points u, and v, when projected onto the unit sphere.

Well, the geometric intuition is what you need. Intuitively, the worst scenario for cos(v,z) comes about when the projections of v, w, z, are in a geodesic on the unit circle. In this case, if the angle between v and w = a, and the angle between w and z = b, then we can see that

cos(v,z) = cos(a + b), and some taylor expansion messiness will give you something like

cos(v,z) >= [cos(v,w) + cos(w,z)]^2/2 – 1

i guess the question is: how messy do you want the inequality to be ?