Benchmarking theorem provers for programming tasks: yices vs. z3

One neat family of tools that most programmers should know about are “theorem provers”. If you went to college in computer science, you may have been exposed to them… but you may not think of using them when programming.

Though I am sure that they can be used to prove theorems, I have never used them for such a purpose. They are useful for quickly checking some assumptions and finding useful constants. Let me give a simple example.

We have that unsigned odd integers in software have multiplicative inverses. That is,  if you are given the number 3, you can find another number such that when you multiply it with 3, you get 1. There are efficient algorithms to find such multiplicative inverses, but a theorem prover can do it without any fuss or domain knowledge. You can write the following Python program:

s = Solver()
a = BitVec('a', 64)
s.add(a*3  == 1)
s.check()
print(s.model()[a])

It will return 12297829382473034411. As 64-bit unsigned integers, if you multiply 12297829382473034411 with 3, you get back 1. If there was no possible solution, the theorem prover would tell as well. So it can find useful constants, or prove that no constant can be found.

For some related tasks, I have been using the popular z3 theorem prover and it has served me well. But it can be slow at times. So I asked Geoff Langdale for advice and he recommended yices, another theorem prover that might be faster for the kind of work that programmers do, e.g., using fixed-bit integer values.

Though I trust Geoff, I wanted to derive some measures. So I built the following benchmark. For all integers between 0 and 1000, I try to find a multiplicative inverse. It will not always work (even numbers do not have inverse), but the theorem prover is left to figure that out.

What are the results?

z3 15 s
yices 1 s

So, at least in this one test, yices is 15 times faster than z3. My Python scripts are available. You can install z3 and yices by using the standard pip tool. Be mindful that yices should be present on your system, but the authors provide easy instructions.

I found the Python interface of yices to be quite painful compared to z3. So if performance is not a concern, z3 might serve you well.

But why refer to performance? Go back the numbers above. To solve 1000 inverse problems in 15 s is really quite slow on a per number basis. It is on the order of 60 million CPU cycles per number. And it is an easy problem. As you start asking more complicated questions, a theorem prover can quickly slow down to the point of becoming unusable. Being able to go just 10x faster can make a large difference in practice.

Caveat: It is just one test and it does not, in any way, establish the superiority (in general) of yices over z3.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

9 thoughts on “Benchmarking theorem provers for programming tasks: yices vs. z3”

  1. For bitvector theory (aka machine integers), another solver that gives great results is boolector. It might be better than yices. Z3 is especially good as the swiss-army knife of SMT solvers (aka, I want to be able to do everything)/

      1. I haven’t come around to benchmarking boolector on your example. If I have the time by the end of the week, I’ll report in this thread.

        However, in the annual competition of the SMT community (see this year’s results at https://smt-comp.github.io/2020/results.html, where I learned about Bitwuzla), boolector has regularly been at the top in the “QF-ABV (this is the category of formulas you are interested in) in the last few years.

        Yices is usually not very far behind and, in practice, our research group has used both since they have slightly different heuristics.

        One good way to have SMT solvers go at your problem is to make them available to SMT-LIB, the library of problems used to evaluate the solvers, in particular during SMT-COMP.

      2. So, I’ve taken some (procrastination) time to run the benchmark with boolector and cvc4 as well.

        The key take away is that yices is indeed clearly better for this problem.
        The surprise to me is that z3 beats boolector in single query mode and that boolector is super slow in incremental mode (which is the mode you seem to be using through the API AFAIK).

        The repo with the details is here
        https://github.com/rbonichon/smt-inverse-benchmark

  2. The difference is about 2x, and not 15x if you set the proper mode for z3 (although I agree that this should be documented better and/or it should be smarter about figuring out the required mode).

    Use the following diff:
    s = SolverFor(“QF_BV”)

  3. The difference is about 2x, and not 15x if you set the proper mode for z3 (although I agree that this should be documented better and/or it should be smarter about figuring out the required mode). Use the following:

    s = SolverFor(“QF_BV”)

    instead of

    s = Solver()

  4. I think it is not fair to call SMT solvers “theorem provers”. Yes, they are very useful, they can automate certain (sub)tasks, especially in real theorem provers, but they have some fundamental limitations. No SMT can deal on it’s own with problems like verification of floating point algorithms or floating point units. Much like bounded model checkers, great for bug hunting in control path, but totally inadequate for datapaths.

Leave a Reply

Your email address will not be published.

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

You may subscribe to this blog by email.