The Java (not really) Faster than C Benchmark

ShangShujie 2006-09-16

展开全文

The Java (not really) Faster than C++ Benchmark

Last update: Tue Jun 27 00:28:45 CEST 2006

Introduction

When I first read "Java faster than C++ benchmark", I was sure that there was something wrong with it. After all, Java couldn‘t be faster that C++, right? What would be next? C++ faster than C? C faster than Assembler? After quite a while I‘ve found some time to update the results using brand new JVMs and GCC versions.

Benchmark environment

The tests were performed on Linux 2.6, AMD Athlon(tm) XP 2500+ (Barton). No other load was placed on the machine during tests, run level 1 (single user mode) had been used. GNU C Library had optimizations for i686 (standard Debian package, this should be of benefit for both Java and C++). I compiled GCC myself from the standard release. Please note that the result of this benchmark (I am talking about "C++ faster than Java" result) is valid for GCC 3.4.4 and 4.0.3 as well, but GCC series 4.1.x is even faster than earlier GCCs.

Testing conditions

I‘ve kept the original number of repetitions in all tests.

Time was measured in the same way as in the original results, with one exception: the time used as benchmark result was elapsed real (wall clock) time used by the process. If we stick to the original method of measurement, the results will be incorrect and biased towards Java - because of threads used by the JVM. Wall clock time is better and more accurate if the machine operates under no load - as was the case here.

Java

Java was compiled with standard settings, using Java compiler from Sun in the same version as the JVM that was used later to do the actual benchmark.

To run hash, heapsort and strcat tests I had to increase the heap size. It‘s pretty much standard practice for enterprise software, so I do not see it as a drawback of Java.

C++

When looking at C++ code, I‘ve noticed many performance problems, which probably may go unnoticed for people with strong Java background. In other words, to make this benchmark fair, I had to make some modifications to the original code. In doing so, I tried to make the code as close to the original as possible, even if my personal coding style is completely different.

For C++ compilation, I‘ve used the following options:

-O2 -fomit-frame-pointer -finline-functions -march=athlon-xp

I consider the first two options standard for compilation (all major Linux distributions, Linux kernel and many others use these two for compilation).

-finline-functions is used to tell the compiler to guess which functions should be inlined (Java Server VM also does that, that‘s one of the reasons for which it performs better than Client VM).

Athlon-XP - well, that‘s my machine. I could use pentium instead (which could be considered more standard), but it would not change the overall result - that, is C++ being the clear winner. So, let‘s get down to business, shall we?

Changes to the original benchmark

Here‘s the list of changes that I‘ve made to the C++ programs. You can download the modified code here.

ackermann.cpp
No changes to the code here. The clue to improving performance of C++ lies in Java results. Client VM does not finish the test - stack overflow ocurs. Server VM passes the test and the result is much better than C++. The easiest answer is that Server VM does not recurse as much as Client VM, because of inlining. In GCC you can enable automated inlining by adding -finline-functions option. Since the recursion is very deep here, this program will benefit a lot. This option was used in all tests, not only here.

Note that it is possible to make this program even faster (up to twice as fast) if you enable more aggressive inlining in GCC. I decided against doing it, but in real life, if you know that your program does a lot of recursion it may be worth a try.
fibo.cpp
No changes here.
hash.cpp
hash2.cpp
Both of these have the same problem. If you use map in C++, you use a TreeMap equivalent, not HashMap equivalent. TreeMaps are much worse (of course) when it comes to performing (huge amounts) of insertions on them. In C++, if you want hash maps, you have to use hash_map class (obsolete) or unordered_map (soon to be part of C++ standard, already available in GCC 4.x). I‘ve rewritten both of these to use unordered_map just to stay on the bleeding edge ;-) . Performance results are similar with both hash_map and unordered_map. If you look closely at C++ code you will notice that a benchmark called "Java prettier that C++" would make an interesting read.
heapsort.cpp
No changes here.
matrix.cpp
No changes here.
methcall.cpp
OK. So what this test actually does is it calls two methods 1 000 000 000 times each. Java uses references to access objects, so I‘ve changed C++ code to use references too.

I also inverted the loop direction, since comparison against 0 is faster - normally I would leave it as it was, but since this was the only benchmark in which Java Server came close to C++, I had to do something ;-) . Oh, and inverting the loop direction has no effect whatsoever on Java, so I left the code as it was.

If you look at the results, you will notice that only the poor Java Client VM actually tried to invoke the methods - both Java Server and GCC inlined the methods, which resulted in a much better performance.
nestedloop.cpp
No changes here.
objinst.cpp
Before I explain the changes, a word of explanation about differences in memory management between Java and C++. (What follows is all very simplified). Java allocates new memory blocks on it‘s internal heap (which is allocated in huge chunks from the OS). In this way, in most of the cases it bypasses memory allocation mechanisms of the underlying OS and is very fast. If you allocate memory on heap in C++, each allocation request will be sent to the operating system, which is slow. That‘s why Java won in the original benchmark.

The thing is, in C++ you don‘t have to use the default allocation algorithm - you have other options. You can either change the allocation behaviour in C++ to allocate memory exactly in the way Java does (which will be very fast), or allocate memory on the stack (which is more or less similar to the way memory management in Java works, if you grossly oversimplify the problem).

If you look at the code which allocates code on stack, you will notice that it is much simpler than the original version. And if you look at the results you will notice that it‘s much faster, too. Actually, so fast, that it didn‘t even register on the graph ;-)
random.cpp
No changes here.
sieve.cpp
Rewritten. I think that original code was very unfair for C++:
1. C++ used vector container and Java used an array of primitive types.
2. C++ version counted the items one extra time at the end ( count() call )
3. C++ used iterators and Java used indexing (actually it does not matter all that much for performance, but makes the code look ugly)
I‘ve replaced vector with an array (like in Java), removed the iterators (obviously) and removed the final count() call. The code is now more or less equivalent to Java version.
strcat.cpp
I simplified the code, because it basically did what string implementation already does internally. I also removed the reserve() call from code - although keeping it makes C++ faster, I think it‘s cheating. If you actually reserve the string size (the way original code does it), you will get 20% performance increase. You cannot do this in Java. See, I am fair.
sumcol.cpp
I added ios_base::sync_with_stdio() call to make sure that C++ does not keep the stream in sync with C functions all the time - no reason to do it. The code can be made faster by 50% if C routines (fgets) are used to access stdin instead of C++ streams. I decided not to do it to keep the code as close as possible to the original version.
wc.cpp
Only sync_with_stdio was added. It does improve the results a bit, but C++ was already fast anyway.

Results / And the winner is...

Every test ended with C++ performing better than an equivalent Java program.

Individual results for Java 1.5 vs GCC 4.1.0
Individual results for Java 1.6 Beta vs GCC 4.1.1 Warning! This is Beta release of JVM, performance of the final release will probably differ!
Individual results for Java 1.6 Beta 2 vs GCC 4.1.1 Warning! This is Beta release of JVM, performance of the final release will probably differ!

Conclusion

The most important conclusion is obvious. (For this set of benchmarks,) C++ is clearly the winner.

Second conclusion - don‘t use Client VM in Java.

Unfortunately, there‘s also a third conclusion. It seems that it‘s much, much easier to create a well performing program in Java. So, please consider it for a moment before you start recoding your Java program in C++ just to make it faster...

Java 1.6.0 Beta specific remarks

Client vs Server VM

It seems that client VM in 1.6.0 is tuned a little bit better than in 1.5.0. Still, in the final release server VM will probably perform significantly better than client VM.

Performance issues in Beta

Actually there are not that many. If we don‘t take into account Ackermann test, 1.6.0 Server VM performs comparably to 1.5.0 Server VM. The most likely reason for problems in Ackermann is lack of inlining (or limited inlining) in Beta release. Hopefully in the final version inlining will be on par with 1.5.0, which would mean that 1.6.0 will actually get significantly better than 1.5.0 in overall results (~15 seconds).

If you compare 1.6.0 Beta 1 and 1.6.0 Beta 2, performance improvement is easily noticeable. Beta 2 completed the tests nearly 30 seconds faster than Beta 1. It‘s still slower than 1.5.0, but it‘s getting there.

Please note that I had to increase the stack size in Ackermann test for Java 1.6.0 - otherwise neither of the JVM versions would finish the test. This further supports the theory about limited inlining.

Back to my homepage

Przemyslaw Bruski, SCJP - mail me at [pbruskispam "at" op "dot" pl]