Saturday, March 7, 2015

Profiling Java

Java is slow!?! Is it?

One common prejudice about Java applications is lack of speed compared to languages like C or C++. Benchmarking Java requires a some knowledge about the way Java is executing code to assure, the benchmark is benchmarking the the code, and not the coding style of the developer.
While this post concentrates more on the HOW to measure, the following posts will concentrate on common (anti-)patterns and their impact on performance, readability and development speed.

Measuring the execution time of a program or method usually looks like this:

public class PerformanceAnalysis {
    public static void main(String[] args) {
        long startTime = System.currentTimeMillis();
        // something to be measured
        int result = 0;
        for (int i = 0; i < 10000000; i++) {
            result += i;
        }
        System.out.println(
            System.currentTimeMillis() - startTime);
    }
}

On my system, this simply writes a "30" to the console. On your system this value might differ. Please not that executing this this multiple times will lead to a spread which is created by system load and other processes. To get a "real" result, execute a few times - you will see a tendency. Please note that in this simple tests, the changes in executing time are not significant and it should stay almost constant.

While this approach seams to be reasonable, it has a few flaws which affect the results:
  1. A granulation of milliseconds might not lead to senseful results.
  2. This test is not object oriented - Java programs are.
  3. The result is not used after the loop.
  4. The output of the test might affect the measurement more than necessary.
  5. The test is interpreted without the JIT-compiler.
To see the changes of each point, you should edit the source code as specified, recompile, start and compare the results.

    Problem 1:

    Java has the possibility to use a granulation of nano seconds for differential measurements. While this can not be used for absolute times, it is a perfect means to measure execution times.

    Solution:

    Replace calls to System.currentTimeMillis() by System.nanoTime().
    >> execution time: 28921134

    Problem 2:

    The test code is completely static and does not use any class or object references. This is not typical for Java code, so the JIT behavior is different.

    Solution:

    Pack the code in an object and excute it there.

    public class PerformanceAnalysis {
        public static void main(String[] args) {
            PerformanceAnalysis analysis
                = new PerformanceAnalysis();
            analysis.perform();
        }

        public void perform() {
            ... test code as before ...
        }
    }

    >> execution time: 28440079

    Problem 3:
    Optimization might remove the loop as it has no side affects. Therefore you might not get the expected results.

    Solution:

    Use the result after creation. This can be achieved by simply writing it to the console after the loop.
    >> execution time: 29159851    49999995000000 

    Problem 4:

    The class containing the System.out.println(...) method  might not yet have been loaded. This code will include the time needed for class loading and thereby lead to wrong results.

    Solution:

    Replace the printout of the result by:


        long endTime = System.nanoTime();
        System.out.println(endTime - startTime + "\t" + result);

    This will decouple class loading from the measurement.
    >> execution time: 26237273    49999995000000

    Problem 5:

    Java code is not compiled to executables as for instance C/C++ code. A special format (Java byte code) is created which itself is interpreted on the target system at runtime. While this leads to high portability (in principle - another topic), the drawback is a significant loss in performance. To solve this problem, Sun developed the JIT-compiler (JIT: just in time) which itself analyses the byte code and will create native code on the target system when it detects, that the code is needed more often. To be more precise, The code is compiled when the code is used the third time. Our code is excecuted exactly one time, so the JIT will not be used at all. Performance critical code is usually executed more than once, so in real life you do not have to care about this. In synthetic benchmarks (like this is) this has to be considered.

    Solution:

    Put the whole code in a loop and execute it multiple times. To make sure all paths are executed more than three times, the number of executions is difficult to guess. I usually use 10 times.

    In this simple example the result after Solution 5 will not very much differ from the previous ones, as the time consumption is produced in a loop which itself will be optimized on the first run already. If you have large, costly operations the result will be markable.

    public class PerformanceAnalysis {
        public static void main(String[] args) {
            PerformanceAnalysis analysis 
                = new PerformanceAnalysis();
            for (int j = 0; j < 10; j++) {
                analysis.perform();
            }
        }

        public void perform() {
            long startTime = System.nanoTime();
            // something to be measured
            int result = 0;
            for (int i = 0; i < 10000000; i++) {
                result += i;
            }
            long endTime = System.nanoTime();
            System.out.println(
                (endTime - startTime) + "\t" + result);
        }
    }
     

    Conclusion

    - Try to create object based test code - the compiler is optimized...
    - Ensure relevance of the code - the optimizer is more clever as you might expect!
    - Reduce side effects - profile the function, not the measurement.
    - Profiling takes time -  think about required cycles.


    All further "mini-benchmarks" will follow these guidelines.

        ... to be continued

    No comments:

    Post a Comment