Fastest way to sort an Array in Java

Sorting arrays efficiently is crucial for optimizing performance in Java applications. With numerous sorting algorithms and built-in methods available, it’s essential to identify the most efficient approach for different scenarios. Let’s delve into a comparative analysis of various sorting methods to determine the fastest way to sort a Java array.

Introduction

Sorting large arrays efficiently can significantly impact application performance. Java provides several built-in methods and algorithms to sort arrays, each with its advantages and performance characteristics.

Benchmarking Setup

To determine the fastest sorting method, we set up a benchmarking test with the following approaches:

  • Arrays.sort(): Java’s built-in method for sorting arrays.
  • Arrays.parallelSort(): Parallelized version of Arrays.sort() for improved performance on multi-core processors.
  • Comparator Interface: Sorting using a custom comparator for comparison purposes.
  • Collections.sort(): Sorting using the Collections framework.

Test Environment

For the benchmark, we generated a sizable array of integers (1,000,000 elements) filled with random values. This size ensures a robust comparison between methods.

Here is the sample Class we will use:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.Random;

public class SortingPerformanceTest {

    private static final int DATA_SIZE = 1000000; // Adjust this value for different data sizes

    public static void main(String[] args) {
        int[] data = new int[DATA_SIZE];

        Random random = new Random();
        for (int i = 0; i < DATA_SIZE; i++) {
            data[i] = random.nextInt();
        }

        System.out.println("Sorting algorithm\tExecution time (ms)");

        // Arrays.sort()
        int[] sortedData = Arrays.copyOf(data, DATA_SIZE);
        long startTime = System.currentTimeMillis();
        Arrays.sort(sortedData);
        long endTime = System.currentTimeMillis();
        System.out.println("Arrays.sort()\t\t\t" + (endTime - startTime));

        // Arrays.parallelSort()
        sortedData = Arrays.copyOf(data, DATA_SIZE);
        startTime = System.currentTimeMillis();
        Arrays.parallelSort(sortedData);
        endTime = System.currentTimeMillis();
        System.out.println("Arrays.parallelSort()\t" + (endTime - startTime));

        // Comparator interface
        List<Integer> list = new ArrayList<>(DATA_SIZE);
        for (int i : data) {
            list.add(i);
        }
        Comparator<Integer> comparator = (a, b) -> Integer.compare(a, b);
        startTime = System.currentTimeMillis();
        Collections.sort(list, comparator);
        endTime = System.currentTimeMillis();
        System.out.println("Comparator interface\t\t" + (endTime - startTime));

        // Collections.sort()
        list = new ArrayList<>(DATA_SIZE);
        for (int i : data) {
            list.add(i);
        }
        startTime = System.currentTimeMillis();
        Collections.sort(list);
        endTime = System.currentTimeMillis();
        System.out.println("Collections.sort()\t\t" + (endTime - startTime));
    }
}

The provided Java code benchmarks the performance of various sorting algorithms for a dataset of a specified size. It initializes an array with random integers and then evaluates the execution times of different sorting methods using this dataset.

To learn more about how to initialize an Array in Java we recommend the following article: How to init an Array in Java made simple

Benchmark Results

Here is the outcome of our micro benchmark:

Sorting algorithm	Execution time (ms)
Arrays.sort()			203
Arrays.parallelSort()	117
Comparator interface		305
Collections.sort()		314

Let’s examine the performance results of each sorting method:

  • Arrays.sort() is the second fastest sorting method with an execution time of 203 ms.
  • Arrays.parallelSort() is the fastest sorting method with an execution time of 117 ms.
  • Comparator interface is the third fastest sorting method with an execution time of 305 ms.
  • Collections.sort() is the slowest sorting method with an execution time of 314 ms.

Detailed Analysis:

  • Arrays.sort() is the default sorting method in Java and is known for its efficient implementation. It utilizes a hybrid sorting algorithm that combines merge sort and insertion sort, making it suitable for sorting arrays of various sizes and data distributions.
  • Arrays.parallelSort() is a parallel implementation of Arrays.sort() that leverages the ForkJoin Framework to divide the sorting task into smaller subtasks and execute them concurrently on multiple threads. This parallelization can significantly improve performance for large arrays on multicore systems.
  • Comparator interface allows for customized comparison logic for sorting arrays of objects. While it provides flexibility, it incurs some overhead due to the additional comparison function call for each element.
  • Collections.sort() operates on List collections and utilizes the same sorting mechanism as Arrays.sort(). However, the additional overhead of managing the List data structure contributes to its slightly slower performance compared to Arrays.sort().

Recommendations:

  • For general-purpose sorting of primitive data types, Arrays.sort() is the recommended choice due to its efficiency and ease of use.
  • For large arrays on multicore systems, Arrays.parallelSort() can provide significant performance improvements.
  • When sorting objects with custom comparison logic, the Comparator interface is necessary. However, consider optimizing the comparison function to minimize overhead.
  • For sorting List collections, Collections.sort() is a convenient option but may be slightly slower than Arrays.sort() for primitive data types.

Why processing a sorted array is faster than a non-sorted one?

After discussing the performance of sorting an Array in Java, let’s consider a side aspect. Why a sorted array takes less time to be processed?

Consider the following code:

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // Generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;

        // This will improve performance in the next step
        Arrays.sort(data);

        // Test
        long start = System.nanoTime();
        long sum = 0;
        for (int i = 0; i < 100000; ++i)
        {
            for (int c = 0; c < arraySize; ++c)
            {   // Primary loop.
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

In the above code example, we are looping over an Array of random integers, after we apply the Arrays.sort() . As you can see, by sorting the Array makes a difference in terms of performance comparing to an Array which is not sorted. Why?

Sorting the array before the primary loop indeed improves its performance due to the elimination of branch prediction penalties. Branch prediction is a hardware mechanism that attempts to anticipate the direction a conditional branch will take (e.g., if-else statements). This anticipation allows the processor to fetch instructions from the predicted path before the branch outcome is known, potentially reducing the number of pipeline stalls caused by branch mispredictions

In the provided code, the primary loop iterates over the entire array and checks whether each element is greater than or equal to 128. This conditional check introduces a branch prediction penalty, as the processor needs to speculate whether the condition will be true or false for each element.

java array sort performance

Sorting the array beforehand eliminates this branch prediction overhead. Since the array is already sorted in ascending order, all elements accessed by the primary loop will be greater than or equal to 128, making the condition always true. Consequently, the branch prediction penalty is removed, resulting in smoother instruction pipeline execution and improved performance.

This optimization demonstrates the importance of data arrangement in optimizing code performance. By pre-sorting the data, you can often eliminate unnecessary conditional checks and reduce branch prediction penalties, leading to significant performance gains.

Reference: https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array/11227902#11227902

Final Thoughts

Efficient sorting is critical for optimal performance in Java applications dealing with large datasets. By selecting the appropriate sorting method based on performance benchmarks and application needs, developers can enhance their applications’ efficiency and responsiveness.

Through this comparative analysis, we’ve highlighted the fastest ways to sort a Java array, enabling developers to make informed decisions for their sorting needs.