Saturday, 7 June 2014

Binary Search vs Contains Performance in Java List

There are two ways to search an element in a List class, by using contains() method or by using Collections.binarySearch() method. There are two versions of binarySearch() method, one which takes a List and Comparator and other which takes a List and Comparable. This method searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the natural ordering of its elements (as by the sort(List) method) prior to making this call. If List is not sorted, then results are undefined. If the List contains multiple elements equal to the specified object, there is no guarantee which one will be returned. This method runs in log(n) time for a "random access" list (which provides near-constant-time positional access). If the specified list does not implement the RandomAccess interface and is large, this method will do an iterator-based binary search that performs O(n) link traversals and O(log n) element comparisons. In the end this method returns the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. This means that return value will be >= 0 if and only if the key is found. Since common implementation of List interface e.g. ArrayList, Vector, CopyOnWriteArrayList and Stack implements RandomAccess interface, they can be used for performing binary search, but there are other implementations like LinkedList, which doesn't implement java.util.RandomAccess, hence not suitable for binary search operation. Since binary search can only be performed in sorted list, you also need to sort your collection before doing search, which may potentially impact performance, especially if your List is big and not in sorted order already.


Java Code for contains() Vs binarySearch()

Binary search vs contains in Java ListHere is our program to find object using binary search in Java List. We have a list of Integer with 1M records and uses both contains() and binarySearch() method to search an element.


import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/** 
  * Java program to perform binary search in Java collection e.g List, Set
  * @author Javin Paul
  */
public class BinarySearchTest {
    public static final Logger logger = LoggerFactory.getLogger(BinarySearchTest.class);
   
    public static void main(String args[]) {       
  
        //creating List
        List<Integer> numbers = new ArrayList<Integer>(1000000); //List of 1M records
       
        //initializing List
        for(int i =0; i<numbers.size(); i++){
            numbers.add(new Integer(i));
        }
      
        //performing contains search
        long startTime = System.nanoTime();
        boolean isExist = numbers.contains(new Integer(1000000));
        long totalTime = System.nanoTime() - startTime;
      
       
        logger.info("Time to search 1Mth Record using contains() is {} nano seconds", totalTime);
       
        //performing binary search
        startTime = System.nanoTime();
        Collections.sort(numbers);  // List needs to be sorted for Binary Search
        Integer number = Collections.binarySearch(numbers, new Integer(1000000));
        totalTime = System.nanoTime() - startTime;
       
        logger.info("Time to search 1Mth Record using binary search is {} nano seconds", totalTime);
    }
  
}

Ouput:
2013-06-04 23:23:17,834 0    [main] INFO  test.BinarySearchTest  - Time to search 1Mth Record using contains() is 51404 nano seconds
2013-06-04 23:23:17,849 15   [main] INFO  test.BinarySearchTest  - Time to search 1Mth Record using binary search is 554261 nano seconds

From the output, you can see that contains() method is almost 10 times faster than binary search, which means it make sense to use contains() for searching objects in List, especially for those which implements RandomAccess interface e.g. ArrayList.


No comments:

Post a Comment