Beautiful Code – binary search

Somewhere in chapter seven of the book Beautiful Code the following challenge is given:

If you have never implemented binary search, or haven’t done so in a few years, I suggest you try that yourself before going forward; it will give you greater appreciation for what follows.

Well, binary search? What was that again? Well, it is explained before the challenge is given, let’s go back and read up:

As a quick refresher, a binary search is a simple and effective algorithm (but, as we’ll see, tricky to implement correctly) to determine whether a presorted array of numbers x[0..n-1] contains a target element t. If the array contains t, the program returns its position in the array; otherwise, it returns -1.

That looks easy. It means that if we have an array of numbers like this:

Array-index 0 1 2 3 4 5 n
Value 10 12 13 13 16 18 42

…let’s suppose our target is 12, the algorithm should return ‘1’ as the index-value that contains our targetvalue. If our target was 11, the algorithm to be implemented should return -1, meaning no index could be found because the value could not be found in the sorted array. The problem is easy to describe and follow, indeed.

On with the book, it even describes the algorithm before challenging us:

Binary search solves the problem by keeping track of the range within the array that holds t (if t is anywhere in the array). Initially, the range is the entire array. The range is shrunk by comparing its middle element to t and discarding half the range. The process continues until t is discovered in the array or until the range in which it must lie is known to be empty.

Most programmers think that with the above description in hand, writing the code is easy. They are wrong. The only way to believe this is by putting down this column right now and writing the code yourself. Try it.

If you have never implemented binary search, or haven’t done so in a few years, I suggest you try that yourself before going forward; it will give you greater appreciation for what follows.

What? Not easy to implement? Of course, as an experienced programmer, I took on this challenge. But I’m warned, so I will not write the algorithm from the top-of-my head. I know that off-by-one errors can catch me, so let’s do this by using TDD. That means the first thing we do, is write a simple test that calls the (not yet written) algorithm, and checks if the algorithm works. Here we go with our ‘coding kata

Start coding all right!

With TDD, we start easy, so let’s simply start with a list of one item that holds our target.

  [TestFixture]
  public class BinarySearchTest
  {

    [Test]
    public void OneItem_TargetExists_Success()
    {
      // Arrange an array with one element
      int[] sortedArray = new int[1];
      int targetValue = 10; // this is our target
      // set the value of first item to our target
      sortedArray[0] = targetValue; 
      int targetIndex = MyBinarySearch.Execute(sortedArray, targetValue);

      Assert.AreEqual(0, targetIndex, "0 should have been the result!");
    }
}

The first implementation of MyBinarySearch is easy. With TDD, we keep our code as simple as possible to pass the test:

 

public static class MyBinarySearch
{
  public static int Execute(List arrayList, int targetValue)
  {
    return 0; // just return the first index as result
  }
}

Well, in this case the code is too simple, that did not really do much as we did not even used the inserted array or even the targetValue at all! But that’s how we do it with TDD: start easy. The test is green, so let’s also add another test and refactor the code if necessary:

[Test]
public void OneItem_TargetDoesNotExists_Success()
{
  // Arrange
  int[] sortedArray = new int[1];
  int targetValue = 10; // this is our target
  sortedArray[0] = 20; // value will not be found
  int targetIndex = MyBinarySearch.Execute(sortedArray, targetValue);
  Assert.AreEqual(-1, targetIndex, "-1 should have been the result!");
}

// running the above test results in a failure, so we
// refactor.
// We need to change the binarySearch class,
// to support the case for not-found:
public static class MyBinarySearch
{
  public static int Execute(int[] sortedArray, int targetValue)
  {
    if (sortedArray[0] == targetValue)
    {
      return 0;
    }
    return -1; // target not found
  }
}

This still works, the value is either found or not-found, so it’s time to add some more tests!

Let’s start by adding two tests that have lists of more then just one item. We keep it simple and add two tests with just two items:

[Test]
public void TwoItems_TargetExists_Success()
{
  int[] sortedArray = new [] { 10, 20 };
  int targetValue = 20; // this is our target
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(1, targetIndex, "index 1 should have been the result!");
}

[Test]
public void TwoItems_TargetDoesNotExists_Success()
{
  int[] sortedArray = new[] { 10, 20 };
  int targetValue = 15; // this is our target
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(-1, targetIndex, "-1 should have been the result!");
}

Ok, this is great. Both of these new tests will fail for sure! As you see, I have already prepared to go to a different Entry method, called.. Entry, instead of Execute. This is because our current BinarySearch class only works for lists of one item. Eventually, we have to do something smart anyway, and we already know that we have to dive deep because we need some kind of recursion, so that is why I want to enter the BinarySearch class with a different Entry method. I usually want to keep the entry-method apart from the actual recursive method, although that might not always be necessary, it’s just the way I like to keep the methods apart.

Let’s also add a third new test that actually uses an array of 100 items, and we want to find one target, something like this:

[Test]
public void HundredItems_TargetExists_Success()
{
  // Arrange  
  int[] sortedArray = new int[100];
  for(int i=0;i<=99;i++)
  {
    sortedArray[i] = i;
  }
  int targetValue = 39;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(39, targetIndex, "Expected to find the target at index 39");
}

This last added test will also fail with the current code, and just adding alot of if-statements in the MyBinarySearch class won’t work either. We will really need a recursive method before continuing.

Writing a recursive method to zoom-in

So how to write a recursive method? With recursion, we usually have an entry-method and the recursive method itself that is called from the Entry-method. As the description of binarySearch explains, the recursive method should ‘zoom-in’ into the sub-range of the array where the targetValue should be. To keep track of the subRange, we must have markers that tell us what is the lower-end of the subRange, as well as the higher-end of the subrange.







The lines above show the general idea: we search the array by peaking at the value of the index half-way and chopping the array in two, and search onwards in either the bottom or the above half of the array to find the value we search for.

Also, a recursive method always needs a stop-condition, or else we might end up with an indefinite loop…let’s rewrite the MyBinarySearch class, and after that re-run all of the tests!

People who have never seen a recursive method, one that calls into itself, might feel a little awkard reading the next code-piece, but the ones that have already written some recursive methods will not be too surprised with the following first try….

We are going to keep the lower and upper bounds of the array we zoom into, and adjust the lower and upper bounds along the way.

public static class MyBinarySearch
{
  public static int Entry(int[] sortedArray, int targetValue)
  {
    // determine low and high
    int low = 0;
    int high = sortedArray.Length - 1;
    return SearchTarget(sortedArray, targetValue, low, high);
  }


  // recursive method:
  // the return is the index of the targetvalue we search for,
  // or -1 if the targetvalue is not found
  public static int SearchTarget(int[] sortedArray, int targetValue, int low, int high)
  {
  // stop condition.. we re-use the earlier code of the Execute method:
  if (high <= low) 
  { // range has shortened to (almost?) nothing:
    if (sortedArray[low] == targetValue) 
    { 
       return low; // index found! 
    } 
    return -1; // target not found 
  } // high > low means we can cut the range in half
  // but what half? the peakpoint should be the middle of low and high:
  int currentPeakPoint = (high - low)/2 + low;
  if (sortedArray[currentPeakPoint] < targetValue)
  {
     // we must search upper half:
     low = currentPeakPoint;
     return SearchTarget(sortedArray, targetValue, low, high);
  }
  else
  {
     // we must search lower half:
     high = currentPeakPoint;
     return SearchTarget(sortedArray, targetValue, low, high);
  }
  // I do not expect we will ever get here:
  throw new Exception("something went wrong!");
}

Ok, above is a first try. Of course, all tests need to be changed, they should now call the Entry() method instead of the Execute() method, that in turn will call the recursive method SearchTarget()…

So, I run the tests… but when I run my tests, I got a stack-overflow exception for the TwoItems_TargetExists_Success() test! How on earth did that happen?

Let’s debug… The code was continuously trying for the subRange 0..1, and again, 0..1, and again 0..1, the low-high-range of the array to search further was not altered at all. low did not get higher then 0, and high did not get lower then 1. in other words, ‘low’ was never increased. So the easiest thing to fix this is what I did below: I changed that one line of code from:

low = currentPeakPoint; // to
low = currentPeakPoint+1;

And also added some debugging statements to see what was actually happening in the recursive loop, and found that I could refactor the internal call to SearchTarget at the end of the method a bit. It is not needed to call it in both the internal if…else.. branches, one time at the end is good enough. I ended up with the following recursive method:

public static int SearchTarget(int[] sortedArray, int targetValue, int low, int high)
{
  Console.WriteLine(string.Format("target {0}, low {1}, high {2}",targetValue,low,high));
  // stop condition
  if (high <= low) 
  { 
    if (sortedArray[low] == targetValue) 
    { 
      return low; // index found! 
    } 
    Console.WriteLine(string.Format("target {0} does not match value {1} at index {2}", targetValue, sortedArray[low], low)); 
    return -1; // target not found } // high > low means we can cut the range in half
  // but what half? the peakpoint should be the middle of low and high:
  int currentPeakPoint = (high - low)/2 + low;
  Console.WriteLine(string.Format("Peaking at:{0}",currentPeakPoint));
  if (sortedArray[currentPeakPoint] < targetValue)
  {
    // we must search upper half:
    low = currentPeakPoint +1; // this +1`I did not have at first.
  }
  else
  {
    // we must search lower half:
    high = currentPeakPoint;
  }
  return SearchTarget(sortedArray, targetValue, low, high);
}

And, all tests green!

Even more tests!

Of course, I added more tests, this time real tests with larger lists, edge-cases, values that are outside the bounds of the sorted-array… just to be sure the method I wrote worked. But the additional tests never broke the code. I present all the other tests that I wrote here just for completeness-sake…

[Test]
public void HundredItems_TargetExists_Success()
{
  int[] sortedArray = new int[100];
  for(int i=0;i<=99;i++)
  {
    sortedArray[i] = i;
  }
  int targetValue = 39;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(39, targetIndex, "Expected to find the target at index 39");
}

[Test]
public void HundredItems_TargetDoesNotExists_Success()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i;
  }
  int targetValue = 139;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(-1, targetIndex, "Expected to not find the target");
}

[Test]
public void UnEvenItems_TargetExists_Success()
{
  int[] sortedArray = new int[99];
  for (int i = 0; i <= 98; i++)
  {
    sortedArray[i] = i;
  }
  int targetValue = 50;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(sortedArray[targetIndex], targetValue, "Expected to find the target");
}

[Test]
public void HundredItems_TargetDoesNotExists_ButWithinRange_Success()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i*2;
  }
  int targetValue = 139;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(-1, targetIndex, "Expected to not find the target");
}

[Test]
public void TenThousandItems_TargetExists_Success()
{
  int[] sortedArray = new int[10000];
  for (int i = 0; i <= 9999; i++)
  {
    sortedArray[i] = i * 3;
  }
  int targetValue = 1234*3;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(sortedArray[targetIndex],targetValue, "Expected to find the target");
}

[Test]
public void EdgeTargetLow_Success()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i * 2;
  }
  int targetValue = 0;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(sortedArray[0], targetValue, "Expected to find the target");
}

[Test]
public void EdgeTargetHigh_Success()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i * 2;
  }
  int targetValue = 99 * 2;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(sortedArray[99], targetValue, "Expected to find the target");
}

[Test]
public void TargetBelowLowest_DoesNotExist()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i * 2;
  }
  int targetValue = -1;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
  Assert.AreEqual(targetIndex,-1, "Expected to not find the target");
}

[Test]
public void TargetAboveHighest_DoesNotExist()
{
  int[] sortedArray = new int[100];
  for (int i = 0; i <= 99; i++)
  {
    sortedArray[i] = i * 2;
  }
  int targetValue = 99*2+1;
  int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
Assert.AreEqual(targetIndex, -1, "Expected to not find the target");
}

And as an example, the satisfying output from the 10.000 search:

target 3702, low 0, high 9999
Peaking at:4999
target 3702, low 0, high 4999
Peaking at:2499
target 3702, low 0, high 2499
Peaking at:1249
target 3702, low 0, high 1249
Peaking at:624
target 3702, low 625, high 1249
Peaking at:937
target 3702, low 938, high 1249
Peaking at:1093
target 3702, low 1094, high 1249
Peaking at:1171
target 3702, low 1172, high 1249
Peaking at:1210
target 3702, low 1211, high 1249
Peaking at:1230
target 3702, low 1231, high 1249
Peaking at:1240
target 3702, low 1231, high 1240
Peaking at:1235
target 3702, low 1231, high 1235
Peaking at:1233
target 3702, low 1234, high 1235
Peaking at:1234
target 3702, low 1234, high 1234

With these tests, I’m pretty confident I was succesfull to the challenge I got from the book, and I’m happy to read on in Beautiful code…

After reading the chapter

I must say when I finished the chapter I was rather dissapointed. It turned out that the presented solution was not using a recursive loop at all, but a for loop instead. While recursion is so elegant as you don’t need to take care of a stack yourself… oh well..

Must say that except for a few chapters, the rest of the 30 chapters of the book were not that gleaming either. Most of the chapters are clearly from developers who are very proud of their code; that’s ok, but did they always keep their reader in mind?

And with this blog-article, you also have my review about the contents of the book… decide for yourselves!

Beautiful Code: Leading Programmers Explain How They Think (Theory in Practice (O’Reilly))Computer Programmer Books)

Posted in Code, developerslifestyle, Searchtree