Monthly Archives: May 2011

Beautiful Code – binary search

Lees voor met webReader

Somewhere in chapter seven of the book Beautiful Code the following challenge is given:

If you have never implemented binary search, or haven’t done so in a few years, I suggest you try that yourself before going forward; it will give you greater appreciation for what follows.

Well, binary search? What was that again? Well, it is explained before the challenge is given, let’s go back and read up:

As a quick refresher, a binary search is a simple and effective algorithm (but, as we’ll see, tricky to implement correctly) to determine whether a presorted array of numbers x[0..n-1] contains a target element t. If the array contains t, the program returns its position in the array; otherwise, it returns -1.

That looks easy. It means that if we have an array of numbers like this:

Array-index 0 1 2 3 4 5 n
Value 10 12 13 13 16 18 42

…let’s suppose our target is 12, the algorithm should return ’1′ as the index-value that contains our targetvalue. If our target was 11, the algorithm to be implemented should return -1, meaning no index could be found because the value could not be found in the sorted array. The problem is easy to describe and follow, indeed.

On with the book, it even describes the algorithm before challenging us:

Binary search solves the problem by keeping track of the range within the array that holds t (if t is anywhere in the array). Initially, the range is the entire array. The range is shrunk by comparing its middle element to t and discarding half the range. The process continues until t is discovered in the array or until the range in which it must lie is known to be empty.

Most programmers think that with the above description in hand, writing the code is easy. They are wrong. The only way to believe this is by putting down this column right now and writing the code yourself. Try it.

If you have never implemented binary search, or haven’t done so in a few years, I suggest you try that yourself before going forward; it will give you greater appreciation for what follows.

What? Not easy to implement? Of course, as an experienced programmer, I took on this challenge. But I’m warned, so I will not write the algorithm from the top-of-my head. I know that off-by-one errors can catch me, so let’s do this by using TDD. That means the first thing we do, is write a simple test that calls the (not yet written) algorithm, and checks if the algorithm works. Here we go with our ‘coding kata

Start coding all right!

With TDD, we start easy, so let’s simply start with a list of one item that holds our target.

 C# |  copy code |? 
01
 [TestFixture]
02
 public class BinarySearchTest
03
 {
04
   [Test]
05
   public void OneItem_TargetExists_Success()
06
   {
07
    int[] sortedArray = new int[1];
08
    int targetValue = 10; // this is our target
09
    sortedArray[0] = targetValue; // set value of first item to our target
10
 
11
    int targetIndex = MyBinarySearch.Execute(sortedArray, targetValue);
12
 
13
    Assert.AreEqual(0, targetIndex, "0 should have been the result!");
14
   }
15
}

The first implementation of MyBinarySearch is easy. With TDD, we keep our code as simple as possible to pass the test:

 C# |  copy code |? 
1
public static class MyBinarySearch
2
{
3
    public static int Execute(List<int> arrayList, int targetValue)
4
    {
5
        return 0; // just return the first index as result
6
    }
7
}
8

Well, in this case the code is too simple, that did not really do much as we did not even used the inserted array or even the targetValue at all! But that’s how we do it with TDD: start easy. The test is green, so let’s also add another test and refactor the code if necessary:

 C# |  copy code |? 
01
[Test]
02
public void OneItem_TargetDoesNotExists_Success()
03
{
04
   int[] sortedArray  = new int[1];
05
   int targetValue = 10; // this is our target
06
   sortedArray[0] = 20; // value will not be found 
07
   int targetIndex = MyBinarySearch.Execute(sortedArray, targetValue);
08
   Assert.AreEqual(-1, targetIndex, "-1 should have been the result!");
09
}
10
 
11
// running the above test results in a failure, so we
12
// refactor.
13
// We need to change the binarySearch class,
14
// to support the case for not-found:
15
public static class MyBinarySearch
16
{
17
    public static int Execute(int[] sortedArray, int targetValue)
18
    {
19
        if (sortedArray[0] == targetValue)
20
        {
21
            return 0;    
22
        }
23
        return -1; // target not found
24
 
25
    }
26
}

This still works, the value is either found or not-found, so it’s time to add some more tests! Let’s start by adding two tests that have lists of more then just one item. We keep it simple and add two tests with just two items:

 C# |  copy code |? 
01
02
[Test]
03
public void TwoItems_TargetExists_Success()
04
{
05
   int[] sortedArray = new [] { 10, 20 };
06
   int targetValue = 20; // this is our target
07
   int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
08
   Assert.AreEqual(1, targetIndex, "index 1 should have been the result!");
09
}
10
 
11
[Test]
12
public void TwoItems_TargetDoesNotExists_Success()
13
{
14
   int[] sortedArray = new[] { 10, 20 };
15
   int targetValue = 15; // this is our target
16
   int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
17
   Assert.AreEqual(-1, targetIndex, "-1 should have been the result!");
18
}

Ok, this is great. Both of these new tests will fail for sure! As you see, I have already prepared to go to a different Entry method, called.. Entry, instead of Execute. This is because our current BinarySearch class only works for lists of one item. Eventually, we have to do something smart anyway, and we already know that we have to dive deep because we need some kind of recursion, so that is why I want to enter the BinarySearch class with a different Entry method.

Let’s also add a third new test that actually uses an array of 100 items, and we want to find one target, something like this:

 C# |  copy code |? 
01
02
   [Test]
03
    public void HundredItems_TargetExists_Success()
04
    {
05
        int[] sortedArray = new int[100];
06
        for(int i=0;i<=99;i++)
07
        {
08
            sortedArray[i] = i;
09
        }
10
        int targetValue = 39;
11
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
12
        Assert.AreEqual(39, targetIndex, "Expected to find the target at index 39");
13
    }
14

This last added test will also fail with the current code, and just adding alot of if-statements in the MyBinarySearch class won’t work either. We will really need a recursive method before continuing.

Writing a recursive method to zoom-in

So how to write a recursive method? With recursion, we usually have an entry-method and the recursive method itself that is called from the Entry-method. As the description of binarySearch explains, the recursive method should ‘zoom-in’ into the sub-range of the array where the targetValue should be. To keep track of the subRange, we must have markers that tell us what is the lower-end of the subRange, as well as the higher-end of the subrange.

Also, a recursive method always needs a stop-condition, or else we might end up with an indefinite loop…let’s rewrite the MyBinarySearch class, and after that re-run all of the tests! People who have never seen a recursive method, one that calls into itself, might feel a little awkard reading the next code-bit, but the ones that have already written some recursive methods will not be too surprised with the following first try….

 C# |  copy code |? 
01
02
public static class MyBinarySearch
03
{
04
    public static int Entry(int[] sortedArray, int targetValue)
05
    {
06
        // determine low and high
07
        int low = 0;
08
        int high = sortedArray.Length - 1;
09
        return SearchTarget(sortedArray, targetValue, low, high);
10
    }
11
 
12
    // recursive method:
13
    // the return is the index of the targetvalue we search for,
14
    // or -1 if the targetvalue is not found
15
    public static int SearchTarget(int[] sortedArray, int targetValue, int low, int high)
16
    {
17
        // stop condition.. we re-use the earlier code of the Execute method:
18
        if (high <= low)
19
        {
20
            // range has shortened to (almost?) nothing:
21
            if (sortedArray[low] == targetValue)
22
            {
23
                return low; // index found!
24
            }        
25
            return -1; // target not found
26
        }
27
        // high > low means we can cut the range in half
28
        // but what half? the peakpoint should be the middle of low and high:
29
        int currentPeakPoint = (high - low)/2 + low;
30
        if (sortedArray[currentPeakPoint] < targetValue)
31
        {
32
            // we must search upper half:
33
            low = currentPeakPoint; 
34
            return SearchTarget(sortedArray, targetValue, low, high);
35
        }
36
        else
37
        {
38
            // we must search lower half:
39
            high = currentPeakPoint;
40
            return SearchTarget(sortedArray, targetValue, low, high);
41
        }
42
        // I do not expect we will ever get here:
43
        throw new Exception("something went wrong!");
44
    }
45

Ok, above is a first try. Of course, all tests need to be changed, they should now call the Entry() method instead of the Execute() method, that in turn will call the recursive method SearchTarget()…

So, I run the tests… but when I run my tests, I got a stack-overflow exception for the TwoItems_TargetExists_Success() test! How on earth did that happen?

Let’s debug… The code was continuously trying for the subRange 0..1, and again, 0..1, and again 0..1, the low-high-range was not altered at all. low did not get higher then 0, and high did not get lower then 1. in other words, ‘low’ was never increased. So the easiest thing to fix this is what I did below: I changed that one line of code from:

 C# |  copy code |? 
1
2
            low = currentPeakPoint; // <-- goes wrong
3
// to
4
            low = currentPeakPoint+1; 
5

And also added some debugging statements to see what was actually happening in the recursive loop, and found that I could refactor the internal call to SearchTarget at the end of the method a bit. It is not needed to call it in both the internal if…else.. branches, one time at the end is good enough. I ended up with the following recursive method:

 C# |  copy code |? 
01
02
 public static int SearchTarget(int[] sortedArray, int targetValue, int low, int high)
03
    {
04
        Console.WriteLine(string.Format("target {0}, low {1}, high {2}",targetValue,low,high));
05
        // stop condition
06
        if (high <= low)
07
        {
08
            if (sortedArray[low] == targetValue)
09
            {
10
                return low; // index found!
11
            }
12
            Console.WriteLine(string.Format("target {0} does not match value {1} at index {2}", targetValue, sortedArray[low], low));
13
            return -1; // target not found
14
        }
15
        // high > low means we can cut the range in half
16
        // but what half? the peakpoint should be the middle of low and high:
17
        int currentPeakPoint = (high - low)/2 + low;
18
        Console.WriteLine(string.Format("Peaking at:{0}",currentPeakPoint));
19
        if (sortedArray[currentPeakPoint] < targetValue)
20
        {
21
            // we must search upper half:
22
            low = currentPeakPoint +1; // this +1`I did not have at first.
23
        }
24
        else
25
        {
26
            // we must search lower half:
27
            high = currentPeakPoint;
28
        }
29
        return SearchTarget(sortedArray, targetValue, low, high);
30
    }
31

And, all tests green!

Even more tests!

Of course, I added more tests, this time real tests with larger lists, edge-cases, values that are outside the bounds of the sorted-array… just to be sure the method I wrote worked. But the additional tests never broke the code. I present all the other tests that I wrote here just for completeness-sake…

 C# |  copy code |? 
001
002
   [Test]
003
    public void HundredItems_TargetExists_Success()
004
    {
005
        int[] sortedArray = new int[100];
006
        for(int i=0;i<=99;i++)
007
        {
008
            sortedArray[i] = i;
009
        }
010
        int targetValue = 39;
011
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
012
        Assert.AreEqual(39, targetIndex, "Expected to find the target at index 39");
013
    }
014
 
015
    [Test]
016
    public void HundredItems_TargetDoesNotExists_Success()
017
    {
018
        int[] sortedArray = new int[100];
019
        for (int i = 0; i <= 99; i++)
020
        {
021
            sortedArray[i] = i;
022
        }
023
        int targetValue = 139;
024
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
025
        Assert.AreEqual(-1, targetIndex, "Expected to not find the target");
026
    }
027
 
028
    [Test]
029
    public void UnEvenItems_TargetExists_Success()
030
    {
031
        int[] sortedArray = new int[99];
032
        for (int i = 0; i <= 98; i++)
033
        {
034
            sortedArray[i] = i;
035
        }
036
        int targetValue = 50;
037
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
038
        Assert.AreEqual(sortedArray[targetIndex], targetValue, "Expected to find the target");
039
    }
040
 
041
    [Test]
042
    public void HundredItems_TargetDoesNotExists_ButWithinRange_Success()
043
    {
044
        int[] sortedArray = new int[100];
045
        for (int i = 0; i <= 99; i++)
046
        {
047
            sortedArray[i] = i*2;
048
        }
049
        int targetValue = 139;
050
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
051
        Assert.AreEqual(-1, targetIndex, "Expected to not find the target");
052
    }
053
 
054
 
055
 
056
    [Test]
057
    public void TenThousandItems_TargetExists_Success()
058
    {
059
        int[] sortedArray = new int[10000];
060
        for (int i = 0; i <= 9999; i++)
061
        {
062
            sortedArray[i] = i * 3;
063
        }
064
        int targetValue = 1234*3;
065
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
066
        Assert.AreEqual(sortedArray[targetIndex],targetValue, "Expected to find the target");
067
    }
068
 
069
 
070
    [Test]
071
    public void EdgeTargetLow_Success()
072
    {
073
        int[] sortedArray = new int[100];
074
        for (int i = 0; i <= 99; i++)
075
        {
076
            sortedArray[i] = i * 2;
077
        }
078
        int targetValue = 0;
079
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
080
        Assert.AreEqual(sortedArray[0], targetValue, "Expected to find the target");
081
    }
082
 
083
    [Test]
084
    public void EdgeTargetHigh_Success()
085
    {
086
        int[] sortedArray = new int[100];
087
        for (int i = 0; i <= 99; i++)
088
        {
089
            sortedArray[i] = i * 2;
090
        }
091
        int targetValue = 99 * 2;
092
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
093
        Assert.AreEqual(sortedArray[99], targetValue, "Expected to find the target");
094
    }
095
 
096
 
097
    [Test]
098
    public void TargetBelowLowest_DoesNotExist()
099
    {
100
        int[] sortedArray = new int[100];
101
        for (int i = 0; i <= 99; i++)
102
        {
103
            sortedArray[i] = i * 2;
104
        }
105
        int targetValue = -1;
106
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
107
        Assert.AreEqual(targetIndex,-1, "Expected to not find the target");
108
    }
109
 
110
    [Test]
111
    public void TargetAboveHighest_DoesNotExist()
112
    {
113
        int[] sortedArray = new int[100];
114
        for (int i = 0; i <= 99; i++)
115
        {
116
            sortedArray[i] = i * 2;
117
        }
118
        int targetValue = 99*2+1;
119
        int targetIndex = MyBinarySearch.Entry(sortedArray, targetValue);
120
        Assert.AreEqual(targetIndex, -1, "Expected to not find the target");
121
    }
122

And as an example, the satisfying output from the 10.000 search:

01
target 3702, low 0, high 9999
02
Peaking at:4999
03
target 3702, low 0, high 4999
04
Peaking at:2499
05
target 3702, low 0, high 2499
06
Peaking at:1249
07
target 3702, low 0, high 1249
08
Peaking at:624
09
target 3702, low 625, high 1249
10
Peaking at:937
11
target 3702, low 938, high 1249
12
Peaking at:1093
13
target 3702, low 1094, high 1249
14
Peaking at:1171
15
target 3702, low 1172, high 1249
16
Peaking at:1210
17
target 3702, low 1211, high 1249
18
Peaking at:1230
19
target 3702, low 1231, high 1249
20
Peaking at:1240
21
target 3702, low 1231, high 1240
22
Peaking at:1235
23
target 3702, low 1231, high 1235
24
Peaking at:1233
25
target 3702, low 1234, high 1235
26
Peaking at:1234
27
target 3702, low 1234, high 1234
28

With these tests, I’m pretty confident I was succesfull to the challenge I got from the book, and I’m happy to read on in Beautiful code…

After reading the chapter

I must say I was rather dissapointed. The presented solution was not using a recursive loop at all, but a for loop instead. While recursion is so elegant as you don’t need to take care of a stack yourself… oh well..
must say that except for a few chapters, the rest of the 30 chapters of the book were not that gleaming either. Most of the chapters are clearly from developers who are very proud of their code; that’s ok, but did they always keep their reader in mind?

And with this blog-article, you also have my review about the contents of the book… decide for yourselves! Beautiful Code: Leading Programmers Explain How They Think (Theory in Practice (O’Reilly))Computer Programmer Books)