Wednesday, May 11, 2011

Introduction to Algorithms - The Reading Notes: Part 2, Insertion Sort

Insertion Sort is probably the simplest algorithm so it is used in many books as the first example. ItA discusses this minion algorithm right in the Part I Foundations, Chapter 2 Getting Started.

The aim of the algorithm is to sort a set of keys in a certain order: You have some keys as input then the algorithm will give you a ordered list of your keys. Here is how the insertion sort works: suppose all the keys are in an input list, then the algorithm starts working on the the keys one by one in a certain order, for example from left to right. Each time it works on a key, it save it's value, then it looks all keys already processed (they are to the left of the current key), from right to left, by comparing the current key with them and moving the processed keys one place to the right if it should be placed after the current key, according to the ordering rule. This process ends until when an already processed key that should be placed to the left of the current key or there is no more already processed keys. Then we put the current key in the place of the last already processed key we found. Now that we find a good place for the current key, we can start processing the next unprocessed key.

The algorithm is simple, code the pseudo code in C++ is straightforward. What I'd to say here is how I set up the whole project with CppUnit and SCons.

CppUnit is very handy for unit test, especially when I'm writing libraries. Just as in this ITA project, I'd like to put all the algorithms in some libraries, I don't need to write a separate main cpp file just to test these algorithms, instead, I setup a CppUnit testing framework for this. It's easier to maintain. For how to install/use CppUnit, here is the CppUnit Documentation.

First, I implement the Insertion Sort algorithm in InsertSort.cpp and .h files and I put them in a folder named Sorting. With TDD in mind, I write only the class declaration but leaving the method empty.

Then I setup the CppUnit framework: I need three files: SortingTest.h and .cpp, where I'll implement test code for all the sorting algorithms I'll write; TestMain.cpp. where is the "main" function of the test code. The beauty of using CppUnit is this "main" function has only infrastructure interest, it's not related to the particular test cases. So once setup, I don't need to change it. I'll only work on the SortingTest class.

Now I will write a test case:

   1:  // Test normal case.
   2:  void SortingTest::testInsertSort_0()
   3:  {
   4:      int iArray[] = {10, 5, 7, -2, 19, 50, 300, 320, 160, 1000};
   5:      std::vector<int> v(iArray, iArray + sizeof(iArray) / sizeof(int));
   6:      Common::Algorithm::Sort::InsertSort(v);
   7:      CPPUNIT_ASSERT_EQUAL(v[0], -2);
   8:      CPPUNIT_ASSERT_EQUAL(v[1], 5);
   9:      CPPUNIT_ASSERT_EQUAL(v[2], 7);
  10:      CPPUNIT_ASSERT_EQUAL(v[3], 10);
  11:      CPPUNIT_ASSERT_EQUAL(v[4], 19);
  12:      CPPUNIT_ASSERT_EQUAL(v[5], 50);
  13:      CPPUNIT_ASSERT_EQUAL(v[6], 160);
  14:      CPPUNIT_ASSERT_EQUAL(v[7], 300);
  15:      CPPUNIT_ASSERT_EQUAL(v[8], 320);
  16:      CPPUNIT_ASSERT_EQUAL(v[9], 1000);
  17:  }


The aim of the test case is to test the basic function of the algorithm, I initialize a vector with 10 integers, then I call the InsertSort(), finally, I test that the vector is sorted by examining that each element has the expected value. CppUnit provides a macro CPPUNIT_ASSERT_EQUAL for this kind of test.

Now that I have the algorithm code (although not yet complete), I have the testing code, I can compile both and run my test. I use SCons for this. SCons is very convenient yet powerful.  Basically, you write a configuration file usually named as SConstruct, inside you tell scons how to build your project.

   1:  Library(target='Sorting/Sorting', \ 
   2:          source=Glob('Sorting/*.cpp')) 
   3:  Program(target='CppUnitTest/TestMain', \ 
   4:          source=Glob('CppUnitTest/*.cpp'), \ 
   5:          LIBS=['cppunit','Sorting'], \ 
   6:          LIBPATH='Sorting', \ 
   7:          CPPPATH=['Sorting', 'CppUnitTest']) 

The Library keyword at line 1 specifies a library to build, the source code are all the .cpp files in the Sorting folder.

The test code should be built with the Program keyword, as in line 3. It needs all source code in the CppUnitTest folder and two libraries: the CppUnit library and the Sorting library that I'm testing. The Sorting library is in my local folder so I need to tell SCons where to find it with LIBPATH keyword as in line 6 above. Finally, the test code need the Sorting library header to compile, I'll tell SCons to find the Sorting headers in the Sorting folder with CPPPATH keyword as in line 7.

OK, now I save the Sconstruct file then I type scons. It compiles my Sorting library and the testing code.

I can run the test immediately and it fails evidently. Now you can start to write the InsertSort() method, but personally, I will write some more test cases before writing the InsertSort() method. For example, I will write one case when the input is empty, another when all inputs have the same value, etc.

Finally, I write my InsertSort() method and correct all failing cases, once all test cases pass, I'm quite confident that my InsertSort() works correctly.

Here is the how the InsertSort() is implemented:

   1:  void Sort::InsertSort(std::vector<int> &v)  
   2:  { 
   3:      int key = 0; 
   4:    
   5:      // Trivial case, nothing to do.  
   6:      if (v.size() <= 1) 
   7:      {    
   8:          return; 
   9:      }    
  10:    
  11:      for (int j = 1; j < v.size(); j++) 
  12:      {    
  13:          key = v[j]; 
  14:          int i = j-1; 
  15:          while (i >= 0 && v[i] > key)                                                                                             
  16:          {    
  17:              v[i+1] = v[i]; 
  18:              i--; 
  19:          }    
  20:          v[i+1] = key; 
  21:      }    
  22:  } 
  23:    

Modification: When I started to write the merge sort, I found it could be better to rename the class and the methods, and finally, I ended up with a solution that gives each sorting algorithm a class and they all have the same Sort() method declaration.

   1:  void InsertSort::Sort(std::vector<int> &v, bool bNonDecreasing) 




Monday, May 9, 2011

Introduction to Algorithms - The Reading Notes: Part 1, Introduction

I started reading this book recently and I will share my understandings and thoughts with you.

I will discuss the problems in the book and I'd like to inject in particular more on the software engineering aspects of those algorithms: how to implement them so they can be used in a real-life software project to solve practical problems.

Finally, I will code those algorithms in C++ and build the code under Fedora Linux with Scons. I will also adopt the TDD methodology (TDD stands for Test Driven Development) and use CppUnit for unit testing.

Here are some useful links for this project:
The code formatter, it formats the source code into HTML so they look better when put in the blog. Currently it formats C#, VB and some other languages but not C/C++. I'll continue looking for one but before I find something, I'll use this one for the moment.

The SCons user's guide.

The CppUnit Documentation.



Saturday, May 7, 2011

A Real-life Story about Return by Reference - Part II

Now we know that we should follow some "best practices" when dealing with return by reference:


1. Use toto aToto = getToto() or toto aToto(getToto()), don't write toto aToto; aToto = getToto();


2. Don't return a reference to a local variable, except when you know what you are doing 


3. Let member function return a reference to a member variable when the performance is really an issue, and call it with toto &aTotoRef = aToto.getTotoRef() to really take advantage of return by reference. Otherwise, return by value. Always take care of the variable holding the reference result, use it only during the life time of the object itself. 


There is one last thing you may want to know, that is the RVO does not work all the time, as a result it cost you one more construction/destruction of your object when you return it by value. I won't say it's a penalty you have to pay, it's just the fair cost you have to pay as the compiler can not help you more because your code asks for more. So when RVO does not work? Well, when the compiler can not decide which is the object it needs to return at compile time. Here is an example:

toto getTotoWithParam(bool first)
{
    if (first)
    {
        toto aToto;
        return aToto;
    }
    else
    {
        toto bToto;
        return bToto;
    }
}

Then, when calling this function, it will show you that the RVO is not apllied.

toto a = getTotoWithParam(false);
toto constructor.
toto copy constructor.
toto destructor.
toto destructor.



The object is constructed inside the function then used to copy construct the object holding the result.  Compared with the getToto() function, one more object (temporary object) is created.

toto aToto = getToto():
toto constructor.
toto destructor.

So that all about the return by value/reference that I want to tell you, I hope you now know what to do given a certain situation.

Good luck and I'd like to know what you are thinking about this.