Friday, November 25, 2011

Introduction to Algorithms - The Reading Notes: Part 7, The Quick Sort

Now comes our famous QuickSort. It's famous because it's quick, so how quick it is? The answer from the theoretical analysis is Θ(n ln n) in average. OK, you may argue that the MergeSort and HeapSort we talked before also run in O(n ln n) , why this one is quicker? The answer is hidden in the big O notation, we usually don't write the constant factor in the function. In fact, QuickSort has a smaller hidden factor than others which distinguishes in practice QuickSort from the others. However, QuickSort can run, in the worst case, in O(n2) time. Now let's see how to write some code to do QuickSort.

As a starting point, we write our unit tests. As usual, we only need to add test cases in SortingTest.h and implement them in SortingTest.cpp.

    CPPUNIT_TEST( testQuickSort_0 );
    CPPUNIT_TEST( testQuickSort_1 );
    CPPUNIT_TEST( testQuickSort_Dec_0 );
    CPPUNIT_TEST( testQuickSort_Dec_1 );
...
    void testQuickSort_0();
    void testQuickSort_1();
    void testQuickSort_Dec_0();
    void testQuickSort_Dec_1();

The implementation of the tests which is quite straightforward, actually, same
as tests for other sorting algorithms except the method used to sort, here we 
use QuickSort::Sort(v): 



    // Setup Array.
...
    QuickSort::Sort(v);
...
    // Test Assertions.

Now if we try to compile, we will get some compiling errors, of course. We do
not have our QuickSort class created. So now it's the time to create it. 



    class QuickSort 
    {
        public:
            static void Sort(std::vector &v, bool bNonDecreasing = true);
        protected:
            static int  Partition(std::vector &v, int p, int r, bool bNonDecreasing);
            static void DoSort(std::vector &v, int p, int r, bool bNonDecreasing);
    };

The QuickSort provides exactly the same interface as other sorting algorithms which is the Sort method. As usual, we allow to sort in both non-decreasing and decreasing orders.

The DoSort method will recursively call itself to sort two sub-arrays partitioned by the method Partition.

Write the QuickSort class, compile it with scons, then run the unit tests. Failed!! Double check the code, then fix all the bugs until all unit tests pass. Below is the resulting code.

void QuickSort::Sort(std::vector &v, bool bNonDecreasing) 
{
    if (v.size() <= 1 )
    {
        return;
    }

    DoSort(v, 0, v.size()-1, bNonDecreasing);
}

int QuickSort::Partition(std::vector &v, int p, int r, bool bNonDecreasing)
{
    int pivot = v[r];
    int i = p-1; // i marks the end of the left partition.

    // j markes the end of the right partition
    for (int j = p; j <= r-1;  j++)
    {
        if ((v[j] < pivot &&  bNonDecreasing) ||
            (v[j] > pivot && !bNonDecreasing))
        {
            i++;
            int t = v[i];
            v[i]  = v[j];
            v[j]  = t;
        }
    }
    v[r] = v[i+1];
    v[i+1] = pivot;

    return i+1;
}

void QuickSort::DoSort(std::vector &v, int p, int r, bool bNonDecreasing)
{
    std::cout << "Sorting sub array: " << p << ":" << r << std::endl;
    if (p < r)
    {
        int q = Partition(v, p, r, bNonDecreasing);
        DoSort(v, p, q-1, bNonDecreasing);
        DoSort(v, q+1, r, bNonDecreasing);
    }
}


About this document ...

ITA Reading Notes This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -html_version 4.0,math -no_navigation -scalable_fonts -split 0 -no_auto_link ITAReadingNotes-Main
The translation was initiated by Bing Han on 2011-11-25

Sunday, November 20, 2011

Introduction to Algorithms - The Reading Notes: Part 6, The Heap Sort

Heap Sort

The basic idea of Heap Sort is to sort with help of a special data structure called heap. A heap is a binary tree where each note in the tree contains a value. If the value of each node is no less than it's children, we call it a Max-Heap. If the value of each node is no more than it's children, we call it a Min-Heap.

Don't be confused by name heap, which is also used commonly as a notion of a region of memory space where program can dynamically allocate.

Heap is very interesting, on one hand, it's a binary tree, so it benefits binary tree properties such as it's height grows logarithmically with the number of nodes it has; on the other hand, it has one important order relationship between parent and child nodes.

As usual, we won't discuss in detail the theoretical aspect of the algorithm, instead, we focus on the implementation of the algorithm.

Before implementing the algorithm, let's first setup the unit test. As we already has our test infrastructure for previous sorting algorithms we discussed, we simply need to add some test cases inside.

For Heap Sort, I'm going to create a class with name HeapSort which provides a Sort method as other sort algorithms. So in the CppUnitTest/SortingTest.h, I add some test cases and their declarations:

CPPUNIT_TEST( testHeapSort_0 );
    CPPUNIT_TEST( testHeapSort_1 );
...
    void testHeapSort_0();
    void testHeapSort_1();

Then I define those tests (methods) in the CppUnitTest/SortingTest.cpp:

void SortingTest::testHeapSort_0()
{
    int iArray[] = {10, 5, 7, -2, 19, 50, 300, 320, 160, 1000};
    std::vector v(iArray, iArray + sizeof(iArray) / sizeof(int));
    HeapSort::Sort(v);
    CPPUNIT_ASSERT_EQUAL(v[0], -2);
    CPPUNIT_ASSERT_EQUAL(v[1], 5);
    CPPUNIT_ASSERT_EQUAL(v[2], 7);
    CPPUNIT_ASSERT_EQUAL(v[3], 10);
    CPPUNIT_ASSERT_EQUAL(v[4], 19);
    CPPUNIT_ASSERT_EQUAL(v[5], 50);
    CPPUNIT_ASSERT_EQUAL(v[6], 160);
    CPPUNIT_ASSERT_EQUAL(v[7], 300);
    CPPUNIT_ASSERT_EQUAL(v[8], 320);
    CPPUNIT_ASSERT_EQUAL(v[9], 1000);
}


void SortingTest::testHeapSort_1()
{
...
}

Now I can start coding the HeapSort class. First, I declare the main interface method, which is Sort for the HeapSort class, then I declare some helper methods Sort will use.

Same as described in the ITA, we will construct a binary tree directly on the input array, by using the index of an item in the array. So item 0 is always the root of the heap tree. Then, item 1 and 2 are two children of item 0, item 3 and 4 are two children of item 1, item 5 and 6 are children of item 2, and so on. In short, given a node i , we know that it's left child is 2*i + 1 and it's right child is (i + 1)*2 , if they exist, where i starts from 0. As for it's parent, it depends. If i = 0 , then it's the root node, so it does not really have a parent, otherwise, the parent node's index can be computed as (i - 1)/2 .

We declare Parent, Left and Right methods to do the computation described above.
Also, the Heap Sort is based on first building a MaxHeap ( in case we need to sort non-decreasingly, otherwise, we build a MinHeap), then each time takes the root (which is the largest item in the heap), then reconstruct the heap. So we will also declare two auxiliary methods BuildMaxHeap and MaxHeapify to do the job. Methods for building and reconstructing MinHeap are also provided. The whole HeapSort class declaration looks like follows:


class HeapSort 
{
    public:
    static void Sort(std::vector &v, bool bNonDecreasing = true);

    protected:
    static int  Parent(const int i) { return ((i > 0) ? (i - 1) >> 1: 0); }
    static int  Left(const int i)   { return (i << 1) + 1; }
    static int  Right(const int i)  { return (i + 1) << 1; }

    static void BuildMaxHeap(std::vector &v);
    static void BuildMinHeap(std::vector &v);
    static void MaxHeapify(std::vector &v, const int i, const int iHeapSize);
    static void MinHeapify(std::vector &v, const int i, const int iHeapSize);
};
Now comes the time to implement those methods. We will implement them in  a bottom-up way. So, first look at the MaxHeapify method, it takes  three parameters, a vector representing the heap (or a binary tree to be  built to a heap) and a integer i  which is an index in the vector, representing a sub-tree rooted at i . Also the heap size is provided as the third parameter. MaxHeapify assumes that two sub-trees rooted at both of i 's children are already MaxHeaps, only i  is possibly violating the MaxHeap property. So the goal is just to make a MaxHeap rooted at i . To this end, it simply recursively floats down the content of node i  until a proper place is found. "Float down" is just an exchange of i  with the maximum of i , Left(i ) and Right(i ). Of cause, in case that the maximum is i  itself, nothing needs to be done. The code reads as follows:


void HeapSort::MaxHeapify(std::vector &v, const int i, const int iHeapSize)
{
    int l = Left(i);
    int r = Right(i);

    int largest = i;

    if ( l < iHeapSize && v[l] > v[i])
    {
        largest = l;
    }

    if ( r < iHeapSize && v[r] > v[largest])
    {
        largest = r;
    }

    if (largest != i)
    {
        int temp = v[largest];
        v[largest] = v[i];
        v[i] = temp;

        MaxHeapify(v, largest, iHeapSize);
    }
}
Next, we will implement BuildMaxHeap. With the help of MaxHeapify, this  is fairly straightforward.  We simply need to first MaxHeapify the sub-trees at the lowest level  (actually the next to lowest, as the lowest are leaf nodes that does not  have any child so trivially they are MaxHeap). As a result, we can  start MaxHeapify the sub-trees from the middle of the vector.  The trick is, given a vector of size n , if we count from the end of the vector, the first item that may have a child always appears no early than the n/2 th item (think why?).  As a result, we have the following simply implementation of BuildMaxHeap:   

void HeapSort::BuildMaxHeap(std::vector &v)
{
    for (int i = (v.size() >> 1); i >= 0; i--)
    {
        MaxHeapify(v, i, v.size());
    }
}
We can implement similar methods for MinHeap too.

Now that we have everything in hand, let's implement the final method Sort. We start by building a MaxHeap (or MinHeap), then each time, we exchange the root of the heap (item 0) with the last item in the heap. This exchange does two things: put the current largest at the end of the vector and put a smaller one at the heap root, breaking heap property. Then we use MaxHeapify to reconstruct the heap.
Once the heap is reconstructed, we have again the largest item at root, then we simply repeat the exchange-reconstruct. The code looks like the following, sorting by non-decreasing or non-increasing order are supported with parameter bNonDecreasing, as other sort algorithms:


void HeapSort::Sort(std::vector &v, bool bNonDecreasing) 
{
    if (v.size() <= 1 )
    {
        return;
    }
    
    if (bNonDecreasing)
    {
        BuildMaxHeap(v);
    }
    else
    {
        BuildMinHeap(v);
    }

    int iHeapSize = v.size();
    
    for (int i = v.size() - 1; i >=1; i--)
    {
        int temp = v[0];
        v[0] = v[i];
        v[i] = temp;

        iHeapSize--;
        if (bNonDecreasing)
        {
            MaxHeapify(v, 0, iHeapSize);
        }
        else
        {
            MinHeapify(v, 0, iHeapSize);
        }
    }
}


About this document ...

ITA Reading Notes This document was generated using the LaTeX2HTML translator Version 2008 (1.71)
Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -html_version 4.0,math -no_navigation -scalable_fonts -split 0 -no_auto_link ITAReadingNotes-Main
The translation was initiated by Bing Han on 2011-11-20

Saturday, November 12, 2011

Fedora 16 Release

Fedora 16 released on November 8, 2011!

I tried to download the 3.5GB DVD image  with firefox under windows. I started the download quite late last night, as it takes about 2 hours to finish, I paused it and shut down my PC then went to bed. I resumed it this morning and it finished later, but surprisingly, the resulted .iso file has 6GB!!

I don't know how the firefox downloader worked on it but I do know, I don't have a DVD on which I can write a 6GB image. However, instead delete the bad file directly, I'm curious to see how bad it is. Interestingly, I can mount it with a WinISO virtual driver and then I can open it. Selecting all items in the image, the total size seems correct, i.e. about 3.5GB.

Of cause, the SHA256 checksum failed. (See here on how to verify the image)

Finally, I switched to Fedora and let wget get the things done and it did it correctly.

So, now I have the DVD image to write and install ! :D