For example, if I'm given a series of value -3, 2, 5, 6, -1, 2, -5, 5, then I find that the sub-series 2,5,6 gives 13 which quite large, but 2, 5, 6, -1, 2 sums up even more, it gives 14. If I look further, I find that the sub-series 2, 5, 6, -1, 2, -5, 5 gives the same maximum of 14. So we may have more than one maximum and our goal is to find any one of them.
ITA mentions the brutal force algorithm that has Ω(n^2) running time, it says the algorithm is to calculate sums of all sub-series possible (there is O(n^2) number of sub-series) and if we can evaluate each sub-series in constant time then we can have Ω(n^2) algorithm, but it does not say how. ITA introduces also a very fast (O(nlgn) algorithm based on DaC (Divide and Conquer).
I understand that this ITA chapter is for DaC so no need to talk more details on this brutal force algorithm and it's put in the exercise. So, let's finish the Ω(n^2) brutal force algorithm first then we will see how to implement the DaC one.
So you already know the number of sub-series is O(n^2), suppose we have a n-length series, but why? The answer is this number equals to the number of ways you have when you try to select any two numbers in the series (those two numbers defines the start and end of the sub-series), that is n(n-1)/2.
Now comes to the question how we calculate the sum of each sub-series. If we just simply calculate sums for all sub-series, then we have to sum n sub-series of length 1, n-1 sub-series of length 2, n-2 sub-series of length 3, ..., 1 sub-series of length n. In total, we need to do add n+2(n-1)+3(n-2)...+ n times, which is equivalent to sum(for i = 1 to n) (i*(n-i+1)), which renders our algorithm O(n^3) in terms of time consumption.
Now we need to do more to get a faster algorithm. Just one more glance at our previous O(n^3) algorithm we find that the sub-series have many common parts and we add those parts repeatedly many times once for each sub-series. That's the key to reduce running time.
I will now use a n*n matrix to save sums of each sub-series, say, [0-1], [0-2]... [1,2], [1,3]... Also each single element represents also a sub-series of length 1, they are also stored in the matrix [0,0], [1,1] ... So only the upper triangle of the matrix is used. The advantage to save the sums in the matrix is that it's very easy to find a sub-series that can be used to build the next sum, because we know that:
sum(sub[i, j]) = sum(sub[i, j-1]) + v[j] = sum(sub[i, j-1]) + sub[j, j]
OK, that's enough for the theory, let's see the code. But wait a moment, before implementing the algorithm, let's put in place the unit test!
I'll use another class for the MaxSubArray unit test, separated from that of the sort algorithms. With CppUnit, I just simply create a .h and .cpp for the new tests, no other configuration is needed.
MiscTest.h
1: #ifndef MISCTEST_H
2: #define MISCTEST_H
3: #include <cppunit/extensions/HelperMacros.h>
4:
5:
6: class MiscTest : public CppUnit::TestFixture {
7:
8: CPPUNIT_TEST_SUITE( MiscTest);
9: CPPUNIT_TEST( testMaxSubArray_0);
10: CPPUNIT_TEST_SUITE_END();
11:
12: private:
13: public:
14:
15: void setUp();
16: void tearDown();
17:
18: // Test cases.
19: void testMaxSubArray_0();
20:
21: };
22: #endif
MiscTest.cpp
1: #include "MiscTest.h"
2: #include "MaxSubArray.h"
3:
4: using namespace Common::Algorithm::MaxSubArray;
5:
6: // Registers the fixture into the 'registry'
7: CPPUNIT_TEST_SUITE_REGISTRATION( MiscTest );
8:
9: void MiscTest::setUp()
10: {
11: }
12: void MiscTest::tearDown()
13: {
14: }
15:
16: // Test normal case.
17: void MiscTest::testMaxSubArray_0()
18: {
19: int iArray[] = {-10, 5, 7, -2, 19, 50, -30, 32, -24, 18};
20: std::vector<int> v(iArray, iArray + sizeof(iArray) / sizeof(int));
21:
22: int rStart = 0;
23: int rEnd = 0;
24: int rSum = 0;
25: MaxSubArray::CalcBF(v, rStart, rEnd, rSum);
26:
27: CPPUNIT_ASSERT_EQUAL( 1, rStart);
28: CPPUNIT_ASSERT_EQUAL( 7, rEnd);
29: CPPUNIT_ASSERT_EQUAL(81, rSum);
30:
31: }
Finally, I can start implementing the algorithm. Then I run the unit test each time I think my algorithm will work, then fix bugs and retry. Here is the final version that passes the test above:
1: void MaxSubArray::CalcBF(std::vector<int> &v, int &oStart, int &oEnd, int &oSum)
2: {
3: // Trivial cases, size = 0 or 1.
4: if (v.size() == 0)
5: {
6: oStart = 0;
7: oEnd = 0;
8: oSum = 0;
9: return;
10: }
11:
12: if (v.size() == 1)
13: {
14: oStart = 0;
15: oEnd = 0;
16: oSum = v[0];
17: return;
18: }
19:
20: oSum= 0;
21: oStart = 0;
22: oEnd = 0;
23: std::vector<int> vResult;
24:
25: // Initialize vResult.
26: for (size_t i = 0; i < v.size(); i++)
27: {
28: for (size_t j = 0; j < v.size(); j++)
29: {
30: if (i == j)
31: {
32: vResult.push_back(v[i]);
33: }
34: else
35: {
36: vResult.push_back(0);
37: }
38: }
39: }
40:
41:
42: // Calculate the results.
43: // We use a matrix to store sum of all intervals
44: // 00 01 02 03 04
45: // 10 11 12 13 14
46: // 20 21 22 23 24
47: // 30 31 32 33 34
48: // 40 41 42 43 44
49: // M[i,i] = v[i]
50: // M[i,j] = sum(i,j) for 0 <= i < j <= v.size().
51: // It takes O(n^2) time.
52: for (size_t i = 0; i < v.size(); i++)
53: {
54: for (size_t j = 0; j < v.size(); j++)
55: {
56: if (i < j)
57: {
58: vResult[ i * v.size() + j] = vResult[ i * v.size() + j - 1 ] +
59: vResult[ j * v.size() + j];
60: if (oSum < vResult[i * v.size() +j])
61: {
62: oSum = vResult[i * v.size() + j];
63: oStart = i;
64: oEnd = j;
65: }
66: }
67: }
68: }
69: }
Let discuss the faster version that takes only O(nlgn) time, in the next post. Thank you and like to know your feedback.
No comments:
Post a Comment