Monday, August 4, 2014

Catalan number applications - balanced paranthesis, number of binary search trees etc..


In combinatorial mathematics, the Catalan numbers form a sequence of natural numbers that occur in various counting problems, often involving recursively-defined objects.

The first Catalan numbers for n = 0, 1, 2, 3, … are
1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, 742900, 2674440, 9694845, 35357670, 129644790, 477638700, 1767263190, 6564120420, 24466267020, 91482563640, 343059613650, 1289904147324, 4861946401452, ...
Some of the applications are:
  • Cn is the number of Dyck words of length 2n. A Dyck word is a string consisting of n X's and n Y's such that no initial segment of the string has more Y's than X's (see also Dyck language). For example, the following are the Dyck words of length 6:
         XXXYYY     XYXXYY     XYXYXY     XXYYXY     XXYXYY.
  • Re-interpreting the symbol X as an open parenthesis and Y as a close parenthesis, Cn counts the number of expressions containing n pairs of parentheses which are correctly matched:
        ((()))     ()(())     ()()()     (())()     (()())
  • Cn is the number of different ways n + 1 factors can be completely parenthesized (or the number of ways of associating n applications of a binary operator). For n = 3, for example, we have the following five different parenthesizations of four factors:
        ((ab)c)d     (a(bc))d     (ab)(cd)     a((bc)d)     a(b(cd))
  • Successive applications of a binary operator can be represented in terms of a full binary tree.[3] (A rooted binary tree is full if every vertex has either two children or no children.) It follows that Cn is the number of full binary trees with n + 1 leaves:
Catalan number binary tree example.png


If we want to print all the possible balanced parenthesis - we can solve by below recursive function which prints either "(" or ")" recursively - but while printing make ")" it makes sure that the number of left "(" printed are always >= to number of right parenthesis.

Below is the corresponding code

Code Snippet
  1. string[] GetAllPossibleParanthesis_2(int N)
  2.         {
  3.             var r = this.GetAllPossibleParanthesis_2(N, N, "");
  4.             return r;
  5.         }
  6.         string[] GetAllPossibleParanthesis_2(int left, int right, string s)
  7.         {
  8.             if((left == 0) && (right==0))
  9.             {
  10.                 return new string[] { s };
  11.             }
  12.             Assert.IsTrue(left <= right);
  13.             List<string> results = new List<string>();
  14.             if (left > 0)
  15.             {
  16.                 var r = this.GetAllPossibleParanthesis_2(left - 1,
  17.                     right, s + "[");
  18.                 results.AddRange(r);
  19.             }
  20.             if(left <= (right-1))
  21.             {
  22.                 var r = this.GetAllPossibleParanthesis_2(left,
  23.                     right-1, s + "]");
  24.                 results.AddRange(r);
  25.             }
  26.             return results.ToArray();
  27.         }

Example of how the recursive method generates balanced parenthesized string for 3 pairs


Solving the problem identifying the Catalan recursive relationship

The Catalan numbers satisfy the recurrence relation
C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\sum_{i=0}^{n}C_i\,C_{n-i}\quad\text{for }n\ge 0;
for ex, Cn is the number of different ways n + 1 factors can be completely parenthesized
i.e; C0 = 1 (1 factor -> a)
      C1 = 1 (2 factors -> (ab))
      C2 = 2 (3 factors -> (a(bc)), ((ab)c)
Or, C2 = C0*C1 + C1*C0 = { a(bc), (ab)c }
      C3 = 5 (4 factors -> (a(b(cd)), (a((bc)d)), ((ab)(cd)), (((ab)c)d), ((a(bc))d)
Or, C3 = c0*c2 + c1*c1+c2*c0 { a(b(cd)), a((bc)d), (ab)(cd), (a(bc))d, ((ab)c)d

Note: its probably more intuitive from programming perspective if we start the sequence with notation c1 than c0 where c1 represents different ways '1' factor can be completely parenthesized and so on...

Below is the corresponding code where factors are sent as string to the method:
Code Snippet
  1. string[] CatalanNumber_GeneratingParanthesizedFactors(string s)
  2.         {
  3.             s.ThrowIfNullOrWhiteSpace("s");
  4.             if(s.Length == 1)
  5.             {
  6.                 return new string[] {s};
  7.             }
  8.             var r = this.CatalanNumber_GeneratingParanthesizedFactorsRecursive(
  9.                 s);
  10.             return r;
  11.         }
  12.         string[] CatalanNumber_GeneratingParanthesizedFactorsRecursive(string s)
  13.         {
  14.             if(s.Length == 1)
  15.             {
  16.                 return new string[] {s};
  17.             }
  18.             if(s.Length == 2)
  19.             {
  20.                 string r = "(" + s + ")";
  21.                 return new string[] { r };
  22.             }
  23.             List<string> results = new List<string>();
  24.             for (int i = 1; i < s.Length; i++)
  25.             {
  26.                 var r1 = this.CatalanNumber_GeneratingParanthesizedFactorsRecursive(
  27.                     s.Substring(0, i));
  28.                 var r2 = this.CatalanNumber_GeneratingParanthesizedFactorsRecursive(
  29.                     s.Substring(i));
  30.                 foreach(var s1 in r1)
  31.                 {
  32.                     foreach(var s2 in r2)
  33.                     {
  34.                         string r = "(" + s1 + s2 + ")";
  35.                         results.Add(r);
  36.                     }
  37.                 }
  38.             }
  39.             return results.ToArray();
  40.         }
 

 All possible balanced parenthesis using Catalan recursive relationship

Cn counts the number of expressions containing n pairs of parentheses which are correctly matched
1, 1, 2, 5, 14, 42, 132, etc...
i.e, c0 = empty (considered one for generalization)
      c1 = []
      c2 = [[]], [][]
      c3 = [[[]]], [[][]], [[]][], [][[]], [][][]
Essentially one need to notice that any correct (balanced) balanced parenthesized strings can be represented with [c']c'' where c' and c'' are valid strings.
i.e, it satisfies Catalan recursive relationship :)
Cn = c0*Cn-1 + C1*Cn-2+....+Cn-2*C1+Cn-1*C0
or in other words,
Cn string = [C0]*Cn-1+[C1]*Cn-2+......+[Cn-2]*C1+[Cn-1]*C0
 
Below is the corresponding code:
 
Code Snippet
  1. string[] CatalanNumber_AllPossibleParanthesis(int N)
  2.         {
  3.             Assert.IsTrue(N >= 0);
  4.             if(N==0)
  5.             {
  6.                 return new string[] {" "};
  7.             }
  8.             if(N == 1)
  9.             {
  10.                 return new string[] { "[]" };
  11.             }
  12.             
  13.             List<string> results = new List<string>();
  14.             for (int i = 0; i <= (N - 1); i++)
  15.             {
  16.                 var r1 = this.CatalanNumber_AllPossibleParanthesis(i);
  17.                 var r2 = this.CatalanNumber_AllPossibleParanthesis(N - 1 - i);
  18.                 
  19.                 foreach (var s1 in r1)
  20.                 {
  21.                     foreach (var s2 in r2)
  22.                     {
  23.                         //[c']c" - will be of the form
  24.                         string r = "[" + s1.Trim() + "]" + s2.Trim();
  25.                         results.Add(r);
  26.                     }
  27.                 }
  28.             }
  29.             return results.ToArray();
  30.         }

Number of full binary trees, binary search trees, binary trees etc.

Cn is the number of full binary trees with n + 1 leaves:
Catalan number binary tree example.png

Note: basically only the structure is taken into account, not the data.

Then, one can say C0 = 1 (number of binary trees with '1' leaf)
                              C1 = 1 (number of binary trees with '2' leaves)
                              C3 = C0*C1 (number of binary trees on left with '1' leaf and on right with '2' leaves) + C1*C0 (vice versa)
So, one can define Cn = C0*Cn-1 ('1' leaf on left and 'n' leaves on right) + c1*Cn-2+....+Cn-2*C1 (n-1 leaves on left and 2 on right)+Cn-1*C0 (n leaves on left and 1 leaf on right)

Cn is the total number of binary search trees that can be constructed from given 'N' distinct elements

i.e, C0 = 1 (empty tree)
      C1 = 1 (with one element)
      C2 = 2 (with two elements)
      C3 = C0*C2 (none on left and 2 on right) + C1*C1 (using one element on both sides)+C2*C0

i.e, it satisfies Catalan recursive function, and Cn is nothing but Catalan number.

No. of structurally equivalent binary trees with given 'N' nodes is Cn as well. And by taking the data into consideration, then total number of binary trees are N!*nthCatalanNumber.

Below is the code to generate total number of binary search trees (Catalan Number):
 

Code Snippet
  1. public int DP_Catalan_TotalBinarySearchTrees(int N)
  2.         {
  3.             if (N == 0)
  4.             {
  5.                 return 1;
  6.             }
  7.             if (N == 1)
  8.             {
  9.                 return 1;
  10.             }
  11.             int[] DP_Catalan_Cache = new int[N + 1];
  12.             DP_Catalan_Cache[0] = 1;
  13.             DP_Catalan_Cache[1] = 1;
  14.             for(int i = 2;i<=N; i++)
  15.             {
  16.                 int sum = 0;
  17.                 for(int j=0;j<i;j++)
  18.                 {
  19.                     sum += (DP_Catalan_Cache[j]*DP_Catalan_Cache[i-1-j]);
  20.                 }
  21.                 DP_Catalan_Cache[i] = sum;
  22.             }
  23.             return DP_Catalan_Cache[N];
  24.         }
  25.         public int GetCatalan_TotalBinarySearchTrees(int N)
  26.         {
  27.             if(N==0)
  28.             {
  29.                 return 1;
  30.             }
  31.             if(N==1)
  32.             {
  33.                 return 1;
  34.             }
  35.             int sum = 0;
  36.             for(int i = 1;i<=N;i++)
  37.             {
  38.                 int totalLeftTrees = this.GetCatalan_TotalBinarySearchTrees(i - 1);
  39.                 int totalRightTrees = this.GetCatalan_TotalBinarySearchTrees(N - i);
  40.                 sum += (totalLeftTrees * totalRightTrees);
  41.             }
  42.             return sum;
  43.         }

Below are the corresponding Unit Tests

 
Code Snippet
  1. [TestCategory(Constants.Recursion)]
  2.         [TestMethod]
  3.         public void AllPossibleParanthesisTests()
  4.         {
  5.             int[] CatalanNumbers = new int[] { 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796 };
  6.             for(int i=1;i<=10;i++)
  7.             {
  8.                 var results2 = this.GetAllPossibleParanthesis_2(i);
  9.                 Assert.AreEqual(results2.Length, CatalanNumbers[i - 1]);
  10.                 var results = this.CatalanNumber_AllPossibleParanthesis(i);
  11.                 Assert.AreEqual(results.Length, CatalanNumbers[i - 1]);
  12.                 Debug.WriteLine("-----------------------------------------------");
  13.                 foreach (string s in results2)
  14.                 {
  15.                     Assert.IsTrue(results.Any(rs => rs.Equals(s)));
  16.                 }
  17.             }
  18.         }
Code Snippet
  1. [TestMethod]
  2.         public void CatalanNumber_GeneratingParanthesizedFactorsTests()
  3.         {
  4.             var CatalanNumbers = new int[] { 1, 1, 2, 5, 14, 42, 132, 429,
  5.                 1430, 4862, 16796 };
  6.             string input = "";
  7.             for (int i = 1; i <= 10; i++)
  8.             {
  9.                 input += i;
  10.                 var results2 = this.CatalanNumber_GeneratingParanthesizedFactors(input);
  11.                 Assert.AreEqual(results2.Length, CatalanNumbers[input.Length-1]);
  12.                 Debug.WriteLine("-----------------------------------------------");
  13.                 foreach (string ss in results2)
  14.                 {
  15.                     Debug.WriteLine(ss);
  16.                 }
  17.             }
  18.             string s = "a";
  19.             var r = this.CatalanNumber_GeneratingParanthesizedFactors(s);
  20.             Assert.AreEqual(r.Length, 1);
  21.             Assert.AreEqual(s, r[0]);
  22.             s = "ab";
  23.             r = this.CatalanNumber_GeneratingParanthesizedFactors(s);
  24.             Assert.AreEqual("(ab)", r[0]);
  25.             s = "abc";
  26.             r = this.CatalanNumber_GeneratingParanthesizedFactors(s);
  27.             string[] output = new string[] { "(a(bc))", "((ab)c)" };
  28.             Assert.AreEqual(2, r.Length);
  29.             foreach(var o in output)
  30.             {
  31.                 Assert.AreEqual(1, r.Where(rs => (rs == o)).Count());
  32.             }
  33.             s = "abcd";
  34.             r = this.CatalanNumber_GeneratingParanthesizedFactors(s);
  35.             
  36.             output = new string[] { "(a(b(cd)))", "(a((bc)d))", "((ab)(cd))", "(((ab)c)d)", "((a(bc))d)"};
  37.             Assert.AreEqual(5, r.Length);
  38.             foreach (var o in output)
  39.             {
  40.                 Assert.AreEqual(1, r.Where(rs => (rs == o)).Count());
  41.             }
  42.         }
Code Snippet
  1. [TestMethod]
  2.         public void Catalan_TotalBinarySearchTreesTests()
  3.         {
  4.             var CatalanNumbers = new int[] { 1, 1, 2, 5, 14, 42, 132, 429,
  5.                 1430, 4862, 16796 };
  6.             for(int i =0;i<CatalanNumbers.Length;i++)
  7.             {
  8.                 int n = this.GetCatalan_TotalBinarySearchTrees(i);
  9.                 Assert.AreEqual(CatalanNumbers[i], n);
  10.                 int n1 = this.DP_Catalan_TotalBinarySearchTrees(i);
  11.                 Assert.AreEqual(n, n1);
  12.             }
  13.         }

No comments:

Post a Comment