Computer Science 456/656: Automata and Formal Languages

Spring 2000

The Pumping Lemma for Context-Free Languages

March 8, 2000

Pumping Lemma. Let L be any context-free language. Then, there exists an integer N such that, for any string w in L of length at least N, there exist strings u, v, x, y, z, such that w = xyzuv, y and v are not both empty, and uvixyiz is in L for any non-negative integer i.
Proof of the Pumping Lemma.
Suppose that that L is context-free. Then, since the empty string is not a member of L, there must be a Chomsky Normal Form grammar G that generates L. Let us suppose that G has n variables. Let N = 2n.
Let w be any string in L of length at least N. Let T be the parse tree of w under the grammar G. The variables of T form a binary tree with as many leaves as the length of w, by the property of a CNF grammar that the right-hand-side of a production consists of two variables or one terminal. Let h be the height of T. Since T has at least N leaves, but not more than 2h leaves, we know that h is greater than or equal to n.
Thus, there must be a path of nodes in T, starting at the root (which is S) and going down to a leaf, which contains h+1 variables. Since G has only n variables, by the pigeon-hole principle, there must be at least two copies of the same variable on that path. Let us say that that variable is A. We refer to the two nodes which both are A as the "upper A" and the "lower A." Let x be the substring of w which is the string of terminals at the leaves of the subtree of T rooted at the lower A. Let t be the the substring of w which is the string of terminals at the leaves of the subtree of T rooted at the upper A. Note that x is a proper substring of t, thus t = vxy for strings v and y which are not both empty. Since t is a substring of w, we can write w = utz for some strings u and z. Thus, w = uvxyz.
By the definition of a parse tree, A derives x. But also note, by inspecting the parse tree, that A derives vAy, and that S derives uAz.
We now prove that S derives uviAyiz, by induction on i. For i = 0, we have that S derives uAz. Suppose that S derives uviAyiz, then replace A by vAy, ad we see that S derives uvi+1Ayi+1z, completing the inductive step.
Replacing A by x, we see that S derives uvixyiz for all i, and we are done.
On the other hand, suppose that L contains the empty string. Let L' be the same language with just the empty string removed, which is then also a context-free language. Since L' has the pumping property, L must also.
Using the Pumping Lemma. Let L be the language consisting of all strings of the form anbncn for all non-negative n.
L is not a context-free language.
Proof.
Suppose that that L is context-free. Let N be the number given by the Pumping Lemma. Let w = aNbNcN. Then there must exist strings u, v, x, y, z such that w = xyzuv, y and v are not both empty, and uvixyiz is in L for any non-negative integer i. We will obtain a contradiction.
Consider the total number of each symbol in the strings v and y. If those numbers are not all equal, then xzv does not have equal numbers of each symbol, contradiction. If those numbers are equal, then there must be at least one of each of the three symbols in v and y combined. Thus either v contains the substring ab or y contains the substring bc. Let s = uv2xy2z. If v contains the substring ab, then s contains the substring ba, contradiction. If v contains the substring bc, then s contains the substring cb, contradiction.
We conclude that L cannot be context-free.

Back to Course Page