##
Computer Science 456/656: Automata and Formal Languages

##
Spring 2000

### The Pumping Lemma for Context-Free Languages

### March 8, 2000

**Pumping Lemma.**
Let *L* be any context-free language.
Then, there exists an integer *N* such that, for any string
*w* in *L* of length at least *N*, there exist strings *u*, *v*,
*x*, *y*, *z*,
such that *w = xyzuv*, *y* and *v* are not both empty, and
*uv*^{i}xy^{i}z is in *L* for any non-negative integer *i*.

**Proof of the Pumping Lemma.**

Suppose that that *L* is context-free. Then, since the empty string
is not a member of *L*, there must be a Chomsky Normal Form
grammar *G*
that generates *L*. Let us suppose that *G* has *n*
variables.
Let *N = 2*^{n}.

Let *w* be any string in *L* of length at least *N*.
Let *T* be the
parse tree of *w* under the grammar *G*.
The variables of *T* form
a binary tree with as many leaves as the length of *w*,
by the property of a CNF grammar that the right-hand-side
of a production consists of two variables or one terminal.
Let *h* be the height of *T*. Since *T* has at
least *N* leaves, but
not more than *2*^{h} leaves, we know that *h*
is greater than
or equal to *n*.

Thus, there must be a path of nodes in *T*, starting at the
root (which is
*S*) and going down to a leaf, which contains *h+1*
variables. Since
*G* has only *n* variables, by the pigeon-hole principle,
there must be at least two copies of the same variable on that path.
Let us say that that variable is *A*.
We refer to the two nodes which
both are *A* as the "upper *A*" and the "lower *A*."
Let *x* be
the substring of *w* which is the string of terminals at the leaves
of the subtree of *T* rooted at the lower *A*.
Let *t* be the
the substring of *w* which is the string of terminals at the leaves
of the subtree of *T* rooted at the upper *A*.
Note that *x* is a proper
substring of *t*,
thus *t = vxy* for strings *v* and *y* which are not
both empty.
Since *t* is a substring of *w*, we can write *w = utz*
for some strings *u* and *z*. Thus, *w = uvxyz*.

By the definition of a parse tree, *A* derives *x*.
But also note,
by inspecting the parse tree, that
*A* derives *vAy*, and that *S* derives *uAz*.

We now prove that
*S* derives *uv*^{i}Ay^{i}z,
by induction on ^{i}.
For *i = 0*, we have that *S* derives *uAz*.
Suppose that
*S* derives *uv*^{i}Ay^{i}z,
then replace *A* by *vAy*,
ad we see that
*S* derives *uv*^{i+1}Ay^{i+1}z,
completing the
inductive step.

Replacing *A* by *x*, we see that
*S* derives *uv*^{i}xy^{i}z
for all *i*, and we are done.

On the other hand, suppose that *L* contains the empty string.
Let *L'* be the same language with just the empty string removed,
which is then also a context-free language. Since *L'* has the
pumping property, *L* must also.

**Using the Pumping Lemma.**
Let *L* be the language consisting of all strings of the form
*a*^{n}b^{n}c^{n} for all
non-negative *n*.

**L is not a context-free language.**

**Proof.**

Suppose that that *L* is context-free. Let *N* be the
number given
by the Pumping Lemma.
Let *w = a*^{N}b^{N}c^{N}.
Then there must exist
strings *u*, *v*, *x*, *y*, *z* such that
*w = xyzuv*, *y* and *v* are not both empty, and
*uv*^{i}xy^{i}z is in *L* for any
non-negative integer *i*.
We will obtain a contradiction.

Consider the total number of each symbol in the strings *v*
and *y*.
If those numbers are not all equal, then *xzv* does not have equal
numbers of each symbol, contradiction. If those numbers are equal,
then there must be at least one of each of the three symbols in *v*
and *y* combined. Thus either *v* contains the
substring *ab* or
*y* contains the substring *bc*.
Let *s = uv*^{2}xy^{2}z.
If *v* contains the substring *ab*, then *s*
contains the substring *ba*,
contradiction.
If *v* contains the substring *bc*, then *s*
contains the substring *cb*,
contradiction.

We conclude that *L* cannot be context-free.
Back to
Course Page