# Pumping lemma for context-free languages

In computer science, in particular in formal language theory, the **pumping lemma for context-free languages**, also known as the **Bar-Hillel** **lemma**,^{[1]} is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.

The pumping lemma can be used to construct a proof by contradiction that a specific language is *not* context-free. Conversely, the pumping lemma does not suffice to guarantee that a language *is* context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.

## Formal statement

If a language $L$ is context-free, then there exists some integer $p\geq 1$ (called a "pumping length"^{[2]}) such that every string $s$ in $L$ that has a length of $p$ or more symbols (i.e. with $|s|\geq p$) can be written as

- $s=uvwxy$

with substrings $u,v,w,x$ and $y$, such that

- 1. $|vx|\geq 1$,
- 2. $|vwx|\leq p$, and
- 3. $uv^{n}wx^{n}y\in L$ for all $n\geq 0$.

Below is a formal expression of the Pumping Lemma.

${\begin{array}{l}(\forall L\subseteq \Sigma ^{*})\\\quad ({\mbox{context free}}(L)\Rightarrow \\\quad ((\exists p\geq 1)((\forall s\in L)((|s|\geq p)\Rightarrow \\\quad ((\exists u,v,w,x,y\in \Sigma ^{*})(s=uvwxy\land |vx|\geq 1\land |vwx|\leq p\land (\forall n\geq 0)(uv^{n}wx^{n}y\in L)))))))\end{array}}$

## Informal statement and explanation

The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least $p$, where $p$ is a constant—called the *pumping length*—that varies between context-free languages.

Say $s$ is a string of length at least $p$ that is in the language.

The pumping lemma states that $s$ can be split into five substrings, $s=uvwxy$, where $vx$ is non-empty and the length of $vwx$ is at most $p$, such that repeating $v$ and $x$ the same number of times ($n$) in $s$ produces a string that is still in the language. It is often useful to repeat zero times, which removes $v$ and $x$ from the string. This process of "pumping up" additional copies of $v$ and $x$ is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having $p$ equal to the maximum string length in $L$ plus one. As there are no strings of this length the pumping lemma is not violated.

## Usage of the lemma

The pumping lemma is often used to prove that a given language L is non-context-free, by showing that arbitrarily long strings s are in L that cannot be "pumped" without producing strings outside L.

For example, the language $L=\{a^{n}b^{n}c^{n}|n>0\}$ can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that L is context free. By the pumping lemma, there exists an integer p which is the pumping length of language L. Consider the string $s=a^{p}b^{p}c^{p}$ in L. The pumping lemma tells us that s can be written in the form $s=uvwxy$, where u, v, w, x, and y are substrings, such that $|vx|\geq 1$, $|vwx|\leq p$, and $uv^{i}wx^{i}y\in L$ for every integer $i\geq 0$. By the choice of s and the fact that $|vwx|\leq p$, it is easily seen that the substring vwx can contain no more than two distinct symbols. That is, we have one of five possibilities for vwx:

- $vwx=a^{j}$ for some $j\leq p$.
- $vwx=a^{j}b^{k}$ for some j and k with $j+k\leq p$
- $vwx=b^{j}$ for some $j\leq p$.
- $vwx=b^{j}c^{k}$ for some j and k with $j+k\leq p$.
- $vwx=c^{j}$ for some $j\leq p$.

For each case, it is easily verified that $uv^{i}wx^{i}y$ does not contain equal numbers of each letter for any $i\neq 1$. Thus, $uv^{2}wx^{2}y$ does not have the form $a^{i}b^{i}c^{i}$. This contradicts the definition of L. Therefore, our initial assumption that L is context free must be false.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free.

On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example

- $L=\{b^{j}c^{k}d^{l}|j,k,l\in \mathbb {N} \}\cup \{a^{i}b^{j}c^{j}d^{j}|i,j\in \mathbb {N} ,i\geq 1\}$

for *s*=*b*^{j}*c*^{k}*d*^{l} with e.g. *j*≥1 choose vwx to consist only of *b*’s, for *s*=*a*^{i}*b*^{j}*c*^{j}*d*^{j} choose vwx to consist only of *a*’s; in both cases all pumped strings are still in *L*.^{[3]}

A precursor of the pumping lemma was used in 1960 by Scheinberg to prove that $L=\{a^{n}b^{n}a^{n}|n>0\}$ is not context-free.^{[4]}