ABNF Parser and Verifier

Motivation

The Augmented BNF is used to specify the syntax of a number of very common and emerging text-based Internet protocols:

A typical example for a BNF (here, for email) is

message-id      =       "Message-ID:" identifier CRLF   ; a comment
in-reply-to     =       "In-Reply-To:" identifier CRLF
references      =       "References:" identifier *(*CFWS identifier) CRLF
identifier      =       "<" addr-spec ">"

ABNF extends context-free grammars (BNFs) by allowing repetitions of an element or a group of elements within a rule, possibly restricted in number. It also expresses optional elements more concisely by allowing bracketed elements, e.g., [foo] is equivalent to *1(foo). Besides, the syntax differs somewhat from, e.g., yacc.

Unfortunately, there are currently no tools that parse a BNF to check whether a definition is syntactically accurate or complete. Also, each message parser is implemented by hand.

Specification

The project consists of three (separable) parts:
  1. Parse ABNF descriptions with grammars as in ABNF so that specifications can be checked for syntactic correctness and completeness. Allow BNF elements to be out of order (use before definition). Provide for file inclusion to simplify use of standard definitions. Possibly accomodate slight variations found (e.g.) in RFC 822 (email) and RFC 2068 (HTTP). Allow both / and | for alternatives. Support the notation "#1 identifier" to indicate a comma-separated list of identifier. Input is a the ABNF description found in the protocol specification, output are error indications.
  2. Check text strings (e.g., SMTP headers) to see if they conform to a given grammar and possibly show their production. You only have to process the protocol headers, not the whole message. Your program should read the headers from stdin, but there should be a function internally that processes the headers as a single string.
  3. Build a parser generator that can create run-time parser that parse messages based on a given ABNF specification.

Note: this project can be split into several (e.g., parsing/checking and translation).

Investigate whether lex and/or YACC can be made to fit some of these tasks, e.g., by translating an ABNF specification into something understandable by these tools.

Environment

I prefer that the project be done in Java (command-line mode), but this is not a requirement.

Prerequisites

Knowledge of compilers and parsing. Knowledge of protocols not required.
Last modified: 1997-09-21 by Henning Schulzrinne