From documents sent to us by the EIA and dated 1999, we have 6 printed pages of definitions with spurious formatting. To extract parts-of-speech and semantic meaning, we need to parse the raw text using only line breaks to gauge definition boundaries (all other formatting is inconsistent.)
(or go to next section)
(or see all definition documents)
|
8/20/99 MOTOR GASOLINE AND RELATED TERMS Aviation gasoline (Finished): A complex mixture of relatively volatile hydrocarbons with or without small quantities of additives, blended to form a fuel suitable for use in aviation reciprocating engines. Fuel specifications are provided in ASTM Specification D 910 and Military Specification MIL-G-5572. Note: Data on blending components are not counted in data on finished aviation gasoline. Conventional Gasoline: Finished motor gasoline not included in the oxygenated or reformulated gasoline categories. Note: This category excludes reformulated gasoline blendstock for oxygenate blending (RBOB) as well as other blendstock. Gasohol: A blend of finished motor gasoline containing alcohol (generally ethanol but sometimes methanol) at a concentration of 10 percent or less by volume. Data on gasohol that has at least 2.7 percent oxygen, by weight, and is intended for sale inside carbon monoxide nonattainment areas are included in data on oxygenated gasoline. See Oxygenates. Gasoline: See Motor Gasoline (Finished). |
Definiton Text:
Aviation gasoline (Finished): A complex mixture of relatively volatile hydrocarbons with or without small quantities of additives, blended to form a fuel suitable for use in aviation reciprocating engines. Fuel specifications are provided in ASTM Specification D 910 and Military Specification MIL-G-5572. Note: Data on blending components are not counted in data on finished aviation gasoline.
Sample tagging hierarchy:
type word: (letter*)
type phrase: (word{1,5})
alternately:
(subject, verb, object)
type text: (phrase*)
text division:
Note:
See:
See Also:
Vertical Whitespace
type definition:
head term (phrase)
parenthetical modifier (phrase)
definition (text)
head noun phrase (phrase)
properties (text)
for use in (phrase)
used in (phrase)
used by (phrase)
used for (phrase)
characterized as (phrase)
intended for (phrase)
includes/excludes (text)
includes (phrase)
contains (phrase)
excludes (phrase)
other than (phrase)
note (text)
x-reference (phrase)
Acronyms are expanded in place via a different structure on the first pass:
uppercase_lookup (word)
expanded meaning(phrase)
words-involved(word-list)
For example, upon encountering MTBE the analyzer consults this dataset under
uppercase_lookup to find MTBE. Expanded meaning will have Methyl Tertiary Butyl
Ether on one of two conditions: if it was already found in the document on the
first pass in the following format:
Methyl Tertiary Butyl Ether (MTBE)
or if the word and meaning were already entered into the databse through an acronym
glossary meant for this task. In this manner, we can track such terms as CO
(Carbon Monoxide) that do not directly match up their letters.
The words-involved field matches words that appear mid-acronym so the lexer can
throw them away while searching for acronyms. For example, RBOB is referenced as
reformulated gasoline blendstock for oxygenate blending, and so gasoline and for
would be placed in the words-involved list.
Tagged:
head term (phrase) aviation gasoline
parenthetical modifier (phrase) finished
definition (text) A complex mixture... Fuel specifications are provided in
Whatever ASTM Stands For (ASTM) Specification D 910...
(the entire text up to Note:)
head noun phrase (phrase) complex mixture of relatively volatile hydrocarbons
properties (text)
for use in (phrase) aviation reciprocating engines
used in (phrase)
used by (phrase)
used for (phrase)
characterized as (phrase)
intended for (phrase)
includes/excludes (text)
includes (phrase)
contains (phrase)
excludes (phrase)
other than (phrase)
note (text) Data on blending component are not counted in data on finished
aviation gasoline.
x-reference (phrase)