Abstract

Abstract

This article presents a new automatic classification system-the Parallel Automated Coding Expert (PACE)-that takes advantage of a massively parallel supercomputer. The system is based on an empirical learning model called Memory-based Reasoning (MBR) [28]. Following the MBR model, the PACE system uses a training database of 132,000 previously classified returns to classify new census returns not contained in the database. This contrasts with the Automated Industry and Occupation Coding System (AIOCS), an automated system developed by the Census Bureau for the 1990 Census, which is essentially an expert system driven by knowledge extracted from human experts and tested via the same preclassified database. The thesis of this article is that the MBR paradigm provides a more accurate, more robust and simpler solution that is directly reflected in a much reduced software development effort. Case in point, the building of PACE required four person-months while the expert system required 192 person-months. Moreover, PACE exhibits higher performance; it can process approximately 60% of the returns accurately while AIOCS can process approximately 47%. MBR is well-matched to data parallel computer hardware and programming models, such as those of the Connection Machine, so that the final system operates rapidly, though the computational requirements are substantially higher than those required to run the expert system.