2. Related Work

2.1. Introduction

In this chapter we describe work in knowledge-based computer graphics and work in computer graphics that can be applied to knowledge-based graphics. Each of the systems described automatically computes some or all of the values that specify a computer-generated graphic. These systems differ greatly not only in the types of graphics produced, but also in the types of knowledge used. After first describing each system individually, we proceed to compare the different types of decision making, from the decision of what objects to depict to the criteria used for different stylistic decisions.

Summarized, the areas addressed in these systems are:

• Knowledge base queries [Zdydel, Greenfeld and Yonke 81]

• Spatial data queries: generating displays of database objects [Friedell 83]

• User-interfaces: the coloring of interface objects [Meier 88]

• Pictorial representations of sentences [Simmons 75]

• Pictorial representations of data and procedures.[Strothotte 89]

• 2D animated stories [Kahn 79]

• 2D animated help facility [Neiman 82]

• 2D charts and graphs [Ghahamgari 81] [Mackinlay 86]

• 2D diagrams of attributed graphs and networks.[Marks 90][Kamada and Kawai 91]

• 3D scene synthesis: generating different types of generic environments [Friedell 83]

• 3D pictorial explanations of actions and procedures [Feiner 85][Rist and André 92]

• 3D illustrations of geometric models [Kamada and Kawai 87][Dooley and Cohen 90a, Dooley and Cohen 90b][Saito and Takahashi 90]

• 3D rendering [Emhardt and Strothotte 92].

• 3D animations [Karp and Feiner 93].

• 3D intent-based illustrations [Seligmann and Feiner 89, Seligmann and Feiner 91, Seligmann and Feiner 93]

2.2. Description of the Systems

2.2.1. CLOWNS

Simmons’ CLOWNS Microworld [Simmons 75] is an early attempt in language processing. Given a sentence as input, the CLOWNS system generates pictures interpreting that sentence. One of the motivating factors in developing CLOWNS was to provide a system which pictorially demonstrated natural language computation. The pictures CLOWNS generates show how the system interprets sentences. CLOWNS generates simple 2D line drawings which correspond to a sentence. The CLOWNS domain is a microworld consisting of a clown, pedestal, pole, boat, dock, lighthouse, water and land. CLOWNS determines the position, orientation, and scale of the graphical objects that figures in the generated pictures.

Typical of early research in artificial intelligence, CLOWNS utilizes a procedural approach to knowledge representation: different types of words (verbs vs. nouns) are programs that return different types of graphic primitives or relations. Certain nouns, such as “clown” or “house” directly correspond to graphics objects and are represented by the graphics programs that provide the instructions used to render that object (in this case, a list of line primitives). Each object, in order to be rendered, requires a 2D location and rotation value. Nouns such as “top” or “bottom” are programs that return the 2D or contact points for particular objects. Prepositions and verbs such as “to” and “balances” return the spatial relations between the objects. These spatial relations may occur within a certain timeframe, so one sentence may be interpreted with a series of pictures. For example, an object that moves can be depicted at different locations. A separate picture is generated for every instance in the timeframe. The values returned by these programs are stored with each object in a property list and are used to compute the location and rotation of each object of the scene.

2.2.2. Ani

Kahn’s Ani system [Kahn 79] was designed to generate animations based on high-level film descriptions. Ani takes as input a representation of the film which includes descriptions of each character and of each scene. A character description refers to aspects of the character’s personality using high-level concepts as “powerful,” “ugly,” or “good.” A scene description includes a desired time length and a general description of the characters’ actions and relations. For example, “A meets B” or “A hates B.” Ani generates a detailed description of a sequence of scenes which comprise the final animation. Each generated scene describes the movement of simple 2D geometric shapes depicting the characters. Ani uses knowledge that formalizes how concepts and relations are communicated using character motion. For example, one method for showing “A meets B” consists of moving A and B towards each other. A generated Ani film consists of a sequence of scenes. To generate each scene, Ani makes three main decisions: what geometric shapes (or characters) to include, and for each, the path on which it moves and how it moves along this path.

Starting with a description of each scene, Ani decides where each character should be located and how each should move. Each scene is input with a time length that further constrains the solution. Ani is an actor-based system with control components which postpone and reawaken the otherwise autonomous actors. Each aspect of the animation, the character, each descriptor of each character, each plan, method, relationship and so on, is an actor which makes suggestions as to how it should be depicted. Each suggestion consists of a value (such as location) and the strength of the suggestion (how important this suggestion is). These suggestions concern all the decisions needed to complete the animation, such as the speed of a character to reflect motivation or the method for depicting actions such as how the characters should move to express anger or their actions. Conflicts are resolved by applying heuristics, and then if no solution is found, then the strength of the suggestion is considered. Ani’s control component utilizes focus to determine which problems to solve next. Actors which cannot make suggestions at a certain point during the search for a solution are postponed. Actors that have been postponed are awakened when the focus changes appropriately.

Ani was used to generate an animated version of the Cinderella story. The characters in the film are depicted by simple 2D geometric shapes which are translated and rotated. For example, Cinderella is depicted by a star.

2.2.3. AIPS

AIPS [Zdydel, Greenfeld and Yonke 81] was designed as a general purpose system to display information in a knowledge base. AIPS generates displays of different types of information in a KL-ONE knowledge base. Templates are used to describe different types of displays. Templates include slots and positioning information for all parts of the display. A user defines the templates, indicating for what types of information they are appropriate. These templates are represented by both the graphical information necessary for rendering, or the routines needed to instantiate them, as well as the semantics and characteristics of the information for which they are appropriate. For example, a label template contains information for positioning the text, border, sizing and is appropriate for a title.

A template is activated if the type of information it is listed as being able to display matches the type of information that is to be shown. For example, a map template includes a rectangle for its border, a text area for its label, a table area for its legend, and map symbols for items appearing on the map. The system automatically selects an appropriate set of templates for displaying certain types of information. As the type of information grows, however, so will the number of formalized templates.

2.2.4. BHARAT

BHARAT [Ghahamgari 81] is an early attempt to generate displays of numeric data. BHARAT automatically generates 2D graphical presentations of numeric tabular data. The system is limited to expressing aggregate information or unary functions. For example, BHARAT can generate a graph showing the ethnic breakdown of a country, or the price of different fruit but cannot generate a graph that expresses the relation between the price, size and weight of different cars. BHARAT can display information in one of three formats: a pie chart, bar chart or line chart. The basis for selection is a function of the data itself, so BHARAT selects what format to use automatically and instantiates it.

BHARAT characterizes the input data in terms of six values: continuity, totality, cardinality, multiplicity, units and range. The user is required to provide the values for continuity and totality. Using a small set of criteria, BHARAT selects what type of chart to instantiate based on these six values. The user plays an important part in the decision process by specifying what characteristic of the data should be shown.

2.2.5. GAK

Neiman’s GAK system [Neiman 82] was designed to generate animated help automatically for CADHELP, a CAD system help facility. The CAD system employs a vector display, stylus and tablet. The knowledge base consists of a representation of the CAD system and user interaction with the CAD system. Two classes of information must be represented in order to generate appropriate explanations: Actions describe the functionality of the CAD system, or more generally how operations are performed by the user, while states describe the status of the system (such as the location of the cursor on the display). Feature scripts tie actions and states together using conceptual dependencies. These scripts were devised by experts users of the CAD system and describe domain-specific information. More generic user-interface actions (such as dragging, drawing, picking) are also represented. Causal links represent the chain between actions and states. To arrive at a certain state, a causal chain is traversed.

The modularity of the representation enables easy extendibility; new functionality may be added, new specific interface actions can be added and integrated. The graphics component relies on representations of each of the objects (stylus, mouse, hand) which are stored as a list of vectors which may be positioned and oriented differently. GAK stores different instances of the same object appropriate for different actions. The animations are generated using inferencing to determine what objects to depict and what graphical transformations to perform on each object. Additionally, textual captions are displayed for each selected action.

GAK's representation is therefore general and easily extensible, but its graphical representations are less so. The objects are prestored 2D vector graphics. The automated sequence is generated based on the description of the action so the success of each explanation is dependent upon a designer’s capability to foresee all variations of explanations and coherency between each causal link. All objects appear in one style.

An example of input to the animator generator entails moving of the cursor to a general area in the right hand lower corner of the drawing area. The current location of the stylus is known. In order to generate an animation a specific 2D location needs to be generated as the destination point for the stylus. Other inferences are made to determine that the part of the stylus which must be moved is the its tip. Additionally, attachments are represented. If one object is being moved, then so are the other objects which are currently attached to it. In this case the hand is attached to the stylus. Another frame contains the list of vectors needed to draw the hand. Different frames are used for the hand in varying positions. The output of the system is an animation in which the stylus and hand move from the current position to a point in the right hand lower corner of the drawing area. At the same time, the textual explanation appears to move the stylus to the right-hand-lower-corner of the drawing area.

2.2.6. Beach and Stone

Beach and Stones’ illustration system [Beach and Stone 83] was designed to enable users to format diagrams in different illustration styles. A user inputs a diagram and style specification. The diagram is a list of named components comprised of graphic and textual primitives. A style sheet specifies a set of procedures and values and that describe how different types of diagram components should be formatted. The diagram is thus formatted much like the way a bibliography is formatted in Scribe in a specified journal style. Style rules specify line styles, pen styles, area color and outline colors, font types and styles, indentations, text color, shadow styles, and so on. The user can identify and refer to special components of an illustration and develop rules for these components. Different style sheets can be defined for different output media.

2.2.7. The View System

Friedell’s View System [Friedell 83, Friedell at al. 83, Friedell 84] was designed to generate 2D iconic displays of spatial database queries as well as realistic 3D scenes. The View System generates 2D iconic displays of a database of ships as well as generating 3D landscapes. A description of the situation space determines the content of the generated display. The situation space is comprised of four values: object type, list of object attributes, the viewer’s identity and the viewer’s task. The View System generates each object it depicts and positions it using object operators. The system determines what information to communicate (and hence depict) with each icon based on the situation space and sizing and positioning constraints. The generation of a graphical object is incremental. One constraint which is considered is limitations on icon size. The View System adds in detail based on the attributes of the object represented. For examples, labels cannot exceed a certain size, so textual descriptions are selected to meet this constraint. The View System relies on prestored iconic objects and routines for their generation. Special operators are used to transform or combine objects. For example a flag may be placed on a battleship to convey nationality, while the shape of a battleship may be transformed to convey ship type.

There are two main stages in display generation: object synthesis and environment layout. Object synthesis is the generation of the graphical objects that populate the display; environment layout determines where the graphical objects are positioned. The View System operates in two modes: one in which the object synthesis takes precedence over environment layout, and second, in which the environment layout takes precedence over the object synthesis. One main constraint which determines what information is depicted (or left out) is that of feasibility, or constraints having to do with object size. In the first mode, there is less opportunity for the amount of information conveyed to be compromised. The second mode is appropriate when the spatial relationship between the objects is a priority.

Synthesis operators comprise the graphics knowledge base (dictionary of object elements, structural relations between objects, guides for graphics design, image computation). Each is represented as a predicate and action (simile inferences, structural couplers and object grids). Simile inferences are semantically based (prototypes and modifiers). Modifiers are transformative or structural. Structural modifiers are intrinsic or extrinsic. Intrinsic structural modifiers affect the object's geometry (converting a generic ship into a particular kind), while extrinsic structural modifiers concern ancillary alterations such as adding a flag.

The structural coupler specifies how structural modifiers similes are applied to prototype similes using specific information. When there is no specific information available, object grids serve the same purpose. Object grids are a set of guidelines to determine how structural modifiers are combined with the prototype to generate a coherent composite object.

2.2.8. APEX

Feiner’s APEX system [Feiner 85, Feiner 87] was designed to show how explanation graphics may be generated that show how particular objects are manipulated. APEX generates pictures that depict actions performed on objects in a 3D world. APEX has been demonstrated in a sonar cabinet world as well as bookshelf domain.

An APEX picture frame consists of a list of objects, viewing specification, and lighting. APEX generates one picture for each action it depicts. APEX stores the objects in the worlds it depicts in assembly hierarchies in which only leaf nodes store actual geometries. Each object has a slot which is used to represent whether or not the user is as yet (during the course of a session) familiar what that object.

Given an action to depict, APEX populates the list of objects in the picture frame by considering each object’s role vis à vis the action. As each object is added to the picture frame, the reason for which it has been included is recorded and the view specification may be altered to ensure that it is visible. Objects which are essential to the action are classified as frame objects because of their importance in the action’s frame representation. All other classifications are made with regard to the selected frame objects. Objects fall into one of six classifications. Given a frame object, an object is classified as a context object if it is the first familiar object encountered when traversing up the object hierarchy. Landmark objects are objects with unique characteristics and with the greatest proximity. They serve as reference points for frame objects. Similar objects are simply those objects which could be confused with the frame object. Similar objects are included to disambiguate the identification of the frame object. Supplementary objects are objects which, if omitted, would cause confusion or make the picture seem incomplete. Objects which support objects (like the table on which a book lies or the floor on which a cabinet rests) are supplementary objects and are included if they are within the visible viewing area. Regular objects are those objects which have nor yet been assigned a classification, yet are visible given the view specification.

The types of actions APEX depicts entail the rotation and translation of objects. APEX generates a seventh type of object, or meta-objects. These objects are not part of the world APEX depicts, but serve to illustrate the actions being explained. APEX generates arrows to show the path along which objects move.

Objects appear in one of two styles depending upon their role in the picture frame: subdued or primary style. Frame and context objects are rendered in primary style (or normally). All other objects are rendered in subdued style. The color values of these objects as well as the actual geometry of these objects are blended to de-emphasize them. APEX thereby decides the level of detail for object is depicted. Each internal node is enriched with a simplified version of its descendants that APEX itself generates in a preprocessing stage, based on the 3D geometry of the world. APEX determines at which level to utilize the simplified version, which has the effect of merging the boundaries of an object’s subparts. Thus, objects rendered using the simplified versions appear with less detail. An interesting side-effect of this operation is that certain small objects will sometimes disappear.

2.2.9. APT

Mackinlay's APT system [Mackinlay 86] generates 2D presentation graphics of quantitative data. Using a formalized vocabulary and grammar APT designs charts and graphs which best present quantitative relations and their structural properties effectively. APT defines a graphical language as the methods for constructing different charts and graphs and a graphical sentence as a relation. Sentences are expressed in a chosen language and can be combined so that one presentation may show several relations, thereby combining simple presentation styles to create more complex ones. APT designs presentations to convey only the relations specified in its input.

APT incorporates knowledge about presentation design described by Bertin [Bertin 83]. APT uses a vocabulary of marks. A mark is either a point, line or area. APT uses different graphical languages which fall into six different categories. Single-position languages position marks on either the horizontal or vertical axis. Apposed-position languages position marks between two axes, such as line, bar or plot charts. Retinal languages determine the color, shape, size, saturation, texture and orientation of marks.

APT receives as input a set of data and a set of relations to show and optionally those to ignore. APT considers the relations specified for the presentation and searches for a design solution using two criteria. Expressiveness determines what graphical languages are appropriate for conveying different types of relations. A graphical language is appropriate if and only if it can present the specified relations and presents only the specified relations. Effectiveness determines how well which graphical languages, alone and combined, express a particular relation. Effectiveness is evaluated through the use of a ranking of the perceptibility of various properties of 2D graphical languages. APT utilizes a generate-and-test approach: composition operators are used to select combinations of graphical languages and come up with alternative plans when partial solutions fail.

2.2.10. GRIP

Kamada and Kawai’s GRIP system [Kamada and Kawai 87] generates line drawings of 3D geometric models. The system handles hidden lines based on a picturing function scheme. The system receives as input a geometric model, view specification and a set of picturing functions. The picturing functions specify how different parts of a geometric model are drawn. For example a picturing function can refer to a particular surface or an edge occluded by a surface (which the system computes based on the view specification). Thus, in order to render a simple wireframe illustration, the picturing function specifies that all lines be drawn. To draw a hidden-line drawing, the picturing function specifies that occluded line segments are not drawn. GRIP, can, however, generate more complicated line drawings.

The picturing function can refer to entire surfaces which can be removed. GRIP computes the coverage of surfaces and lines, segmenting these to reflect the different degrees of occlusion. The picturing function can assign different line styles to line segments depending upon the degree of occlusion. Thus edges that are occluded by just one surface are rendered in one style, while the edges that occluded by two surfaces are rendered in a different style and so on. Furthermore, the system can represent any kind of geometric model. Thus their system can be used to illustrate how a ray travels through a periscope. The user, however, designs the illustration by specifying the conditions and objects that should be treated differently and by specifying the line styles for these conditions.

2.2.11. ACE

Meier’s ACE system [Meier 88] assigns color values to user-interface objects. The user describes a set of interface objects, some of which may be standard items such as menu bars or button labels, as well as their positional relationships (above, below, etc.). ACE uses three classes of rules: rules about relationships of interface objects, rules about color relations, and control rules which specify how the first two types of rules command certain color selections. The color set is limited to 450 possible colors, each of which is represented by hue (10 perceptually different hues), brightness (15 gray levels), and saturation (3 for each hue/brightness combination). For every pair of hues 4 relations are stored. These are harmony/contrast (value of 1 to 5) for either adjacency contrast (where clashing combinations are represented) and screen contrast. There are two attractiveness values for each pair, the first represents the value if the first color will be darker than the second, the other if the contrary is true. Brightness and saturation is also represented in the same manner—with harmony/contrast. ACE begins by determining all the constraints imposed by the relational conditions input by the user for each interface object. ACE then proceeds to attempt to assign a color value to each interface object. If an interface object has been assigned a color, then it proposes colors to those interface objects that it constrains. Constraints are relaxed when no solution is found. ACE thus selects colors based on both the physical positioning of each interface device as well as its semantics (if it is one of the standard interface objects). The latter consideration allows ACE's rules to use color in a semantic fashion.

2.2.12. Saito and Takahashi

[Saito and Takahashi 90] have developed efficient algorithms for showing different characteristics of 3D surfaces. Their system does not automatically decide what should be shown, but rather automatically computes and generates 3D representations that combine different styles. The resulting images show various features of 3D objects. These include edges, contour lines, various types of hatching to show surface. Each of the procedures returns the information necessary to render and combine different styled images of the same object. An edge drawing is difficult to generate using conventional graphic primitives. The contour drawing shows the form of the object. Most interestingly, there are algorithms for generating tapered cross hatching patterns which are generated to follow the form of the object emulating cross-hatched drawing. All of these styles can be combined with fully-shaded images, or 2D image data that has been retrieved from the same object. This enables real images to be easily combined with the generated images as long as range data is available at an appropriate resolution. The user is required to request the combinations of the various styles; however the system’s preprocessing prepares all the information in order to generate any combination of style.

2.2.13. Dooley and Cohen

Dooley and Cohen [Dooley and Cohen 90a, Dooley and Cohen 90b] designed systems to illustrate the form of 3D objects. Their first system [Dooley and Cohen 90a] is an interactive system which generates line illustrations of 3D models. The system defines a vocabulary of lines with attributes of width, transparency, and style (e.g., solid, dotted, dashed, and invisible). The user inputs a set of 3D models, defines a view specification, and assigns values to different types of prescribed line segments (e.g., edge, contour, and degree of occlusion) such as their importance and material properties. The system then uses this specification to render a line illustration. The line illustration utilizes different line styles and tapering to show intersections, various degrees of occlusion, and depth. Tolerances and thresholds are used to monitor the compatibility of assigned values and effect constraints and modify existing strategies. For example, a thin line which tapers to indicate depth may be thickened so that the tapering effect is apparent.

Their second system [Dooley and Cohen 90b] illustrates the 3D models with fully shaded color graphics with transparencies. The generated illustrations combine different rendering styles. Fully shaded graphics are combined with line drawings and line hatching in order to communicate more effectively the structure and shape of various 3D models. Illustration rules are specified by the user and determine how certain relationships should be depicted. These specify values for color, width, transparency, style as well as values for shadows, lighting model, reflection, and coverage (or the number of layers a surface covers or obscures). The system analyses the model and classifies line and surface segments, each of which will be dealt with differently according to the values set by the user. The system automatically selects to taper lines, choose line endpoint position and shape, and segment connections based on the inherent properties of the model. The system also generates screen based hatching to further communicate form and positions and defines different light sources automatically.

2.2.14. ANDD

Marks’ ANDD [Marks 90, Marks and Reiter 90] system was designed to produce network diagrams. ANDD receives as input an attributed graph along with a set of relations. The list of relations specifies what attributes and properties of the network are to be communicated to the viewer. ANDD determines all the visual aspects of the generated diagram. All stylistic choices are made using specialized rule bases. ANDD sets all values for each graphical object, the line widths and styles, the color of nodes, the text, and the positioning of each object or the network layout using its rule base. ANDD incorporates knowledge to adhere to Grice’s four maxims of conversation [Grice 75]: Quality, Quantity, Relation, and Manner. Briefly, the maxim for Quality requires that the communication be truthful; the maxim of Relation requires that the audience not interpret any false implications; the maxim for Quantity requires that the communication not contain additional, irrelevant implications; and the maxim of Manner requires that the communication be clear.

ANDD classifies two types of design tasks to generate a diagram. Each relies on a different formalism. The first represents the semantics of the network diagram and isolates those properties and attributes that are to be conveyed to the viewer. This formalism is used to generate the expressive mapping of the network diagram. The expressive mapping is a mapping of all the semantic attributes, properties and relations of the network that are to be communicated to various types of graphical objects. The second formalism represents the syntax of the network diagram and specifies the information needed to instantiate and finally render the diagram. While the semantic formalism expresses the high-level graphical goals of the diagram, the syntactic formalism expresses the low-level graphical goals of the diagram.

The generation of a network diagram proceeds in three stages. ANDD begins by considering the attributed graph and list of relations. It generates a semantic representation of the diagram or an expressive mapping of properties of the attributed graph to a semantic representation of the network diagram. The mapping from the attributed graph to the semantic representation relies on semantic specific information, such as what sort of symbols should be used to represent certain kinds of nodes. This information is stored in a separate rule base. From this semantic representation, a syntactic representation is generated. For example, during this stage vertex names are mapped to text labels, edge types are mapped to pen types, and flags for emphasis are set. Using this syntactic representation, ANDD then begins to instantiate the network diagram using a separate rule base. First, all the parameters needed to render each graphical object is set. Then, the layout or positioning of each graphical object is determined. The positioning of nodes is determined by the high-level constraints that specify that certain objects should be aligned in a particular fashion (such as horizontally or vertically) and/or should be positioned in relation to other objects (such as above or below). ANDD’s rules are sensitive to the capabilities of the display and rely on a graphical palette that limits stylistic choices to those that can be accommodated on a particular type of display.

It is during the expressive mapping stage that false unwanted implicatures are avoided. Rules for avoiding unwanted implicatures specify, for example, that a non-quantitative attribute should not be mapped to a graphical property that will be perceived as ordered. Additionally, extraneous relations or attributes are not introduced into the diagram. For example, a rule states that only one graphical technique be used to express a property or attribute.

2.2.15. Strothotte’s Tutoring System

Strothotte’s tutoring system[Strothotte 89] was designed to formalize a frame-based representation of pictures to be integrated in advice-giving explanation systems. The tutoring system generates pictorial explanations in the domain of high-school chemistry problems.

A picture-frame includes a picture (pixel-matrix) of a typical scene in the domain, and several graphical slots. There are 3 types of graphical slots. Picture-manipulation slots are used to make minor changes in the picture, such as the rotation of an object. Graphic-symbol slots are used to annotate the picture with labels, arrows, or icons. Hierarchy slots refer to other (previously designed) picture frames with greater detail of specific parts of the picture. A picture presentation is generated by activating certain slots. The presentations are predetermined, relying exclusively on a human designer’s ability to anticipate all the system’s needs and all the possible permutations. However, the same picture frame may be used to depict similar information such as similar procedures in a chemistry experiment. The various slots may be assigned different values for each step. For example, the graphic-symbol slots may be assigned different values so that the labels are changed while the display is left unchanged. The system relies on a vocabulary of picture-frames and the correct definition of graphical slots.

Picture-frames were hand-designed for the basic operations shared by a set of chemistry problems. Graphical slots were attached to each of these picture-frames to account for the different types of chemicals used in the experiments. The system includes a rule base representing chemical reactions and the experimental techniques needed to carry out the various reactions. To generate a pictorial explanation the user inputs a query in PROLOG, such as “how is H2 produced?” The system searches for H2 and finds that it is produced by dissolving Zn in HCl. The picture-frame for dissolution is selected and the graphical slots are filled with the appropriate information for the chemicals. Procedures that require more than one step are also handled. The picture-frame is augmented with each step in the process.

2.2.16. TRIP

Kamada and Kawai’s TRIP system [Kamada and Kawai 91] is designed to provide a visualization framework for translating abstract objects and relations into pictorial representations. TRIP automatically generates different types of 2D diagrams of semantic nets. TRIP takes as input a relational structure or semantic net. TRIP’s diagrams consist of simple geometric shapes (boxes, circles, etc.), connecting objects (as lines and arrows), and text.

TRIP receives as input a semantic net and a specified visual mapping. All constraints in the visual mapping are specified as either rigid or pliable. Rigid constraints cannot be relaxed while pliable constraints can be relaxed TRIP begins by first translating the semantic net into a relational structure. Then the abstract objects are translated to graphical objects, and the abstract relations to graphical relations. For each type of concept a particular graphical node is specified, for different types of relations, different graphical objects are used to connect them. Graphical relations are geometric, connection, and attribute. It is during this step that constraints for alignment are set.

2.2.17. Hyper-Renderer

Emhardt and Strothotte’s Hyper-Renderer[Emhardt and Strothotte 92] computes a number of properties of a 3D scene during the rendering process. For example, the system can automatically compute the set of objects blocking each object. The user can then ask questions about the resulting picture in restricted natural language (e.g., to inquire which objects are invisible). Using this information, the user can then specify that the scene be rendered differently so that, for instance, a blocking object is rendered in a transparent style to reveal the objects behind it.

2.2.18. WIP’s Graphics Generator

More recent work in knowledge-based graphics adopts methods and primitives modeled in many cases after our own. The WIP knowledge-based presentation system [Wahlster et al. 91] automatically creates presentations comprised of text and graphics. WIP’s graphics generator [Rist and André 92] automatically designs and renders 3D line drawings or shaded illustrations. WIP’s graphics generator designs pictures for domains that include an expresso maker, lawnmower, and modem.

The system incrementally generates an illustration by populating the illustration with objects (corresponding to the objects in the domain) and annotative objects (arrows and textual labels), and selecting a view specification. The graphics generator is made up of two main components: a graphics design component and a graphics realization component. The graphics design component is assigned presentation tasks and selects graphical constraints that match the elements of the presentation task. Graphical constraints are achievement operators, that specify how to achieve the constraint, or evaluator operators that specify how to determine if a constraint has been achieved, similar to the methods and evaluators used in IBIS. These are implemented in a manner fashioned after APT. They have adopted many of our primitives, such as IBIS's include and visible. Design strategies describe the set of graphical constraints that must be satisfied to achieve a presentation task. A design strategy comprises of a set of constraints that must be achieved to accomplish a specific presentation task and a set of conditions that specify when it can be used. This is similar to IBIS’s design methods.

The graphics realization component comprises of four modules. An image/picture handler composites the annotation objects and 3D modeled objects. Object geometries are represented by 3D wireframe models. Using these models one module generates the 3D graphical objects (line drawings or shaded) to depict the objects that it determines should be included. Using an algorithm for generating exploded views, the system can reposition 3D objects to depict assemblies. At this point, the system can generate a cutaway view satisfy simple visibility constraints. Another module creates a 2D projection of the 3D object, which is sent to the image/picture handler . A third module generates and positions annotation objects, such as textual labels and 2D arrows which are also sent to the image/picture handler .


ESPLANADE [Karp and Feiner 93, Karp and Feiner 90] is a knowledge-based animation presentation planner. ESPLANADE uses rules of cinematography to design all aspects of an animation, including camera placement and movement, multiple viewports, the different shots and transitions, object properties, and special effects. ESPLANADE currently creates animations of the operations of a crane in a warehouse.

ESPLANADE receives as input a script generated by a separate action planner and a set of communicative goals. ESPLANADE determines how to show the actions in the script and achieve the specified communicative goals in one animation. Using heuristic reasoning, the system decomposes the animation planning task to build a frame-based semantic representation of the animation. This structure is a hierarchy of sequences, scenes, and shots. The system selects the viewing specifications, selects and orders the actions that are to be shown, selects the transitions to use between shots, and selects and creates multiple views.

2.3. Evaluations and Comparisons

2.3.1. Introduction

In this section we will compare how the systems generate some or all of the values that specify computer-generated graphics. The first section compares how the systems determine what objects to depict that correspond to real objects or concepts. We begin by comparing how the systems decide to include an object and then describe how the systems select to represent that object. The next section describes how the different systems position objects. Then we describe how the systems generate annotative objects. Then we describe how the systems select line styles and color, and if they can combine styles. We end the comparison with a description of the criteria these systems use to make stylistic choices.

2.3.2. Object Synthesis The Inclusion of Objects in a Scene

In CLOWNS only special nouns are programs that are used to render objects. Those nouns, when they appear in the sentence CLOWNS is interpreting, are included in the scene. The knowledge of what should be depicted is thereby pre-stored and not context sensitive. Additionally, words that may be used differently in different sentences are treated the same way each time. GAK uses inferencing to determine what objects to include in the animation. For example, if an action entails moving the stylus, the user is the agent performing this action. The information that the user moves the stylus is not explicit in the feature script, rather GAK infers that the user’s hand is attached to the stylus and decides to include it in the animation. GAK thereby relies on the ability of the system designer to account for all situations and does not use visual specific knowledge to determine what objects should be shown. Depending upon what picture-frames and graphical slots are activated, Strothotte's Tutoring System will generate pictures with different sets of objects. Picture-frames are arranged hierarchically, so the picture may not include objects at locations where they would be generated in other cases. The decision to include objects is based on the original frame design and the activation of slots. WIP’s graphics generator applies design strategies to achieve the input presentation tasks. These strategies constrain the illustration to include certain objects.

APEX and ANDD decide what objects to include based on what role they play in the semantics of the final presentation. We have seen that APEX classifies objects to determine what objects must be in view (remember that APEX depicts objects with predetermined 3D orientations and locations). Additionally, APEX includes certain objects in order to communicate the location of the frame objects. APEX stores with each object a flag denoting whether or not the user has already been presented with a picture showing the object’s location. If the flag is false, the APEX will include objects whose role is to locate that object. ANDD can visually represent attributes in several ways. ANDD applies its own rule base to determine whether attributes are shown using separate graphical objects or other graphical devices such as spatial relations.

Ani and the View System make decisions based on the collection of constraints that concern the semantics of the presentation, but in the end feasibility constraints have the final say when the solution is generated. Ani begins with the description of the scene in which all the characters required are listed. Since Ani sometimes partitions a scene into a sequence of subscenes, the inclusion of characters is a side-effect of this partitioning. The final decision for including characters in a scene is thereby a function of feasibility. The View System decides what objects to synthesize for a display based on the situation space and feasibility constraints. Subobjects of lesser importance may be omitted in order to satisfy sizing constraints. The View System decision making process is in large part based on the communicative intent, but in the end the feasibility determined by the prestored sizing constraints prevails. The Selection of Object Properties and Attributes to Depict

The View System applies specialized operators to generate objects. The situation space is used to determine what attributes should be shown, while other operators represent how objects are transformed to include more or less detail. The grid-operators, which specify the size constraints of graphical objects, determine how much detail may be included in the icon. The View System incorporates a set of object operators which specify how certain objects are transformed into new objects, how objects are combined, and how detail is added. These are prestored routines.

APEX selects the level of detail of each object based on APEX’s classification of the object’s role in the scene. APEX de-emphasizes objects in part by rendering them in lesser detail. Objects are stored in a parts hierarchy. APEX combines the parts which appear below a certain level by geometrically processing them so that their boundaries are blended. Smoothing adjacent surfaces has the effect of removing the distinguishing characteristics of those objects which have been combined.

GRIP, Dooley and Cohen’s and Saito and Takahashi’s systems rely on user control to determine what properties to show. The user provides GRIP with a specification that specifies how different lines segments and surfaces should be treated based on coverage. Similarly, Dooley and Cohen's systems determine surface, edge, and coverage information. The user specifies how and what information should be shown, but the system applies rules to satisfy that the use of these specifications is effective. Dooley and Cohen’s system is interactive, so the user actively participates in these stylistic decisions by changing parameters when the resulting illustration is unsatisfactory. Saito and Takahashi’s system analyzes a 3D surface and collects the information necessary for the surface to be rendered in several different ways. Additionally, their system allows 2D images to be combined with the rendered surface. The user selects the images to combine.

2.3.3. Object Positioning

CLOWNS and Ani compute object positions based on information that describes where the corresponding objects would be found in the worlds depicted. CLOWNS determines an object’s position by computing contact points returned by the preposition and verb and special noun programs. These contact points relate where the object must be based on the interpretation of the sentence. GAK has routines that select exact 2D locations for “fuzzy locations.” A fuzzy location is a unspecified general location such as “the lower right hand corner” of the display. GAK then calculates the path on which an object moves based on starting and finishing positions. WIP’s graphics generator can position objects in an assembly to depict an exploded view.

The motion and initial and final positions of the objects in Ani’s films are determined by the constraints associated with each operator selected to convey the mood, actions and relations of the characters in a scene. Ani can chose from several different operators and bases its decisions on feasibility (time).

The View System and ANDD position objects based on their semantics. The View System can cluster objects with shared attributes. ANDD’s expressive mapping phase determines when alignment and positioning should be used to convey attributes and relations. ANDD has the facility for determining whether or not objects should appear above, below, next-to each other or whether they should be aligned. The layout phase attempts to satisfy these constraints. APT’s set of graphical languages specify how marks are arranged, the actual value of each point, of course, determines the position of each mark. Once a set of graphical languages has been chosen to convey a set of relations, the layout is determined. Again, feasibility is the key factor in determining the final solution. TRIP relaxes the alignment constraints that the user specified as pliable, if necessary, to find a solution.

Object positioning is used to convey many different things. CLOWNS, GAK, the View System, and APEX use object positioning in order to convey the corresponding positioning of objects in the real world. The View System and APEX use the actual locations of objects in the real world, while CLOWNS and GAK compute fuzzy locations. BHARAT and APT use position to convey values in a way dictated by graphing languages. Other systems use position to convey other things: Ani moves objects in order to convey feelings and actions; the View System and ANDD use spatial positioning in order to convey the relationships between different objects. WIP’s graphics generator generates exploded views of assemblies. The positioning is used to show how objects fit together, or how they are assembled.

2.3.4. Annotative Objects

ANDD and APT have limited capabilities for generating labels. ANDD generates a legend indicating the semantics of different shapes, colors, and line styles. The View System is able to apply constraints to generate different labels based on the space allotted for the label. APEX generates arrows to show how objects are manipulated. ANDD includes and positions labels for the various objects in its network diagrams. GAK’s animations are annotated with captions that are generated from the same feature script used to generate the animation.

APEX generates 3D arrows (a type of meta-object), which are not found in the world knowledge base, to illustrate actions. APEX will generate an arrow whenever an action requires object manipulation. WIP’s graphics annotates objects with call-out textual labels. The system positions the label along the boundary of the illustration and positions an arrow from the label pointing to the object. The system also generates arrows to depict 3D operations. The system can handle multiple labels in the same illustration.

2.3.5. Stylistic Choice Line Styles and Weights

The three systems which depict 3D objects and their form rely on different line weights and styles in order to differentiate edges, profiles, contours, and distance to the eye. GRIP can show different levels of coverage (the number of surfaces an object obscures) with user-specified line styles. Saito and Takahashi's algorithms provide tapering of profile lines and allow different coloring schemes for different types of lines and contours. Dooley and Cohen require that the user choose what line styles and weights to use to show different categories of lines (profiles, etc.). Their system detects conflicts and alters the set values in order to provide continuity, show different levels of coverage, distance to the eye, and tapering.

ANDD uses different line styles and weights to emphasize certain objects as well as indicate their similarity or difference. For example, if two set of nodes are connected by the same sort of cabling, then ANDD would not select two different line styles to depict these two connections; doing so would imply that the connections were different. However, if ANDD were depicting a particular portion of a network, then a different line style could be used to emphasize that portion of the network. Since all the objects would then belong to the same group, the same visual cue (in this case line style) would be used uniformly over the group of objects. Color

The ACE system is designed with one goal: assign appropriate colors to user-interface components. Colors are chosen based on their harmony/contrast and attractiveness as a function of the semantics and relationship to the objects they are used to color. One obvious shortcoming of the ACE system is the inability to handle situations in which objects are shading or colored in a graduated manner. ACE does not incorporate enough knowledge to be applied to 3D shaded problems.

APEX uses color to emphasize and de-emphasize objects in the 3D scene. For example, the muting of object is accomplished in part by rendering them in a color which is either darker or lighter than the main objects in the picture. APEX also uses colors consistently, coloring meta-objects with the same semantics in the same color. APEX operated under the underlying assumption that all objects of the same category should be treated equally. Both APT and ANDD are able to assign colors to convey a set of attributes. WIP’s graphics generator colors objects differently to show different properties. Combining Styles

APT’s representation of graphical languages explicitly defines different styles for the use of visual cues. In this sense, APT is the only system that is truly able to choose different stylistic combinations in the same presentation. APT allows the user to restrict the system to certain graphical languages and thereby supports graphical preferences. Additionally, the expressiveness criteria can be altered to order the search for a solution. Other systems allow the same concepts to be mapped to different graphical devices. ANDD chooses what visual effects to use to convey the relation list. The other systems rely on user-specified selections.

2.3.6. Criteria for Stylistic Choice

Only a few of the systems incorporate knowledge about how visual material is interpreted or perceived. APT has encoded rules to evaluate the effectiveness of a stylistic choice. For example, certain graphical devices are better at conveying different types of relations than others. For example, if the days of the week should be distinguished, marks of seven different sizes could be used. However, the effectiveness of this method is not rated highly. First of all, it may be difficult to distinguish the sizes of the marks, and secondly, nominal values are better conveyed using other methods. ANDD uses rules that specify how the spatial arrangement of graphical objects will be interpreted to convey certain relations. ANDD uses these rules both to find an appropriate mapping for relationships and to detect when the spatial arrangement of objects imply relations that are either false or not specified in the input relation list. WIP’s graphics generator represents knowledge in two ways. First, the design strategies specify the conditions when different graphical constraints should be applied to achieve presentation tasks. Second, the system uses achievement operators to represent how graphical constraints are achieved and evaluation operators to represent how to determine when graphical constraints are satisfied.

The ACE system uses special rules about color to make color assignments. These are based on well founded theories of color interaction and by adjacency. Additionally, ACE uses semantic information, when it is provided by the user, to further constrain the color selection made for each object. ACE favors objects with the most specifications, and relaxes the constraints when assigning colors to other objects. ACE invokes special rules if the object it is selecting a color for is a standard interface object with a particular functionality. If the semantics of the object are unspecified then general rules are used instead.

Dooley and Cohen’s illustration system allows the user to specify what techniques can be mapped to certain properties. Their system integrates knowledge that specifies the conditions under which these devices are effective. For example, if the user has specified that tapering lines should show distance from the eye, but has also assigned a small line width, their system would detect when tapering would not be obvious and adjusts the line weight to accommodate the tapering effect.

Most of the systems reviewed here use semantic specific information when possible, but apply general operators when no specific information is available. Often these systems rely on the user to identify different types of objects and specify how each should be treated. For example, the Beach and Stone illustration format system requires that the user differentiate every part of a diagram while TRIP requires that the user specify the visual mapping of every type of attribute of the semantic net. ACE allows the user to input semantic specific information about different components of the user-interface. Both GRIP and Dooley and Cohen’s systems has a fixed set of properties that can be treated specially but requires that the user specify how each should be treated.

AIPS, the View System, APT, and ANDD use a notion of graphical language to map attributes to graphical objects. AIPS activates different templates based solely on the semantics of the information to be displayed. BHARAT, APT, and ANDD consider the type of relation (e.g. ordinal and non-ordinal, linear or non-linear) in their decision-making process. APT judges the application of combinations of graphical languages to show different types of relations while ANDD applies a rule base determining which methods, together, satisfy the four maxims of conversation discussed above.

Only some of the systems have mechanisms for constraint resolution. In the main, these constraints concern feasibility. TRIP resolves positioning constraints and can relax constraints that the user flagged as pliable. In Ani, one of the major constraints is the length of a scene. This determines what solutions are feasible given the stretchability of certain methods. As we described earlier, solutions were rejected because they would not be transformed to conform to the time constraint. It is difficult to determine whether or not Ani could handle a very complex scene that included a large number of characters. The space constraints would require that the scene be broken into several small scenes, which could further lengthen the time required. The View System activates the various synthesis operators based on the current constraints set out by the situation space. First the user’s identity and task description are used to determine on a high level what should be included in the display. Then depending upon the mode (object synthesis or environment layout first), the View System incrementally generates the display. The final criteria that is used to determine how objects are placed and the area they occupy is the available screen space.

Some of the systems have mechanisms for resolving conflicts and backtracking to find alternative solutions. Ani relies on the “strength” of a suggestion to determine if it can be overridden. The View System considers the entire situation space to resolve conflicts, APT judges partial solutions and backtracks when conflicts occur based on the effectiveness and expressiveness criteria. TRIP can lay out diagrams differently when one layout violates constraints. WIP’s graphics generator can select alternative design strategies when constraints are violated.

2.4. Conclusions

Our approach is different in many ways. First, input to the system is communicative intent—it specifies what the user should understand to be what is meant. As we will show, the user’s actions are considered to determine whether or not goals are achieved, and the conditions for the success of a visual effect is represented to model how the illustration is interpreted.

Second, our system uses a visual language that corresponds to the decomposition of the illustration task. This language comprises of primitives (communicative goals, style strategies and illustration procedures) and rules (design and style). The collection of primitives and rules applied to create an illustration remain part of the representation of the illustration. Thus, the illustration is never divorced from the representation, or from the components that generate it. This language is designed to model the full cycle of communication:

Third, our architecture is based on the decomposition of the illustration task. Our system is based on the notion of trial-and-error, using methods to generate and evaluators to test.. This decomposition is reflected by the division of labor of the three main components that comprise the system. We have determined that for every decision made, it is necessary to check to see if it successful by examining all the values that define the illustration. This often involves the painstaking examination of the values stored in the framebuffer and z-buffer as well as the values in the higher level representations of the illustration. Using a system of methods and evaluators, our system is continuously reevaluating and modifying the illustration to satisfy communicative intent. Thus, our system determines when to use visual effects that human illustrators use, such as ghosting, cutaway views, and annotation. It also determines when these effects are effective and thus can select alternative plans when goals are violated. Thus, our system could be modified to generate the user-supplied input to the systems described.

Fourth, even systems with facilities for handling goal conflicts cannot handle overly-constrained problems. Our system can opt to generate composite illustrations when a single illustration cannot achieve communicative intent. A hierarchy of illustrators are used to manage each component of the composite illustration.

Fifth, our system is bi-directional and the illustrations are dynamic. This enables our system to provide different types of interaction with the user and other modules. Mechanisms are used to free the user from the role of passive viewer, to unfreeze the world depicted so that the objects in the world are changing, and finally to dynamically accommodate (and determine) changing goals. For example, we will show how the system interprets the user’s actions to determine when communicative intent is achieved. External components can communicate with the three main components of the system to both request evaluations and access the multi-level representation structure of the illustration. This allows our system to be used in multimedia, multimodal systems with new modes for creating coordinated and cohesive multimedia, multimodal dynamic presentation.

Finally, our system is designed to generate illustrations in real time. We have not yet attained this goal, achieving instead near-real time.

Go to chapter 3

Go back back to chapter 1

Go to title page