Project DescriptionWe live in a vast sea of ever-changing text with few tools available to help us visualize its meaning. The goal of our research is to bridge the gap between graphics and language by developing new theoretical models and supporting technology to create a system that automatically converts descriptive text into rendered 3D scenes representing the meaning of that text. We are building upon the original WordsEye text-to-scene system and adding new information to support the depiction of locations and actions. This project is funded by NSF IIS-0904361.
Try it out online at http://bit.ly/wordseye
- Social Media
Grounded Lexical Semantics for Text-to-Scene GenerationVigNet is a lexical semantic resource that grounds the meaning of lexical items in spatial and graphical primitives. (For details see this workshop paper ). VigNet is based on FrameNet. We call the theory underlying VigNet Vignette Semantics . The three core additions to FrameNet are
- a set of new lexical and sublexical, spatially inspired frames called Vignettes and their lexical units (for lexical Vignettes). Like frames, Vignettes correspond to conceptual structures of situations, events and complex objects but they add spatial knowledge about how such schemas can be typically depicted (For instance we know that certain goods are bought in a supermarket. A Vignette for this structure would include a BUYER standing at the checkout and paying MONEY for the GOODS purchased to a SELLER who operates a cash register).
- a mechanism to decompose frames into atomic primitives implemented as a frame-to-frame relation. The same mechanism allows us to specify complex selectional restrictions that we use for simple inference over Vignette structures and to aid semantic parsing.
- semantic nodes to build full meaning representations for sentences as discourse referents and as generics that allow us to assert world knowledge.