Code Specialization Literature Survey

Michael E. Locasto

Introduction

This area sketch/literature survey represents what I've learned recently about code specialization and optimization as well as providing a reading list for further research into problem specifics. I've downloaded a few of the papers {pebble.ps,coprocessor.ps,valuespec.ps,paige.ps} and plan to read them more fully.

Overview

In general, there are "three approaches to providing performance guarantees in algorithm design: we can randomize, amortize, or optimize."

"A randomized algorithm introduces random decision making into the algorithm itself, to reduce dramatically the chance of a worst-case scenario."

"An amortization approach is to do extra work at one time to avoid more work later, to be able to provide guaranteed upper bounds on the average per-operation cost."

"An optimization approach is to take the trouble to provide performance guarantees for every operation. Various methods...require that we maintain some strutural information."

[Algorithms in C++, Robert Sedgewick]

Several strategies and methods for manually optimizing source code are:

algebraic and logical expression manipulation
recursion elimination
reduce method call overhead
reduce implicit string concatenation (esp Java)
incur cost during startup if there is a stable lifecycle (pool creation, configuration reading)

[Data Structures in Java, Thomas A. Standish]

Related Terms

invariants

code specialization

compiler optimization

code obsfuscation

memoization

promotion & accumulation

maintaining and strengthening loop invariants

principled strength reduction

partial evaluation

finite differencing

data structure selection

JIT compilation

lazy evaluation

transformational programming

Invariant: A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant. [Dictionary.com]

Invariant: An invariable quantity; specifically, a function of the coefficients of one or more forms, which remains unaltered, when these undergo suitable linear transformations. --J. J. Sylvester. [Dictionary.com]

Transformational programming is the process of refactoring code to improve the performance of algorithms by applying a knowledge of general programming principles, the underlying architecture, or improvements in the algorithm. [Data Structures in Java, Thomas A. Standish]

Literature Links to Current Work

Code Specialization

Source	Link
	Summary
Citeseer	http://citeseer.nj.nec.com/ferreira98multiple.html
	" Program specialization is normally supported by global analysis of the program. Compilers use the information deduced to generate more efficient, specialized implementations of the program. This specialization can be single or multiple, depending if each procedure of the program is specialized into one or more versions. We present a Prolog compiler that does multiple specialization, using an algorithm that works over the WAM code, deducing the reachable procedure activations based on local... "
Citeseer	http://citeseer.nj.nec.com/72770.html
	"Specializing programs with respect to run-time values is an optimization strategy that has been shown to drastically improve code performance on realistic programs ranging from operating systems to graphics. Recently, various approaches to specializing code at run-time have been proposed. However, these approaches still suffer from shortcomings that limit their applicability: they are either manual, require programs to be written in a dedicated language, or are too expensive to be widely... " François Noël and Luke Hornof and et al., Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study
Citeseer	http://www.cs.arizona.edu/people/saw/papers/valuespec.ps
	Code Specialization Based on Value Profiles (valuespec.ps)
Citeseer	http://citeseer.nj.nec.com/41254.html
	"Conventional operating system code is written to deal with all possible system states, and performs considerable interpretation to determine the current system state before taking action. A consequence of this approach is that kernel calls which perform little actual work take a long time to execute. To address this problem, we use specialized operating system code that reduces interpretation for common cases, but still behaves correctly in the fully general case. We describe how specialized... " misc{ cowan-calton, author = "Crispin Cowan", title = "Calton Pu, Tito Autrey, Andrew Black, Charles Consel", url = "" }
ACM DL	Resource-bounded partial evaluation
	Saumya Debray ACM SIGPLAN Notices , Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation December 1997 Volume 32 Issue 12
Citeseer	http://citeseer.nj.nec.com/503504.html
	This paper presents a novel technique, called input space adaptive software synthesis, for the energy and performance optimization of embedded software. The proposed technique is based on the fact that the computational complexities of programs or subprograms are often highly dependent on the values assumed by input and intermediate program variables during execution. This observation is exploited in the proposed software synthesis technique by augmenting the program with optimized versions of...
Citeseer	http://citeseer.nj.nec.com/gupta02profile.html
	INTRODUCTION Over the past several decades numerous compile-time optimizations have been developed to speed up the execution of programs. Application of a typical optimization can be viewed as consisting of two primary tasks: uncovering optimization opportunities through static analysis of the program; and transforming the program to exploit the uncovered opportunities. Most of the early work on classical code optimizations is based upon a very simple performance model. Optimizations are de...

Compiler Optimization

Source	Link
	Summary
Web search	Link time code optimization: http://www.cs.arizona.edu/alto/

Web search	http://www.windriver.com/products/html/optimization_wp_diab07312002.html
	WindRiver: Advanced Compiler Optimization Techniques white paper
Citeseer, ACM DL	http://citeseer.nj.nec.com/calder97value.html
	"Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-invariant behavior. A semiinvariant variable is one that cannot be identified as a constant at compile-time, but has a high degree of invariant behavior at run-time. If run-time information was available to identify these variables as..." B. Calder, P. Feller, and A. Eustace, "Value Profiling", Proc. MICRO-30, Dec. 1997.
Citeseer	http://citeseer.nj.nec.com/budimlic98static.html
	"Interprocedural optimizations are important in Java because the object-oriented programming style encourages the use of many small methods. Unfortunately, such optimizations are difficult because of the nature of language structure and its security restrictions. A particular problem is the difficulty of knowing the entire program at any time prior to execution. This paper presents new approaches to cloning and inlining that can be profitably used even in a single class. These optimizations ..." Z. Budimlic and K. Kennedy. Static Interprocedural Optimizations in Java. Rice University technical report CRPC-TR98746, 1998.
Citeseer	http://citeseer.nj.nec.com/mezini96dynamic.html
	"The definition of a class library is an iterative process involving both the designer who provides basic functionality, and the users who subsequently specialize it. In order to ensure the coherence of possible modifications, suitable means for coordinating the activity of both are required. In this paper, we propose a metaclass based approach for bridging the gap between library designers and specializers, by enabling the designer to express specialization constraints which are automatically..." Mezini M. Dynamic metaclass construction for an explicit specialization interface. In Proceedings of the Reflection '96 Conference, pp. 203--219, 1996.
ACM DL	Compiler-directed dynamic computation reuse: rationale and initial results
	Daniel A. Connors , Wen-mei W. Hwu Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture November 1999 Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data within high-level computations. In this approach, the compiler performs analysis to id ...
ACM DL	Partial evaluation applied to numerical computation
	Andrew Berlin Proceedings of the 1990 ACM conference on LISP and functional programming May 1990 There have been many demonstrations that the expressive power of Lisp can greatly simplify the process of writing numerical programs, but at the cost of reduced performance.[10][16] I show that by coupling Lisp's abstract, expressive style of programming with a compiler that uses partial evaluation, data abstractions can be eliminated at compile time, producing extremely high-performance code. For an important class of numerical programs, partial evaluation achieves order-of-magnitude speed ...
ACM DL	Profile-directed optimization of event-based programs
	Mohan Rajagopalan , Saumya K. Debray , Matti A. Hiltunen , Richard D. Schlichting ACM SIGPLAN Notices , Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation June 2002 Volume 37 Issue 5 Events are used as a fundamental abstraction in programs ranging from graphical user interfaces (GUIs) to systems for building customized network protocols. While providing a flexible structuring and execution paradigm, events have the potentially serious drawback of extra execution overhead due to the indirection between modules that raise events and those that handle them. This paper describes an approach to addressing this issue using static optimization techniques. This approach, which explo ...
ACM DL	Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching
	Youfeng Wu ACM SIGPLAN Notices , Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation June 2002 Volume 37 Issue 5 Irregular data references are difficult to prefetch, as the future memory address of a load instruction is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that some important load instructions in irregular programs contain stride access patterns. Although the load instructions with stride patterns are difficult to identify with static compiler techniques, we developed an efficient profiling method to discover these load instructions. The new profiling ...

Transformational Programming

Source	Link
	Summary
Citeseer citation	R. S. Bird. Addendum: The promotion and accumulation strategies in transformational programming. ACM Trans. Program. Lang. Syst., 7(3):490{ 492, July 1985.
	Apparently a well-referenced paper
Citeseer	http://citeseer.nj.nec.com/liu97principled.html
	Y. A. Liu. Principled strength reduction. In R. Bird and L. Meertens, editors, Algorithmic Languages and Calculi, pages 357--381. Chapman & Hall, London, U.K., 1997.
Citeseer	www.cs.purdue.edu/homes/palsberg/sdpl96/statements/paige.ps
	Robert Paige's paper is a short three page review of three techniques in program transformations: partial evaluation (replacing invariants), finite differencing, and data structure selection.
Citeseer	http://www.cs.cornell.edu/Info/People/tt/research/incremental-computation/derivation.html
	" In general, incremental computation aims to solve a problem on a sequence of inputs that differ only slightly from one another, making use of the previously computed output in computing a new output, instead of computing the new output from scratch. Incremental computation is a fundamental issue relevant throughout computer software, e.g., optimizing compilers, transformational program development, and interactive systems."
Citeseer	http://citeseer.nj.nec.com/liu99dynamic.html
	" Dynamic programming is an important algorithm design technique. It is used for problems whose solutions involve recursively solving subproblems that share subsubproblems. While a straightforward recursive program solves common subsubproblems repeatedly, a dynamic programming algorithm solves every subsubproblem just once, saves the result, and reuses it when the subsubproblem is encountered again. This can reduce the time complexity from exponential to polynomial." Y. A. Liu and S. D. Stoller. Dynamic programming via static incrementalization. In Proceedings of the 8th European Symposium on Programming, Lecture Notes in Computer Science. Springer-Verlag, Berlin, Mar. 1999.

Network, Distributed Systems & Crypto Applications

Source	Link
Source	Summary
Web search	http://www.darpa.mil/ipto/psum1999/J097-0.html
Web search	Providing "exokernals" to improve server performance, and "remove barriers to specialization." They also claim that dynamic generation of code is essential to their system.
Web search	http://research.microsoft.com/sn/ http://research.microsoft.com/~jvh/MMLite-sigops.pdf http://research.microsoft.com/sn/Herald/
Web search	Microsoft Research: (new framework design, distributed peer networks)
Citeseer	http://citeseer.nj.nec.com/cowan98adaptation.html
Citeseer	"Some failures cannot be masked by redundancies, because an unanticipated situation occurred, because fault-tolerance measures were not adequate, or because there was a security breach (which is not amenable to replication) . Applications that wish to continue to offer some service despite nonmaskable failure must adapt to the loss of resources. When numerous combinations of non-maskable failure modes are considered, the set of possible adaptations becomes complex. This paper presents..." Cowan, C., L. Delcambre, A. Le Meur, L. Liu, D. Maier, D. McNamee, M. Miller, C. Pu, P. Wagle, and J. Walpole. "Adaptation Space: Surviving Non-Maskable Failures," Technical Report 98-013, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology, May 1998.
ACM DL, Citeseer, and Web search	http://www.prolangs.rutgers.edu/f01papers/chung-islped01.pdf
ACM DL, Citeseer, and Web search	Automatic Source Code Specialization for Energy Reduction. I thought it would be useful in terms of saving power in embedded devices.
Citeseer, Web search	http://www.bell-labs.com/~avi/publication-dir/usenix99-workshop.ps
Citeseer, Web search	Pebble: Component based OS for embedded systems
Citeseer	http://citeseer.nj.nec.com/bettini02coordinating.html
Citeseer	"Standard class-based inheritance mechanisms, which are often used to implement distributed systems, do not seem to scale well to a distributed context with mobility. In this paper, a mixin-based approach is proposed for structuring mobile object-oriented code and it is shown to t in the dynamic and open nature of a mobile code scenario.." bettini02coordinating, author = "Lorenzo Bettini and Viviana Bono and Betti Venneri", title = "Coordinating Mobile Object-Oriented Code", booktitle = "Coordination Models and Languages", pages = "56-71", year = "2002"
ACM DL	Generating efficient protocol code from an abstract specification IEEE/ACM Transactions on Networking (TON) Volume 5 , Issue 4 (August 1997)
ACM DL	Claude Castelluccia , Walid Dabbous , Sean O'Malley IEEE/ACM Transactions on Networking (TON) August 1997 Volume 5 Issue 4
ACM DL	Processors and accelerators for embedded applications: Unlocking the design secrets of a 2.29 Gb/s Rijndael processor
ACM DL	Patrick R. Schaumont , Henry Kuo , Ingrid M. Verbauwhede Proceedings of the 39th conference on Design automation June 2002 This contribution describes the design and performance testing of an Advanced Encryption Standard (AES) compliant encryption chip that delivers 2.29 GB/s of encryption throughput at 56 mw of power consumption. We discuss how the high level reference specification in C is translated into a parallel architecture. Design decisions are motivated from a system level viewpoint. The prototyping setup is discussed.

JIT

Source	Link
	Summary
Web search	http://www.csrd.uiuc.edu/ics99/tut1p.html

Web search	http://www.graco.c.u-tokyo.ac.jp/~masuhara/BCS/
	Bytecode specialization
Sun-Java website	http://java.sun.com/docs/jit_interface.html http://wwws.sun.com/software/solaris/jit/
	The Just In Time bytecode to machine language compiler
ACM DL	A dynamic optimization framework for a Java just-in-time compiler
	Toshio Suganuma , Toshiaki Yasue , Motohiro Kawahito , Hideaki Komatsu , Toshio Nakatani ACM SIGPLAN Notices , Proceedings of the OOPSLA '01 conference on Object Oriented Programming Systems Languages and Applications October 2001 Volume 36 Issue 11 The high performance implementation of Java Virtual Machines (JVM) and just-in-time (JIT) compilers is directed toward adaptive compilation optimizations on the basis of online runtime profile information. This paper describes the design and implementation of a dynamic optimization framework in a production-level Java JIT compiler. Our approach is to employ a mixed mode interpreter and a three level optimizing compiler, supporting quick, full, and special optimization, each of which has a differ ...

Miscellaneous

Source	Link
	Summary
Web search	http://perl.plover.com/MiniMemoize/memoize.html
	Short summary of the memoization technique, as applied to Perl.
ACM DL	Programming Languages for Distributed Computing Systems
	ACM Computing Surveys (CSUR) Volume 21 , Issue 3 (September 1989) Henri E. Bal Vrije Univ., Amsterdam, The Netherlands Jennifer G. Steiner Open Software Foundation, Cambridge, MA Andrew S. Tanenbaum Vrije Univ., Amsterdam, The Netherlands Publisher ACM Press New York, NY, USA Pages: 261 - 322 Periodical-Issue-Article Year of Publication: 1989 ISSN:0360-0300 This paper appears to be a widely read and referenced one, and it includes a citation and reading list of about 100 other publications.

Citeseer	http://www.cs.wisc.edu/~zilles/papers/profiler.hpca.ps
	A programmable co-processor for profiling
ACM Crossroads	http://www.acm.org/crossroads/xrds2-3/dependency.html
	Short Summary Article on Design and Dependency Diagrams
ACM Crossroads	http://www.acm.org/crossroads/xrds4-3/codeob.html
	Discusses various techniques in code obsfuscation that are related to code specialization, but applied to creating, rather than reducing code.

Other Links

http://www.cs.ucsd.edu/users/calder/abstracts/UCSD-CS1998-581.html

"Code specialization is a type of compiler optimization, which selectively executes a different version of the code, conditioned on the value of a variable. Given an invariant variable and its value, the original code is duplicated. There will be one general version of the code, and a special version of the code. The specialized version of the code will be conditioned on the invariant variable. A selection mechanism based on the invariant variable will choose which code to execute."

http://www.prolangs.rutgers.edu/f01papers/chung-islped01.pdf

Conclusions and Further Work

My search found other interesting papers that I don't include here because they are a bit off-topic. They might eventually be useful to read, and I'll compile a list of my own on them.

I will read the papers mentioned in the introduction, as well as weeding through the papers in this list for those relevant to the specifc application of code specialization and generatation for a network of embedded systems, in hopes of finding related ideas in the "TC: Open and Survivable Embedded Systems" paper. In addition, I have comments and questions regarding the aforementioned paper. These will be detailed in a separate document.