This area sketch/literature survey represents what I've learned recently about code specialization and optimization as well as providing a reading list for further research into problem specifics. I've downloaded a few of the papers {pebble.ps,coprocessor.ps,valuespec.ps,paige.ps} and plan to read them more fully.
In general, there are "three approaches to providing performance guarantees in algorithm design: we can randomize, amortize, or optimize."
"A randomized algorithm introduces random decision making into the algorithm itself, to reduce dramatically the chance of a worst-case scenario."
"An amortization approach is to do extra work at one time to avoid more work later, to be able to provide guaranteed upper bounds on the average per-operation cost."
"An optimization approach is to take the trouble to provide performance guarantees for every operation. Various methods...require that we maintain some strutural information."
[Algorithms in C++, Robert Sedgewick]
Several strategies and methods for manually optimizing source code are:
invariants |
code specialization |
compiler optimization |
code obsfuscation |
memoization |
promotion & accumulation |
maintaining and strengthening loop invariants |
principled strength reduction |
partial evaluation |
finite differencing |
data structure selection |
JIT compilation |
lazy evaluation |
transformational programming |
Invariant: A rule, such as the ordering of an ordered list or heap, that applies throughout the life of a data structure or procedure. Each change to the data structure must maintain the correctness of the invariant. [Dictionary.com]
Invariant: An invariable quantity; specifically, a function of the coefficients of one or more forms, which remains unaltered, when these undergo suitable linear transformations. --J. J. Sylvester. [Dictionary.com]
Transformational programming is the process of refactoring code to improve the performance of algorithms by applying a knowledge of general programming principles, the underlying architecture, or improvements in the algorithm. [Data Structures in Java, Thomas A. Standish]
Source | Link |
Summary | |
Citeseer | http://citeseer.nj.nec.com/ferreira98multiple.html |
" Program specialization is normally supported by global analysis of the program. Compilers use the information deduced to generate more efficient, specialized implementations of the program. This specialization can be single or multiple, depending if each procedure of the program is specialized into one or more versions. We present a Prolog compiler that does multiple specialization, using an algorithm that works over the WAM code, deducing the reachable procedure activations based on local... " | |
Citeseer | http://citeseer.nj.nec.com/72770.html |
"Specializing programs with respect to run-time values
is an optimization strategy that has been shown to drastically improve code
performance on realistic programs ranging from operating systems to graphics.
Recently, various approaches to specializing code at run-time have been
proposed. However, these approaches still suffer from shortcomings that
limit their applicability: they are either manual, require programs to be
written in a dedicated language, or are too expensive to be widely... "
François Noël and Luke Hornof and et al., Automatic, Template-Based Run-Time Specialization: Implementation and Experimental Study |
|
Citeseer | http://www.cs.arizona.edu/people/saw/papers/valuespec.ps |
Code Specialization Based on Value Profiles (valuespec.ps) | |
Citeseer | http://citeseer.nj.nec.com/41254.html |
"Conventional operating system code is written to deal with all possible system states, and performs considerable interpretation to determine the current system state before taking action. A consequence of this approach is that kernel calls which perform little actual work take a long time to execute. To address this problem, we use specialized operating system code that reduces interpretation for common cases, but still behaves correctly in the fully general case. We describe how specialized... " misc{ cowan-calton, author = "Crispin Cowan", title = "Calton Pu, Tito Autrey, Andrew Black, Charles Consel", url = "" } | |
ACM DL | Resource-bounded partial evaluation |
Saumya Debray ACM SIGPLAN Notices , Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation December 1997 Volume 32 Issue 12 |
|
Citeseer | http://citeseer.nj.nec.com/503504.html |
This paper presents a novel technique, called input space adaptive software synthesis, for the energy and performance optimization of embedded software. The proposed technique is based on the fact that the computational complexities of programs or subprograms are often highly dependent on the values assumed by input and intermediate program variables during execution. This observation is exploited in the proposed software synthesis technique by augmenting the program with optimized versions of... | |
Citeseer | http://citeseer.nj.nec.com/gupta02profile.html |
INTRODUCTION Over the past several decades numerous compile-time optimizations have been developed to speed up the execution of programs. Application of a typical optimization can be viewed as consisting of two primary tasks: uncovering optimization opportunities through static analysis of the program; and transforming the program to exploit the uncovered opportunities. Most of the early work on classical code optimizations is based upon a very simple performance model. Optimizations are de... | |
Source | Link |
Summary | |
Web search |
Link time code optimization: http://www.cs.arizona.edu/alto/ |
Web search |
http://www.windriver.com/products/html/optimization_wp_diab07312002.html |
WindRiver: Advanced Compiler Optimization Techniques white paper | |
Citeseer, ACM DL | http://citeseer.nj.nec.com/calder97value.html |
"Identifying variables as invariant or constant at compile-time allows the compiler to perform optimizations including constant folding, code specialization, and partial evaluation. Some variables, which cannot be labeled as constants, may exhibit semi-invariant behavior. A semiinvariant variable is one that cannot be identified as a constant at compile-time, but has a high degree of invariant behavior at run-time. If run-time information was available to identify these variables as..." B. Calder, P. Feller, and A. Eustace, "Value Profiling", Proc. MICRO-30, Dec. 1997. | |
Citeseer | http://citeseer.nj.nec.com/budimlic98static.html |
"Interprocedural optimizations are important in Java because the object-oriented programming style encourages the use of many small methods. Unfortunately, such optimizations are difficult because of the nature of language structure and its security restrictions. A particular problem is the difficulty of knowing the entire program at any time prior to execution. This paper presents new approaches to cloning and inlining that can be profitably used even in a single class. These optimizations ..." Z. Budimlic and K. Kennedy. Static Interprocedural Optimizations in Java. Rice University technical report CRPC-TR98746, 1998. | |
Citeseer | http://citeseer.nj.nec.com/mezini96dynamic.html |
"The definition of a class library is an iterative process involving both the designer who provides basic functionality, and the users who subsequently specialize it. In order to ensure the coherence of possible modifications, suitable means for coordinating the activity of both are required. In this paper, we propose a metaclass based approach for bridging the gap between library designers and specializers, by enabling the designer to express specialization constraints which are automatically..." Mezini M. Dynamic metaclass construction for an explicit specialization interface. In Proceedings of the Reflection '96 Conference, pp. 203--219, 1996. | |
ACM DL | Compiler-directed dynamic computation reuse: rationale and initial results |
Daniel A. Connors , Wen-mei W. Hwu Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture November 1999 Recent studies on value locality reveal that many instructions are frequently executed with a small variety of inputs. This paper proposes an approach that integrates architecture and compiler techniques to exploit value locality for large regions of code. The approach strives to eliminate redundant processor execution created by both instruction-level input repetition and recurrence of input data within high-level computations. In this approach, the compiler performs analysis to id ... | |
ACM DL | Partial evaluation applied to numerical computation |
Andrew Berlin Proceedings of the 1990 ACM conference on LISP and functional programming May 1990 There have been many demonstrations that the expressive power of Lisp can greatly simplify the process of writing numerical programs, but at the cost of reduced performance.[10][16] I show that by coupling Lisp's abstract, expressive style of programming with a compiler that uses partial evaluation, data abstractions can be eliminated at compile time, producing extremely high-performance code. For an important class of numerical programs, partial evaluation achieves order-of-magnitude speed ... | |
ACM DL | Profile-directed optimization of event-based programs |
Mohan Rajagopalan , Saumya K. Debray , Matti A. Hiltunen , Richard D. Schlichting ACM SIGPLAN Notices , Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation June 2002 Volume 37 Issue 5 Events are used as a fundamental abstraction in programs ranging from graphical user interfaces (GUIs) to systems for building customized network protocols. While providing a flexible structuring and execution paradigm, events have the potentially serious drawback of extra execution overhead due to the indirection between modules that raise events and those that handle them. This paper describes an approach to addressing this issue using static optimization techniques. This approach, which explo ... | |
ACM DL | Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching |
Youfeng Wu ACM SIGPLAN Notices , Proceeding of the ACM SIGPLAN 2002 Conference on Programming language design and implementation June 2002 Volume 37 Issue 5 Irregular data references are difficult to prefetch, as the future memory address of a load instruction is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that some important load instructions in irregular programs contain stride access patterns. Although the load instructions with stride patterns are difficult to identify with static compiler techniques, we developed an efficient profiling method to discover these load instructions. The new profiling ... |
Source | Link |
Summary | |
Citeseer citation | R. S. Bird. Addendum: The promotion and accumulation strategies in transformational programming. ACM Trans. Program. Lang. Syst., 7(3):490{ 492, July 1985. |
Apparently a well-referenced paper | |
Citeseer | http://citeseer.nj.nec.com/liu97principled.html |
Y. A. Liu. Principled strength reduction. In R. Bird and L. Meertens, editors, Algorithmic Languages and Calculi, pages 357--381. Chapman & Hall, London, U.K., 1997. | |
Citeseer | |
Robert Paige's paper is a short three page review of three techniques in program transformations: partial evaluation (replacing invariants), finite differencing, and data structure selection. |
|
Citeseer | http://www.cs.cornell.edu/Info/People/tt/research/incremental-computation/derivation.html |
" In general, incremental computation aims to solve a problem on a sequence of inputs that differ only slightly from one another, making use of the previously computed output in computing a new output, instead of computing the new output from scratch. Incremental computation is a fundamental issue relevant throughout computer software, e.g., optimizing compilers, transformational program development, and interactive systems." |
|
Citeseer | http://citeseer.nj.nec.com/liu99dynamic.html |
" Dynamic programming is an important algorithm design technique.
It is used for problems whose solutions involve recursively solving subproblems
that share subsubproblems. While a straightforward recursive program solves
common subsubproblems repeatedly, a dynamic programming algorithm solves
every subsubproblem just once, saves the result, and reuses it when the
subsubproblem is encountered again. This can reduce the time complexity
from exponential to polynomial." |
|
Source | Link |
Summary | |
Web search | http://www.darpa.mil/ipto/psum1999/J097-0.html |
Providing "exokernals" to improve server performance, and "remove barriers to specialization." They also claim that dynamic generation of code is essential to their system. | |
Web search | http://research.microsoft.com/sn/ http://research.microsoft.com/~jvh/MMLite-sigops.pdf http://research.microsoft.com/sn/Herald/ |
Microsoft Research: (new framework design, distributed peer networks) | |
Citeseer | http://citeseer.nj.nec.com/cowan98adaptation.html |
"Some failures cannot be masked by redundancies, because
an unanticipated situation occurred, because fault-tolerance measures were
not adequate, or because there was a security breach (which is not amenable
to replication) . Applications that wish to continue to offer some service
despite nonmaskable failure must adapt to the loss of resources. When numerous
combinations of non-maskable failure modes are considered, the set of possible
adaptations becomes complex. This paper presents..." Cowan, C., L. Delcambre, A. Le Meur, L. Liu, D. Maier, D. McNamee, M. Miller, C. Pu, P. Wagle, and J. Walpole. "Adaptation Space: Surviving Non-Maskable Failures," Technical Report 98-013, Department of Computer Science and Engineering, Oregon Graduate Institute of Science and Technology, May 1998. |
|
ACM DL, Citeseer, and Web search | http://www.prolangs.rutgers.edu/f01papers/chung-islped01.pdf |
Automatic Source Code Specialization for Energy Reduction. I thought it would be useful in terms of saving power in embedded devices. | |
Citeseer, Web search | http://www.bell-labs.com/~avi/publication-dir/usenix99-workshop.ps |
Pebble: Component based OS for embedded systems | |
Citeseer | http://citeseer.nj.nec.com/bettini02coordinating.html |
"Standard class-based inheritance mechanisms, which are often used to implement distributed systems, do not seem to scale well to a distributed context with mobility. In this paper, a mixin-based approach is proposed for structuring mobile object-oriented code and it is shown to t in the dynamic and open nature of a mobile code scenario.." bettini02coordinating, author = "Lorenzo Bettini and Viviana Bono and Betti Venneri", title = "Coordinating Mobile Object-Oriented Code", booktitle = "Coordination Models and Languages", pages = "56-71", year = "2002" | |
ACM DL | Generating efficient protocol code from an abstract specification IEEE/ACM Transactions on Networking (TON) Volume 5 , Issue 4 (August 1997) |
Claude Castelluccia , Walid Dabbous , Sean O'Malley IEEE/ACM Transactions on Networking (TON) August 1997 Volume 5 Issue 4 | |
ACM DL | Processors and accelerators for embedded applications: Unlocking the design secrets of a 2.29 Gb/s Rijndael processor |
Patrick R. Schaumont , Henry Kuo , Ingrid M. Verbauwhede Proceedings of the 39th conference on Design automation June 2002 This contribution describes the design and performance testing of an Advanced Encryption Standard (AES) compliant encryption chip that delivers 2.29 GB/s of encryption throughput at 56 mw of power consumption. We discuss how the high level reference specification in C is translated into a parallel architecture. Design decisions are motivated from a system level viewpoint. The prototyping setup is discussed. | |
Source | Link |
Summary | |
Web search | http://www.csrd.uiuc.edu/ics99/tut1p.html |
Web search | http://www.graco.c.u-tokyo.ac.jp/~masuhara/BCS/ |
Bytecode specialization | |
Sun-Java website |
http://java.sun.com/docs/jit_interface.html http://wwws.sun.com/software/solaris/jit/ |
The Just In Time bytecode to machine language compiler | |
ACM DL | A dynamic optimization framework for a Java just-in-time compiler |
Toshio Suganuma , Toshiaki Yasue , Motohiro Kawahito , Hideaki Komatsu , Toshio Nakatani ACM SIGPLAN Notices , Proceedings of the OOPSLA '01 conference on Object Oriented Programming Systems Languages and Applications October 2001 Volume 36 Issue 11 The high performance implementation of Java Virtual Machines (JVM) and just-in-time (JIT) compilers is directed toward adaptive compilation optimizations on the basis of online runtime profile information. This paper describes the design and implementation of a dynamic optimization framework in a production-level Java JIT compiler. Our approach is to employ a mixed mode interpreter and a three level optimizing compiler, supporting quick, full, and special optimization, each of which has a differ ... |
Source | Link |
Summary | |
Web search | http://perl.plover.com/MiniMemoize/memoize.html |
Short summary of the memoization technique, as applied to Perl. | |
ACM DL | Programming Languages for Distributed Computing Systems |
ACM Computing Surveys (CSUR) Volume 21 , Issue 3 (September 1989) Henri E. Bal Vrije Univ., Amsterdam, The Netherlands |
|
Citeseer | http://www.cs.wisc.edu/~zilles/papers/profiler.hpca.ps |
A programmable co-processor for profiling | |
ACM Crossroads | http://www.acm.org/crossroads/xrds2-3/dependency.html |
Short Summary Article on Design and Dependency Diagrams | |
ACM Crossroads | http://www.acm.org/crossroads/xrds4-3/codeob.html |
Discusses various techniques in code obsfuscation that are related to code specialization, but applied to creating, rather than reducing code. |
http://www.cs.ucsd.edu/users/calder/abstracts/UCSD-CS1998-581.html
"Code specialization is a type of compiler optimization, which selectively executes a different version of the code, conditioned on the value of a variable. Given an invariant variable and its value, the original code is duplicated. There will be one general version of the code, and a special version of the code. The specialized version of the code will be conditioned on the invariant variable. A selection mechanism based on the invariant variable will choose which code to execute."
http://www.prolangs.rutgers.edu/f01papers/chung-islped01.pdf
My search found other interesting papers that I don't include here because they are a bit off-topic. They might eventually be useful to read, and I'll compile a list of my own on them.
I will read the papers mentioned in the introduction, as well as weeding through the papers in this list for those relevant to the specifc application of code specialization and generatation for a network of embedded systems, in hopes of finding related ideas in the "TC: Open and Survivable Embedded Systems" paper. In addition, I have comments and questions regarding the aforementioned paper. These will be detailed in a separate document.