In the world of quantitative finance, software errors can trigger the loss
of millions of dollars, and even brief system failures can be costly. At
the same time, a firm's competitive edge is highly dependent on the speed
and adaptability of its software. In this informal talk, we explore the
delicate balance between reliability, flexibility, and performance in
real-time trading systems. In particular, we discuss some of the key
challenges of designing fault-tolerant architectures and employing error
prevention and detection tactics while achieving the highest possible
performance. We also cover some of the techniques used to leverage the
contributions of hundreds of developers, researchers, and systems
engineers while limiting the potential for problems in the production
environment. The talk aims to provide some real world examples of software
reliability challenges, and how they are solved in practice.