HOW TO Develop a Tool for Valgrind by Michael E. Locasto ----------------------------------------------------------------------------- VERSION The latest version of this document can be found at: http://www1.cs.columbia.edu/~locasto/software/valgrind-tool-develop-HOWTO.txt SUMMARY This document describes how to develop a Valgrind tool. While Valgrind includes documentation on how the core and VEX work and how the output for various tools should be interpreted, no good documentation exists which explains how to create a new Valgrind tool. At the time of this writing, there are three Valgrind tools that are meant to serve as tutorials or examples. This document explains how to take one of those tools (Lackey) as a starting point and adapt it for use. It is a log of my actions, so it cannot be considered a complete tutorial, as I do not have an intimate knowledge of Valgrind's inner workings. In addition, the examples are driven by my project, so make sure you adapt them (variable names and such) to yours. OTHER DOCUMENTATION After I wrote this document, I discovered that the Valgrind developers had written a similar document. It can be found at: http://www.valgrind.org/docs/manual/writing-tools.html STEPS 0. Set up your environment. Make sure you have all the necessary compilers, build tools, etc. to build Valgrind. See the Valgrind website (valgrind.org) for more detail. To prepare for the SVN checkout, I executed the following commands: $ cd $ cd data $ mkdir svn $ cd svn to create a directory where I can dump SVN sources. This location can be wherever you normally check things out to in your file system structure. 1. Check out the latest Valgrind from CVS. Valgrind uses Subversion for its code repository. Instructions for checking out are located here: http://valgrind.org/downloads/repository.html but it basically amounts to executing this command: $ svn co svn://svn.valgrind.org/valgrind/trunk valgrind if you have a fairly normal setup. See the above URL for more information. Once the checkout completes, enter the main Valgrind source code directory: $ cd valgrind 2. Create a new subdirectory to hold your tool. This step is accomplished by copying a current tool's directory structure to a new directory named for your project. My new tool is named 'Lugrind', so I executed this command: $ cp -R lackey/ lugrind $ cd lugrind Note that the use of the '-grind' suffix is deprecated for new tools. I didn't follow this new rule, mostly because I didn't want to spend time thinking of a new name. 3. Make initial changes. First, rename lk_main.c to an appropriate name for your tool. I chose 'lu_main.c' ('lu' for Lugrind): $ mv lk_main.c lu_main.c Then edit 'Makefile.am' to change the references to Lackey to the name of your project (in this case, Lugrind). Change the document name and Makefile in docs/ to match your new project name. Edit docs/lu-manual.xml to include information about your tool. Add whatever header or license style to lu_main.c that you are comfortable with (obviously the GPL still applies here), edit the comments, etc. Change references to 'lk_' to be appropriate for your tool (in this case, 'lu_'). Change the 'lu_pre_clo_init()' function to output information specific to your project. 4. Setup build environment by running autogen.sh Then run the ./configure script. I ran: $ ./configure --prefix==/home/michael/apps/valgrind3 5. Adjust build environment to include your new tool. Edit the configure.in file to put the name of your tool in the AC_OUTPUT list: AC_OUTPUT( Makefile valgrind.spec ... lackey/Makefile lackey/tests/Makefile lackey/docs/Makefile lugrind/Makefile lugrind/tests/Makefile lugrind/docs/Makefile ... none/docs/Makefile ) Edit the generated main Valgrind Makefile.am to include the name of your tool's directory in the 'TOOLS' variable: TOOLS = memcheck \ cachegrind \ massif \ lackey \ lugrind \ none Edit the file Makefile to do the same (although Makefile should be generated from Makefile.am). Edit your tool's Makefile.am to set the value of 'LUGRIND_SOURCES_COMMON' (you should have done this in Step 3) LUGRIND_SOURCES_COMMON = \ lu_main.c \ lu_stack.c Ensure that your tool's Makefile.in and Makefile reflect this value. 6. Compile Valgrind with your new tool included. $ make $ make install 7. Run your tool on a program. To test the newly compiled Valgrind with our new tool, we invoke it, passing the name of the tool, on a simple shell sort program: [michael@xoren code]$ ~/apps/valgrind3/bin/valgrind --tool=lugrind --fnname=shell_sort ./shellsort ==6209== Lugrind, a VG tool for function behavior profiling. ==6209== Copyright (C) 2005-2006, and GNU GPL'd, by Michael E. Locasto. ==6209== Using LibVEX rev 1471, a library for dynamic binary translation. ==6209== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP. ==6209== Using valgrind-3.1.0RC1, a dynamic binary instrumentation framework. ==6209== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al. ==6209== For more details, rerun with: -v ==6209== data[0] = 71 data[1] = 20 data[2] = 50 data[3] = 92 data[4] = 85 data[5] = 78 data[6] = 98 data[7] = 16 data[8] = 24 data[9] = 17 --- data[0] = 16 data[1] = 17 data[2] = 20 data[3] = 24 data[4] = 50 data[5] = 71 data[6] = 78 data[7] = 85 data[8] = 92 data[9] = 98 ==6209== ==6209== Counted 1 calls to shell_sort() ==6209== ==6209== Jccs: ==6209== total: 24,568 ==6209== taken: 9,877 ( 40%) ==6209== ==6209== Executed: ==6209== BBs entered: 26,402 ==6209== BBs completed: 16,525 ==6209== guest instrs: 169,087 ==6209== IRStmts: 910,676 ==6209== ==6209== Ratios: ==6209== guest instrs : BB entered = 64 : 10 ==6209== IRStmts : BB entered = 344 : 10 ==6209== IRStmts : guest instr = 53 : 10 ==6209== ==6209== Exit code: 0 [michael@xoren code]$ The 'data[i]' lines are the output of the shellsort program, which call the shell_sort() routine once, as Lugrind (really Lackey's machinery) reports correctly. You can do this on any available program. Try a simple one like 'true', 'ls', or 'cat somefile'. 6. Hack on your tool. In my case, I wanted a tool that would report a list of all functions called during execution with each function followed by a list of return values for each invocation of the function. This is where the waters get murkey; I get the feeling that you need to know what sort of events the Valgrind core exports so that your tool can register to listen for them, and this sort of information is not well documented, at least that I could see initially. The source has the best information. The two best places to look are at the example tools and the "public" header files, especially: * 'pub_tool_basics.h' * 'pub_tool_tooliface.h' --- The first thing I did was to create a dynamic list of functions and monitor each function call. I looked up the hashtable available in include/pub_tool_hashtable.h and added one to the lu_main.c I created a type to store a function name and how many times it appears. Put intializaiton stuff into pre_clo or post_clo init's Valgrind's interface is quite easy to use after you get familiar with the pub_tool_XXX interfaces that basically replace the C library. The other thing to note is that VG_() prefixing macro that makes the code hard to read at first, but much nicer after a while. The key function to override or pay attention to is the 'IRBB* lu_instrument()' function. This is how a basic block (BB) is instrumented. Things to be interested in: IRBB.jumpkind member st->tag (st is pointer to object of type IRStmt) and retrieved from bb_in->stmts[i] To recognize that an instruction is the first within a particular function, we can use the following VG function in pub_tool_debuginfo.h extern Bool VG_(get_fnname_if_entry) ( Addr a, Char* fnname, Int n_fnname ); The documentation for that function should be improved slightly to explain the args. fnname is just a buffer. n_fnname is the length of the buffer. need to look into mkIRExprVec_XXX series of functions * defined in VEX/pub/libvex_ir.h the hard part here is to get the helper function caller to recognize the arguments we want to send to it. 6.1.X Registering Instrumentation Helper Functions need to look into unsafeIRDirty_O_N() series of functions the 'helper' function is really just for being interested about the instructions themselves and passing them as IRExpr's .... you can call other functions directly. This is misleading in lackey!!! However, there is a discrepancy...the instrumentation in lackey counts work calls correctly but our function in the tool only counts 6...hmmm So it looks like VG passes over the input and parses the instruction stream -- then it inserts the hooks, and finally executes the stream and the instrumentation. This is why a true 'helper' function is called multiple times and our 'direct' helper function is only called during the parsing stages. coregrind/m_debuginfo/symtab.c (check this out for possible additions for get_fnname_if_exit() pub_tool_debug_info.h is the header for this pub_core_threadstate.h pub_tool_threadstate.h pub_tool_machine.h ThreadId, then ThreadState struct type, then ->arch.vex, which is of type VexGuestArchState (conditional compilation to VexGuestX86State) 6.2.X Example Invocation on bigo.c ~/apps/valgrind3/bin/valgrind --tool=lugrind --fnname=work ./bigo 10 2> tmp.out cat tmp.out | grep LUG_OUT | grep called | gawk '{print $6 " " $4}' | sort -nr 7. Package and ship your tool for use with other installations of Valgrind. TBD 8. Announce tool on valgrind-developers and valgrind-users. Solicit feedback. TBD FAQ 0. Why don't you have any FAQ's? Because my time is taken up by development. I'll do an FAQ list when I get asked questions and when I have time to answer them. 1. What does... CONTRIBUTORS Michael Locasto , original author