There is a note (p. 3-273), indicating that with Pentium Pro and beyond, that LOCK is not always necessary. Instead, just need to insure that an individual processor's cache is locked, using a cache coherency mechanism.
"There are paired instructions, Load Linked and Store Conditional, that can be used to perform atomic read-modify-write of word and doubleword cached memory locations. These instructions are used in carefully coded sequences to provide one of several synchronization primitives, including test-and-set, bit-level locks, semaphores, and sequencers/event counts."E.g., LL: load linked word; SC: store conditional word; LD load linked doubleword; SCD store conditional doubleword