Synchronization issue on multi-processors

    技术2022-05-19  18

     

    先转发一篇最近刚写的关于多 CPU 上的同步问题。

    -    Application designed for single processor may lead to crash on multi-processors

    Applications that originally designed for single processor may lead to crash when running on multi-processors. This is typically due to the Out-of-Order execution on multi-processors.

     

    You can find a brief introduction on Wikipedia: “In computer engineering, out-of-order execution (OoOE or OOE ) is a paradigm used in most high-performance microprocessors to make use of instruction cycles that would otherwise be wasted by a certain type of costly delay. In this paradigm, a processor executes instructions in an order governed by the availability of input data, rather than by their original order in a program. In doing so, the processor can avoid being idle while data is retrieved for the next instruction in a program, processing instead the next instructions which is able to run immediately. ” (1). For more information about Out-of-Order execution, please see the reference link.

    First of all, let's have a look at the In-Order execution:

    1. Fetch instruction

    2. Wait for Operands

    3. Dispatch to FU

    4. FU does its work

    5. FU writes the value back to register file

    In In-Order execution, the processor executes instruction in the original order of a program. There is no optimization and the processor stalls in step 2.

    The Out-of-Order execution introduces a new concept to avoid processor stall due to missing operands. It does the execution in an order of data availability instead of the original program order.

    1. Instruction fetch

    2. Instruction dispatch to an instruction queue

    3. The instruction waits in the queue until its input operands are available

    4. If the operands of an instruction in the queue are available, the instruction is issued to the functional unit before earlier, older instructions

    5. The results of the instruction are queued in a data queue

    6. Only after all older instruction have their results written back to the register file, then this result is written back to the register file from data queue. This is called the graduation or retires stage.

     

    The Out-of-Order execution is an optimization in processor architect. It makes use of the processor idle time to fetch next instruction. But the result is sti ll written back to register file in program order, so it seems "as if" the program still runns in its original order.

     

    Consider the following code:

    mov edx, dword ptr[p]

    add eax, ecx

    Because there is no data dependency between the two instructions, the code may be re-ordered as following:

    add eax, ecx

    mov edx, dword ptr[p]

     

    Note: ALU result is written back to register file still after the load instruction.

    Problem on multi-processor platform

    An application may lead to unexpected result due to following reason:

    1. two threads can run in two processors simultaneously independent of their priorities

    2. memory access instructions are re-ordered to increase performance

    3. an atomic instruction becomes non-atomic on multi-processors due to Out-of-Order execution

     

    If two threads access the same memory location simultaneously, the application may encounter problem. For an instance, thread#1 on processor#1 increments the memory value pointed by p, and thread#2 on processor#2 decrements the memory by one, at the exact same time, then the result written back to memory is unpredictable.

     

    Solution

    Using memory barrier (2) or interlocked variable access (3) can solve the problem. The methods above provide synchronized access to the memory.

     

    A typical usage in ATL/COM is that your class is derived from CComObjectRootEx<CComSingleThreadModel>. The CComSingleThreadModel does increment/decrement on the COM object’s reference counter simply with “++/--“. If your class is designed not only for usage in one thread, the OoO execution may let your application crash. See the following case:

    The fix for this crash can be done by changing your class to multi-thread modeling, i.e., using template trailer CComMultiThreadModel instead of CComSingleThreadModel, the first one uses InterLockedIncrement/InterLockedDecrement functions on reference counter.

     

    From ATL 9.0 for Windows CE, the thread Id validation is built-in for COM object with single thread model in debug version. If you get a debug assertion in CComObjectRootEx <CComSingleThreadModel >::InternalAddRef or CComObjectRootEx <CComSingleThreadModel >:: InternalRelease , you might want to check your class’ thread modeling. Unfortunately, Microsoft does the validation only for CE platform, but it is helpful to copy the code to your platform’s ATL.

    (1). http://en.wikipedia.org/wiki/Out-of-order_execution

    (2). http://msdn.microsoft.com/en-us/library/ms684208(v=vs.85).aspx

    (3). http://msdn.microsoft.com/en-us/library/ms684122(v=vs.85).aspx

     


    最新回复(0)