The @ref srs_qk "preemptive, non-blocking QK kernel" is specifically designed to execute non-blocking active objects. QK runs active objects in the same way as prioritized interrupt controller (such as NVIC in ARM Cortex-M) runs interrupts using the **single stack** (MSP on Cortex-M). This section explains how the @ref srs_qk "preemptive non-blocking QK kernel" works on ARM Cortex-M.
@remarks In a QK port, the only components requiring platform-specific porting are QF and QV itself. The other two components: QEP and QS require merely recompilation and will not be discussed here. With the QV port you're not using the QV or QXK kernels. The QK port to ARM Cortex-M is located in the folder /ports/arm-cm/qk/. @section arm-cm_qk-synopsis Synopsis of the QK Port on ARM Cortex-M The ARM Cortex-M architecture is designed primarily for the traditional real-time kernels that use multiple per-thread stacks. Therefore, implementation of the non-blocking, single-stack kernel like QK is a bit more involved on Cortex-M than other CPUs and works as follows: - The ARM Cortex-M processor executes the QK application code (active objects) in the Privileged Thread mode, which is exactly the mode entered out of reset. The exceptions (including all interrupts) are always processed in the Privileged Handler mode. - QK uses only the Main Stack Pointer (QK is a single stack kernel). The Process Stack Pointer is not used and is not initialized. - ARM Cortex-M enters interrupt context without disabling interrupts (without setting the PRIMASK bit or the BASEPRI register). Generally, you should not disable interrupts inside your ISRs. In particular, the QP services QF_PUBLISH(), QF_TICK_X(), and QACTIVE_POST() should be called with interrupts enabled, to avoid nesting of critical sections. (NOTE: If you don’t wish an interrupt to be preempted by another interrupt, you can always prioritize that interrupt in the NVIC to a higher level -- use a lower numerical value of priority). - The QK port uses the PendSV exception (number 14) to perform asynchronous preemption (see Chapter 10 in @webref{psicc2, PSiCC2}). The startup code must initialize the Interrupt Vector Table with the addresses of @c PendSV_Handler() exception handler. @note QK uses only the CMSIS-compliant exception and interrupt names, such as `PendSV_Handler` - The QK port uses the NMI exception (number 2) or any unused IRQ interrupt to *return* to the preempted thread (see Chapter 10 in @webref{psicc2, PSiCC2}). The startup code must initialize the Interrupt Vector Table with the addresses of `NMI_Handler()` and `Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 EXC_RETURN (pushed in step [7] if FPU is present) old SP --> "aligner" (pushed in step [7] if FPU is present) xPSR == 0x01000000 PC == QK_activate_ lr == Thread_ret r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory`[21]` The special exception-return value 0xFFFFFFF9 is synthesized in r0 (two instructions are used to make the code compatible with Cortex-M0, which has no barrel shifter). @note the r0 register is used instead of lr because the Cortex-M0 instruction set cannot manipulate the higher-registers (r9-r15). @note The exception-return value is consistent with the synthesized stack-frame with the lr[4] bit set to 1, which means that the FPU registers are not included in this stack frame. `[23]` PendSV exception returns using the special value of the r0 register of 0xFFFFFFF9 (return to Privileged Thread mode using the Main Stack pointer). The synthesized stack frame causes actually a function call to QK_sched_ function in C. @note The return from the PendSV exception just executed switches the ARM Cortex-M core to the Privileged Thread mode. The QK_sched_ function internally re-enables interrupts before launching any thread, so the threads always run in the Thread mode with interrupts enabled and can be preempted by interrupts of any priority. @note In the presence of the FPU, the exception-return to the QK activator does not change any of the FPU status bit, such as CONTROL.FPCA or LSPACT. `[24]` The @c Thread_ret function is the place, where the QK activator `QK_activate_()` returns to, because this return address is pushed to the stack in step [16]. Please note that the address of the @c Thread_ret label must be a THUMB address. `[25]` If the FPU is present, the read-modify-write code clears the CONTROL[2] bit [2]. This bit, called CONTROL.FPCA (Floating Point Active), would cause generating the FPU-type stack frame, which you want to avoid in this case (because the NMI exception will certainly not use the FPU). @note Clearing the CONTROL.FPCA bit occurs with interrupts disabled, so it is protected from a context switch. `[28-32]` The asynchronous NMI exception is triggered by setting ICSR[31]. The job of this exception is to put the CPU into the exception mode and correctly return to the thread level. `[33]` This endless loop should not be reached, because the NMI exception should preempt the code immediately after step `[31]` @subsection arm-cm_qk_port-asm_nmi NMI_Handler() Implementation @caption{Listing: NMI_Handler() function in qk_port.c file} @code{.c} __attribute__ ((naked)) [1] void NMI_Handler(void) { __asm volatile ( [2] " ADD sp,sp,#(8*4) \n" /* remove one 8-register exception frame */ #if (__ARM_ARCH == 6) /* Cortex-M0/M0+/M1 (v6-M, v6S-M)? */ [3] " CPSIE i \n" /* enable interrupts (clear PRIMASK) */ [4] " BX lr \n" /* return to the preempted task */ #else /* M3/M4/M7 */ [5] " MOV r0,#0 \n" [6] " MSR BASEPRI,r0 \n" /* enable interrupts (clear BASEPRI) */ #if (__ARM_FP != 0) /* if VFP available... */ [7] " POP {r0,pc} \n" /* pop stack aligner and EXC_RETURN to PC */ #else /* no VFP */ [8] " BX lr \n" /* return to the preempted task */ #endif /* no VFP */ #endif /* M3/M4/M7 */ ); } @endcode `[1]` The @c NMI_Handler is the CMSIS-compliant name of the NMI exception handler. This exception is triggered after returning from the QK activator in step [31] of the previous listing. The job of NMI is to discard its own stack frame and cause the exception-return to the original preempted thread context. The stack contents just after entering NMI is shown below:
Hi memory (optionally S0-S15, FPSCR), if EXC_RETURN[4]==0 xPSR pc (interrupt return address) lr r12 r3 r2 r1 r0 old SP --> EXC_RETURN (pushed in PendSV [7] if FPU is present) "aligner" (pushed in PendSV [7] if FPU is present) xPSR don't care PC don't care lr don't care r12 don't care r3 don't care r2 don't care r1 don't care SP --> r0 don't care Low memory`[2]` The stack pointer is adjusted to un-stack the 8 registers of the interrupt stack frame corresponding to the NMI exception itself. This moves the stack pointer from the "old SP" to "SP" in the picture above, which "uncovers" the original exception stack frame left by the PendSV exception. `[3]` For ARMv6-M, interrupts are enabled by clearing the PRIMASK. `[4]` For ARMv6-M, The NMI exception returns to the preempted thread using the standard EXC_RETURN, which is in lr. `[5-6]` For the ARMv7-M, interrupts are enabled by writing 0 into the BASEPRI register. `[7]` If the FPU is used, the EXC_RETURN and the "stack aligner" saved in PendSV step [7] are popped from the stack into r0 and pc, respectively. Updating the pc causes the return from PendSV. `[8]` Otherwise, NMI returns to the preempted thread using the standard EXC_RETURN, which is in lr. @anchor qk_stack-detail @image html qk_stack-detail.png "Detailed stack allocation in QK for ARM Cortex-M" @image latex qk_stack-detail.png "Detailed stack allocation in QK for ARM Cortex-M" @section arm-cm_qk-isr Writing ISRs for QK The ARM Cortex-M CPU is designed to use regular C functions as exception and interrupt service routines (ISRs). @note The ARM EABI (Embedded Application Binary Interface) requires the stack be 8-byte aligned, whereas some compilers guarantee only 4-byte alignment. For that reason, some compilers (e.g., GNU-ARM) provide a way to designate ISR functions as interrupts. For example, the GNU-ARM compiler provides the `__attribute__((__interrupt__))` designation that will guarantee the 8-byte stack alignment. Typically, ISRs are application-specific (with the main purpose to produce events for active objects). Therefore, ISRs are not part of the generic QP port, but rather part of the BSP (Board Support Package). The following listing shows an example of the `SysTick_Handler()` ISR (from the DPP example application). This ISR calls the `QF_TICK_X()` macro to perform QF time-event management. @anchor arm-cm_qk-isr-code @caption{Listing: An ISR header for QK} @code{.c} void SysTick_Handler(void) __attribute__((__interrupt__)); void SysTick_Handler(void) { ~ ~ ~ [1] QK_ISR_ENTRY(); /* inform QK about entering an ISR */ ~ ~ ~ QF_TICK_X(0U, &l_SysTick_Handler); /* process all armed time events */ ~ ~ ~ [2] QK_ISR_EXIT(); /* inform QK about exiting an ISR */ } @endcode `[1]` Every ISR for QK must call `QK_ISR_ENTRY()` before calling any QP API `[2]` Every ISR for QK must call `QK_ISR_EXIT()` right before exiting to let the QK kernel schedule an asynchronous preemption, if necessary. @note The QK port to ARM Cortex-M complies with the requirement of the ARM-EABI to preserve stack pointer alignment at **8-byte boundary**. Also, all QP examples for ARM Cortex-M comply with the CMSIS naming convention for all exception handlers and IRQ handlers. @section arm-cm_qk-fpu Using the FPU in the QK Port (ARMv7M or higher architectures) If you have the Cortex-M4F CPU and your application uses the hardware FPU, it should be enabled because it is turned off out of reset. The CMSIS-compliant way of turning the FPU on looks as follows: @verbatim SCB->CPACR |= (0xFU << 20); @endverbatim @note The FPU must be enabled before executing any floating point instruction. An attempt to execute a floating point instruction will fault if the FPU is not enabled. Depending on wheter or not you use the FPU in your ISRs, the "Vanilla" QP port allows you to configure the FPU in various ways, as described in the following sub-sections. @subsection arm-cm_qk-fpu_1thread FPU used in ONE thread only and not in any ISR If you use the FPU only at a single thread (active object) and none of your ISRs use the FPU, you can setup the FPU not to use the automatic state preservation and not to use the lazy stacking feature as follows: @verbatim FPU->FPCCR &= ~((1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos)); @endverbatim With this setting, the Cortex-M4F processor handles the ISRs in the exact-same way as Cortex-M0-M3, that is, only the standard interrupt frame with R0-R3,R12,LR,PC,xPSR is used. This scheme is the fastest and incurs no additional CPU cycles to save and restore the FPU registers. @note This FPU setting will lead to FPU errors, if more than one thread or any of the ISRs indeed start to use the FPU @subsection arm-cm_qk-fpu_nthreadd FPU used in more than one thread only or the ISR If you use the FPU in more than one of the threads (active objects) or in any of your ISRs, you should setup the FPU to use the automatic state preservation and the lazy stacking feature as follows: @verbatim FPU->FPCCR |= (1U << FPU_FPCCR_ASPEN_Pos) | (1U << FPU_FPCCR_LSPEN_Pos); @endverbatim This is actually the default setting of the hardware FPU and is recommended for the QK port, because it is safer in view of code evolution. Future changes to the application can easily introduce FPU use in multiple active objects, which would be unsafe if the FPU context was not preserved automatically. @note As described in the ARM Application Note "Cortex-M4(F) Lazy Stacking and Context Switching" [@ref ARM-AN298], the FPU automatic state saving requires more stack plus additional CPU time to save the FPU registers, but only when the FPU is actually used. @section arm-cm_qk-idle QK Idle Processing Customization in QK_onIdle() QK can very easily detect the situation when no events are available, in which case QK calls the `QK_onIdle()` callback. You can use `QK_onIdle()` to suspended the CPU to save power, if your CPU supports such a power-saving mode. Please note that `QK_onIdle()` is called repetitively from an endless loop, which is the QK idle-thread. The `QK_onIdle()` callback is called with interrupts **enabled** (which is in contrast to the @ref arm-cm_qv-idle "QV_onIdle() callback" used in the non-preemptive configuration). The THUMB-2 instruction set used exclusively in ARM Cortex-M provides a special instruction WFI (Wait-for-Interrupt) for stopping the CPU clock, as described in the "ARMv7-M Reference Manual" [ARM 06a]. The following listing shows the `QK_onIdle()` callback that puts ARM Cortex-M into a low-power mode. @anchor arm-cm_qk_onidle-code @caption{Listing: QV_onIdle() for ARM Cortex-M} @code{c} [1] void QK_onIdle(void) { ~ ~ ~ [2] #if defined NDEBUG /* Put the CPU and peripherals to the low-power mode. * you might need to customize the clock management for your application, * see the datasheet for your particular Cortex-M3 MCU. */ [3] __WFI(); /* Wait-For-Interrupt */ #endif } @endcode `[1]` The preemptive QK kernel calls the `QK_onIdle()` callback with interrupts enabled. `[2]` The sleep mode is used only in the non-debug configuration, because sleep mode stops CPU clock, which can interfere with debugging. `[3]` The `WFI` instruction is generated using inline assembly. @section arm-cm_qk-testing Testing QK Preemption Scenarios The `bsp.c` file included in the examples/arm-cm/dpp_ek-tm4c123gxl/qk directory contains special instrumentation (an ISR designed for testing) for convenient testing of @ref arm-cm_qk-arm-cm-fig "various preemption scenarios in QK". The technique described in this section will allow you to trigger an interrupt at any machine instruction and observe the preemption it causes. The interrupt used for the testing purposes is the GPIOA interrupt (INTID == 0). The ISR for this interrupt is shown below: @code{c} void GPIOPortA_IRQHandler(void) { QK_ISR_ENTRY(); /* inform QK about entering an ISR */ QACTIVE_POST(AO_Table, Q_NEW(QEvt, MAX_PUB_SIG), /* for testing... */ &l_GPIOPortA_IRQHandler); QK_ISR_EXIT(); /* inform QK about exiting an ISR */ } @endcode `GPIOPortA_IRQHandler()`, as all interrupts in the system, invokes the macros QK_ISR_ENTRY() and QK_ISR_EXIT(), and also posts an event to the Table active object, which has higher priority than any of the Philo active object. The figure below hows how to trigger the GPIOA interrupt from the CCS debugger. From the debugger you need to first open the register window and select NVIC registers from the drop-down list (see right-bottom corner of Figure 6).You scroll to the NVIC_SW_TRIG register, which denotes the Software Trigger Interrupt Register in the NVIC. This write-only register is useful for software-triggering various interrupts by writing various masks to it. To trigger the GPIOA interrupt you need to write 0x00 to the NVIC_SW_TRIG by clicking on this field, entering the value, and pressing the Enter key. @image html arm-cm_qk_test-ccs.png "Triggering the GPIOA interrupt from Eclipse debugger" The general testing strategy is to break into the application at an interesting place for preemption, set breakpoints to verify which path through the code is taken, and trigger the GPIO interrupt. Next, you need to free-run the code (don’t use single stepping) so that the NVIC can perform prioritization. You observe the order in which the breakpoints are hit. This procedure will become clearer after a few examples. @subsection arm-cm_qk-test-isr Interrupt Nesting Test The first interesting test is verifying the correct tail-chaining to the PendSV exception after the interrupt nesting occurs, as shown in @ref arm-cm_qk-synch-fig "Synchronous Preemption in QK". To test this scenario, you place a breakpoint inside the `GPIOPortA_IRQHandler()` and also inside the `SysTick_Handler()` ISR. When the breakpoint is hit, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window) and also another breakpoint on the first instruction of the `QK_PendSV` handler. Next you trigger the PIOINT0 interrupt per the instructions given in the previous section. You hit the Run button. The pass criteria of this test are as follows: 1. The first breakpoint hit is the one inside the `GPIOPortA_IRQHandler()` function, which means that GPIO ISR preempted the SysTick ISR. 2. The second breakpoint hit is the one in the `SysTick_Handler()`, which means that the SysTick ISR continues after the PIOINT0 ISR completes. 3. The last breakpoint hit is the one in `PendSV_Handler()` exception handler, which means that the PendSV exception is tail-chained only after all interrupts are processed. You need to remove all breakpoints before proceeding to the next test. @subsection arm-cm_qk-test-thread Thread Preemption Test The next interesting test is verifying that threads can preempt each other. You set a breakpoint anywhere in the Philosopher state machine code. You run the application until the breakpoint is hit. After this happens, you remove the original breakpoint and place another breakpoint at the very next machine instruction (use the Disassembly window). You also place a breakpoint inside the `GPIOPortA_IRQHandler()` interrupt handler and on the first instruction of the `PendSV_Handler()` handler. Next you trigger the GPIOA interrupt per the instructions given in the previous section. You hit the Run button. The pass criteria of this test are as follows: 1. The first breakpoint hit is the one inside the `GPIOPortA_IRQHandler()` function, which means that GPIO ISR preempted the Philo thread. 2. The second breakpoint hit is the one in `PendSV_Handler()` exception handler, which means that the PendSV exception is activated before the control returns to the preempted Philosopher thread. 3. After hitting the breakpoint in `PendSV_Handler()`, you single step into `QK_activate_()`. You verify that the activator invokes a state handler from the Table state machine. This proves that the Table thread preempts the Philo thread. 4. After this you free-run the application and verify that the next breakpoint hit is the one inside the Philosopher state machine. This validates that the preempted thread continues executing only after the preempting thread (the Table state machine) completes. @subsection arm-cm_qk-test-fpu Testing the FPU In order to test the FPU (ARMv7M or higher architectures), the Board Support Package (BSP) for the Cortex-M4F EK-TM4C123GXL board uses the FPU in the following contexts: - In the idle loop via the `QK_onIdle()` callback (QP priority 0) - In the thread level via the `BSP_random()` function called from all five Philo active objects (QP priorities 1-5). - In the thread level via the `BSP_displayPhiloStat()` function called from the Table active object (QP priorty 6) - In the ISR level via the `SysTick_Handler()` ISR (priority above all threads) To test the FPU, you could step through the code in the debugger and verify that the expected FPU-type exception stack frame is used and that the FPU registers are saved and restored by the "lazy stacking feature" when the FPU is actually used. Next, you can selectively comment out the FPU code at various levels of priority and verify that the QK context switching works as expected with both types of exception stak frames (with and without the FPU). @subsection arm-cm_qk-test-other Other Tests Other interesting tests that you can perform include changing priority of the GPIOA interrupt to be lower than the priority of SysTick to verify that the PendSV is still activated only after all interrupts complete. In yet another test you could post an event to Philosopher active object rather than Table active object from the `GPIOPortA_IRQHandler()` function to verify that the QK activator will not preempt the Philosopher thread by itself. Rather the next event will be queued and the Philosopher thread will process the queued event only after completing the current event processing. @next{arm-cm_qxk} */ /*##########################################################################*/ /*! @page arm-cm_qxk Preemptive "Dual-Mode" QXK Kernel This section describes how to use QP on ARM Cortex-M with the @ref srs_qxk "preemptive, dual-mode QXK real-time kernel", which combines the lightweight non-blocking @ref srs_qxk_basic "basic threads" of QK with traditional blocking @ref srs_qxk_extended "extended threads" found in conventional RTOS kernels. QXK provides all typical services of a conventional blocking RTOS, such as blocking time-delays, semaphores, mutextes, and message queues. QXK has been designed specifically for mixing event-driven active objects with traditional blocking code, such as commercial middleware (TCP/IP stacks, UDP stacks, embedded file systems, etc.) or legacy software. @note If you are currently using QP on top of a conventional 3rd-party RTOS, consider moving your application to the QXK kernel. QXK is not only more efficient than running QP on top of a @ref ports_rtos "traditional 3rd-party RTOS" (because non-blocking @ref srs_qxk_basic "basic threads" take far less stack space and CPU cycles for context switch than the much heavier @ref srs_qxk_extended "extended threads"). But the biggest advantage of QXK is that it __protects__ the application-level code from inadvertent mixing of blocking calls inside the event-driven active objects. Specifically, QXK "knows" the type of the thread context (extended/basic) and asserts internally if a blocking call (e.g., semaphore-wait or a time-delay) is attempted in a basic thread (active object). This is something that a QP port to a @ref ports_rtos "conventional 3rd-party RTOS" cannot do, because such an RTOS runs all code (including active objects) in the context of heavyweight extended threads. @section arm-cm_qxk-synopsis Synopsis of the QXK Port on ARM Cortex-M The preemptive, blocking QXK kernel works on ARM Cortex-M as follows: - The ARM Cortex-M processor executes application code in the Privileged Thread mode, which is exactly the mode entered out of reset. The exceptions (including all interrupts) are always processed in the Privileged Handler mode. - QXK uses the Main Stack Pointer (MSP) for @ref srs_qxk_basic "basic threads", interrupts and exceptions (such as the PendSV exception). The MSP is also used for the QXK idle thread (which is a non-blocking basic thread). - QXK uses the Process Stack Pointer (PSP) for handling @ref srs_qxk_extended "extended threads". Each extended thread must provide a private stack space to be associated with the PSP. - The QXK port uses the @c PendSV (exception number 14) and the NMI or the IRQ exception (number 2) to perform context switch. The application code (your code) must initialize the Interrupt Vector Table with the addresses of the `PendSV_Handler` and `NMI_Handler` exception handlers. @note QXK uses only the CMSIS-compliant exception and interrupt names, such as `PendSV_Handler`, `NMI_Handler`, etc.@n @note The QXK port specifically does **not** use the SVC exception (Supervisor Call). This makes the QXK ports compatible with various "hypervisors" (such as mbed uVisor or Nordic SoftDevice), which use the SVC exception. - You need to explicitly **assign priorities of the all interrupts** used in your application, as described in @ref arm-cm_int. @note For ARMv7M or higher architectures (M3/M4/M7/M33...), the QXK initialization code (executed from the QF initialization) initializes all interrupt priorities to the safe value maskable with the BASEPRI register. However, this is just a safety precaution not to leave the interrupts kernel-unaware, which they are out of reset. It is highly recommended to set the priorities of all interrupts explicitly in the application-level code. - It is strongly recommended that you do not assign the lowest NVIC priority (0xFF) to any interrupt in your application, because it is used by the PendSV handler. For example, with 3 bits of priority implemented in the NVIC, this leaves the following 7 priority levels for you (listed from the lowest to the highest urgency): 0xC0, 0xA0, 0x80, 0x60, 0x40, 0x20, and 0x00 (the highest priority). @note The prioritization of interrupts, including the PendSV exception, is performed entirely by the NVIC. Because the PendSV has the lowest priority in the system, the NVIC tail-chains to the PendSV exception only after exiting the last nested interrupt. - ISRs are written as regular C functions, but they need to call QXK_ISR_ENTRY() before using any QF services, and they must call QXK_ISR_EXIT() after using any of the QF services. - ARM Cortex-M enters interrupt context without disabling interrupts. Generally, you should not disable interrupts inside your ISRs. In particular, the QF services (such as QF_PUBLISH(), QF_TICK_X(), and QACTIVE_POST()) should be called with interrupts enabled, to avoid nesting of critical sections. @note If you don't wish an interrupt to be preempted by another interrupt, you can always prioritize that interrupt in the NVIC to a higher or equal level as other interrupts (use a lower numerical value of priority). - In compliance with the ARM Application Procedure Call Standard (AAPCS), the QXK kernel always preserves the 8-byte alignment of the stack (both MSP and PSP). @subsection arm-com_qxk_vfp Using the VFP If you have the ARMv7M or higher architectures (ARMv7M or higher architectures) and your application is compiled with the VFP present, the QXK kernel will enable the VFP along with the VFP automatic state preservation and lazy stacking features. This will cause the NVIC to automatically use the VFP-exception stack frame (with additional 18 VFP registers S0-S15 plus VFP status and stack "aligner"). The QXK context switch will add to this the rest of the VFP registers (S16-S31) on context switches to and from extended threads. @note With VFP enabled, any QXK thread (both a basic and an extended thread) will use 136 more bytes of its stack space, regardless if VFP is actually used by this thread. However, due to the "lazy-stacking" hardware feature, only a thread that actually uses the VFP will save and restore the VFP registers on the stack (which will cost some additional CPU cycles to perform a context switch). */