qpc/doxygen/modules.dox

/*! @defgroup qep QEP

@brief
Hierarchical Event Processor

QEP is a universal, UML-compliant event processor that enables developers to code UML state machines in highly readable ANSI-C, in which every state machine element is mapped to code precisely, unambiguously, and exactly once (traceability). QEP fully supports hierarchical state nesting, which is the fundamental mechanism for reusing behavior across many states instead of repeating the same actions and transitions over and over again.
*/

/*##########################################################################*/
/*! @defgroup qf QF

@brief
Active Object (Actor) Framework

QF is a portable, event-driven, real-time framework for execution of active objects (concurrent state machines) specifically designed for real-time embedded (RTE) systems.
*/

/*##########################################################################*/
/*! @defgroup qs QS

@brief
Software Tracing Instrumentation

QS is software tracing system that enables developers to monitor live QP applications with minimal target system resources and without stopping or significantly slowing down the code. QS is an ideal tool for testing, troubleshooting, and optimizing QP applications. QS can even be used to support acceptance testing in product manufacturing. Please see <a href="https://www.state-machine.com/qtools/qs.html"><b>QS Manual</b></a> inside the <a href="https://www.state-machine.com/qtools" target="_blank" class="extern">QTools collection</a> for more information.
*/

/*##########################################################################*/
/*! @defgroup qv QV

@brief
Cooperative Kernel

@description
QV is a simple **cooperative** kernel (previously called "Vanilla" kernel). This kernel executes active objects one at a time, with priority-based scheduling performed before processing of each event. Due to naturally short duration of event processing in state machines, the simple QV kernel is often adequate for many real-time systems.

@note
The QV scheduler is described in Section 6.3.7 of the book ["Practical UML Statecharts in C/C++, 2nd Ed" (PSiCC2)](https://www.state-machine.com/psicc2/).


@section qv_overview QV Overview
The QV scheduler is engaged after every RTC step of any active object to choose the next active object to execute. The QV scheduler always chooses the highest-priority active object that has any events in its event queue. The QV scheduler then extracts the next event from this queue and dispatches it to the state machine associated with the active object. The state machine runs to completion, after which the QV scheduler runs and the cycle repeats.

![QV scheduler operation](qv.gif)

Please note that because the state machines always return to the QV scheduler after each RTC step, a single stack can be used to process all state machines (memory-friendly architecture).

The QV scheduler can also very easily detect when all event queues are empty, at which point it can call the idle callback to let the application put the CPU and peripherals to a **low-power sleep mode** (power-friendly architecture).

Given the simplicity, portability, and low-resource consumption, the QV scheduler is very attractive. It allows you to partition the problem into active objects and execute these active objects orderly. The thread-level response of this scheduler is the longest RTC step in the whole system, but because event-driven active objects don’t block, the RTC steps tend to be very short (typically just a few microseconds). Also, often you can break up longer RTC steps into shorter pieces, by posting an event to self and returning (“Reminder” state pattern). The self-posted event then triggers the continuation of longer processing.

@remarks
Sometimes it is not practical to break up long RTC steps, and consequently the thread-level response of the simple @ref qv "QV kernel" might be too slow. In this cases you need to use a *preemptive* kernel. The big advantage of preemptive kernel is that it effectively decouples high-priority thread from low-priority threads in the time domain. The timeliness of execution of high-priority thread is almost independent on the low-priority threads. But of course there is no such thing as a free lunch. Preemptive kernels open the whole new class of problems related to race conditions. So you need to be very careful about sharing any resources.


@next{qk}
*/

/*##########################################################################*/
/*! @defgroup qk QK

@brief
Preemptive Run-To-Completion (Non-Blocking) Kernel

<p>The preemptive, non-blocking QK kernel is specifically designed to execute non-blocking active objects (see also [[PSiCC2](https://www.state-machine.com/psicc2/), Chapter 10]). QK runs active objects in the same way as prioritized interrupt controller (such as NVIC in ARM Cortex-M) runs interrupts using **single stack**. Active objects process their events in run-to-completion (RTC) fashion and remove themselves from the call stack, the same way as nested interrupts remove themselves from the stack upon completion. At the same time high-priority active objects can **preempt** lower-priority active objects, just like interrupts can preempt each other under a prioritized interrupt controller. <span class="highlight">QK meets all the requirement of the Rate Monotonic Scheduling (a.k.a. Rate Monotonic Analysis RMA) and can be used in hard real-time systems</span>.
</p>

@note
The non-blocking, run-to-completion, preemptive threads are known in the literature as "basic threads" (OSEK/AUTOSAR terminology), sometimes also called "fibers" (e.g., Q-Kernel) or "software interrupts" (e.g., TI-RTOS).


@section qk_overview QK Overview
The preemptive, run-to-completion (RTC) QK kernel breaks entirely with the endless-loop structure of the thread routines and instead uses threads structured as one-shot, discrete, run-to-completion functions, very much like ISRs [[PSiCC2](https://www.state-machine.com/psicc2/), Chapter 10]. In fact, the QK kernel views interrupts very much like threads of a “super-high” priority, except that interrupts are prioritized in hardware by the interrupt controller, whereas threads are prioritized in software by the RTC kernel.

As a fully preemptive multithreading kernel, QK must ensure that at all times the CPU executes the highest-priority thread (active object) that is ready to run. Fortunately, only two scenarios can lead to readying a higher-priority thread:

<ul class="tag">
  <li><span class="tag">1</span>
When a lower-priority thread posts an event to a higher-priority thread, the kernel must immediately suspend the execution of the lower-priority thread and start the higher-priority thread. This type of preemption is called <b>synchronous preemption</b> because it happens synchronously with posting an event to the thread's event queue.
> NOTE: The stack usage shown in the bottom panel displays stack growing down (towards lower addresses), as it is the case in ARM Cortex-M.
  </li>
</ul>

@anchor qk-synch-fig
@image html qk_synch.gif "Synchronous Preemption in QK"

<ul class="tag">
  <li><span class="tag">2</span>
When an interrupt posts an event to a higher-priority thread than the interrupted thread, upon completion of the ISR the kernel must start execution of the higher-priority thread instead of resuming the lower-priority thread. This type of preemption is called <b>asynchronous preemption</b> because it can happen asynchronously, any time interrupts are not explicitly disabled.
> NOTE: The stack usage during asynchronous preemption on ARM Cortex-M is slightly simplified in the diagram below. A more detailed stack usage diagram is discussed later in the section explaining the @ref arm-cm_qk_port-asm_pendsv "Detailed stack allocation in QK for ARM Cortex-M".
  </li>
</ul>

@anchor qk-asynch-fig
@image html qk_asynch.gif "Asynchronous Preemption in QK"


@note
A traditional RTOS kernel does not distinguish between the synchronous and asynchronous preemptions and makes all preemptions look like the more stack-intensive asynchronous preemptions. In contrast, a RTC kernel can implement synchronous preemption as a simple function call (to QK_activate_()), which is much more efficient than a full context-switch.


@next{qxk}
*/

/*##########################################################################*/
/*! @defgroup qxk QXK

@brief
Preemptive Dual-Mode (Run-to-Completion/Blocking) RTOS Kernel

QXK is a small, preemptive, priority-based, dual-mode **blocking** kernel that executes active objects like the @ref qk "QK kernel" (@ref qxk_basic "basic threads"), but can also execute traditional __blocking__ threads (@ref qxk_extended "extended threads"). In this respect, QXK behaves exactly like a __conventional RTOS__ (Real-Time Operating System).

QXK has been designed specifically for mixing event-driven active objects with traditional blocking code, such as commercial middleware (TCP/IP stacks, UDP stacks, embedded file systems, etc.) or legacy software. To this end, QXK is not only more efficient than running QP on top of a @ref ports_rtos "traditional 3rd-party RTOS" (because non-blocking @ref qxk_basic "basic threads" take far less stack space and CPU cycles for context switch than the much heavier @ref qxk_extended "extended threads"). But the biggest advantage of QXK is that it __protects__ the application-level code from inadvertent mixing of blocking calls inside the event-driven active objects. Specifically, QXK "knows" the type of the thread context (extended/basic) and asserts internally if a blocking call (e.g., semaphore-wait or a time-delay) is attempted in a basic thread (active object). This is something that a QP port to a @ref ports_rtos "conventional 3rd-party RTOS" cannot do, because such an RTOS runs all code (including active objects) in the context of havyweight extended threads.

Currently, the QXK kernel has been ported to the following CPUs:

- @ref arm-cm_qxk "Cortex-M (M0/M0+/M1/M3/M4/M4F/M7)"@n
Supported toolchains include: ARM-KEIL MDK, IAR-ARM, GNU-ARM, TI-ARM.

Currently, the QXK kernel is illustrated by the following examples:

- @ref arm-cm_dpp_efm32-slstk3401a (ARM Cortex-M4F)@n
Example illustrates: 6 active objects plus two extended threads, QXK blocking delay, QXK semaphore, QXK mutex, QXK blocking message queue.

- @ref arm-cm_dpp_ek-tm4c123gxl (ARM Cortex-M4F)@n
Example illustrates: 6 active objects plus two extended threads, QXK blocking delay, QXK semaphore, QXK mutex, QXK blocking message queue.

- @ref arm-cm_dpp_nucleo-l053r8 (ARM Cortex-M0+)@n
Example illustrates: 6 active objects plus two extended threads, QXK blocking delay, QXK semaphore, QXK mutex, QXK blocking message queue.


@section qxk_basic Basic Threads
QXK supports **basic**-threads (non-blocking, run-to-completion activations). The basic-threads all nest on the same stack (Main Stack Pointer in ARM Cortex-M), so the stack usage is reduced. The additional advantage of basic-threads is that switching from basic-thread to another basic-thread requires only @ref QXK_activate_() "activation" of the basic-thread, which is much simpler and faster than full context-switch required for @ref qxk_extended "extended"-threads that QXK also supports (see below).

@remarks
QXK adopts the "basic/exteded thread" terms from the <a class="extern" target="_blank"  href="https://www.autosar.org">OSEK/AUTOSAR Operating System</a> specification. Other real-time kernels might use different terminology for similar concepts. For example, the <a class="extern" target="_blank" href="http://www.quasarsoft.com/">Q-Kernel</a> uses the term "fibers", while <a class="extern" target="_blank"  href="http://www.ti.com/tool/TI-RTOS">TI-RTOS</a> uses the term "software interrupts" for concepts closely related to "basic threads".


@section qxk_extended Extended Threads
QXK supports **extended**-threads (blocking, typically structrued as endless loops). The extended-threads use private per-thread stacks, as in conventional RTOS kernels. Any switching from basic-to-extended thread or extended-to-extended thread requires full context switch.

@remarks
QXK is a unique dual-mode kernel on the market that supports interleaving the priorities of basic threads and extended threads. Other dual-mode kernels typically limit the priorities of basic threads to be always higher (more urgent) than any of the extended threads.


@sa ::QXThread


@section qxk_classes Classes in QXK
The figure below shows the main classes introduced in the QXK kernel and their relation to the classes of the QP framework.

@image html qxk_classes.gif "Classes of the QXK dual-mode kernel"

<ul class="tag">
  <li><span class="tag">0</span> The abstract ::QActive class represents active objects in QP. This class contains the @c thread object of the underlying kernel (QXK thread-control-block in this case) as well as the event queue and the unique priority of the active object.
  </li>

  <li><span class="tag">1</span> The ::QXThread class represents the extended (blocking) threads of the QXK kernel. It inherits ::QActive, so that extended-threads can be treated as active objects internally in the framework. However, the extended-threads __do not implement state machines__. Instead, the data fields used for storing the current state in active objects are re-used to store the private stack of the extended-thread. The ::QXThread class also contains the @c timeEvt object (see ::QTimeEvt) for generating timeouts when the extended-thread is blocked.
  </li>

  <li><span class="tag">2</span> The ::QXMutex class represents the **priority-ceiling mutex** of the QXK kernel. The mutex can block and can be used only in the extended-threads (in case they share resources that need to be protected). The mutex is recursive, meaning that it can be locked multiple times from the same extended thread (but it needs to be unlocked equal number of times).
  </li>

  <li><span class="tag">3</span> The ::QXSemaphore class represents the **counting semaphore** of the QXK kernel. The semaphore can be waited on only in the extended-threads and the QXK kernel would assert if an active object thread would attempt to wait on a semaphore. On the other hand, a semaphore can be signaled from anywhere in the code, including active objects and ISRs.
  </li>
</ul>

@note
The main takeaway from the QXK class diagram is QXK's **optimal, tight integration** with the QP framework. The QXK kernel reuses all mechanisms already provided in QP, thus avoiding any code duplication, inefficient layers of indirection, and additional licensing costs, which are inevitable when using @ref ports_rtos "3rd-party RTOS kernels" to run QP applications.


@section qxk_features QXK Feature Summary
As you can see in the list below, <span class="highlight">QXK provides most features you might expect of a traditional blocking **RTOS** kernel and is <strong>recommended</strong> as the preferred RTOS kernel for QP applications</span> that need to mix active objects with traditional blocking code.

<ul class="tag">

<li><span class="bullet">&gt;</span>Preemptive, priority-based scheduling of up to 64 threads. Each thread must be assigned its own unique priority (1 .. #QF_MAX_ACTIVE);
</li>

> NOTE: QXK always executes the highest-priority thread that is ready to run (is not blocked). The scheduling algorithm used in QXK meets all the requirement of the Rate Monotonic Scheduling (a.k.a. Rate Monotonic Analysis — RMA) and can be used in hard real-time systems.

<li><span class="bullet">&gt;</span>QXK distinguishes between two types of threads:
</li>

- **basic threads** of active objects that are made ready-to-run by events posted to the active objects. Such basic threads are non-blocking, run-to-completion activations that cannot block in the middle of the RTC step. QXK asserts when a basic thread attempts to use a blocking mechanism, such as a time-delay or a semaphore-wait. All basic threads share the common stack.

- **extended threads** that can block anywhere in their thread-handler function, whereas QXK provides a typical assortments of blocking primitives for extended-threads, such as time-delay, blocking message queue, counting semaphore, or a mutex. Each extended thread must be provided with its own private stack.

<li><span class="bullet">&gt;</span>Tightly integrated mechanisms for communication between event-driven active objects and extended blocking threads:
</li>

- Basic threads (Active Objects) can signal semaphores and send messages to extended threads.

- Extended threads can post or publish events to active objects or other extended threads;

- Extended threads can subscribe to events and thus can receive events published in the system.


<li><span class="bullet">&gt;</span>Priority-Ceiling, recursive Mutexes with optional timeout;
</li>

> **NOTE:** Priority-ceiling protocol implemented in QXK is immune to priority-inversions, but requires a unique QP thread priority level (the ceiling priority) to be assigned to the mutex. This ceiling priority is unavailable to QP threads.@n

> **NOTE:** A QXK mutex can be configured not to use the priority-ceiling protocol (when initialized with a zero priority-ceiling). In that case, the mutex does not require a separate priority level.

<li><span class="bullet">&gt;</span>Counting Semaphores with optional timeout that can block multiple extended-threads;
</li>

<li><span class="bullet">&gt;</span>Blocking "zero-copy" message queue with optional timeout bound to each extended-thread;
</li>

<li><span class="bullet">&gt;</span>Deterministic fixed-size memory pools for dynamic memory management available both to extended-threads and active objects;
</li>

<li><span class="bullet">&gt;</span>Interrupt management, including "zero-latency", kernel-unaware interrupts that are never disabled;
</li>

> NOTE: This feature is only supported on CPUs that allow selective interrupt disabling, such as ARM Cortex-M3/M4 (but not ARM Cortex-M0/M0+);

<li><span class="bullet">&gt;</span> @ref qxk_tls "Thread-Local Storage" for all threads (basic threads and extended threads).
</li>

</ul>


@subsection qxk_tls Thread Local Storage
<b>Thread-local storage (TLS)</b> is a programming method that uses static or global memory local to a thread. TLS is specifically useful for writing library-type code, which is used in a multithreaded environment and needs to access per-thread data in an independent way.

TLS is used in some places where ordinary, single-threaded programs would use static or global variables, but where this would be inappropriate in multithreaded cases. An example of such situations is where library-type functions use a global variable to set an error condition (for example the global variable errno used by many functions of the C library). If errno were simply a global variable, a call of a system function on one thread may overwrite the value previously set by a call of a system function on a different thread, possibly before following code on that different thread could check for the error condition. The solution is to have errno be a variable that looks like it is global, but in fact exists once per thread—i.e., it lives in *thread-local storage*. A second use case would be multiple threads accumulating information into a global variable. To avoid a race condition, every access to this global variable would have to be protected by a mutual-exclusion mechanism. Alternatively, each thread might accumulate into a thread-local variable (that, by definition, cannot be read from or written to from other threads, implying that there can be no race conditions). Threads then only have to synchronize a final accumulation from their own thread-local variable into a single, truly global variable.

The TLS implementations vary, but many systems, including QXK, implement TLS by providing a pointer-sized variable thread-local. This pointer can be set to arbitrarily sized memory blocks in a thread-local manner, by allocating such a memory block (statically or dynamically) and storing the memory address of that block in the thread-local variable.

Typical usage of TLS in QXK is illustrated in the example <span class="img folder">qpc/examples/arm-cm/dpp_efm32-slstk3401a/qxk/</span>, <span class="img file_c">test.c</span>, and consists:

- define the TLS structure
@code{c}
typedef struct {
    uint32_t foo;
    uint8_t bar[10];
} TLS_test;
@endcode

- allocate the TLS storage for all participating threads (extended or basic threads)
@code{c}
static TLS_test l_tls1;
static TLS_test l_tls2;
@endcode

- initialize the TLS per-thread pointer in each thread routine (for extended threads) or the top-most initial transition (for basic threads of active objects):
@code{c}
static void Thread1_run(QXThread * const me) {
    me->super.thread = &l_tls1; /* initialize the TLS for Thread1 */
    . . .
}
. . .
static void Thread2_run(QXThread * const me) {
    me->super.thread = &l_tls2; /* initialize the TLS for Thread2 */
    . . .
}
@endcode

- access the TLS from your code:
@code{c}
void lib_fun(uint32_t x) {
    QXK_TLS(TLS_test *)->foo = x;
}
@endcode

@sa
- QXK_current()
- QXK_TLS()

*/