Is Context Switch Equal To Saving Registers

Context Saving

Smarter systems and the Moving-picture show 18F2420

Tim Wilmshurst , in Designing Embedded Systems with PIC Microcontrollers (Second Edition), 2010

thirteen.7.7 Context saving with interrupts

With the Fast Annals Stack, described in Department 13.6.3 , context saving can in some circumstances be delightfully easy. The programmer must decide starting time if the iii registers saved on this stack, WREG, Status and BSR, are adequate for the purpose. If not, or if the fast return from interrupt is non used, then the programmer will need to write lawmaking to save all necessary registers at the start of the ISR and retrieve them at the end (equally demonstrated in Program Example 6.4). Information technology is of import also to call up that a high-priority interrupt tin interrupt one of lower priority. In so doing, the interrupt of high priority would overwrite the contents of the Fast Annals Stack, and the low-priority interrupt lose its context! In such cases it is not safe to use the Fast Register Stack for low-priority interrupts; context for these should be saved in the software. These issues are explored in Programming Exercises 13.4 and 13.5.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781856177504100174

Design and Development

Colin Walls , in Embedded Software (Second Edition), 2012

Impact of Real-Time Systems

The majority of embedded microprocessors are employed in real-time systems, and this puts further demands on the build tools. A existent-time organisation tends to include interrupt service routines (ISRs); it should be possible to code these in C or C++ by adding an additional keyword interrupt declaring that a specific function is an ISR. This performs the necessary context saving and restoring and the render from interrupt sequence. The interrupt vector may usually be defined in C using an array of pointers to functions.

Furthermore, in a real-time organisation, it is common for code to be shared between the mainline program and ISRs or between tasks in a multithreading system. For code to be shared, it must be reentrant. C and C++ intrinsically permit reentrant lawmaking to be written, because data may be localized to the instance of the programme (since it is stored on the stack or in a register). Although the programmer may compromise this capability (by using static storage, for example), the build tools need to support reentrancy. Equally mentioned previously, some ANSI-standard library functions are intrinsically nonreentrant, and some care is required in their use.

Since memory may exist in short supply in an embedded system, cross-compilers generally offering a high level of command over its usage. A practiced instance is a language extension supporting the packed keyword, which permits arrays and structures to be stored more efficiently. Of course, more memory-efficient storage may result in an access fourth dimension increase, but such trade-offs are typical challenges of real-time organization blueprint.

One of the enhancements added to the C language during the ANSI standardization was the volatile keyword. Applying this qualifier to a variable declaration advises the compiler that the value of the variable may alter in an unexpected way. The compiler is thus prevented from optimizing access to that variable. Such a situation may arise if ii tasks share a variable or if the variable represents a command or information register of an I/O device. It should be noted that the volatile keyword is non included in the standard C++ language. Many compilers do, however, include information technology as a language extension. The usefulness of a C++ cross-compiler without this extension is dubious.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124158221000027

Moving beyond 8-chip

Tim Wilmshurst , in Designing Embedded Systems with PIC Microcontrollers (Second Edition), 2010

21.5.2 The Central Processing Unit of measurement

At the heart of the microcontroller lies the MIPS Technology 32-scrap cadre. It is interesting to get a picture of the many other applications this core is used for by checking the MIPS Technology website, Ref. 21.ix. The compages was adult by John Hennessy of Stanford University. Interestingly, Hennessy is co-author of Ref. 21.i, which uses the MIPS cadre equally an example. That volume is therefore especially advisable as background reading for this device.

The MIPS CPU is a circuitous thing. It contains an 'execution unit' for mainstream CPU operations, a 'multiply/divide unit', doing just what its name suggests, and a 'system control coprocessor,' which handles some of the operational features like interrupts, address translation and debug. The execution unit has 32-bit registers, belongings data and address information. There is as well a 'shadow gear up' of register files, to ease context saving during interrupts.

The multiply/split up unit tin can execute 16-bit × 16-bit or 16-bit × 32-bit multiplications in one clock bike, or 32-fleck × 32-bit in ii. Divide operations replicate the looping algorithm mentioned in Section 21.three.1. In add-on to regular multiply instructions, the instruction set contains two instructions, madd (multiply and add together) and msub (multiply and subtract), intended for DSP applications.

The CPU has a five-stage pipeline, illustrated in Effigy 21.13. Near instructions execute in these five stages, each phase taking one instruction bike. Notice the difference between this figure and the elementary ii-stage pipeline of Figure 2.8. There, fetch and execute were the only two pipeline stages. Here, fetch and execute remain broadly speaking the first ii stages, but other useful housekeeping and data transfers (including load and shop transfers) take identify in the later stages, in parallel with other activities from other instructions.

The CPU has three modes of operation, 'kernel', 'user' and 'debug'. On reset, the CPU is in kernel mode, which is the virtually general-purpose and powerful. This gives access to the whole memory infinite and all peripherals. User mode restricts access to a range of resources. Information technology does not have to be used at all, simply information technology can exist viewed every bit a safer operational way for some activities, with transfer to kernel fashion where needed. Debug style is of class used past debuggers; it allows access to all kernel mode resources, including those specifically for debug.

An of import feature of the microcontroller is its JTAG capability. JTAG, the Articulation Examination Action group, was formed in the mid 1980s. At this fourth dimension digital systems were becoming increasingly complex and it was no longer possible to admission test points within a system. Therefore it became necessary to design test points and test facilities into the hardware itself. JTAG wished to develop an approach which was compatible beyond all manufacturers. Their proposal was adopted equally IEEE Standard 1149.i, although the terminology JTAG is still ordinarily used. Integral to the approach is a 'purlieus scan' mechanism, whereby signals at component boundaries are monitored or controlled. At chip level, the JTAG interface is implemented with a 4- or 5-wire interface. An enhanced JTAG standard is applied here.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9781856177504100265

NFV Infrastructure—Hardware Evolution and Testing

Ken Gray , Thomas D. Nadeau , in Network Part Virtualization, 2016

Graphics processing unit of measurement

GPUs are potential accelerators and accept been used in research like Packetshader. ⁴⁵ Like FPGA, they are used in High Functioning Computing and accept consumer applications (eg, gaming).

While the GPU was originally developed specifically for graphics processing (rasterization, shading, 3D, etc.), information technology has evolved to a more than general-purpose programming model (GPGPU) that allows the GPU to exist treated like a multicore processor (with an equally big, independent retentivity infinite). Many universities accept curricula that include parallel programming techniques and include the use of CUDA, ⁴⁶ a domain-specific language for NVIDIA GPUs.

As generically illustrated in Fig. 8.ix , the virtualized GPU processors could exist direct mapped to VMs/containers mayhap reducing their host resources requirements or used equally standalone machines. Like the connectivity in the discreet FPGA, a number of copy/context saving mechanisms could exist leveraged to provide admission between the GPU and other cores. Most notably, this can result in the utilize of incoherent shared retentiveness. ⁴⁷

Makers like NVIDEA ⁴⁸ and AMD ⁴⁹ offering products designed for more general compute tasks (parallel compute focus) with multiple streaming multiprocessors (SMs) on a die—each with 32 or more ALU (currently driving total core-equivalent density well into the hundreds)—illustrated in Fig. 8.x. Overall performance of these products is measured in many hundreds of GFLOPS (ten⁹ Floating Point Operations per Second). Be forewarned, not all GPU architectures are the same, even within the same manufacturer (they have application-focused designs). ^l

GPUs likewise utilise caching as arbitrage against retention access, though they currently are beingness designed with significantly greater retentiveness bandwidth that may ultimately decrease this dependency.

As with the FPGA, SoC architectures with integrated GPP cores are emerging. (Intel's Haswell has an integrated GPU with up to 40 "execution units." Sandy Bridge and Ivy Bridge had models with up to 12 or 16 "execution units," respectively.) This is nevertheless a much smaller size than is seen in designs mentioned before.

It remains to be seen if a big enough, on-socket GPU will evolve to be a serious network I/O off-load processor. Power consumption and resulting heat may make this a difficult design point. Until then, the use of the GPU equally a PCI peripheral for this purpose will probably Non be the best solution for this application because of limitations imposed both by PCI bandwidth and circular-trip latency from NIC to GPU.

In add-on, GPUs have also proven to exist more difficult to virtualize/share than the cohort of competing technologies covered in this chapter.

The GPU cores are optimized for threads with long running serial of computations and devote more than area to this than is seen in the design of a generic processor architecture (lA).

Basic GPU programming technique creates a "kernel"—the computational component of an algorithm. Note that this is not the same concept as an OS kernel. Applications or libraries can have one or more "kernels." Afterward compilation, the "kernel" can be many threads, which execute the same routine. The threads can and then exist assembled into "blocks" of greater than a one thousand threads, limited by scheduler adequacy to execute on an SM. This creates opportunities to share retention. In the case of NVIDEA, execution is in groups of 32 threads inside the SM (ie, a "warp")—even though in some cases, the number of ALUs in an SM is only 16 due to a higher clock rate for the ALUs. The GPU can besides switch in and out of applications in tens of microseconds. All of which tin can be used to create large scale parallel processing such equally multithreaded SIMD.

GPUs support their own ISAs and have their ain unique development environments. These products back up higher-level languages like C, C++, Java, and Python.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780128021194000080

Multi-tasking and the real-time operating organization

Tim Wilmshurst , in Designing Embedded Systems with PIC Microcontrollers (2d Edition), 2010

18.4 Scheduling and the scheduler

A key part of the RTOS is the 'scheduler'. This determines which task is immune to run at any item moment. Amidst other things, the scheduler must be aware of what tasks are ready to run and their priorities (if any). There are a number of fundamentally different scheduling strategies, which we consider now.

18.4.1 Cyclic scheduling

Cyclic scheduling is unproblematic. Each task is immune to run to completion before it hands over to the next. A task cannot be discontinued every bit it runs. This is almost like the super loop operation we saw earlier in this affiliate.

A diagrammatic example of cyclic scheduling is shown in Figure 18.five. Here the horizontal band represents CPU action and the numbered blocks the tasks as they execute. Tasks are seen executing in turn, with Task 3 initially the longest and 2 the shortest. In the tertiary iteration, nevertheless, Job 1 takes longer and the overall loop time is longer. Cyclic scheduling carries the disadvantages of sequential programming in a loop, every bit outlined above. At least it is simple.

18.4.2 Circular-robin scheduling and context switching

In circular-robin scheduling the operating organization is driven by a regular interrupt (the 'clock tick'). Tasks are selected in a fixed sequence for execution. On each clock tick, the current chore is discontinued and the side by side is immune to start execution. All tasks are treated equally being of equal importance and look in turn for their slot of CPU time. Tasks are not immune to run to completion, but are 'pre-empted', i.e. their execution is discontinued mid-flying. This is an example of a 'pre-emptive' scheduler.

The implications of this pre-emptive task switching, and its overheads, are not insignificant and must be taken into business relationship. When the task is allowed to run again, information technology must be able to pick up operation seamlessly, with no side-outcome from the pre-emption. Therefore, complete context saving (all flags, registers and other retentivity locations) must exist undertaken as the task switches. Time-critical plan elements should non exist interrupted, withal, and this requirement will need to be written into the program.

A diagrammatic example of round-robin scheduling is shown in Figure xviii.vi. The numbered blocks over again represent the tasks every bit they execute, merely there is a major difference from Figure 18.5. Now each chore gets a slot of CPU time, which has a fixed length. The clock tick, which causes this task switch, is represented in the diagram by an arrow. When that time is up, the side by side task takes over, whether the current ane has completed or not. At one stage Task ii completes and does not demand CPU time for several time slices. It then becomes fix for activity over again and takes its turn in the cycle.

As the task and context are switched, there is an inevitable fourth dimension overhead, which is represented by the black bars. This is the time taken serving the requirements of the RTOS, which is lost to the awarding plan.

eighteen.4.3 Job states

It is worth pausing at this moment to consider what is happening to the tasks now they are being controlled past a scheduler. Clearly, only one task is running at any one time. Others may demand to run, merely at any i instant do not have the chance. Others may only need to respond to a particular set of circumstances and hence only be active at certain times during programme execution.

It is important, therefore, to recognise that tasks tin move between different states. A possible state diagram for this is shown in Figure 18.7. The states are described below. Note, even so, that the terminology used and the style the state is affected vary to some extent from one RTOS to another. Therefore, in some cases several terms are used to describe a certain state.

•: Ready (or eligible). The task is prepare to run and volition practise so as before long as it is allocated CPU fourth dimension. The job leaves this country and enters the agile land when it is started by the scheduler.
•: Running. The job has been allocated CPU fourth dimension and is executing. A number of things can cause the task to get out this land. Mayhap it simply completes and no longer needs CPU time. Alternatively, the scheduler may pre-empt information technology, so that another job can run. Finally, information technology may enter a blocked or waiting land for one of the reasons described below.
•: Blocked/waiting/delayed. This country represents a task which is ready to run but for i reason or another is not allowed to. In that location are a number of singled-out reasons why this may be the case, and indeed this unmarried state on the diagram could be replaced by several, if greater detail was wanted. The task could be waiting for some data to arrive or for a resource that it needs that is currently being used past another task, or information technology could exist waiting for a menstruum to exist up. The state is left when the task is released from the condition which is holding it in that location.
•: Stopped/suspended/dormant. The task does not now demand CPU fourth dimension. A task leaves this country and enters the ready state when information technology is activated over again, for whatsoever reason.
•: Uninitialised/destroyed. In this state the job no longer exists as far every bit the RTOS is concerned. An implication of this is that a task does not need to have continuous existence throughout the course of plan execution. Generally, they have to exist created or initialised in the program before they can run. If necessary they can later be destroyed and possibly another created instead. Removing unneeded tasks from the job listing simplifies scheduler functioning and reduces demands on retentiveness.

eighteen.4.iv Prioritised pre-emptive scheduling

We return now to our survey of scheduling strategies, armed with a greater agreement of the lifestyle of tasks. In round-robin scheduling tasks get subservient to a college ability – the operating organisation – as we have seen. Yet all tasks are of equal priority, and so an unimportant task gets just as much access to the CPU as one of tip-tiptop priority. We can change this by prioritising tasks.

In the 'prioritised pre-emptive scheduler', tasks are given priorities. High-priority tasks are now allowed to complete before whatsoever time whatever is given to tasks of lower priority. The scheduler is still run past a clock tick. On every tick it checks which ready task has the highest priority. Whichever that is gets access to the CPU. An executing job which however needs CPU time and is highest priority keeps the CPU. A low-priority task which is executing is replaced by one of higher priority, if it has become ready. The high-priority task becomes the 'bully in the playground'. In almost every case it gets its way.

The way this scheduling strategy works is illustrated in the example of Figure 18.8. This contains a number of the key concepts of the RTOS and is worth understanding well. The diagram shows iii tasks, each of different priority and different execution duration. At the beginning, all are ready to run. Because Chore i has the highest priority, the scheduler selects information technology to run. At the next clock tick, the scheduler recognises that Job 1 withal needs to run, so it is allowed to continue. The same happens at the next clock tick and the chore completes during the post-obit time slice. Chore 1 does not at present demand CPU fourth dimension and becomes suspended. At the next clock tick the scheduler therefore selects the ready job which has the highest priority, which is at present Task 3. This likewise runs to completion.

At terminal Task 2 gets a hazard to run! Unfortunately for it, however, during its first time slice Task i becomes ready once more. At the adjacent clock tick the scheduler therefore selects Job ane to run again. In one case more, this is allowed to run to completion. When it has, and only because no other task is ready, Job 2 can re-enter the arena and finally consummate. Following this, for one time slice, there is no active task and hence no CPU activity. Task 1 and then becomes fix one more than fourth dimension and starts to run again to completion.

18.iv.5 Cooperative scheduling

The scheduling strategy just discussed, prioritised pre-emptive scheduling, represents archetype RTOS activity. It is not without disadvantage, even so. The scheduler must hold all context information for all tasks that it pre-empts. This is generally done in one stack per job and is memory-intensive. The context switching tin also be time-consuming. Moreover, tasks must be written in such a style that they can be switched at whatever time during their operation.

An alternative to pre-emptive scheduling is 'cooperative' scheduling. Now each task must relinquish, of its own accordance, its CPU access at some time in its functioning. This sounds like we're blocking out the operating system, merely if each task is written correctly this need not be. The reward is that the task relinquishes control at a moment of its choosing, so it can control its context saving and the central overhead is non required.

Cooperative scheduling is unlikely to be quite every bit responsive to tight deadlines as pre-emptive scheduling. It does, however, need less memory and tin switch tasks quicker. This is very important in the small system, such as one based on a PIC microcontroller.

18.4.6 The role of interrupts in scheduling

So far, we have not mentioned interrupts in connectedness with the RTOS. Should ISRs themselves grade tasks, as was washed in structures like that of Figure 18.four? The answer is no. The first apply of interrupts is almost always to provide the clock tick, through a timer interrupt on overflow. Beyond this, ISRs are usually used to supply urgent information to the tasks or scheduler. The interrupt could, for case, be fix to signal that a certain event has occurred, thereby releasing a task from a blocked state (Figure eighteen.seven). The ISRs themselves are not normally used equally tasks.

Read full chapter

URL:

https://www.sciencedirect.com/science/commodity/pii/B9781856177504100228