DSP WORLD: September 2008

Saturday, September 27, 2008

Monday, September 22, 2008

VLSI Technology

http://www.vlsitechnology.org/index.html

Saturday, September 20, 2008

Performance of submicron CMOS devices and gates with substratebiasing

Xiaomei Liu; Mourad, S.Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium onVolume 4, Issue , 2000 Page(s):9 - 12 vol.4 Digital Object Identifier 10.1109/ISCAS.2000.858675Summary:

This paper reports the results of an extensive simulation to study the effect of body bias engineering on the performance of deep submicron technology circuits. Reverse body bias (RBB) is very useful in reducing a device's off-state leakage current and hence standby power. This reduction is more effective as the temperature increases. Forward body bias (FBB) suppresses short channel effects and hence improves Vt roll-off and reduces the gate delays. This improvement is enhanced as the power supply voltage decreases. However, the power dissipation and power delay product have increased under this biasing condition. A good strategy is to apply a forward body bias on critical path only to improve speed without significant increase in power dissipation» View citation and abstract

gate and wire delay simulation

http://tams-www.informatik.uni-hamburg.de/applets/hades/webdemos/12-gatedelay/10-delaydemo/gate-vs-wire-delay.html

Tuesday, September 16, 2008

Clock cross domains

http://www.wipo.int/pctdb/en/wo.jsp?wo=2003039061

edatechforum - Journals

http://www.edatechforum.com/journal/archives.cfm

Introducing new verification methods into a design flow: an industrial user's view

Verification has become one of the main bottlenecks in hardware and system design. Several verification languages, methods and tools addressing different issues in the process have been developed by EDA vendors in recent years. This paper takes the industrial user’s point of view to explore the difficulties posed when introducing new verification methods into ‘naturally grown’ and well established design flows – taking into account application domain-specific requirements, constraints present in the existing design environment and economics. The approach extends the capabilities of an existing verification strategy with powerful new features while keeping in mind integration, reuse and applicability. Based on an industrial design example, the effectiveness and potential of the developed approach is shown.

Bio Pic Robert Lissel is a senior expert focusing on digital design in the Design Methodology Group of the Bosch Automotive Electronics Semiconductor and ICs Division. He holds a Master’s degree in Electrical Engineering from Dresden University of Technology, Germany.

Bio Pic Joachim Gerlach is a member of the Design Methodology Group in the Bosch Automotive Electronics Semiconductor and ICs Division, responsible for research projects in system specification, verification and design. He holds a Ph.D. degree in Computer Science from University of Tübingen, German.

Today, it is estimated that verification accounts for about 70% of the overall hardware and system design effort. Therefore, increasing verification efficiency can contribute significantly to reducing timeto- market. Against that background, a broad range of languages, methods and tools addressing several aspects of verification using different techniques has been developed by EDA vendors. It includes hardware verification languages such as SystemC^{1 2}, SystemVerilog³ and e⁴ that address verification challenges more effectively than description languages such as VHDL and Verilog.

Strategies that use object-oriented mechanisms as well as assertionbased techniques built on top of simulation-based and formal verification enable the implementation of a more compact and reusable verification environment. However, introducing advanced verification methods into existing and well established industrial development processes presents several challenges. Those that require particularly careful attention from an industrial user’s point of view include:

The specific requirements of individual target applications;
The reusability of available verification components;
Cost factors such as tool licenses and appropriate designer training.

This paper discusses how to address the challenges outlined above. With regard to the specific requirements of automotive electronics design, it identifies verification tasks that have high priority. For the example of the verification strategy built up at Bosch, the paper works through company-specific requirements and environmental constraints that required greatest consideration. Finally, the integration of appropriate new elements into our industrial design flow, with particular focus on their practical application, is described.

Figure 1. Verification landscape

Verification challenges

Recently, many tools and methods have been developed that address several aspects of verification using different techniques. In the area of digital hardware verification, metrics for the assessment of the status of verification as well as simulation-based and formal verification approaches have received most attention. Figure 1 is an overview of various approaches and their derived methods. Different design and verification languages and EDA solutions from different vendors occupy this verification landscape to differing degrees and in different parts.

Introducing new verification languages and methods into a well established design and verification flow requires more than pure technical discussion. Their acceptance by developers and the risks that arise from changing already efficient design processes must be considered – a smooth transition and an ability to reuse legacy verification code are essential.

Existing test cases contain much information on former design issues. Since most automotive designs are classified as safety critical, even a marginal probability of missing a bug because of the introduction of a new verification method is unacceptable. On the other hand, the reuse of legacy code should not result in one project requiring multiple testbench approaches. Legacy testcases should ideally be part of the new approach, and it should be possible to reuse and enhance these instead of requiring the writing of new ones.

A second important challenge lies in convincing designers to adopt new methods and languages. Designers are experienced and work efficiently with their established strategies. Losing this efficiency is a serious risk. Also, there is often no strict separation between design and verification engineers, so many developers can be affected when the verification method changes. Furthermore, new methods require training activities and this can represent a considerable overhead. Meanwhile, most projects must meet tight deadlines that will not allow for the trial and possible rejection of a new method.

To overcome those difficulties, it is important to carefully assess all requirements and to evaluate new approaches outside higher priority projects. One possibility is to introduce new methods as add-ons to an existing approach so that a new method or tool may improve the quality but never make it worse. In this light, the evolution of verification methodologies might be preferable to the introduction of completely new solutions.

Considering verification’s technical aspects, automotive designs pose some interesting challenges. The variety of digital designs ranges from a few thousand gates to multimillion-gate systemson- chip. Typical automotive ICs implement analog, digital and power on the same chip. The main focus for such mixed-signal designs is the overall verification of analog and digital behavior rather than a completely separate digital verification. However, the methodology's suitability for purely digital ICs (e.g., for car multimedia) must also be covered.

In practice, the functional characteristics of the design determine what is the most appropriate verification method. If the calculation of the expected behavior is ‘expensive’, directed tests may be the best solution. If there is an executable reference model available or if the expected test responses are easy to calculate, a random simulation may be preferable. Instead of defining hundreds of directed testcases, a better approach can be to randomize the input parameters with a set of constraints allowing only legal behavior to be generated. In addition, special directed testcases can be implemented by appropriately constraining the randomization. The design behavior is observed by a set of checkers. Functional coverage points are necessary to achieve visibility into what functionality has been checked. Observing functional coverage and manually adapting constraints to meet all coverage goals leads to coverage-driven verification (CDV) techniques. Automated approaches built on top of different verification languages^{1 2 3 4 5} result in testbench automation (TBA) strategies.

A directed testbench approach may be most suitable for low complexity digital designs, particularly in cases where reference data is not available for the randomization of all parameters, or the given schedule does not allow for the implementation of a complex constraint random testbench. Furthermore, mixed-signal designs may require directed stimulation. Often a function is distributed over both analog and digital parts (e.g., an analog feedback loop to the digital part). Verifying the digital part separately makes no sense in this case. In fact, the interaction between analog and digital parts is error-prone. Thus, the integration of analog behavioral models is necessary in order to verify the whole function.

One technique that deals with this requirement maps an analog function to a VHDL behavioral description and simulates the whole design in a directed fashion. In other cases, the customer delivers reference data originating from a system simulation (e.g., one in Matlab⁶). Integrating that reference data within a directed testcase is mandatory. Since each directed testcase may be assigned to a set of features within the verification plan, the verification progress is visible even without functional coverage points. Hence, the implementation effort is less than for a constraint random and CDV approach up to a certain design complexity. Anyway, for some parameters not affecting the expected behavior (e.g., protocol latencies), it makes sense to introduce randomization.

Formal verification techniques like property checking allow engineers to prove the validity of a design characteristic in a mathematically correct manner. In contrast to simulation-based techniques – which consider only specific paths of execution – formal techniques perform exhaustive exploration of the state space. On the other hand, formal techniques are usually very limited in circuit size and temporal depth. Therefore, formal and simulation-based techniques need to be combined carefully to optimize the overall verification result while keeping the verification effort to a minimum.

The solution is to apply different verification techniques where they best fit. Powerful metrics are needed to ensure sufficient visibility into the verification’s progress and the contribution of each technique. The question is how to find the best practical solution within available time, money and manpower budgets rather than that which is simply the best in theory. The demands placed on verification methods reach from mixed-signal simulation and simple directed testing to complex constraint random and formal verification as well as hardware/software integration tests. Nevertheless, a uniform verification method is desirable, to provide sufficient flexibility and satisfy all the needs of the automotive environment.

Verification strategies

To illustrate one response to the challenges defined above, this section shows how SystemC has been applied to enhance a companyinternal VHDL-based directed testbench strategy. This approach allowed for the introduction of constraint random verification techniques as well as the reuse of existing testbench modules and testcases, providing the kind of smooth transition cited earlier.

Figure 2. VHDL testbench approach

VHDL-based testbench approach

As Figure 2 shows, the main element in our testbench strategy is to associate one testbench module (TM) or bus functional model with each design-under-test (DUT) interface. All those TMs are controlled by a single command file. Each TM provides commands specific to its individual DUT interface. Furthermore, there is a command loop process requesting the next command from the command file using a global testbench package. Thus, a ‘virtual interconnect layer’ is established. Structural interconnect is required only between the TMs and the DUT.

The command file is an ASCII file containing command lines for each TM as well as control flow and synchronization statements. With its unified structure, this testbench approach enables the easy reuse of existing TMs.

Figure 3 is an example of the command file syntax. Each line starts with a TM identifier (e.g., CLK, CFG), the ALL identifier for addressing global commands (e.g., SYNC), or a control flow statement. Command lines addressing TMs are followed by modulespecific commands and optional parameters. Thus, line 1 addresses the clock generation module CLK. The command PERIOD is implemented within this clock generation module for setting the clock period and requires two parameters: value and time unit. Line 3 contains a synchronization command to the testbench package. The parameter list for this command specifies the modules to be synchronized (ALL for line 3; A2M and CFG for line 7). Since, in general, all TMs operate in parallel – and thus request and execute commands independently – it is important to synchronize them at dedicated points within the command file. When receiving a synchronization command, the specified TMs will stop requesting new commands until all of them have reached the synchronization point.

Figure 3. Command file example

Introducing a SystemC-based approach

The motivation for applying SystemC is to enhance the existing VHDL-based testbench approach. The original VHDL approach defined a sequence of commands which were executed by several TMs, describing a testcase within a simple text file. This worked well enough, but usage showed that more flexibility within the command file was desirable. Besides, VHDL itself lacks the advanced verification features found in hardware verification languages (HVLs) such as e, SystemVerilog HVL and SystemC, as well as the SystemC Verification Library (SCV).

Since the original concept had proved efficient, it was decided to extend the existing approach. In making this choice, it was concluded that a hardware description language like VHDL is not really suitable for the implementation of a testbench controller which has to parse and execute an external command file. So, SystemC was used instead because it provides the maximum flexibility, thanks to its C++ nature and the large variety of available libraries, especially the SCV. Using SystemC does require a mixed-language simulation – the DUT may still be implemented in VHDL, while the testbench moves towards SystemC – but commercial simulators are available to support this.

The implemented SystemC testbench controller covers the full functionality of the VHDL testbench package and additionally supports several extensions of the command file syntax. This makes the use of existing command files fully compliant with the new approach. The new SystemC controller allows us to apply variables, arithmetic expressions, nested loops, control statements and random expressions that are then defined within the command file. The intended impact of these features is that testcases run more efficiently and flexibly.

In general, the major part of testbench behavior should be implemented in VHDL or SystemC within the TMs. Thus, the strategy implements more complex module commands rather than very complicated command files. However, the SystemC approach does not only extend command syntax, it also provides static script checks, more meaningful error messages and debugging features.

Implementing the testbench controller in C++, by following an object-oriented structure, makes the concept easier to use. A SystemC TM is inherited from a TM base class. Hence, only module-specific features have to be implemented. For example, the VHDL-based approach requires implementation of a command loop process for each TM in order to fetch the next command. This is not the case with SystemC because the command thread is inherited from the base class – only the command functions have to be implemented. The implementation of features such as expression evaluation particularly shows the advantage using C++ with its many libraries (e.g., the Spirit Library⁷ is used to resolve arithmetic expressions within the command file).

Another important and practical requirement is that existing VHDL-based TMs can be used unchanged. SystemC co-simulation wrappers need to be implemented and they are generated by using the fully automated transformation approach described in Oetjens Gerlach & Rosenstiel⁸. All VHDL TMs are wrapped in SystemC and a new SystemC testbench top-level is built automatically. This allows the user to take advantage of the new command file syntax without re-implementing any TM, and the introduction of randomization within the command file means existing testcases can be enhanced with minimal effort.

Figure 4. SystemC testbench approach

Figure 5. Decimation filter

Figure 4 shows a testbench environment that includes both VHDL and SystemC TMs. As a first step, legacy TMs are retained, as is shown for TM1, TM2 and TM4. Some TMs, like TM3, may be replaced by more powerful SystemC modules later. SystemC modules allow the easy integration of C/C++ functions. Moreover, the TMs provide the interface handling and correct timing for connecting a piece of software.

Design example

Some extended and new verification features were applied to our SystemC-based testbench approach for a specific industrial design, a configurable decimation filter from a Bosch car infotainment application. The filter is used to reduce the sampling frequency of audio data streams, and consists of two independent filter cores. The first can reduce the input sample frequency of one stereo channel by a factor of three, while the second can either work on two stereo channels with a decimation factor of two or one stereo channel with a decimation factor of four. The filter module possesses two interfaces with handshake protocols: one for audio data transmission and the other for accessing the configuration registers.

The original verification environment was implemented in VHDL, based on the legacy testbench concept described in “Verification strategies.” Besides a clock generation module, two testbench modules for accessing both the data transmission and the configuration interface were required. To fulfill the verification plan, a set of directed testcases (command files) was created.

Figure 5 shows the top-level architecture embedded within a SystemC-based testbench. The example demonstrates the smooth transition towards our SystemC-based testbench approach as well as the application of constraint random and coverage-driven verification techniques. This approach also proved flexible enough to offer efficient hardware-software co-verification.

Constraint random verification

The randomization mechanisms of the SystemC-based testbench were extensively used, and the associated regression tests uncovered some interesting corner cases. As a first step, the existing VHDL TMs were implemented in SystemC. No significant difficulties were encountered nor was any extra implementation time required. To check compliance with the legacy VHDL approach, all existing testcases were re-simulated. Since reference audio data was available for all the filter configurations, a random simulation could be implemented quickly with randomization techniques applied to both the TMs and the command file. The command file was split into a main file containing the general function and an include file holding randomized variable assignments. The main command file consisted of a loop which applied all randomized variables from the include file to reconfigure and run the filter for a dedicated time.

Figure 6. Constraint include file

Figure 6 illustrates an excerpt from the include file. Line 24 describes the load scenario at the audio data interface. The variable #rand_load was applied as a parameter to a command of module A2M later within the main command file. A directed test was enforced by assigning constant values instead of randomized items. Hence, the required tests in the verification plan could be implemented more efficiently as constraint include files. After the verification plan had been fulfilled all parameters were randomized for the running of overnight regressions and identification of corner cases.

Coverage-driven verification

Coverage metrics are required to monitor the verification’s progress, especially for random verification. Analyzing the code coverage is necessary but not in itself sufficient.

For this example, a set of functional coverage points was implemented using PSL5. Since PSL does not support cover groups and cross coverage, a Perl⁹ script was written to generate those cross coverage points. Implementing coverage points required considerable effort, but as a result of that work some verification ‘holes’ in our VHDL-directed testbench were identified. Considering the fully randomized testcase, all coverage points will eventually be covered. In order to meet the coverage goals faster and thus reduce simulation time, a more efficient approach defines multiple randomized testcases using stronger constraints.

Replacing the manual constraints extended our knowledge of TBA techniques with regard to the automatic adaptation of constraints due to the measured coverage results. Therefore, it was necessary to manually define dependencies between constraints and coverage items. Such a testbench hits all desired coverage points automatically. The disadvantage is the considerable effort needed to define the constraints, coverage items and their dependencies.

Nevertheless, a methodology based on our SystemC testbench and PSL was created. First, access to our coverage points was required. Therefore, coverage points were assigned to VHDL signals that could be observed from SystemC. Then, dependencies were identified between the coverage results and constraints within either the command file or a SystemC testbench module. To automate this step, improvements were made to the Perl script. Thus, a CDV testbench module was generated that either passed coverage information to the command file or could be extended for the adoption of constraints in SystemC.

HW/SW co-simulation

In the target application, the decimation filter is embedded within an SoC and controlled by a processor. To set up a system-level simulation, a vendor-specific processor model was given in C and Verilog. Hence, the compiled and assembled target application software, implemented in C, could be executed as binary code on the given processor model. However for this co-simulation, simulation performance decreased notably, although the actual behavior of the processor model was not relevant in this case.

The application C code consisted of a main function and several interrupt service routines. Control of the audio processing module (the decimation filter) was achieved by accessing memory-mapped registers. Thus, the processor performed read and write accesses via its hardware interface. To then overcome performance limitations, the processor model was omitted and the C code connected directly to a testbench module, as illustrated in Figure 5 (p.21).

Due to its C++ nature, the SystemC-based testbench approach offered a smart solution. The intention was to map the TMs' read and write functions to register accesses within the application C code. Therefore, the existing register definitions were re-implemented, using an objectoriented technique. This allowed overloading of the assignment and implicit cast operators for those registers. Hence, reading a register and thus applying the implicit cast resulted in a read command being executed by the TM. Similarly, assigning a value to a register resulted in a write command being executed by the testbench module.

Finally, a mechanism was required to initiate the execution of the main and interrupt functions from the application C-code. Therefore, module commands to initiate those C-functions were implemented.

Hence, control and synchronization over the execution of those functions was available within our command file. This was essential to control the audio TM, which is required to transmit and receive audio data with respect to the current configuration. To execute the interrupt functions, the interrupt mechanism in our testbench concept was used.

Conclusions

Taking a company-internal VHDL-based testbench approach as an example, a smooth transition path towards advanced verification techniques based on SystemC can be demonstrated. The approach allows the reuse of existing verification components and testcases. Therefore, there is some guarantee that ongoing projects will benefit from new techniques without risking the loss of design efficiency or quality. This maximizes acceptance of these new techniques among developers, which is essential for their successful introduction.

Acknowledgements

This work was partially funded by the German BMBF (Bundesministerium für Bildung und Forschung) under grant 01M3078. This paper is abridged from the version originally presented at the 2007 Design Automation and Test in Europe conference in Nice.

References

Open SystemC Initiative (OSCI), SystemC 2.1 Library, www.systemc.org
Open SystemC Initiative (OSCI), SystemC Verification Library 1.0, www.systemc.org
IEEE Std 1800-2005, IEEE Standard for SystemVerilog – Unified Hardware Design, Specification, and Verification Language
IEEE Std 1647-2006, IEEE Standard for the Functional Verification Language ‘e’
IEEE Std 1850-2005, IEEE Standard for Property Specification Language (PSL)
The MathWorks homepage, www.mathworks.com
Spirit Library, spirit.sourceforge.net (NB: no ‘www’)
J.H. Oetjens, J. Gerlach, W. Rosenstiel, "An XML Based Approach for Flexible Representation and Transformation of System Descriptions", Forum on Specification & Design Languages (FDL) 2004, Lille, France.
Wall, Larry, et.al., Programming Perl (Second Edition), O’Reilly & Associates, Sebastopol, CA., 1996.
IEEE Std 1076.3-1997, IEEE standard for VHDL synthesis packages

Asynchronous clocks prove tough for verification

For simulation to correctly predict silicon behavior, the logic implementing a design should adhere to the setup and hold constraints specified for clocked elements. However, with multiple asynchronous clocks on a single chip driving logic, designers cannot help but violate setup and hold constraints. This causes metastability, which in its turn leads to non-deterministic delays through synchronizers. For these types of designs it is critical that a designer has the tools to accurately simulate these non-deterministic effects while performing their functional verification.

Therefore, for designs that have asynchronous clocks, the traditional verification flow should be augmented with a comprehensive clock-domain crossing (CDC) verification solution that addresses the following:

The complete identification of all clocks, CDC signals, verifying whether correct synchronizers are in place.
The designer's ability to verify whether the design correctly implements the CDC protocols that ensure uncorrupted data transfer between clock domains.
The means to augment the simulation with behavioral metastability models (BMMs) to account for the non-determinism introduced by metastability and thereby accurately model silicon behavior.

Bio Pic Rindert Schutten is the product marketing manager for Mentor Graphics’ 0-In verification products and has more than 20 years of software development and EDA industry experience.

The need tominimize power usage andmaximize performance is one of themain forces driving the use of multiple clock domains in SoCs. Not only do the functional blocks in today’s wireless, portable and multi-functional electronic devices operate on different frequencies, but inmany cases they will also be dynamically switched on and off by software to reduce overall power consumption.Many of these chips have 5, 10, sometimes even 20 ormore, independent asynchronous clock domains.

Digital simulation is one of the main methods designers use to acquire the confidence that chips have been designed correctly. It relies on abstract behavioral models of sequential and combinational circuits to describe the functionality of the design. When combined with static timing analysis that ensures a required functionality is achieved within the available timing budget, simulation has proven to be an excellent predictor of silicon behavior – that is, as long as the design is driven by a single master clock.

When this is not the case, simulation falls short because one of its fundamental rules is violated. The rule states that a design should adhere to the setup and hold constraints specified for clocked elements. For synchronous designs, this is exactly what static timing analysis verifies – that, given a particular clock frequency, the setup and hold constraints are met. Only when a design passes timing can one rely on the simulation results. However, when multiple asynchronous clocks drive chip logic, designers cannot help but violate this basic design rule.

Any time data is transferred between asynchronous clock domains, the clock-domain crossing (CDC) signals carrying this data will at some point violate the setup and hold constraints specified for the receiving registers.When this happens, the flip-flops in these registers may become metastable—they will not settle to either a logical ‘1’ or ‘0’ within the specified delay for normal operation.

To prevent metastable signals from propagating through the design, designers have devised specific circuits, called synchronizers, to connect asynchronous clock domains. However, synchronizers do not eliminate the occurrence of metastability; rather, they reduce to almost zero the probability that metastable values will contaminate the rest of the design.

Figure 1. When asynchronous clocks A and B are closely aligned, the first flip-flop in the 2DFF synchronizer can go metastable. The synchronizer confines the metastable signal but has a non-deterministic delay.

Figure 2. By adding a BMM to a synchronizer, the simulation will correctly reflect the non-deterministic delays of synchronizers.

With synchronizers connecting clock domains, the occurrence of metastability is observed as a non-deterministic delay. For example, in silicon, the delay through a common dual flipflop (2DFF) synchronizer can be one, two, or three cycles (Figure 1). Simulation, a deterministic process, always produces a two-cycle delay.

Simulating synchronizers correctly

So, we have seen that simulation falls short because it does not model the behavior of synchronizers correctly. To correctly simulate the behavior of a synchronizer (with respect to silicon), simulation must show the non-deterministic behavior of the synchronizer. When the setup constraint for a synchronizer is violated, simulation should randomly add a cycle delay, and when the hold constraint is violated, it should randomly subtract a cycle delay. Only in this way will it correctly reflect silicon behavior.

To accomplish this, designers could create separate primitive behavioral models for each type of synchronizer. This, however, is not a practical solution because designers use many different kinds of synchronizers. A more practical and fundamental solution is to add behavioral metastability models (BMM) to the synchronizers (Figure 2).

A BMM consists of two parts: a checker that recognizes when setup or hold constraints are violated and a driver that randomly overrides the output of the original synchronizer (modifying the delay through the synchronizer). A BMM can be automatically implemented in the standard sequential and combinational primitives that the simulator already supports. Only when BMMs are added to a synchronizer, does one have an accurate functional model of the silicon.

The complete solution

Manually adding BMMs to a design is labor-intensive and errorprone. Moreover, changing a design solely for the sake of making it simulate correctly is generally not allowed. Thus, we require an automated solution that correctly deals with asynchronous clock domains.

Since all CDC signals need a synchronizer, the first part of a complete solution consists of extensive structural or static analysis of the RTL code to identify all CDC signals and ensure that the proper synchronizers are in place. Any complete solution should also automatically identify the various clock domains, map the clock distribution strategy, and recognize the wide range of synchronizer structures designers use to connect clock domains. Once it is determined that the right synchronizers are in place, designers must verify that data is transferred correctly across them. For most types of synchronizers, the design has to adhere to a particular protocol, generally referred to as a CDC protocol. For example, if a value change has to be transferred from a faster to a slower domain, the signal must be kept stable long enough (in terms of the slower clock) for it to propagate through the synchronizer even when metastability conditions are present. The best approach is to automatically generate these protocols as assertions when running static analysis. Since CDC assertions typically specify properties for the logic in the originating clock domain, traditional simulation and formal analysis are very effective at verifying that the design obeys CDC protocols.

The last step is to verify that the non-deterministic delays of synchronizers are handled correctly by the design. To obtain a simulation that truly reflects silicon behavior, the designer needs to instrument the design with the BMMs for selected synchronizers.Mentor’s 0-In CDC verification solution automates this process.

First, the tool analyzes the RTL code and determines the set of synchronizers for which the non-deterministic delays could cause a problem. This is referred to as ‘structural reconvergence’ analysis.

Figure 3. The 0-In CDC verification solution automatically determines where non-determinism in synchronizers could cause a problem in the design.

Figure 4. Verification flow for designs with asynchronous clocks: while developing the RTL code, engineers run the CDC analysis and generate the BMMs that are subsequently included in simulation.

In Figure 3, for example, two synchronizers between domain A and B are clocked by the same clock (B), and both signals Tx1 and Tx2 are clocked by a different clock (A). Since regular simulation is deterministic, both Tx1 and Tx2 will simultaneously either violate the setup constraint or the hold constraint. In other words, during regular simulation both synchronizers will always simulate with the same delay. In silicon, however, the delays through the two synchronizers are independent and, therefore, can be different. The only way to verify that the logic in domain B deals with these non-deterministic delays correctly is to augment the synchronizers between domain A and B with BMMs.

The 0-In CDC automatically recognizes these scenarios. For the scenario shown in Figure 3, it will generate the BMMs for the synchronizers between the A and B clock domains, and will also determine that regular simulation will suffice for the lone connections between the A and C clock domains and the B and C clock domains.

The general rule to determine whether BMMs are needed is:

When two or more pieces of data have a well-defined timing relationship in clock domain A, and are moved through separate synchronizers to clock domain B, that timing relationship can no longer be relied upon in domain B.

Using this rule, 0-In CDC locates all parts of the design where reconvergence issues may be present.

The second phase of reconvergence verification is then to add the BMMs to the simulation. This is automated in 0-In CDC as well.

Furthermore, the tool automatically collects coverage data on the BMMs. This is very helpful because it outlines exactly where and when metastability conditions occurred during a simulation and whether a BMM modified the original simulated delay of the synchronizer.

Verification flow for designs using asynchronous clocks

With many of today’s wireless,multimedia, computing, and communications designs using asynchronous clocks to optimize power and performance, leading companies have started to integrate CDC verification as an integral component of their verification flows. The complete solution described above fits easily into any existing verification flow. The same testbenches can be reused, only now they use an accurate model of the design-under-test. The coverage information collected on the CDC protocol assertions and BMMs further guides the verification flow as it pinpoints where the effects of metastability have been simulated and directs test development to ensure full functional coverage of the design. Applying a comprehensive and effective verificationmethodology (Figure 4) for designs that have asynchronous clocks is critical tominimize the verification overhead whilemaximizing the identification of CDC problems. Key components should include:

Block Design. During block design, CDC static analysis should be run whenever major code changes are made. This identifies structural problems and potential block-level reconvergence issues prior to the availability of a testbench.
Block Verification. The generated CDC protocol assertions and BMMs should be included during block-level verification to ensure that CDC protocols are followed and to ensure that the silicon behavior of the block is modeled correctly.
Chip Integration. Static analysis should be re-run during chip integration to check CDC signals created when multiple blocks are integrated. This may lead to new CDC protocol assertions and chip-level reconvergence issues.
Chip Verification. Engineers usually build their regression suites during chip-level verification. Both the CDC protocol assertions and BMMs should be included in those suites. Coverage should be checked and, if necessary, additional tests created to fill any coverage holes.

Conclusion

With respins costing millions of dollars in engineering cost alone, not to mention the cost of delayed product introduction, you simply can not afford to miss bugs. Therefore, correctly and completely modeling metastability is a must for any design that uses asynchronous clocks.When equipped with tools that can automatically complement existing simulations with behavioral metastability models, engineers are able to accurately simulate silicon behavior and achieve the highest level of confidence that the chip will work before it is sent to the foundry.

Mentor Graphics
Corporate Office
8005 SW Boeckman Rd
Wilsonville
OR 97070
USA
T: 1 800 547 3000
W: www.mentor.com

Using a 'divide and conquer' approach to system verification

Today’s increasingly complex designs typically need to undergo verification at three different levels: block, interconnect and system. There are now well-established strategies for addressing the first two, but the system level, while in many ways the ultimate test, remains the weakest link in the verification process.

System verification normally begins only after a prototype board or SoC hardware is available, at a time where any bugs that are uncovered are also extremely difficult to fix. This flow also delays a major part of the software debug process until late in a project’s life: the performance limitations of RTL simulations include an inability to run any meaningful amount of software on them.

This article describes an alternative ‘divide and conquer’ strategy, also illustrated in a case study. This breaks down the system to enable the use of multiple hardware modeling approaches and takes advantage of readily available software layers. Using portions of the actual software to drive the hardware deepens the verification process for both elements of a design. The segmentation is also achieved with regard to time and resource budgets.

Bio Pic John Willoughby is director of marketing at Carbon Design Systems. Prior to joining Carbon, he held other marketing roles at Cadence Design Systems, Synopsys, and Viewlogic. He has a BSEE from Worcester Polytechnic Institute.

With the increased prevalence of system-on-chip (SoC) designs, the functionality of the system depends on both the hardware component and the increasingly significant software component. Verifying the hardware alone will not tell a project team if the design will work; only by running actual software on the design will the team know that it is, in fact, meeting the specification. Finding bugs or missing functionality after a chip has been fabricated can often be fixed with software, but at the expense of reduced system performance or dropped features.

Levels of verification

Figure 1.Design verification stages

Verification typically takes place at three different levels: block, interconnect and system (Figure 1).

At the block level, specific functions are being created and verified, typically with synthesizable register transfer level (RTL) code written in Verilog or VHDL. Assertions are increasingly being used to describe expected behavior. Both formal verification (primarily of the assertions) and dynamic simulation are commonly used to verify block-level functionality. Testing at this level is primarily white-box testing as both the function of the block at its primary interface ports is tested and the function of the internal logic is specifically tested. Performance of the verification tools is not typically an issue because the scope of the verification effort is relatively small.

After the blocks have been assembled to create a model of the complete hardware design, the next verification task is to determine whether the correct blocks were used and if they were connected correctly. Bus- and interface-level testbenches are normally utilized to verify the design at this level. Bus and interface monitors – usually created as a set of assertions – are used to validate the correct I/O behavior of each block. Formal analysis, particularly of the bus implementation, is still used but this technique becomes less effective as complexity increases. The workhorse for verification at this level is still simulation and directed random testing techniques are employed to create an exhaustive series of I/O, bus or memory transaction sequences. Performance becomes a challenge here, but most tests suites consist of many smaller tests that can be run in parallel on simulation farms.

The third level of testing is to add the software to the hardware model and create a model of the entire system. This is the ultimate test of the project and, no matter how much verification is performed specifically on the hardware, the project team has no way of knowing that the system works until the actual software is proven to run on it. System-level testing has presented a variety of challenges for project teams and remains the weakest link in the verification process for a number of reasons.

System validation challenges

Block- and interconnect-level testing is well understood and established methodologies exist. System-level testing strategies also exist, but these are typically placed in the flow after an expensive prototype board has been developed or when an actual SoC is available. Obviously, hardware bugs discovered at this stage can be difficult to debug and painful to fix. This also means that software debug does not start until the physical prototype is available.

Running software on the simulated RTL code is not usually possible because the simulation performance of an entire SoC modeled in RTL precludes the possibility of running any meaningful amount of software on the model.

One workaround is to develop high-level system models that can be used to debug the software. This approach can provide the performance required to run some level of software, but results in duplicate model development and adds risk to the project because the abstract models may not match the actual design.

A third path that designers are starting to follow, and one that will be highlighted in the following case study, is based on a ‘divide and conquer’ approach to system validation. It segments the system to utilize multiple hardware modeling approaches and takes advantage of readily available software layers. Using portions of the actual software to drive the hardware increases the verification of both components since neither can be fully tested in the absence of the other.

Case study

Figure 2. System model layers

Figure 3. Hardware/software and system testbench configuration

Figure 4. Simplified block diagram

The project team’s goal in this example was to perform system testing to validate some combination of hardware and software prior to the development of the first physical prototype. Instead of creating a physical prototype, the project team created a virtual system prototype (VSP) to perform hardware/software co-verification. Put simply, the team ran the software on the model of the hardware. Because the VSP had lower performance than the actual physical prototype, the team did not plan to run the entire software suite on the VSP, but rather to run lower-level software layers starting initially with the hardware abstraction layer (HAL) (Figure 2).

The HAL is the very bottom layer of software and provides the basic routines that interface with and control the actual hardware. Its purpose is to separate the application software from machinespecific implementations to make the software more portable across multiple platforms. The HAL exports an application programming interface (API) to the higher level system functions and serves to handle hardware-dependent tasks, including:

processor and device initialization;
cache and memory control;
device support (addressing, interrupt control, DMAoperations, etc.);
timing and interrupt control;
and HW error handling;

By utilizing HAL routines to drive the hardware model, the team in this project was able to validate the most important part of the hardware/software interaction: the part where the two domains interact directly.

In this model, they replaced the higher software layers with a system-test environment serving as the testbench (Figure 3). Of course, the testbench also needed to interact directly with the hardware, particularly when providing ‘stub’ interfaces to parts of the system that were not included in the model. In this case, the project team used SystemC to create the system-level testbench, although any high-level programming language could have been used.

The use of HAL routines provided the additional benefit of allowing the verification engineer to create system tests at a higher level of abstraction and raised productivity during test generation. Directed random techniques can be used at this level, and the combination provides a powerful environment to create system tests.

Transaction-level interfaces allow project teams to write transaction calls instead of manipulating individual signals, and each transaction call often results in many individual signal transitions over a period of many clock cycles. In the same way, moving up to the HAL API level allows a greater level of abstraction where a single HAL API call may result in a series of transactions.

Evening the performance

On this project, the hardware model represented something of a performance bottleneck, although a number of techniques could be deployed that helped overcome the problem.

The application included an embedded network interface, an MPEG decoder block and audio processor, various control functions such as bus arbiters and interrupt controllers, the CPU,memory controller, a USB interface, and the memory itself (Figure 4, p.15). The CPU was a pre-verified ARM core, which meant that the team did not need to re-verify that it worked. The processor model was replaced by an instruction set simulator (ISS) that modeled the CPU behavior in an efficient way at cycle level.

The other major blocks of the system were compiled using Carbon Design Systems’Model Studio to produce cycle-accurate compiled models from the ‘golden RTL’. These models included a number of optimizations that provided performance enhancements over the original RTL versions.

The platform development environment was able to take advantage of Carbon’s OnDemand feature to improve performance.

OnDemand monitors the activity in the compiled model itself by tracking inputs and storage elements in the model and identifying inactive or repeating behavior.When it detects that the model is inactive, it ‘shuts off ’ the model.

Figure 5. Block diagram showing inactive blocks.

Figure 6. Block diagram with Replay

By eliminating the activity in the model until it becomes necessary, the platform execution speed improves. During this particular project, each of the major functional blocks was compiled independently and was automatically disabled when not being used. For example, when the CPU was receiving and processing data from the network interface, the MPEG, USB and audio blocks were often all inactive and could be turned off (Figure 5).

The models in the platform also made use of Carbon's Replay technology to accelerate the software debug cycle. The first time software was run, the model captured the behavior at the primary I/O points and stored it.When the same software was re-run, the captured outputs were used instead of executing the model itself. This continued until the first point where input signals were different from what had been recorded, indicating changes in the software or design. At this point, the execution of the model began and took over from the replayed responses. Figure 6 illustrates how the bus interface for the USB block was held inactive and the stimulus/ response information replayed from the previously captured data.

Results

By adopting an intelligent approach to segmenting system verification objectives, the project team was able to reduce project risk and accelerate the development schedule. They used a VSP to debug HAL software routines and hardware before committing to an actual hardware prototype. This approach allowed for more tests within the time budget, and the additional tests were of higher quality because they used the actual system software.While the higher-level software was not simulated, the software that did integrate directly with the hardware was verified and that provided the most important view into the combined hardware/software model.

Not every project has a specific low-level software layer such as HAL. But with software development planning and careful segmentation of the software into hardware-specific and hardwareindependent layers,many project teams can create a design that supports hardware/software co-verification and also increases software portability.

By taking advantage of performance techniques for the hardware model, including the use of cycle-level compiled models derived from the golden RTL, the project team in the case study was able to improve the performance of the entire hardware model. The end result was shorter project development time because of the early start for the software debug. The process also reduced risk by increasing both the amount and quality of verification performed on the hardware before any commitment had to be made to a physical prototype.

Carbon Design Systems
375 Totten Pond Road
Suite 100/200
Waltham
MA 02451-2025
USA
T: 1 781 890 1500
W: www.carbondesignsystems.com

Using Open Virtual Platforms to build, simulate and debug multiprocessor SoCs

The Open Virtual Platforms (OVP) initiative aims to help resolve the difficulties that arise today when modeling multicore systems-on-chip (SoC) so that designers can perform early and timely test of the embedded software that will run on the end devices.

As architects continue to add more cores to meet hardware design goals, the complexity of embedded software continues to increase exponentially because of factors such as amplified software concurrency and shared on-chip resource bottlenecks.

The OVP-based platform enables early software test through powerful simulations that execute at hundreds of MIPS, that are aimed specifically at the challenge posed by multicore, and that incorporate appropriate application programming interfaces (APIs) for the modeling of processors, components and platforms.

This paper reviews the construction of a multicore SoC platform, describes how to simulate the platform and how to connect it to a debugger.

Bio Pic Duncan Graham is senior corporate applications engineer at Imperas. Graham has a Bachelor’s degree in Electrical Engineering and Electronics from Brunel University.

The Open Virtual Platforms (OVP) initiative addresses problems embedded software developers face when modeling the system-on-chip (SoC) that will host their program. These range from modeling architectural complexity to a lack of open resources for building platforms, to insufficient simulation speeds for software verification.

Embedded software programming issues introduced by the move to multicore processing are now the most significant problems facing SoC delivery. As architects add more cores, embedded software complexity increases exponentially because of amplified software concurrency and shared on-chip resource bottlenecks.

One answer is to comprehensively test software early in the design flow on a simulation that can handle SoC complexity and deliver the performance to verify billions of operational ‘cycles’. The solution must permit model interoperability and the use of legacy models to reduce integration risks and costs. End-users, tool and intellectual property (IP) developers, and service providers must be able to contribute to the platform development infrastructure.

The OVP-based platform satisfies these criteria by enabling software simulations that execute at hundreds of MIPS. It handles multicore architectures, and has a robust set of application programming interfaces (APIs) for the easy modeling of processors, components and platforms. An open source modeling approach enables the community to drive further technology development and leverage existing work.

This paper details how to build a multicore SoC platform, and describes how to simulate the platform and connect it to a debugger. The development of specific processor, behavioral and peripheral models with the OVP APIs is left to another paper.

Specifically, this paper describes using the Innovative CPUManager (ICM) API to implement simulation models of platforms that contain any number of processor models communicating with shared memory. Platforms created using the ICM interface can be simulated using either the free OVPsim simulation environment or a commercial product available from Imperas. OVPsim (available to download at www.OVPworld.org)

is a dynamically linked library implementing Imperas simulation technology. It contains implementations of all the ICM interface functions described in this article. These functions enable the instantiation, interconnection and simulation of complex multiprocessor platforms containing arbitrary local and shared memory topologies.

For Windows environments, MinGW (www.mingw.org) and MSYS should be used. At Imperas, we currently use gcc version 3.4.5 with MinGW runtime version 3.14 for Windows. OVPsim is currently available only on Windows XP.

The examples in this paper use the OR1K processor model and tool chains, also available at www.OVPworld.org.

Single processor platform

A simple program can be made that runs a single-processor platform using just five routines from the ICM API:

icmInit This routine initializes the simulation environment prior to a simulation run. It should always be the first ICM routine called in any application. It specifies attributes to control some aspects of the simulation to be performed, and also specifies how a debugger should be connected to the application, if required.
icmNewProcessor This routine is used to create a new processor instance.
icmLoadProcessorMemory Once a processor has been instantiated by icmNewProcessor, this routine is used to load an Executable and Linking format (ELF) file into the processor memory.
icmSimulatePlatform This routine is used to run simulation of the processor and program for a specified duration.
icmTerminate At the end of simulation, this routine should be called to perform cleanup and delete all allocated simulation data structures.

The example uses the OR1K processor. This can be found online at http://www.opencores.org/projects.cgi/web/or1k/architecture. The test platform source constructs a simple single-processor platform in the main function, as shown in Figure 1.

Figure 1. Test platform construct of simple single processor. Source: Imperas

The following paragraphs describe the main operations being performed.

The ICM kernel is initialized by calling icmInit:

icmInit(0, 0, 0);

This function takes three arguments. The first, simAttrs, is a bitmask controlling some aspects of simulation behavior. The two remaining arguments are used when processor debug is required (this is discussed in a later section).

A single instance of a processor is defined by calling icmNewProcessor. Parameters for this routine are:

name: an instance name to give the instance that must be unique in the design.
type: a type name for the instance; in this case specified as “or1k” in the makefile.
cpuId: every processor has an id number, specified by this argument.
cpuFlags: a bitmask accessed from within the processor model to change its behavior (for example, to turn on debug modes). In normal usage, pass 0.
addressBits: specifies the default data and instruction bus widths for the model (typically 32, though ICM supports addresses up to 64bit).
modelFile / modelSymbol: modelFile is the path to the dynamic load library implementing the processor model.
procAttrs: a bitmask controlling some aspects of processor behavior.
userAttrs: this argument specifies a list of application-specific attributes for the processor. In this example, the instance has no attributes.
semiHostFile / semiHostSymbol: these two parameters specify the semihosting library for the processor instance; this is described in the next subsection.

Defining semihosting

The idea of semihosting allows the default behavior of specified functions to be intercepted and overwritten by a semihosting shared object library loaded by the simulator. In this example, it will be used to define the behavior of a program exit. This is a simple example used to terminate the simulation; other semihosting features can be used to provide support for low-level functions that make up functions such as printf using the native host functionality. In this case, a global label, exit, is defined on the last instruction of the assembler test. This will be intercepted by the simulator as defined in the semihosting library, as shown in Figure 2.

Figure 2. Simulator defined in semihosting library. Source: Imperas

The label can be used in conjunction with a standard semihosting shared object library. This semihosting library terminates simulation immediately after any instruction labeled exit. To use the semihosting library, platform.c includes the semihosting object file name and the name of the semihostAttrs object within that file, as it has been specified by SEMIHOST_FILE and SEMIHOST_SYMBOL, defined in the platform makefile. The SEMIHOST_FILE refers to the name of the .dll file implementing the semihosting. The SEMIHOST_SYMBOL refers to the name of a specific symbol used within the model that defines the semihosting behavior (in this case, impExitAttrs).

This simple example makes no specific mention of any processor memory configuration, other than to say that the processor address bus width is 32 bits. In the absence of any other specific information about memory configuration, OVPsim will create a single fully populated RAM memory attached to both the processor data and instructions busses. In addition, processor address spaces can be explicitly specified to contain separate RAMs and ROMs, with some shared between processors in a multiprocessor system. It is also possible to specify that certain address ranges will be modeled by callback functions in the ICM platform itself, useful for modeling memory-mapped devices. Examples would be UARTs, though a peripheral device can be instantiated as a peripheral instance.

Once a processor instance has been created, an ELF format file can be loaded into the processor memory using:

icmLoadProcessorMemory(processor, argv[1], False, False);

The first parameter is the processor for which to load memory. The second parameter is the application ELF file name. In this example, the application file name is passed as the first argument when the platform is run. The third parameter controls whether the ELF file is loaded using physical addresses (if True) or virtual addresses (if False). This affects only processors implementing virtual memory. The fourth parameter enables debug output showing the location of sections in the loaded ELF file.

There are also memory accessor functions that allow a file loader for any file format to be written in C as part of the platform and used to load program memory. For example, this method would be used to support the loading of hex file formats or S records. Once the processor has been instantiated and an application program loaded, the program can be simulated to completion using: icmSimulatePlatform();

This routine simulates the entire platform using the OVPsim default scheduler that, for multiprocessor platforms, runs each processor for a number of instructions in a time slice before advancing time to run the next time slice.

The routine named icmSimulate is available to simulate a specific processor for a precise number of instructions. This second function is useful in situations when OVPsim is being used as a subsystem of a larger simulation implemented in another environment, such as SystemC. Finally, we use icmTerminate is to clean up simulation data structures and delete all the simulation objects created since the previous icmInit call.

Attaching a debugger

It is possible to attach a debugger using the gdb RSP protocol to a processor in an OVPsim simulation. In order to use debugging, two steps are needed in the platform. First, icmInit must be passed a debug host name and port number as arguments:

icmInit(True, “localhost”, portNum);

Second, the specific processor instance targeted for debug must be given the ICM_ATTR_DEBUG instance attribute:

icmProcessorP processor = icmNewProcessor(

“cpu1”, // CPU name

TYPE_NAME, // CPU type

0, // CPU cpuId

0, // CPU model flags

32, // address bits

MORPHER_FILE, // model file

MORPHER_SYMBOL, // morpher attributes

ICM_ATTR_DEBUG, // CPU attributes

0, // user-defined attributes

SEMIHOST_FILE, // semi-hosting file

SEMIHOST_SYMBOL // semi-hosting attributes

);

When the ICM executable is started, it will wait for a debugger to connect to it on the specified port. If gdb is being used as the debugger, a version of gdb specific to the processor type to be debugged is required.

Multiprocessor support

Any number of processors can be instantiated within an ICM platform, using shared memory resources and callbacks on mapped memory regions to allow communication between them. The following shows the instantiation of two processors and a memory shared between them. Each processor also has a small amount of local memory for stack.

Figure 3. Typical test platform and application output. Source: Imperas

Two processors are instantiated with individual names and unique ID numbers:

// create a processor

icmProcessorP processor0 = icmNewProcessor(

“cpu1”, // CPU name

TYPE_NAME, // CPU type

0, // CPU cpuId

0, // CPU model flags

32, // address bits

MORPHER_FILE, // model file

MORPHER_SYMBOL, // morpher attributes

SIM_ATTRS, // simulation attributes

0, // user-defined attributes

SEMIHOST_FILE, // semi-hosting file

SEMIHOST_SYMBOL // semi-hosting attributes

);

icmProcessorP processor1 = icmNewProcessor(

“cpu2”, // CPU name

TYPE_NAME, // CPU type

1, // CPU cpuId

0, // CPU model flags

32, // address bits

MORPHER_FILE, // model file

MORPHER_SYMBOL, // morpher attributes

SIM_ATTRS, // simulation attributes

0, // user-defined attributes

SEMIHOST_FILE, // semi-hosting file

SEMIHOST_SYMBOL // semi-hosting attributes

);

Two busses are created, one for each processor, and connected to the processors:

// create the processor busses

icmBusP bus1 = icmNewBus(“bus1”, 32);

icmBusP bus2 = icmNewBus(“bus2”, 32);

// connect the processor busses

icmConnectProcessorBusses(processor0, bus1, bus1);

icmConnectProcessorBusses(processor1, bus2, bus2);

This example needs three memories––a local stack memory for each processor and some shared memory, created and connected to the processor busses:

// create memories

icmMemoryP local1 = icmNewMemory(“local1”, ICM_PRIV_RWX, 0x0fffffff);

icmMemoryP local2 = icmNewMemory(“local2”, ICM_PRIV_RWX, 0x0fffffff);

icmMemoryP shared = icmNewMemory(“shared”, ICM_PRIV_RWX, 0xefffffff);

// connect memories

icmConnectMemoryToBus(bus1, “mp1”, shared, 0x00000000);

icmConnectMemoryToBus(bus2, “mp2”, shared, 0x00000000);

icmConnectMemoryToBus(bus1, “mp1”, local1, 0xf0000000);

icmConnectMemoryToBus(bus2, “mp1”, local2, 0xf0000000);

Memory maps for multiprocessor systems can be complex, so it is often useful to be able to show the bus connections using icmPrintBusConnections:

// show the bus connections

icmPrintf(“\nbus1 CONNECTIONS\n”);

icmPrintBusConnections(bus1);

icmPrintf(“\nbus2 CONNECTIONS\n”);

icmPrintBusConnections(bus2);

icmPrintf(“\n”);

The full memory map of each processor is mapped onto the shared memory object, except for a small section of local memory for each stack. In this example, it means that the load processor memory function need only be used for one processor. The effect of loading a processor’s memory will be to load the shared memory where both processors will execute the same code:

// load the processor object file - because all memory is shared, both

// processors will execute the same application code

icmLoadProcessorMemory(processor0, argv[1], False, False);

The platform is then simulated to completion using the function icmSimulatePlatform.

A set of standard Imperas function intercepts can be enabled by passing ICM_ENABLE_IMPERAS_INTERCEPTS as the first argument of icmInit. In this particular application, the impProcessorId intercepted function within the application is used to access the ID of the processor on which the application is running and to modify the application execution. ICM_VERBOSE is set in this example to enable simulation runtime statistics at the end of simulation:

// initialize CpuManager - require Imperas intercepts because the

// application uses impProcessorId() to get processor id

icmInit(ICM_VERBOSE| ICM_ENABLE_IMPERAS_INTERCEPTS, 0, 0);

When the test platform and application are compiled and simulated, the resulting output is as shown in Figure 3––cpu0 is generating the Fibonacci series with cpu1 reading results from the shared memory.

Peripheral support

A SoC usually includes numerous peripheral devices, such as UARTs and DMA controllers that may use interrupts to communicate with the processors and have master access into the memory space used by the processors. These features are all included and supported within the OVP environment.

Peripheral devices are modeled using a Peripheral Simulation Engine (PSE). Each peripheral is added using a call to icmNewPSE.

icmPseP dmac = icmNewPSE(“dmac”, “dmacModel.pse”, NULL, NULL, NULL);

The new peripheral is defined by five parameters. The first two give the peripheral’s name and the dynamic link library that is loaded by the simulator to define the behavior. The third allows attributes to be passed in order to configure the peripheral’s behavior. For example, these attributes could be used to define a file to write data to, or to define the mode of operation. The final two parameters are used to define a semihosting library to load onto the peripheral to expose host native functions and to allow host peripheral hardware accesses. Examples would include the use of the Ethernet NIC or USB, keyboard.

Summary

Key attributes of OVP––hundreds of MIPS simulation performance, the ease with which a complex multiprocessor/multiperipheral platform can be created, the reusability and interoperability of the OVP models due to the APIs––have been highlighted, although more are available. This open, free solution allows embedded software teams to quickly build and simulate complex multiprocessor/multiperipheral platforms, advancing them toward their goal of higher quality software in a tighter schedule.

The OVP-based platform gives software teams an open standard solution to quickly and inexpensively simulate embedded software on SoC designs. Its ability to handle multicore architectures with a robust set of APIs offers easy modeling of processors, components and platforms.

Imperas
Imperas Buildings
North Weston
Thame
Oxfordshire
OX9 2HA
UK

T: +44 1844 217114
W: www.imperas.com

Monday, September 15, 2008

Building reusable verification environments with OVM

Numerous methodologies are available to help engineers speed up the verification development process and make testbenches more efficient. Promoting reuse at sophisticated levels is becoming an increasingly important part of this landscape.

This article specifically reviews the reuse potential within the Open Verification Methodology, with special focus on four particularly fruitful areas: testbench architecture, testbench confi guration control, sequences and class factories. A simple router verification example, pw_router, illustrates schemes for building reusable OVM testbenches.

Stephen D’Onofrio is a verification architect engineer at Paradigm Works. His main responsibilities include leading verification teams, writing verification plans and implementing tests at client sites. D’Onofrio holds a BSEE from the University of Massachusetts.

Ning Guo is a principal consulting engineer at Paradigm Works. She has been working on ASIC/FPGA design verification for about 10 years and holds a Ph.D in Electrical and Computer Engineering from Concordia University, Montreal, Canada.

A testbench developed with reusability in mind will save a lot of duplicated effort and time. Testbench code reusability from block level to system level is also an immediate need for larger systems. Reusability from project to project is similarly desirable.

In recent years, various methodologies have emerged to help engineers speed up the verification development process and make testbenches more effi cient. The initial version of the Open verification Methodology (OVM) includes features such as agents, virtual sequences, and transaction-level modeling (TLM) that directly promote intelligent reuse. Most advanced reuse techniques in OVM are similar to those found in the proven eRM/sVM methodology. Users and managers may not know exactly what SystemVerilog coupled with OVM offers in the area of reusability. Based on our experience in setting up OVM-based environments, we have identified key areas where users can extract major benefits in terms of testbench reusability. This paper discusses four of those in detail: testbench architecture, test bench configuration control, sequences and class factories. A simple router verification example, pw_router, will illustrate our schemes for building reusable OVM testbenches.

Testbench architecture

A key aspect of developing effi cient reusable verification code is a testbench architecture made up of multiple layers of highly confi gurable components. This section describes how unit-level and system-level testbenches are built into multiple layers and highlights component reusability. Complex designs are typically broken up into multiple manageable and controllable unit-level testbenches and a system-level testbench that takes in the entire design. Therefore, reuse of components across multiple unit-level testbenches and at the system level is vital. OVM promotes a layered architecture that consists of a testbench container and two verification components types: interface components and module/system components.

Interface components

These are reusable verification components specifi c to a particular protocol. They comprise of one or more ‘agents,’ an agent being a component that connects to a port on the DUT. It is made up of a monitor, sequencer, and a driver (Figure 1). The testbench can confi gure the agent as either active or passive.

In active mode, the agent builds the sequencer, driver and monitor. In this configuration, stimulus is driven onto the bus via the sequencer/driver and the monitor captures bus.

activity. In passive mode, the agent includes only a monitor - the sequencer and driver are not included inside the agent. A passive agent only captures bus activity and it does not drive stimulus into the DUT.

The topology of the pw_router DUT consisted of one input port and four output ports. For our testbench, we developed an interface verification component called packet_env. The packet_env includes one master agent and four slave agents (Figure 2). The master agent is connected to the input port and the slave agents are connected to the output ports.

Module components

These are reusable verification components for verifying a DUT module. Module components promote reuse for multiple testbenches – they may be reused in both a unit-level testbench and a system-level testbench.

Module components encapsulate multiple interface components and monitor activity among the interfaces. The monitor typically observes abstract data activity such as registers and memory contents. Also, a module component undertakes scoreboarding to verify end-toend expected data against actual data. Occasionally, a module component may include a virtual sequence that coordinates stimulus activity among multiple interface components.

The pw_router’s testbench’s module component is shown in Figure 3. It consists of two interface components: packet_ env and host_env, an interface component. There are two scoreboards: packet scoreboard and interrupt scoreboard. Finally, the pw_router module component contains pw_router monitor, a monitor that shadows the contents of the registers/ memories inside the design. The scoreboards connect to host_env and packet_env’s monitors via TLM ‘analysis ports’. These ports allow transactions to be sent from a producer (publisher) to one or more target components (subscribers). TLM promotes component reuse in a number of ways. Its use of transactions eliminates the need for testbench-specifi c component references (pointers) within other components. There is a standard, proven interface API for sending/receiving transactions. The transactions’ abstraction level may vary, but at the product-description rather than the physicalwire level.

The unit-level testbench

The unit-level testbench for pw_router is shown in Figure 4. The pw_router_sve and test cases are two additional container layers specifi c to this testbench. The pw_router_ sve container encapsulates the pw_router_env module component and other components not intended for reuse. For example, the pw_router virtual sequence component is included in the pw_router_sve container. This sequence coordinates host and packet stimulus. The test layer allows users (or test writers) the opportunity to customize testbench control.

System-level testbench

Figure 5 shows the system-level testbench for the pw_top design module. This design encapsulates pwr_router and another design module called pw_ahb. The system testbench for the pw_top design includes a top-level container called pw_top_sve and a system component called pw_top. A system component encapsulates a cluster of module components, performs scoreboarding, and may monitor activity amongst the module components. A system module allows for further reuse (e.g., pw_top may be included as module component in a larger system context). The pw_top system component includes and reuses the pw_route module component. In addition, a pw_top module component also encapsulates another module component called pw_ahb_env. Finally, a scoreboard laycomponent is included inside the system container to

verify the interface across the pw_ahb and pwr_router designs. The top-level pw_top_sve container encapsulates the system module pw_top and includes the sequencer component pw_top that is specifi c to the system-level testbench. This sequencer is responsible for coordinating AMBA bus (ahb) and host traffic at the system level.

In the system-level testbench, the packet master is confi gured as a passive agent. Because this is an internal interface inside the pw_system design, the testbench can monitor the interface but cannot drive data onto it. Note that putting the agent in passive mode does not affect the pw_router_env’s packet and interrupt scoreboards. They still verify expected data against actual data as they did in the unit-level testbenches.

configuration control

OVM components are self-contained. The behavior and implementation are independent of the overall testbench, facilitating component reuse. Typically, components operate with a variety of different modes controlled by fields (sometimes referred to as ‘knobs’). It is pertinent that the testbench environment and/or the test writers have the ability to configure component field settings. OVM provides advanced capabilities for controlling the configuration of fields. Examples include hierarchy fields such as the active_passive field inside an agent and behavior fields that may control testbench activities for, say, the number of packets to generate. The primary purpose of the configuration mechanism is to control the setting of field values during the build phase. The build phase occurs before any simulation time is advanced. The configuration mechanism gives test writers and higher layer testbench components the ability to overwrite the default field settings in the components. A testbench hierarchy is established in top-down fashion where parent components are built before their child components (Figure 6). Higher-level testbench layers (test cases and containers) and components (system/ module components) can therefore overwrite default configuration settings that govern the testbench hierarchy and its behavior.

OVM sequence mechanism

OVM sequences allow test writers to control and generate stimulus to the DUT. The sequence mechanism may be fl at, layered, hierarchical (sometimes referred to as ‘nested’), layered, and controlled from higher layers of abstraction using a mechanism called virtual sequences. All these sequence capabilities promote reuse.

An OVM sequence mechanism has three entities: a sequence or sequences, a sequencer and a driver.

A sequence is a construct that generates and drives transfers (or sequence items) to a driver via a sequencer. This type of sequence is referred to as a fl at sequence. Additionally, sequences can call other sequences – this is called a hierarchical sequence. Hierarchical sequences allow testbench developers to write new sequences that reuse other sequences. The sequencer is a verification component that mediates the generation and flow of data between the sequence(s) and the driver. The sequencer has a collection of sequences associated with it called a sequence library. The driver is a verification component that connects to the DUT’s pin-level interface. A driver includes one or more transaction-level interfaces that decode the transaction and drive it onto the DUT’s interface. The driver is responsible for pulling transactions from the sequencer and driving them onto the DUT’s pin interface. The sequencer and driver communicate through a special TLM consumer/producer interface. The TLM interface allows a single sequencer to be reused with different drivers.

A virtual sequence allows stimuli to be managed across multiple sequencers. For example, the design requires the host to initialize the DUT before routing packet traffic. Moreover, while packet_env’s sequencer has packet traffic flowing, the host_env’s sequencer needs to service interrupts. OVM virtual sequences provide the coordination needed here. Virtual sequences allow sequencers to be reused in different testbenches. For example, the testbench’s unit level and system level reuse the packet_env’s slave agent sequencer and host_env’s master sequence.

OVM class factory

The OVM class factory is a powerful mechanism that allows test writers to override the default behavior exhibited in the testbench. The class factory and configuration mechanism can both override testbench behavior but have different charters. The configuration mechanism’s primary focus is to give the testbench hierarchy an opportunity to overwrite default fi elds values in a top-down manner during the testbench’s build phase. The class factory gives users the ability to override OVM objects during the build and run phases.

An OVM class factory is a static singleton object. When OVM objects are created in the testbench, they may be registered into the class factory. Test writers can derive their own OVM objects and then perform type or instance overrides of the OVM objects in the testbench environment. This methodology is completely non-intrusive with regard to the testbench code. The test writers may change the behavior of an OVM object by overwriting virtual functions, adding properties, defi ning and adding additional constraints. This reduces the time to develop specifi c tests using a single verification environment and promotes reuse across multiple projects.

Conclusion

Upfront planning and knowledge of methodological best practice are crucial to the development of efficient OVM reusable code. It is important to plan all OVM testbench architectures early in the verification effort, before any code is implemented. When putting together testbench architectures, one must consider system-level and future project reuse. As noted in this article, OVM has features that greatly help with reuse such as the configuration mechanism, class factories, TLMs and sequences. The OVM best practice reuse capabilities will not become fully apparent just from reading the OVM Reference Manual, monitoring the OVM Forum, or looking through the OVM Examples. At time of writing (September 2008), it is only a little more than six months since the initial OVM release, so new material is starting to become available to aid users in developing reusable testbenches, and these releases need to be monitored carefully.

Paradigm Works
300 Brickstone Square
Suite 104
Andover
MA 01810
USA
T: +1 978 824 1400
W: www.paradigm-works.com

Total Pageviews

Saturday, September 27, 2008

Monday, September 22, 2008

Saturday, September 20, 2008

Tuesday, September 16, 2008

Verification challenges

Verification strategies

VHDL-based testbench approach

Introducing a SystemC-based approach

Design example

Constraint random verification

Coverage-driven verification

HW/SW co-simulation

Conclusions

Acknowledgements

References

Asynchronous clocks prove tough for verification

Simulating synchronizers correctly

The complete solution

Verification flow for designs using asynchronous clocks

Conclusion

Levels of verification

System validation challenges

Case study

Evening the performance

Results

Single processor platform

Defining semihosting

Attaching a debugger

Multiprocessor support

Peripheral support

Summary

Monday, September 15, 2008

Testbench architecture

Interface components

Module components

The unit-level testbench

System-level testbench

configuration control

OVM sequence mechanism

OVM class factory

Conclusion

Blog Archive

About Me