Or, Frequently Asked Questions I’ve heard on Embedded Systems, Digital, Programming, and ASIC.
Last Updated on 05/16/01
Q: What are some good books on ASICs, Embedded, and Digital Design?
ASIC/Verilog books I like:
“Digital Design and Synthesis With Verilog Hdl” by Eli Sternheim
“Hdl Chip Design : A Practical Guide for Designing, Synthesizing & Simulating
Asics & Fpgas Using Vhdl or Verilog” by Douglas J. Smith
“Writing Testbenches - Functional Verification of HDL Models” by Janick Bergeron
“It's the Methodology, Stupid!” by Pran Kurup, Taher Abbasi, Ricky Bedi
Some books on Embedded Systems and Digital Design I’ve enjoyed:
“Design
With Pic Microcontrollers” by John
B. Peatman
”Design
With Microcontrollers” by John B.
Peatman
[Clive Maxfield writes some extremely engaging, yet substantial stuff. Try it!]
“Bebop Bytes Back : An Unconventional Guide to Computers” by Alvin Brown,
“Bebop to the Boolean Boogie : An Unconventional Guide to Electronics Fundamentals, Components, and Processes” by Clive Max Maxfield
“Designus Maximus Unleashed!” by Clive Max Maxfield
Several key magazines read by folks in the Embedded Systems field include:
The Microchip Corporation (http://www.microchip.com/) makes the PIC microcontroller and they maintain a very good WWW site containing much free software, applications notes, etc. PICs are just one avenue for implementing an Embedded System, but PICs are also one way to get your feet wet. There are many good and cheap PIC-based kits that allow you to program the PIC, connect up sensors, displays, etc. Check out this ready-to-go PIC tool from Parallax. The XILINX (www.xilinx.com) WWW site is packed with valuable Applications Notes (sometimes, very FPGA specific but often times very general).
Q: What is a SoC (System On Chip), ASIC, “full custom chip”, and an FPGA?
There are no precise definitions. Here is my sense of it all.
First, 15 years ago, people were unclear on exactly what VLSI meant. Was it 50000 gates? 100000 gates? was is just anything bigger than LSI? My professor simply told me that; VLSI is a level of complexity and integration in a chip that demands Electronic Design Automation tools in order to succeed. In other words, big enough that manually drawing lots of little blue, red and green lines is too much for a human to reasonably do. I think that, likewise, SoC is that level of integration onto a chip that demands more expertise beyond traditional skills of electronics. In other words, pulling off a SoC demands Hardware, Software, and Systems Engineering talent. So, trivially, SoCs aggressively combine HW/SW on a single chip. Maybe more pragmatically, SoC just means that ASIC and Software folks are learning a little bit more about each other’s techniques and tools than they did before. Two other interpretations of SoC are 1) a chip that integrates various IP (Intellectual Property) blocks on it and is thus highly centered with issues like Reuse, and 2) a chip integrating multiple classes of electronic circuitry such as Digital CMOS, mixed-signal digital and analog (e.g. sensors, modulators, A/Ds), DRAM memory, high voltage power, etc.
ASIC stands for “Application Specific Integrated Circuit”. A chip designed for a specific application. Usually, I think people associate ASICs with the Standard Cell design methodology. Standard Cell design and the typical “ASIC flow” usually means that designers are using Hardware Description Languages, Synthesis and a library of primitive cells (e.g. libraries containing AND, NAND, OR, NOR, NOT, FLIP-FLOP, LATCH, ADDER, BUFFER, PAD cells that are wired together (real libraries are not this simple, but you get the idea..). Design usually is NOT done at a transistor level. There is a high reliance on automated tools because the assumption is that the chip is being made for a SPECIFIC APPLICATION where time is of the essence. But, the chip is manufactured from scratch in that no pre-made circuitry is being programmed or reused. ASIC designer may, or may not, even be aware of the locations of various pieces of circuitry on the chip since the tools do much of the construction, placement and wiring of all the little pieces.
Full Custom, in contrast to ASIC (or Standard Cell), means that every geometric feature going onto the chip being designed (think of those pretty chip pictures we have all seen) is controlled, more or less, by the human design. Automated tools are certainly used to wire up different parts of the circuit and maybe even manipulate (repeat, rotate, etc.) sections of the chip. But, the human designer is actively engaged with the physical features of the circuitry. Higher human crafting and less reliance on standard cells takes more time and implies higher NRE costs, but lowers RE costs for standard parts like memories, processors, uarts, etc.
FPGAs, or Field Programmable Gate Arrays are completely designed chips that designers load a programming pattern into to achieve a specific digital function. A bit pattern (almost like a software program) is loaded into the already manufactured device which essentially interconnects lots of available gates to meet the designers purposes. FPGAs are sometimes thought of as a “Sea of Gates” where the designer specifies how they are connected. FPGA designers often use many of the same tools that ASIC designers use, even though the FPGA is inherently more flexible.
All these things can be intermixed in hybrid sorts of ways. For example, FPGAs are now available that have microprocessor embedded within them which were designed in a full custom manner, all of which now demands “SoC” types of HW/SW integration skills from the designer.
Q: How do I model Analog and Mixed-Signal blocks in Verilog?
First, this is a big area. Analog and Mixed-Signal designers use tools like Spice to fully characterize and model their designs. My only involvement with Mixed-Signal blocks has been to utilize behavioral models of things like PLLs, A/Ds, D/As within a larger SoC. There are some specific Verilog tricks to this which is what this FAQ is about (I do not wish to trivialize true Mixed-Signal methodology, but us chip-level folks need to know this trick).
A mixed-signal behavioral model might model the digital and analog input/output behavior of, for example, a D/A (Digital to Analog Converter). So, digital input in and analog voltage out. Things to model might be the timing (say, the D/A utilizes an internal Success Approximation algorithm), output range based on power supply voltages, voltage biases, etc. A behavioral model may not have any knowledge of the physical layout and therefore may not offer any fidelity whatsoever in terms of noise, interface, cross-talk, etc. A model might be parameterized given a specific characterization for a block. Be very careful about the assumptions and limitations of the model!
Issue #1; how do we model analog voltages in Verilog. Answer: use the Verilog real data type, declare “analog wires” as wire[63:0] in order to use a 64-bit floating-type represenation, and use the built-in PLI functions:
$rtoi converts reals to integers w/truncation e.g. 123.45 -> 123
$itor converts integers to reals e.g. 123 -> 123.0
$realtobits converts reals to 64-bit vector
$bitstoreal converts bit pattern to real
That was a lot. This is a trick to be used in vanilla Verilog. The 64-bit wire is simply a ways to actually interface to the ports of the mixed-signal block. In other words, our example D/A module may have an output called AOUT which is a voltage. Verilog does not allow us to declare an output port of type REAL. So, instead declare AOUT like this:
module dtoa (clk, reset..... aout.....);
....
wire [63:0] aout; // Analog output
....
We use 64 bits because we can use floating-point numbers to represent out voltage output (e.g. 1.22x10-3 for 1.22 millivolts). The floating-point value is relevant only to Verilog and your workstation and processor, and the IEEE floating-point format has NOTHING to do with the D/A implementation. Note the disconnect in terms of the netlist itself. The physical “netlist” that you might see in GDS may have a single metal interconnect that is AOUT, and obviously NOT 64 metal wires. Again, this is a trick. The 64-bit bus is only for wiring. You may have to do some quick netlist substitutions when you hand off a netlist.
In Verilog, the real data type is basically a floating-point number (e.g. like double in C). If you want to model an analog value either within the mixed-signal behavorial model, or externally in the system testbench (e.g. the sensor or actuator), use the real data type. You can convert back and forth between real and your wire [63:0] using the PLI functions listed above. A trivial D/A model could simply take the digital input value, convert it to real, scale it according to some #defines, and output the value on AOUT as the 64-bit “psuedo-analog” value. Your testbench can then do the reverse and print out the value, or whatever. More sophisticated models can model the Successive Approximation algorithm, employ look-ups, equations, etc. etc.
That’s it. If you are getting a mixed-signal block from a vendor, then you may also receive (or you should ask for) the behavioral Verilog models for the IP.
Q:
How do I interface one clock domain to another?
Depends... Here are 3 methods that I’ve used lately. First, “synchronizers” are very common. A synchronizer is just a flip-flop that accepts signals from one clock domain but is clocked by the local clock domain. People argue about how many levels of these synchronizers to use (2 are common). The idea is that the incoming signal could hit the flop and violate a setup/hold time. When this happens, the flop could go “metastable” and basically “ring” for an indeterminate time. A series of synchronizers helps isolate the main circuitry from potential ringing and resynchronizes the inputs to the new clock domain. There are many, many papers on “Metastability” where you will find equations involving the two frequencies, setup/hold times of the flops, rise times, and MTBF. Some ASIC and FPGA libraries actually have flip-flops intended for use as synchronizers (they have tighter setup/hold times and higher internal gains). Another approach is to use a Dual-Port RAM at the interface (e.g. usually a FIFO). DPRAMs will usually have separate clocks for each domain. DPRAMs can also help with data rate conversions (e.g. smooth out burstiness in an input for example). Another scenario that arises is handshaking across an asynchronous interface. For example, one domain provides an AVAILABLE signal indicating data and the other domain acknowledges this data with some sort of ACKNOWLEDGE signal. Often in these types of interfaces, there are NO explicit clocks. Very fast handshaking interfaces can be designed by designing to the edges in the protocol and using standard flip-flops using the clock inputs to detect edges and then asynchronous sets/clears to reset them. This is how, for example, a narrow pulse can be detected by a clock domain with a much slower period. Well, that wasn’t very clear and requires a schematic. ISD Magazine has had several good articles on this sort of thing in the last couple of years. Also, look for several papers that have been presented at ESNUG (try www.deepchip.com). Be careful out there, but there are techniques work.
Some good insight into FIFOs can be found at the XILINX site (XAPP 051) or just do some searches (also note some free FIFO IP at www.free-ip.com).
Q: What does "Big Endian" mean and how do I know if my computer is one?
Basically, if your computer can address a specific byte and you have an integer that spans more than 1 of those bytes, does the "most significant byte" come first? If yes, this is Big Endian, otherwise you have a Little Endian system.
Here is a C program that will tell you what you have:
//
// ENDIAN.C - Demonstrate big vs. little endian.
//
// Big Endian:
// The address where a multi-byte entity starts references the MSB (e.g. big).
// Increasing the address references less significant bytes.
//
// Little Endian:
// The address where a multi-byte entity starts references the LSB (e.g. big)
// Increasing the address references more significant bytes.
//
#include <stdio.h>
int main () {
union {
char cvalues[2];
short int ivalue;
} x;
x.ivalue = 1;
// So, the integer is encoded in hex as 0x0001. The MSB is 0x00 and the LSB is 0x01.
//
printf ("\nx.ivalue = %X", x.ivalue);
printf ("\nx.cvalues[0] = %X, x.cvalues[1] = %X", x.cvalues[0], x.cvalues[1]);
printf ("\n"); }
// *** Under DOS using Borland C++, output is:
//
// x.ivalue = 1
// x.cvalues[0] = 1, x.cvalues[1] = 0
//
// Therefore, DOS/Intel is LITTLE ENDIAN
//
//
// *** Under UNIX using 'CC', output is:
//
// x.ivalue = 1
// x.cvalues[0] = 0, x.cvalues[1] = 1
//
// Therefore, UNIX/Sparc is BIG ENDIAN
Q: How do I synthesize Verilog into gates with Synopsys?
The answer can, of course, occupy several lifetimes to completely answer.. BUT.. a straight-forward Verilog module can be very easily synthesized using Design Compiler (e.g. dc_shell). Most ASIC projects will create very elaborate synthesis scripts, CSH scripts, Makefiles, etc. This is all important in order automate the process and generalize the synthesis methodology for an ASIC project or an organization. BUT don't let this stop you from creating your own simple dc_shell experiments!
Let's say you create a Verilog module named foo.v that has a single clock input named 'clk'. You want to synthesize it so that you know it is synthesizable, know how big it is, how fast it is, etc. etc. Try this:
target_library = { CORELIB.db } <--- This part you need to get from your vendor...
read -format verilog foo.v
create_clock -name clk -period 37.0
set_clock_skew -uncertainty 0.4 clk
set_input_delay 1.0 -clock clk all_inputs() - clk - reset
set_output_delay 1.0 -clock clk all_outputs()
compile
report_area
report_timing
write -format db -hierarchy -output foo.db
write -format verilog -hierarchy -output foo.vg
quit
You can enter all this in interactively, or put it into a file called 'synth_foo.scr' and then enter:
dc_shell -f synth_foo.scr
You can spend your life learning more and more Synopsys and synthesis-related commands and techniques, but don't be afraid to begin using these simple commands.
Q: What is "Scan" ?
Scan Insertion and ATPG helps test ASICs (e.g. chips) during manufacture. If you know what JTAG boundary scan is, then Scan is the same idea except that it is done inside the chip instead of on the entire board. Scan tests for defects in the chip's circuitry after it is manufactured (e.g. Scan does not help you test whether your Design functions as intended). ASIC designers usually implement the scan themselves and occurs just after synthesis. ATPG (Automated Test Pattern Generation) refers to the creation of "Test Vectors" that the Scan circuitry enables to be introduced into the chip. Here's a brief summary:
· Scan Insertion is done by a tool and results in all (or most) of your design's flip-flops to be replaced by special "Scan Flip-flops". Scan flops have additional inputs/outputs that allow them to be configured into a "chain" (e.g. a big shift register) when the chip is put into a test mode.
· The Scan flip-flops are connected up into a chain (perhaps multiple chains)
· The ATPG tool, which knows about the scan chain you've created, generates a series of test vectors.
· The ATPG test vectors include both "Stimulus" and "Expected" bit patterns. These bit vectors are shifted into the chip on the scan chains, and the chips reaction to the stimulus is shifted back out again.
· The ATE (Automated Test Equipment) at the chip factory can put the chip into the scan test mode, and apply the test vectors. If any vectors do not match, then the chip is defective and it is thrown away.
· Scan/ATPG tools will strive to maximize the "coverage" of the ATPG vectors. In other words, given some measure of the total number of nodes in the chip that could be faulty (shorted, grounded, "stuck at 1", "stuck at 0"), what percentage of them can be detected with the ATPG vectors? Scan is a good technology and can achive high coverage in the 90% range.
· Scan testing does not solve all test problems. Scan testing typically does not test memories (no flip-flops!), needs a gate-level netlist to work with, and can take a long time to run on the ATE.
· FPGA designers may be unfamiliar with scan since FPGA testing has already been done by the FPGA manufacturer. ASIC designers do not have this luxury and must handle all the manufacturing test details themselves.
· Check out the Synopsys WWW site for more info.
Q: How do I generate a random number either in a C program (e.g. if there is no access to standard rand() function) or in hardware?
Use an LFSR (Linear Feedback Shift Register). This is a useful building block for many things including random numbers. Check out the program, myrand.v in my File Archive.
Q: I need to sample an input or output something at different rates, but I need to vary the rate? What's a clean way to do this?
Many, many problems have this sort of variable rate requirement, yet we are usually constrained with a constant clock frequency. One trick is to implement a digital NCO (Numerically Controlled Oscillator). An NCO is actually very simple and, while it is most naturally understood as hardware, it also can be constructed in software. The NCO, quite simply, is an accumulator where you keep adding a fixed value on every clock (e.g. at a constant clock frequency). When the NCO "wraps", you sample your input or do your action. By adjusting the value added to the accumulator each clock, you finely tune the AVERAGE frequency of that wrap event. Now - you may have realized that the wrapping event may have lots of jitter on it. True, but you may use the wrap to increment yet another counter where each additional Divide-by-2 bit reduces this jitter. The DDS is a related technique. I have two examples showing both an NCOs and a DDS in my File Archive. This is tricky to grasp at first, but tremendously powerful once you have it in your bag of tricks. NCOs also relate to digital PLLs, Timing Recovery, TDMA and other "variable rate" phenomena.
Q: How can I pass parameters to my simulation?
A testbench and simulation will likely need many different parameters and settings for different sorts of tests and conditions. It is definitely a good idea to concentrate on a single testbench file that is parameterized, rather than create a dozen seperate, yet nearly identical, testbenches. Here are 3 common techniques:
· Use a define. This is almost exactly the same approach as the #define and -D compiler arg that C programs use. In your Verilog code, use a `define to define the variable condition and then use the Verilog preprocessor directives like `ifdef. Use the '+define+' Verilog command line option. For example:
... to run the simulation ..
verilog testbench.v cpu.v +define+USEWCSDF
... in your code ...
`ifdef USEWCSDF
initial $sdf_annotate (testbench.cpu, "cpuwc.sdf");
`endif
The +define+ can also be filled in from your Makefile invocation, which in turn, can be finally
filled in the your UNIX promp command line.
Defines are a blunt weapon because they are very global and you can only do so much with
them since they are a pre-processor trick. Consider the next approach before resorting to
defines.
· Use parameters and parameter definition modules. Parameters are not preprocessor definitions and they have scope (e.g. parameters are associated with specific modules). Parameters are therefore more clean, and if you are in the habit of using a lot of defines; consider switching to parameters. As an example, lets say we have a test (e.g. test12) which needs many parameters to have particular settings. In your code, you might have this sort of stuff:
module testbench_uart1 (....)
parameter BAUDRATE = 9600;
...
if (BAUDRATE > 9600) begin
... E.g. use the parameter in your code like you might any general variable
... BAUDRATE is completely local to this module and this instance. You might
... have the same parameters in 3 other UART instances and they'd all be different
... values...
Now, your test12 has all kinds of settings required for it. Let's define a special module
called testparams which specifies all these settings. It will itself be a module instantiated
under the testbench:
module testparams;
defparam testbench.cpu.uart1.BAUDRATE = 19200;
defparam testbench.cpu.uart2.BAUDRATE = 9600;
defparam testbench.cpu.uart3.BAUDRATE = 9600;
defparam testbench.clockrate CLOCKRATE = 200; // Period in ns.
... etc ...
endmodule
The above module always has the same module name, but you would have many different
filenames; one for each test. So, the above would be kept in test12_params.v. Your
Makefile includes the appropriate params file given the desired make target. (BTW: You
may run across this sort of technique by ASIC vendors who might have a module containing
parameters for a memory model, or you might see this used to collect together a large
number of system calls that turn off timing or warnings on particular troublesome nets, etc.
etc.)
· Use memory blocks. Not as common a technique, but something to consider. Since Verilog has a very convenient syntax for declaring and loading memories, you can store your input data in a hex file and use $readmemh to read all the data in at once.
In your testbench:
module testbench;
...
reg [31:0] control[0:1023];
...
initial $readmemh ("control.hex", control);
...
endmodule
You could vary the filename using the previous techniques. The control.hex file is just a file
of hex values for the parameters. Luckily, $readmemh allows embedded comments, so you
can keep the file very readable:
A000 // Starting address to put boot code in
10 // Activate all ten input pulse sources
... etc...
Obviously, you are limitied to actual hex values with this approach. Note, of course, that
you are free to mix and match all of these techniques!
Q: What's a Scrambler?
See my scramble.c demo program in my file archive to see a simple operate. A scrambler takes a user bit stream being transmitted and "scrambles" it (another term is "whitening"). Scrambling sort of randomizes the data and tends to insure that the bit stream has a varied mix of 1s and 0s. Why? Receivers must often be able to extract the implicit clock from data by watching for when bits change. If the data happens to be long stretches of all 1s or all 0s, then sych schemes may not work right. All that scrambler reall is, is a random number generator which generates random bits, which are then XORed with the data (selectively flipping each bit). Based on a linear-feedback shift register loaded with an agreed upon seed; the receiver can easily reverse the scrambling. Typically, transmitter and receiver need only agree on when the data starts and what the seed is. Check out the program and you'll see..
Q: What's an Interleaver?
See my ileave.c demo program. Noise will often corrupt a bunch of bits in a burst. Many FEC techniques may not be able to detect or correct these longer bursts. What to do? Take your linear data which has this FEC, and simply "interleave" it. If you arrange your transmitted data in a table format contained in ROW and COLUMN, then rearrange this data in the opposite way so that your COLUMN data becomes your ROW data and then transmit that. Repeat this operation (deinterleave) at the receiver and you get your original data. But, the data won't be as susceptible to burst errors. That wasn't a very good explanation, but check out the little program if you need ideas on how to actually code this up. This operation can be done in a similar fashion in either SW or in hardware (usually using a RAM). The test program doesn't address any issues related to buffering. Typically, one must set up a "ping-pong" system where input data is interleaved (or deinterleaved) from one buffer (e.g. "Pong") while the previous blocks data is streamed out to the next module from the other buffer (e.g. "Ping"). Again, this is often done both in SW and HW contexts.
Q: What are RTL, Gate, Metal and FIB fixes? What is a "sewing kits"?
There are several ways to fix an ASIC-based design. >From easiest to most extreme:
RTL Fix -> Gate Fix -> Metal Fix -> FIB Fix
First, let's review fundementals. A standard-cell ASIC consists of at least 2 dozen manufactured layers/masks. Lower layers conists of materialsmaking up the actual CMOS transistors and gates of the design. The upper 3-6 layers are metal layers used ti connect everything together. ASICs, of course, are not intended to be flexible like an FPGA, however, important "fixes" can be made during the manufacturing process. The progression of possible fixes in the manufacturing life cycle is as listed above.
An RTL fix means you change the Verilog/VHDL code and you resynthesize. This usually implies a new Plance&Route. RTL fixes would also imply new masks, etc. etc. In other words - start from scratch.
A Gate Fix means that a select number of gates and their interconections may be added or subtracted from the design (e.g. the netlist). This avoids resynthesis. Gate fixes preserve the previous synthesis effort and involve manually editing a gate-level netlist - adding gates, removing gates, etc. Gate level fixes affect ALL layers of the chip and all masks.
A Metal Fix means that only the upper metal interconnect layers are affected. Connections may be broken or made, but new cells may not be added. A Sewing Kit is a means of adding a new gate into the design while only affecting the metal layers. Sewing Kits are typically added into the initial design either at the RTL level or during synthesis by the customer and are part of the netlist. A Metal Fix affects only the top layers of the wafers and does not affect the "base" layers.
Sewing Kits are modules that contain an unused mix of gates, flip-flops or any other cells considered potentially useful for an unforseen metal fix. A Sewing Kit may be specified in RTL by instantiating the literal cells from the vendor library. The cells in the kit are usually connected such that each cell's output is unconnected and the inputs are tied to ground. Clocks and resets may be wired into the larger design's signals, or not.
A FIB Fix (Focussed Ion Beam) Fix is only performed on a completed chip. FIB is a somewhat exotic technology where a particle beam is able to make and break connections on a completed die. FIB fixes are done on individual chips and would only be done as a last resort to repair an otherwise defective prototype chip. Masks are not affected since it is the final chip that is intrusively repaired.
Clearly, these sorts of fixes are tricky and risky. They are available to the ASIC developer, but must be negotiated and coordinated with the foundry. ASIC designers who have been through enough of these fixes appreciate the value of adding test and fault-tolerant design features into the RTL code so that Software Fixes can correct mior silicon problems!