Cette page appartient aux archives web de l'EPFL et n'est plus tenue à jour.
This page belongs to EPFL's web archive and is no longer updated.

ThreeD FPGA

Three-dimensional Place and Route for 3D FPGAs

 TPR: Three-dimensional Place and Route for 3D FPGAs

This pages gives us the information on 3D Place and Route tool for FPGAs.

http://mountains.ece.umn.edu/~kia/Download/tpr.html

 

 

 

 

Posted by Shashi Kanth Bobba at 11:45
Tabula Explains 3D FPGAs

 

The Spacetime Continuum

Tabula Explains 3D FPGAs

by Kevin Morris

 

Don't try to understand how Tabula's 3D FPGA fabric works.  You don't need to know that.  All you have to do is wait until they announce a family based on the new technology, and then you can buy bigger, faster FPGAs for less money.  And, yes, you can keep using your standard HDL-based FPGA design methodology.  

That's it.  

We're done.  

You can stop reading now.

Why are you still here?  Do you think we're going to explain what "spacetime" means, and how an FPGA could possibly be 3D?  Are you expecting to hear about logic cells being resource-shared and time-multiplexed at super-high frequencies, creating a virtual logic fabric with up to eight times the density of the physical silicon?

Nah.  

All you need to do is wait a little while, start your next project, check out Tabula's datasheet, and tell your boss "Hey, these are really big FPGAs.  I don't know how they make them this size for so cheap, but we could use some of that extra capacity right about now."

OK, you're determined to keep reading, aren't you.  You'll be sorry.  You won't have that nice, plausible deniability where you can safely claim not to understand. At some point, you'll find yourself drawing crazy diagrams on the whiteboard while your boss squints his eyes and drums his fingers on the table...

Tabula has been in "stealth mode" perhaps longer than any company in the history of the concept of "stealth mode."  For years now, every spring we wait and watch expectantly while Tabula peeks out of their burrow, sees their shadow, and goes back underground for another year of venture-funded silence.  In fact, Tabula was founded in 2003, has over 100 employees, and has secured a hefty $106M in venture funding to date.  Why so long and so expensive?  Because starting a new FPGA company is difficult.  As we've seen proven again and again, time, talent, and money are all three required - and with mask and NRE costs escalating exponentially, the challenge gets harder every year.  

Tabula is now breaking their silence and preparing to come to market with a radical new architecture they call "Spacetime."  Tabula describes Spacetime as a 3D FPGA architecture.  By doing some very clever work in the architecture and tools, the company claims some impressive improvements in effective density of their FPGAs.

Close your eyes and picture a LUT - the basic logic cell for all of FPGAdom.  Now, imagine that your LUT can connect to surrounding LUTs by programmable interconnect.  Got it?  You are now picturing a normal FPGA.  Now, imagine that there are LUTs also located vertically above and below your LUT.  (There aren't, but keep pretending.)  Now, imagine that your LUT also includes some hardware that allows its inputs and outputs to be stored, and its truth tables to be re-programmed very quickly.  Let's put a very fast clock and controller on that hardware so that your LUT is being constantly reprogrammed and cycled through eight different implementations.  We've just created a LUT that is 8-way resource shared.  

Tabula does exactly this with Spacetime.  A 1.6 GHz clock toggles every LUT on the device to cycle through 8 different programs, creating a virtual 3D fabric.  Any given LUT can connect to other logic on the same level with conventional interconnect, or to logic on other levels through a "time via" (a set of registers that hold the inputs and outputs until a subsequent reconfiguration).  The Spacetime clock and the reconfiguration are transparent to the user.  The user clock will be running at a much slower rate - typically something like 200MHz (allowing all 8 folds of the spacetime clock to complete in one user clock cycle).  

Every physical LUT on the device now counts as 8 possible LUTs, giving us a theoretical 8-fold density increase.  Sounds like a good trick, but how do we design for it?  That's the best news of all.  The complexity of the architecture is absorbed by the placement and routing tools.  Place-and-route views the device as a 3D array of logic cells.  Your HDL design is identical to what you'd use on a normal FPGA.  Your synthesis and simulation tools don't know the difference (OK, that's a bit of a lie on synthesis - we'll explain in a bit).  The place and route tools take the resulting netlist and map it to a 3D configuration of logic elements, managing the regular programmable interconnect and the time vias in the background.

There are some very nice side-benefits of this approach.  First, let's think about routing proximity.  Since LUTs are now arranged in a 3D space, the distance from any given LUT to the next LUT in the netlist is now much shorter, or the number of LUTs that are close by is much larger.  Shorter interconnect means easier timing closure.  In fact, the interconnect that goes through time vias has a very predictable timing profile because it is simply latching values through a known number of Spacetime clock cycles.  Also, the smaller size of the die relative to the number of "virtual" LUTs means the device requires less overall routing resources.  In most FPGAs, routing resources are the dominant consumer of area.  The knock-on effect of the physical LUT fabric being smaller is that the physical routing resources are proportionally smaller as well.  

Tabula does the same trick with RAM, which has the windfall bonus of every RAM being effectively 8-port memory.  Compounding the advantage - conventional FPGAs use 2-port RAM (with the corresponding 2X area penalty), but Tabula's architecture uses single-port memories to achieve the 8-port effect, so the result is 4x the ports of conventional FPGA memory with double the density.  

There is always a price to be paid for a novel architecture.  Since just about every FPGA or programmable logic startup has some kind of radical new architecture, the key is always mitigating the negatives.  The most common mistake made by FPGA startups is allowing their architectural innovations to impact the development process.  Tabula appears to have avoided this mistake.  Their architecture works with conventional FPGA design techniques. The only difference is that the company plans to require us to use their proprietary synthesis as well as place-and-route.  If they did the job right, however, there are considerable benefits to be had from combining synthesis with place-and-route, particularly in these days of interconnect-dominated delay.

So, what is the Achilles' heel of Tabula's Spacetime?  We'd suspect it would be power consumption.  Having the entire chip toggling away at a brisk 1.6 GHz all the time sounds like a lot of dynamic power consumption to us.  The company claims it has mitigated the power consumption problem, however, so we'll have to wait for their actual device announcements, datasheets, and dev kits to find out what that means.  

Another typical hurdle for FPGA startups is field support.  The established FPGA companies rely heavily on their trained and seasoned worldwide AE teams.  Tabula's management team is made up of industry veterans, so they probably haven't forgotten that part.  

Overall, the new architecture is exciting and promising.  Tabula says they'll follow with an announcement of actual product in the near future, so we'll be waiting to report on that.  Tabula is attacking the communications infrastructure market - which is the cash cow of the entrenched FPGA superpowers, so we can expect a good fight.

Posted by Shashi Kanth Bobba at 11:32
low-Cost 3D FPGA from Tier Logic with some cool figures

Because I'm more than a little lazy (and I don't get paid to blog), permit me to quote from Clive Maxfield's Blog

Tier Logic's 3D-FPGA technology = low-cost FPGAs and no-risk ASICs

Writen by Clive Maxfield

 

This is seriously cool – I am very excited. Actually, I'm tremendously excited about all sorts of innovations that are taking place in programmable logic space ("where no one can hear you scream"). As far as I'm concerned, the programmable logic domain (FPGAs and related devices) is absolutely the place to discover the coolest technology going.

Just today, for example, the folks at Tier Logic announced something really, really interesting. When we're in the design phase, we need a high degree of flexibility, the ability to implement changes quickly, the ability to work with real hardware and ... ultimately ... a fast way to get to the market. FPGAs address these needs, but traditional FPGAs are too expensive for volume production.

By comparison, when we're in the production phase, we need low manufacturing costs and low power consumption and high reliability and high security. So ASICs are the best alternative for volume production, but traditional ASICs are risky, expensive, and time-consuming to develop ("costly to obtain, cheap to sustain").

Another consideration is creating an FPGA prototype and then migrating that prototype into an ASIC implementation. Different vendors have different solutions to this – some are better than others and I could waffle on for hours, but let's not (a) wander off into the weeds or (b) dilute the message from Tier Logic.

The point is that the folks at Tier Logic say that they have the answer to all our problems. In fact they do have an incredibly cunning solution – one of those ideas that is so simple (conceptually) that as soon as you see it you slap yourself on the head and say: "Doh! Why didn’t I think of that?"

Introducing TierFPGAs
Let's start at the beginning and build the suspense. Consider a traditional SRAM-based FPGA. We might think of this as a 2D FPGA, because both the user logic (including look-up tables, registers, memory blocks, DSP blocks, etc.) and the configuration SRAM cells are created in the same silicon:

A standard 2D FPGA

The result is that a large proportion of the traditional 2D FPGA die is consumed by the configuration SRAM. This has several implications, not the least of which is the fact that the various user logic elements have to be spread out to make room for the configuration cells, which increases the length of the tracks between those logic elements, which increases the time it takes signals to pass through the tracks, which negatively affects performance. Another consideration is that when we come to convert the FPGA into an ASIC – in which all of the logic elements are closer together – the timing of our design will change.

So now let's consider the Tier Logic solution. In the case of the TierFPGA, what they've done is to separate the user circuits and configuration circuits into three-dimensional (3D) stacked layers. The user logic is implemented in the silicon, which has the standard metallization on top. In fact 90% of the process is standard CMOS as illustrated below:

The TierFPGA

The configuration cells are implemented using a standard LCD-based Thin-Film Transistor (TFT) process, which accounts for 10% of the process steps. These TFTs are implemented in amorphous silicon, which means they have low performance (not a problem because they are used only for configuration) and low power consumption (there's no leakage). (I would be very interested to hear how these larger TFTs behave in high-radiation environments – it might be that these devices are inherently radiation-tolerant with regard to their configuration bits, which are the bane of standard 2D SRAM-based FPGAs.) Reducing the configuration overhead from the base-layers of silicon allows Tier Logic to produce smaller, denser, faster, lower-power, and more reliable FPGAs. This means that if we were to implement a 3D TierFPGA device with the same capacity as a regular FPGA, it will be physically much smaller, which shortens the tracks, speeds the signals, and lowers the cost as illustrated below:

The TierFPGA is much smaller and cheaper

Another way of looking at this is that if you kept the same die size as your original 2D FPGA, your 3D TierFPGA could support a much greater capacity!

Introducing TierASICs
But wait, there's more (this is where things get really exciting). Just look at the image above and think about this for a moment. Suppose we want to convert  our TierFPGA into a TierASIC ... all we have to do is replace the programmable configuration circuitry layer with simple metal layer. And that's just what they do ... once the TierFPGA design is frozen and signed off, the bit-stream information is used to create a single custom-mask metal layer that replaces the SRAM programming layer, resulting in a cost-reduced TierASIC device for high-volume production as illustrated below:

The TierASIC

As we see, the original ninth layer of copper metal becomes the configuration layer with straps to Vcc and Ground, and the entire process is 100% standard CMOS. The really cool thing here is that we don’t have to change our original FPGA design in any way. The resulting TierASIC is 100% timing-compatible, pin-compatible, and package-compatible (including parasitic). Thus, unlike any other type of ASIC conversion, the timing remains identical between the FPGA and ASIC, allowing zero-risk, zero-effort conversions.

Migrating from the TierFPGA to the TierASIC

Come on... you have to admit that this is very, VERY clever!

Industry-Standard Tool Flow
Although Tier Logic’s 3D structure is different from other FPGAs, users will be very familiar with the architecture and tool flow when they design with the Mobius tools from Tier Logic because they have the same features as existing FPGA providers and the design flow is exactly the same. New or existing FPGA designs are easily synthesized, packed, placed, and routed into Tier Logic devices using industry-standard design tools, such as Precision Synthesis from Mentor Graphics, combined with Tier Logic's Mobius design tool suite. Mobius tools also create the bitstream for TierFPGA devices and the metal-mask data for TierASIC devices.

As Tier Logic's CTO Raminda Madurawe says, "The innovation of Tier Logic's monolithic 3D-FPGA significantly enhances the value of programmable solutions. By moving programmable overhead into the third dimension, we improve cost, power, performance, and security – all of the drawbacks associated with traditional FPGAs – without losing programmability. We can remove this overhead completely from the device without altering implemented designs to offer users timing-exact, very-low-cost ASICs – something traditional FPGAs simply can't offer."

TierASIC Benefits

  • Low NRE cost: <$50K
  • Low unit cost: Up to 75 percent less than typical FPGA pricing
  • Easy conversion from TierFPGA devices: no engineering effort and identical, timing-exact performance
  • Fast time-to-volume: Four weeks to first delivery
  • High reliability: Single-event upset (SEU) immune

Capital-Efficient Startup
Tier Logic was founded by FPGA process-technology pioneer Raminda Madurawe and is led by Doug Laird, formerly CEO of Cswitch and a founder of Transmeta. Matrix Partners and Walden International provided Tier Logic’s Series A funding and it is extremely unusual for a semiconductor startup to come to market and take initial orders while still on a first round of financing.

"FPGA startups, and semiconductor companies in general, tend to burn through a lot of cash. Tier Logic has been very cost-effective and careful with its funding," commented Doug Laird, President and CEO of Tier Logic. "Not only have we developed silicon and tools, but unlike most semiconductor startups, we also developed a process technology first. And we’ve already been granted over 50 patents on fundamental 3D concepts and architectures. All this has been achieved for less than $20 million."

Early Access Program
Tier Logic is making a special offer to customers wishing to get early access to its technology. TierFPGA devices will be sampling in Q2 of this year, with production qualified in Q4. However, TierASIC devices are available immediately and will be in volume production in Q2. Until the sample TierFPGA devices are shipping, Tier Logic is offering customers with existing FPGAs who wish to take advantage of immediate conversion to TierASIC devices a free NRE if they place an order for $50k or more of production. In addition, for an order of $100k or more, Tier Logic will also create a custom pin-compatible package to avoid customers having to alter existing PCBs. More information on this offer is available at www.tierlogic.com/launch.

Posted by Shashi Kanth Bobba at 11:32
Tier Logic 3D FPGA

 Because I'm more than a little lazy (and I don't get paid to blog), permit me to quote from Ron Wilson's and Dean Steven's Blog

  

TierLogic lifts the veil: another take on the 3D FPGA

Mar 10 2010 9:42AM | from Ron Wilson |


TierLogic, yet another large and expensive FPGA start-up that has been in stealth mode for years, today unveiled a radical approach to increasing the density and utility of large programmable logic devices. Like previously-announced Tabula, TierLogic describes their design as a 3D FPGA. But the two approaches are totally unlike each other, and neither is related to the concept of 3D ICs—involving stacked dice and through-silicon vias—that is currently the hot topic in SoC-of-the-future circles.

TierLogic's big idea is elegant and audacious: increase the density of FPGAs by moving all the configuration memory—not the data memory or the look-up-table (LUT) memory, but the RAM cells that control the interconnect muxes—out of the silicon. Removing these memory bits by itself can cut die area—at least the die area occupied by logic fabric—more than in half, according to the company's vice president of sales and marketing, Paul Hollingworth. TierLogic employs this advantage to use a mature 90nm process node and still deliver a smaller die area than a conventional SRAM FPGA would require, making it possible to offer the FPGAs at about half the cost of equivalent conventional parts.

But those SRAM cells have to go somewhere. That's where TierLogic's foundry partner Toshiba comes into the picture. Toshiba has developed a unique back-end-of-line process that puts a layer of amorphous-silicon thin-film transistors (TFTs) on top of the interconnect stack. The proprietary process uses virtually none of the wafer's thermal budget, so it's compatible with advanced CMOS. Yet at 180nm dimensions Toshiba can produce sufficiently fast and dense TFT SRAM cells to accommodate all the configuration memory required for the FPGA below. And since the configuration SRAM just sits there providing steering bits to the muxes—no user delay paths pass through the configuration memory—the slower, more stable TFT SRAM has no impact on user timing, except for the significant benefit of allowing the active die area be much smaller.

So this is what TierLogic means by 3D: the chips have two separate layers of active circuitry. The substrate holds the logic cells, interconnect muxes, block memory, and other user-accessible features. The TFT layer on top of Metal-8 holds the configuration memory. The result is an FPGA that can be functionally equivalent to industry-standard devices, but potentially on smaller dice, and so significantly lower in cost and power. Hollingworth said that in practice, TierLogic parts will be about 30 percent denser than economy FPGAs and 2.6 times the logic density of high-end conventional devices. For reasons we'll discuss later, TierLogic is also claiming about a third better logic-cell utilization, so overall the company boasts over three times the logic density of existing high-end FPGAs.

There is a second major advantage to this two-layer implementation: ASICconversion. Since the TFT SRAM cells are not in any user timing paths, TierLogic can replace the TFT layer with a simple metal layer containing hard straps to power and ground busses, and have no impact on user timing (except of course for eliminating the need for a power-up configuration mode.) Eliminating the TFT layer reduces cost further, creating a mask-programmed device that is functionally- and timing-equivalent to the field-programmable device, but cheaper. "This is the first time there has been an ASIC solution that really fits for volumes between a hundred and ten-thousand units," Hollingworth maintains.

The turn-around time for reducing a fuse map to a metal-mask and delivering the mask-programmed parts is four weeks. No redesign is necessary, nor should there be any need to reclose timing, although some customers will still have to requalify the parts. The quick turn-around is in part because TierLogic can bank all its wafers at Metal-8, and simply send the wafers to either the TFT line or to Metal-9 fabrication, as needed.

This capability gives TierLogic the equivalent of Altera's Hardcopy capability—with die size and cost intermediate between an FPGA and a cell-based ASIC—but without requiring the customer to redo timing closure with a new set of timing files. The company is underlining this point by offering to early adopters that TierLogic will do the conversion from an existing production or prototype FPGA design or ASIC design to a TierLogic metal-programmed part as a service. For a minimum order of 50k units the service is free. The company will give you complete pin-compatibility with your existing part for a small NRE, or throw pin-compatibility in as well on a 100k-unit minimum order.

The tool flow for the devices is familiar: Mentor Precision Synthesis followed by proprietary mapping, routing, and analysis. One interesting point in the mapping process is that TierLogic's LUTs are fracturable. If a path requires only a portion of a LUT—an inverter, say—the rest of the LUT is available to other nets. "Fracturing is known to be valuable—it improves our logic-cell utilization by 36 percent," Hollingworth said. "But if your configuration RAM is on your die, it's just too costly to support fracturing."

Apparently Tabula's announcement persuaded TierLogic to announce a little earlier than they had intended. The company is not ready to give detailed product descriptions yet. Hollingworth did say that the mask-programmed version of the parts is available today, so TierLogic invites interested prospects to register on their site and get more detailed information. The company has already done one design that includes an on-chip MIPS R4000 CPU implemented in the logic fabric, for example.

Hollingworth expects to ship engineering samples of the field-programmable part with the TFT SRAM layer by the end of June this year, but it will be a while longer before those devices are qualified for full production. There are still issues with TFT yield, he admitted, but the company has seen a new run that appears to solve the problem. It just has to be fully evaluated.

In the future, TierLogic has several options. Hollingworth said that the engineering team has done critical-dimension analysis that indicates the TFT approach will scale to at least the 40nm node, giving the idea lots of room for evolution. And there are at least two more revolutionary ideas afoot. First, since the TFTs are relatively low-performance devices processed at low temperature, the TFT layer is compatible with just about anybody's advanced CMOS process. So TierLogic can license a field-programmable logic fabric as IP for use inside a cell-based SoC. You could have your wafers built at your favorite foundry, passivated and shipped to Toshiba, who could strip the passivation and fabricate the TFT layer. Hollingworth said that the company has already had discussions along these lines with some prospects.

The second point Hollingworth mentioned is that Toshiba is looking at the laser-annealing process that is coming on-stream for the 32nm process node. Once the laser-annealing systems are developed and in place, the high-speed laser annealing could create the local high temperatures necessary to produce a much higher-performance TFT without impacting the thermal budget of the underlying wafer. This would in principle allow TierLogic to put not just configuration memory but signal-path devices such as embedded SRAM blocks and even some logic structures or analog circuits in the top layer, creating an even denser FPGA. But that is for the future.

 

 

Tier Logic Uncloaks – It’s a 3D FPGA Company!

Mar 11 2010 | from Dean Stevens |
 
Just as I was getting ready to head over to the IEEE CPMT-SCV 3D IC Integration meeting last night, this fascinating story popped up.  Tier Logic, a Santa Clara 3D FPGA venture has finally dropped their cloak of invisibility to reveal another 3D FPGA company.

Because I'm more than a little lazy (and I don't get paid to blog), permit me to quote from Ron Wilson's excellent post over on EDN:

TierLogic's big idea is elegant and audacious: increase the density of FPGAs by moving all the configuration memory—not the data memory or the look-up-table (LUT) memory, but the RAM cells that control the interconnect muxes—out of the silicon.  Removing these memory bits by itself can cut die area—at least the die area occupied by logic fabric—more than in half, according to the company's vice president of sales and marketing, Paul Hollingworth.  TierLogic employs this advantage to use a mature 90nm process node and still deliver a smaller die area than a conventional SRAM FPGA would require, making it possible to offer the FPGAs at about half the cost of equivalent conventional parts.

But those SRAM cells have to go somewhere.  That's where TierLogic's foundry partner Toshiba comes into the picture.  Toshiba has developed a unique back-end-of-line process that puts a layer of amorphous-silicon thin-film transistors (TFTs) on top of the interconnect stack.  The proprietary process uses virtually none of the wafer's thermal budget, so it's compatible with advanced CMOS.  Yet at 180nm dimensions Toshiba can produce sufficiently fast and dense TFT SRAM cells to accommodate all the configuration memory required for the FPGA below.  And since the configuration SRAM just sits there providing steering bits to the muxes—no user delay paths pass through the configuration memory—the slower, more stable TFT SRAM has no impact on user timing, except for the significant benefit of allowing the active die area be much smaller.

Illustration of Tier Logic Technology

Mr. Wilson goes on to observe that this approach not only provides much better density (as much as 3x standard FPGAs), but that it provides a very quick, and accurate, path to creating mask-programmed devices.  If you're interested in the subject, you should check out the original post.  There's also a story in EETimes.

Using TFT's for the configuration memory is a fascinating approach.  4D Chips has looked at similar architectures, where the configuration memory was actually in a second die (conceptually, we're thinking more along the lines of the work going on over at NuPGA), but the Tier Logic architecture is certainly intriguing.

By the way, as Mr. Wilson ably points out, Both NuPGA and Tier Logic are developing FPGAs that leverage vertical integration.  The third dimension in the recent announcement from Tabula is time, not stacked die.  I was going to do a post about Tabula, but it doesn't ever seem to bubble up on the priority list.  Just in case it never does, EETimes has an excellent article describing the company and the technology.

This is certainly an exciting time for 3D.

 

 

 

Posted by Shashi Kanth Bobba at 10:42