Need to Explore Parallelism, Capability on Multi-core Platform for IC Design Flow Tools

Date:   Monday , November 02, 2009

World is moving towards complex chip design. Various high end devices are required for scientific computing, telecommunication, multimedia, graphics, and consumer electronics. Such state-of-the-art designs require complicated systems and high end IC design EDA tools. Current IC design EDA tools and hardware platforms are over-exhausted to handle the flow, and it takes a lot of time to meet the market. This is affecting the growth of the semiconductor industry and it is restricting us from moving in to a new era of electronics design. The EDA industry has taken a wake up call and has taken a step towards developing parallel and multi-core computation capable IC design tools.

The world of high end technical computing discusses a lot about parallel programming, and focuses on customizing algorithms written to utilize hardware in most efficient way. We always talk about various factors related to high performance systems by introducing multi-core and multiprocessor systems and the growing availability of computer clusters for various high end applications like scientific computing, military, and aviation; but it is time to think about using such high end multi-core platforms and techniques to produce more capable EDA tools to have next generation of complex devices.

If we take a look at history, commercial high end tools to support the development of technical computing applications for high performance systems did not exist. Parallel programming was very rarely used in applications and its development was restricted to a small and technically skilled group of people, and considered as an art applied by specialists who focused on achieving maximum performance by using custom setups and by tuning their applications for specific hardware. Parallel applications like IC design tools are now being developed to assist design engineers in designing, developing, debugging, and evolving hardware and focusing beyond custom algorithms and performance. To achieve success, we need to extend the functionality of standard serial architecture IC design tools used to support parallel architecture multi-core and multiprocessor tools, without extensively modifying the code that leads to a robust IC design and development environment. It is practically impossible to do with minimum change in legacy and stable software code. One of the alternatives for this problem is that we need to force major software rewrites and shift market momentum to a new generation of EDA startups. This option is very expensive and we need to put highly skilled and experienced resources that understand complexity of multi-core programming, if we need to achieve it in given time.

We all acknowledge that multi-core IC design tool support will be essential in the future, and all EDA vendors claim some multi-threading and multi-core capabilities today. But still we need to work hard to incorporate this in various stages of IC design flow from design, verification, synthesis, scan insertion, and Place and Route i.e. physical verification till mask generation to have actual functionality on silicon at various development stages.

Fig. 1 shows how we have targeted multi-core platform use for high end scientific computing devices or intensive computing devices for more efficiency; because we cannot achieve the same throughput from single core systems in terms of various factors, for example time required to complete the job. We need to consider that IC design tools are equally important in this process and need very high attention, if we do not take steps to develop multi-core IC design tools, the entire process of IC design is going to get affected.

To adopt, improve, or enhance multiple CPUs is a challenging activity in the IC design EDA space. Some applications for functional and physical verification effectively use distributed processing over ‘farms or clusters’ of efficiently configured and networked workstations. Probably managing the power of dozens or hundreds of CPUs is a challenging task in this process. Some of these applications have adopted multi-threading to take advantage of workstations with multiple CPUs.

A few multi-threaded applications, such as Mentor Graphics’ Calibre DRC physical verification tool, run equally well on distributed networks, multiple-CPU workstations and multi-core CPUs. Customers are now placing dual-core and quad-core CPU-based workstations in compute farms, combining both distributed networks and multi-core CPUs. One of the difficulties with distributed processing is that the latency between processors is very high. Latency can be reduced with a workstation that contains multiple single-CPU chips, but the bus is a limiting factor at this stage. The greatest speed benefits come from multi-core processors on the same die, even though there’s very small difference between multiple-CPU architectures and multi-core architectures from a software implementation point of view.

The programming for multi-core architectures is a complex task, and programmers are trying to adapt and effectively modify legacy applications that may prove useless. Usually, multi-core programming involves the use of threads to distribute work. It is equally important to collect and interpret coordinate responses from all the threads. From a software point of view, particularly in the EDA landscape, many algorithms are inherently sequential and show only limited gains when multi-threaded. If we need to achieve high gain out of it, they will need to be rewritten.

Multi-core EDA system design has some known drawbacks, which is a hot research topic in front of all of us. We need to have a fair amount of investment in this area.

If we decide to explicitly create complex IC design EDA software programs at this stage, it would be the most complex task and could lead to a critical problem. There is consequential overhead needed to partition problems into parallel tasks and there’s also post-processing analysis overhead required to assemble or integrate all the results together. If the computational workload isn’t partitioned and distributed properly the communications overhead can sink any gains brought about by parallelism. There will be the complicated issue of poor support, because work is in distributed form and it involves threading from debuggers, this multi-threading can create race conditions that could be extremely hard-to-debug. This delay in debug for tool issues could prolong the mainline IC design cycle.

To build a threaded EDA architecture software requires a ground-up development. It’s hard to implement and we can take it really as rocket science. Memory and data management will be a challenge for multi-core and multiprocessor IC design EDA software. Coding styles that employ global variables and don’t separate data and execution make the rewrite task difficult.

Some theoretical analysis proves that there’s a narrow scope for parallelism for EDA analysis tools. Synthesis tools are one of the major areas. It appears to be a more resistant obstacle to parallelization because synthesis tools deal with too many problem and data interdependencies.

Large EDA vendors acknowledge multi-core’s challenges but say they are making good progress. Mentor Graphics, in fact, has one of the earliest multi-threaded EDA products with Calibre which is state-of–the-art tool having multi-core capability and the market is going to take advantage of that type of architecture. But, as discussed, we also need to understand the infrastructure cost required for actual usages of multi-core systems.

One of the known limitations for adoption of multi-core IC design EDA tool is Amdahl’s Law. We have to understand and anticipate Amdahl’s Law, which imposes a kind of speed limit on parallel processing systems. Amdahl’s Law, also known as Amdahl’s Argument, is named after the computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors. The speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. For example, if a program needs 20 hours using a single processor core, and a particular portion of one hour cannot be parallelized, while the remaining promising portion of 19 hours (i.e. 95 percent) can be parallelized, then regardless of how many processors we devote to parallelized execution of this program, the minimal execution time cannot be less than that critical one hour. Hence, the speed up is limited to 20x, as the diagram illustrates.

The EDA industry needs to explore further and parallelize the code so that we see better scaling, this creates very good opportunity for research to see beyond the horizon.

The requirement for high frequency and low power dissipation is the main driving force for multi-core configuration. The IC design EDA industry must be more focused and stay ahead of the curve. We must put more efforts and explore parallelism capability on multi-core platform for IC design EDA tool development that will give us a more focused, better, and faster design cycle and enable time-to-market saving for the high revenue semiconductor industry.

The author of the article is Sachin Pathak, Mentor Graphics.