VSD - Distributed timing analysis within 100 lines code
What you'll learn
- Learn, code, analyze distributed framework
- Take up and run STA for challenging designs with hugh instance count and witness the benefits of distributed STA
- Be able to install and run Opentimer, which is opensource STA tool
- Be able to understand Unix commands
- Be able to understand basic STA terms and terminologies which can be learnt on Udemy, from Static timing analysis - Part 1 and 2 course
This webinar was conducted on 26th May 2018.
1) What happens when you type set_multi_cpu_usage -localCpu 4 on your EDA timing shell?
2) What happens when you type set_multi_cpu_usage -localCpu 4 -numThreads 4 on your EDA timing shell?
I had a curiosity, while working at my previous design companies, about how jobs are getting spawned on different machines? What if there are less machines and more jobs, and vice versa? How does the algorithm of a timing engine handles this?
I myself used to setup the entire distributed MMMC framework for timing tools at customer place, which was just setting the right variables (set_multi_cpu_usage), but never knew what goes behind the tools. Its the curiosity which leads to queries which leads to exploration and finally, leads to
answers. I found my answers from Tsung-Wei, who is the architect of popular opensource STA Tool Opentimer.
We all know timing analysis is a really important task in overall chip design flow and its so complex and difficult task. The chip that we incorporate today has billions of transistors, resulting timing analysis runtime is tool large. Also, we need to analyze timing under different conditions, so its not just a single run that you get a final result. While there are several solutions to mitigate this computation issue, the problem is most of the work is architecturally constrained by
single machine. And as design complexity continue to grow larger and larger, we have to add more and more CPU and memories to the machine, but not very cost-efficient
There are multiple places, we can introduce distributed computing to timing and major motivation is to speed up the timing closure. We have to analyze timing under different range of conditions, typically quantified as modes (test mode, functional mode) and corner (PVT). The number of combinations (timing views) you have to run is typically increasing exponentially with lower nodes. That's where you need to need to distribute timing analyses across different machines.
So let's distribute it and do it within 100lines of code using DTCraft - A High-performance cluster computing engine. Welcome to the webinar on "Distributed timing analysis within 100 lines of code"
Do you want to find your answers too? Enroll in the upcoming webinar on "Distributed timing analysis" with Tsung-Wei, do labs on your own,
understand the framework and I can guarantee you would be a better STA engineer or Lead than you were before
Speaker Profile:Tsung-Wei Huang
Tsung-Wei Huang is Research Assistant Professor, in Department of Electrical and Computer Engineering at University of Illinois at Urbana-Champaign, IL, USA. He has done his PhD in Electrical and Computer Engineering at UIUC. He holds 2 patents and more than 30 Conference and Journal Paper publications
Who this course is for:
- This course is for people who are proficient with timing concepts and want to move a level ahead, and stay ahead of curve
- Anyone enthusiastic to learn about distributed timing analysis from scratch i.e. from C++ code level
Kunal Ghosh is the Director and co-founder of VLSI System Design (VSD) Corp. Pvt. Ltd. Prior to launching VSD in 2017, Kunal held several technical leadership positions at Qualcomm's Test-chip business unit. He joined Qualcomm in 2010. He led the Physical design and STA flow development of 28nm, 16nm test-chips. At 2013, he joined Cadence as Lead Sales Application engineer for Tempus STA tool. Kunal holds a Masters degree in Electrical Engineering from Indian Institute of Technology (IIT), Bombay, India and specialized in VLSI Design & Nanotechnology.
Hands on with Technology @
1) MSM (mobile station mode chips) - MSM chips are used for CDMA modulation/demodulation. It consists of DSP’s and microprocessors for running applications such as web-browsing, video conferencing, multimedia services, etc.
2) Memory test chips - Memory test chips are used to validate functionality of 28nm custom/compiler memory as well as characterize their timing, power and yield.
3) DDR-PHY test chips - DDR-PHY test chips are basically tested for high speed data transfer
4) Timing and physical design Flow development for 130nm MOSFET technology node till 16nm FinFET technology node.
5) “IR aware STA” and “Low power STA”
6) Analyzed STA engine behavior for design size up to 850 million instance count ACADEMIC
1) Research Assistant to Prof. Richard Pinto and Prof. Anil Kottantharayil on “Sub-100nm optimization using Electron Beam Lithography”, which intended to optimize RAITH-150TWO Electron Beam Lithography tool and the process conditions to attain minimum resolution, use the mix-and-match capabilities of the tool for sub-100nm MOSFET fabrication and generate mask plates for feature sizes above 500nm.
2) Research Assistant to with Prof. Madhav Desai, to characterize RTL, generated from C-to-RTL AHIR compiler, in terms of power, performance and area. This was done by passing RTL, generated from AHIR compiler, through standard ASIC tool chain like synthesis and place & route. The resulting netlist out of PNR was characterized using standard software
1) “A C-to-RTL Flow as an Energy Efficient Alternative to Embedded Processors in Digital Systems” submitted in the conference “13th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, DSD 2010, 1-3 September 2010, Lille, France”
2) Concurrent + Distributed MMMC STA for 'N' views
3) Signoff Timing and Leakage Optimization On 18M Instance Count Design With 8000 Clocks and Replicated Modules Using Master Clone Methodology With EDI Cockpit
4) Placement-aware ECO Methodology - No Slacking on Slack
Tips on order in which you need to learn VLSI and become a CHAMPION:
If I would had been you, I would had started with Physical Design and Physical design webinar course where I understand the entire flow first, then would have moved to CTS-1 and CTS-2 to look into details of how the clock is been built.
Then, as you all know how crosstalk impacts functioning at lower nodes, I would gone for Signal Integrity course to understand impacts of scaling and fix them. Once I do that, I would want to know how to analyze performance of my design and I would have gone for STA-1, STA-2 and Timing ECO webinar courses, respectively
Once you STA, there’s an internal curiosity which rises, and wants us to understand, what goes inside timing analysis at transistor level. To full-fill that, I would had taken Circuit design and SPICE simulations Part 1 and Part 2 courses.
And finally, to understand pre-placed cells, IP’s and STA in even more detail, I would have taken custom layout course and Library Characterization course
All of above needs to be implemented using a CAD tool and needs to be done faster, for which I would have written TCL or perl scripts. So for that, I would start to learn TCL-Part1 and TCL-Part2 courses, at very beginning or in middle
Finally, if I want to learn RTL and synthesis, from specifications to layout, RISC-V ISA course will teach the best way to define specs for a complex system like microprocessor
Connect with me for more guidance !!
Hope you enjoy the session best of luck for future
Tsung-Wei Huang is an Assistant Professor at the Department of Electrical and Computer Engineering (ECE) in the University of Utah. His research helps software developers boost application performance through parallel and heterogeneous computing.
Tsung-Wei Huang received his PhD degree from the ECE Department at the University of Illinois at Urbana-Champaign (UIUC). He got his MS and BS degree in Department of Computer Science at Taiwan's National Cheng-Kung University.