Curriculum Vitae Weirong Zhu Email: weirong at gmail dot com Homepage: http://www.capsl.udel.edu/~weirong June 25, 2008 Contents 1 Research Interests 2 Education 3 Professional Experience 4 Selected Projects 5 Publications 6 Awards 7 Courses 8 Computer Skills 9 Activities 10 References 1 Research Interests * Parallel Computing, Parallel Systems, and Parallel Applications * Computer Architecture and Systems * Advanced Parallel Programming Model 2 Education 2001 Aug. - 2007 May. University of Delaware Ph.D. of Electrical & Computer Engineering Major in Computer Architecture and Parallel Systems Advisor: Prof Guang R. Gao GPA: 3.939 Ph.D. of Electrical Engineering, May 2007 Ph.D. Dissertation: Efficient Synchronization for a Large-Scale Multi-Core Chip Architecture Master of Electrical Engineering, May 2004 Master Thesis: Multithreaded Parallel Implementation of HMMPFAM on EARTH 1997 Aug. - 2001 Jul. Tsinghua University Bachelor of Engineering Major in Automation 3 Professional Experience 2007 Jun. - present Software Development Engineer II, at Parallel Computing Platform group, Developer Division, Microsoft Corporation. 2005 Jun. - 2005 Aug. Intern, at ET International Inc., developing system software toolset for IBM Cyclops64 architecture, including communication kernel, SHMEM, automated test suite. Also worked for cluster market survey, and system administration. 2005 Jan Intern, at ET International, Inc., system software toolset for IBM Cyclops64 architecture, individually developed SHMEM shared memory access library for C64. 2004 Jun. - 2004 Aug. Intern, at ET International, Inc.,system software toolset for large-scale multi-core architecture, including multi-threading runtime thread library, communication library, parallel benchmarks, regression test framework, and simulator. 2004 Jan Intern, at ET International, Inc., developing multi-threading runtime thread library, and benchmarks for IBM Cyclops-64 large-scale multi-core architecture. 2001 Aug. - 2007 May. Research Assistant, at Computer Architecture and Parallel Systems Laboratory, advised by Prof Guang R. Gao, Newark, Delaware 2002 Jun. - 2007 May. System Administrator, in charge of all CAPSL servers and internal network, including NIS/NFS/Mail/Web/FTP/Print/Backup Servers, and two Beowulf clusters, University of Delaware, Newark, Delaware 1999 Sep. - 2001 Jul. Research Assistant, at Department of Automation, Tsinghua University, advised by Prof. Ling Wang, Beijing, P.R. China 2001 Feb. - 2001 Jul. A part-time project group manager and database programmer, Tsinghua Tongzhou Computer System Development Co., Ltd., Beijing, P.R. China 2000 Apr. - 2000 Jun. A part-time programmer, Legend Group (It is renamed to Lenovo Group now), Beijing, P.R. China 4 Selected Projects 2003 Jun. - 2007 May. Large-Scale Multi-Core Architecture: IBM Cyclops-64 architecture IBM Cyclops-64 employs the large-scale multi-core (or many-core) approach (160 thread units in a single chip) to build next generation cellular supercomputer. As a member of system software group, I worked on a software solution framework for large-scale multi-core architectures. The software infrastructure facilitate multithreading programming by including light-weight kernels, compilers, user run-time software and libraries, simulators, debuggers, automated test framework, and performance tools. I have been mainly working on the development of the multithreading runtime library, the C64 functionally accurate simulator, the low-level communication library, the SHMEM share memory access library, the OpenMP multithreading environment, and the automated regression test framework. I also ported and developed a set of parallel benchmarks for the C64 system. My research interest is to develop fine-grained multithreaded programming execution model on large-scale multi-core architectures. In particular, I focuses on the efficient fine-grain synchronization models and hardware/software mechanisms on this new generation architectures. 2001 Jul. - 2005 Jul. EARTH (Efficient Architecture for Running THreads) EARTH is a multi-threaded architecture and execution model that supports fine-grain, non-preemptive fibers. In its current implementations the EARTH multi-threaded execution model is built with off-the-shelf microprocessors in a distributed memory environment. The EARTH runtime system (RTS) assumes the responsibility to provide an interface between an explicitly multi-threaded program and a distributed memory hardware platform. I was the major author of the EARTH RTS 2.5 and several new benchmarks. Besides the development of the RTS, I also conducted research on employ EARTH mulithreading techniques on important bioinformatics applications, such as HMMPFAM, and performance portability evaluation of EARTH on various large-scale clusters. 2003 Apr. - 2004 May. Next Generation Programming Models This was a joint project among three national laboratories and eight universities. It will propose an array of attractive options for convenient, efficient, development of scalable, efficient scientific applications for terascale computers. This will facilitate application development by providing standard libraries, convenient parallel programming languages, and petaflops targeted advanced programming models. The perspective of our group is focus on future programming models based on advanced execution models, such as an extension to the EARTH model. I was working on developing a Parallel Neocortex Model, which will be a motivating application. 2003 Feb. - 2003 Jul. Parallelization of HMMPFAM Hmmpfam is a widely used computation-intensive bioinformatics software for sequence classification. Our parallel implementation is the first largely scalable and robust cluster-based solution of parallel hmmpfam based on EARTH. This implementation is also performance portable on various platforms. 2003 Sep. - 2004 May. DIMES DIMES(Delaware Iterative Multiprocessor Emulation System) is an FPGA based, cost effective rapid prototyping platform for large logic system designs consisting of repetitive functional blocks such as a multiprocessor core. The first emulation target of DIMES is the IBM Cyclops32 architecture. I was working on the fine grain multi-threaded data flow style programming model targeted to DIMES. 5 Publications Selected Publications ISCA2007 Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao, Synchronization State Buffer: Supporting Efficent Fine-Grain Synchronization for Many-Core Architectures, the 34th International Symposium on Computer Architecture (ISCA2007), San Diego, CA, USA June 9-13, 2007. (Acceptance rate: 22% (46/204)). ACM SIGARCH Computer Architecture News, Volume 35 Issue 2, June, 2007. DOI. LCPC2008 Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar, and Guang R. Gao, Minimum Lock Assignment: A Method for Exploiting Concurrency Among Critical Sections, 21st Annual Workshop on Languages and Compilers for Parallel Computing (LCPC2008), Edmonton, Alberta, Canada, July 31 - August 2, 2008. Cluster Comput. (Journal) Weirong Zhu, Yanwei Niu and Guang R. Gao, Performance Portability on EARTH: A Case Study across Several Parallel Architectures, in Cluster Computing (Journal), Volume 10, Number 2, June, 2007. DOI. EuroPar2006 Ziang Hu, Juan del Cuvillo, Weirong Zhu, and Guang R. Gao, Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences, The 12nd International European Conference on Parallel Processing (Euro-Par2006),Dresden, Germany, August 29 - September 1, 2006. CF2006 Juan del Cuvillo, Weirong Zhu and Guang R. Gao, Landing OpenMP on Cyclops-64: An Efficient Mapping of OpenMP to a many-core System-on-a-chip, the 3rd ACM International Conference on Computing Frontiers (CF2006), Ischia, Italy May 2-5, 2006. (Acceptance rate: 25% (30/120)). pdf, [35]DOI. IJHPCN (Journal) Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao, A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model, International Journal of High Performance Computing and Networking, Vol 2, Issue 2/3/4, 2004. DOI. Cluster2003 Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao, A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model, IEEE International Conference on Cluster Computing (CLUSTER2003), Hong Kong, P.R. China, December, 2003. (Acceptance rate: 29.3% (48/164)). pdf Full List of Publications 1. Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar, and Guang R. Gao, Minimum Lock Assignment: A Method for Exploiting Concurrency Among Critical Sections, 21st Annual Workshop on Languages and Compilers for Parallel Computing (LCPC2008), Edmonton, Alberta, Canada, July 31 - August 2, 2008. 2. Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao, Synchronization State Buffer: Supporting Efficent Fine-Grain Synchronization for Many-Core Architectures, the 34th International Symposium on Computer Architecture (ISCA2007), San Diego, CA, USA June 9-13, 2007. (Acceptance rate: 22% (46/204)). ACM SIGARCH Computer Architecture News, Volume 35 Issue 2, June, 2007. DOI. 3. Weirong Zhu, Ziang Hu, and Guang R. Gao, On the Role of Deterministic Fine-Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era, First Workshop on Multithreaded Architectures and Applications, in conjuction with 21th International Parallel and Distributed Processing Symposium (IPDPS2007), Long Beach, CA, USA, March 2007. 4. Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar, and Guang R. Gao, Towards Optimal Lock Assignment and Allocation - A Method for Exploiting Concurrency among Multiple Critical Sections, in ACM Conference on Principles and Practice of Parallel Programming, 2007 (Short paper and poster). DOI. 5. Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld, and Weirong Zhu, ParalleX: A Study of A New Parallel Computation Model, Workshop on NSF Next Generation Software Program (NSFNGS'07), in conjuction with 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA, March 2007. 6. Weirong Zhu, Yanwei Niu and Guang R. Gao, Performance Portability on EARTH: A Case Study across Several Parallel Architectures, in Cluster Computing (Journal), Volume 10, Number 2, June, 2007. DOI. 7. Weirong Zhu, Parimala Thulasiraman, Ruppa K. Thulasiram and Guang R. Gao, Exploring Financial Applications on Many-core-on-a-chip Architecture: A First Experiment, 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA-06), Sorrento, Italy, Dec.4-6, 2006. Lecture Notes in Computer Science, Volume 4331/2006. DOI. 8. Ziang Hu, Juan del Cuvillo, Weirong Zhu, and Guang R. Gao, Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences, The 12nd International European Conference on Parallel Processing (Euro-Par2006), Dresden, Germany, August 29 - September 1, 2006. 9. Weirong Zhu, Juan del Cuvillo, and Guang R. Gao, Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture, the second International Workshop on OpenMP (IWOMP 2006), Reims, France, June 12-15, 2006. Lecture Notes in Computer Science, Volume 4315/2008. DOI. 10. Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao, Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture, the 20th International Symposium on High Performance Computing Systems and Applications (HPCS2006), St. John's, Newfoundland and Labrador, Canada May 14-17, 2006. pdf, [45]DOI. 11. Juan del Cuvillo, Weirong Zhu and Guang R. Gao, Landing OpenMP on Cyclops-64: An Efficient Mapping of OpenMP to a many-core System-on-a-chip, the 3rd ACM International Conference on Computing Frontiers (CF2006), Ischia, Italy May 2-5, 2006. (Acceptance rate: 25% (30/120)). pdf, [47]DOI. 12. Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld, Weirong Zhu, Hierarchical Multithreading: Programming Model and System Software, Workshop on NSF Next Generation Software Program (NSFNGS'06), in conjunction with 20th International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece, April 25 - 29, 2006. 13. Juan B. del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao, FAST: a Functionally Accurate Simulator Toolset for the Cyclops-64 Cellular Architecture, Workshop on Modeling, Benchmarking, and Simulation (MoBS2005), in conjuction with the 32nd Annual International Symposium on Computer Architecture (ISCA2005), Madison, Wisconsin, June 4, 2005. pdf 14. Juan B. del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao, TiNy Threads: a Thread Virtual Machine for the Cyclops64 Cellular Architecture, 5th Workshop on Massively Parallel Processing (WMPP05), in conjuction with the 19th International Parallel and Distributed Processing Symposium (IPDPS2005), April 4-8, 2005 in Denver, Colorado. pdf, [50]DOI. 15. Weirong Zhu, Yanwei Niu and Guang R. Gao, Performance Portability on EARTH: A Case Study across Several Parallel Architectures, 4th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS'05), in conjuction with the 19th International Parallel and Distributed Processing Symposium (IPDPS2005), April 4-8, 2005 in Denver, Colorado. pdf, [52]DOI. 16. Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu, and Guang R. Gao, Sequential Consistency Revisit: the Sufficient Condition and Method to Reason the Consistency Model of a Multiprocessor-on-a-chip Architecture, The IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), February 15-17, 2005, Innsbruck, Austria. pdf 17. Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao, A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model, International Journal of High Performance Computing and Networking, Vol 2, Issue 2/3/4, 2004. DOI. 18. Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao, A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model, IEEE International Conference on Cluster Computing (CLUSTER2003), Hong Kong, P.R. China, December, 2003. (Acceptance rate: 29.3% (48/164)). pdf 19. Weirong Zhu, Yanwei Niu, Jizhu Lu and Guang R. Gao, Implementing Parallel Hmm-pfam on the EARTH Multithreaded Architecture in proceedings of the 2nd Computational Systems Bioinformatics conference (CSB2003), Stanford University, Stanford, CA, Aug, 2003. 20. Ziang Hu, Yuan Zhang, Yan Xie, Hongbo Yang, Guang R. Gao, Weirong Zhu and Haiping Wu, Code Size Reduction Using Global Code Motion, 1st Workshop on Compilers and Tools for Constrained Embedded Systems (CTCES2003), in conjunction with the 6th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES2003), San Jose, CA, Oct., 2003. Technical Memos * CAPSL Technical Memo 67 Revised Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao, Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures, CAPSL Techinical Memo 67 Revised, Nov 20th, 2006. * CAPSL Technical Memo 67 Weirong Zhu, Ziang Hu, and Guang R. Gao, Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look, July 17th, 2006. pdf * CAPSL Technical Memo 65 Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar, Guang R. Gao, Optimized Lock Assignment and Allocation for Productivity: A Method for Exploiting Concurrency among Critical Sections, May 10th, 2006. pdf * CAPSL Technical Memo 62 Juan B. del Cuvillo, Weirong Zhu, Ziang Hu, Guang R. Gao, FAST: a Functionally Accurate Simulator Toolset for the Cyclops-64 Cellular Architecture, June 2005. pdf * CAPSL Technical Memo 55 Juan B. del Cuvillo, Ziang Hu, Weirong Zhu, Fei Chen, Guang R. Gao, Toward a Software Infrastructure for the Cyclops64 Cellular Architecture, April 2004. pdf * CAPSL Technical Memo 53 Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu and Guang R. Gao Lamport Order Revisit: A Study on How to Efficiently Achieve Sequential Consistency on a Modern Multiprocessor-on-a-Chip Architecture, March 2004. pdf 6 Awards 2007 May. Computer Systems & Networking Graduate Faculty Award 2007, University of Delaware. 2005 May. Student Travel Award to IPDPS2005 from IEEE Technical Committee on Parallel Processing (TCPP). 2003 Nov. Student Travel Award to Cluster2003(HK) from IEEE Technical Committee on Parallel Processing (TCPP). 7 Courses ELEG651 Computer Networking and Communication. (A-) ELEG652 Principles of Parallel Computer Architecture. (A) ELEG662 Linear System Theory. (A) CPEG621 Compiler Design. (A) ELEG667 Programming for Supercomputer. (A) CPEG622 Computer System Design II. (A) ELEG631 Digital Signal Processing. (A) ELEG867 Information Theory. (A-) CISC663 Operating System. (A) ELEG622 Electronic Materials Processing. (A) ELEG866 Special Problem. (A) ELEG869 Master's Thesis. (A) CISC601 Elements of Theory of Computation. (A) CISC872 Advanced Program Analysis and Transformations. (A) 8 Computer Skills Programming Language: C, C++, C#, Perl, VHDL, Unix Shell Script, Variety of Assembly languages. Parallel Programming: Pthreads, MPI, PVM, OpenMP, UPC, Threaded-C. Platform: Linux, SUN Solaris, Windows NT/2000/XP. 9 Activities 2007 Jun. 8th Attending the 2007 Federated Computer Research Conference (FCRC), San Diego, CA. Presentation in the 34th International Symposium on Computer Architecture (ISCA2007). 2007 Mar. 26th - 30th Attending 21st International Parallel & Distributed Processing Symposium (IPDPS2007), Long Beach, California. Presentation in Workshop on Multithreaded Architectures and Applications. 2005 Apr. - present IEEE Computer Society Student Member 2003 Nov. - present IEEE Student Member 2005 Apr. 29th Attending IBM X10 Tutorial and Programming Exercise, Newark, Delaware. 2005 Apr. 8th - 10th Attending 19th International Parallel & Distributed Processing Symposium (IPDPS2005), Denver, Colorado. Presentation in 4th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS'05) 2004 Nov. Member of Delaware team for Exhibit in SC2004 for DIMES, Pittsburgh, PA. 2003 Nov. Member of Delaware team for Exhibit in SC2003 for DIMES, Phoenix, AZ. 2003 Nov. SC2003 Tutorial: S09: Programming with the Partitioned Global Address Space Model: UPC, Co-Array Fortran, and Titanium. 2003 Apr. Visit Argonne National Laboratory for the project of next generation programming models. Argonne, IL. 2002 Jan. SunTune Tutorial: Application Performance Optimization on Sun Systems. Newark, Delaware. Conference paper review for IWOMP07, ICS06, IWOMP06, NPC06, IPDPS06, ISPAN05, NPC05, IPDPS-PMEO05, IPDPS04, ICS04, NPC04, SBAC-PAD04, PDCS2004, MASCOTS02 10 References Available on request.