Resume


Table of Contents

Curricula Vitae

Academic Experience
Education
Professional Experience
Current Research Areas
National Recognition
Institutional Recognition

Section A: Teaching and Research Supervision

A.1 Teaching
sp;  A.1.a Teaching at University of Delaware
    A.1.b Other Teaching Experience

  A.2. Research Supervision

Section B: Scholarship

  B.1: Research Activity and Interests

  B.2: List of Research Contributions
    Refereed Journal Publications
    Publications in Refereed Conference Proceedings(Last Six Years Only)
    Monographs, Books and Book Chapters

  B.3 Research Significance

  B.4 Research Support

Section C: Services

  C.1 University Activities and Services

  C.2 Professional Services

CURRICULUM VITAE

 

 

NAME: Guang R. Gao

 

OFFICE ADDRESS:

 

Department of Electrical Engineering

104 Evans Hall

University of Delaware

Newark, DE 19716

Tel: 302-831-8218

Fax: 302-831-4316

ggao@eecis.udel.edu

 

EDUCATION

 

Ph.D Degree in Electrical Engineering and Computer Science

Massachusetts Institutes of Technology, August 1986.

Member of Computational Structures Group at Laboratory of Computer Science, MIT,

June 1982 to August 1986.

 

Master Degree in Electrical Engineering and Computer Science

Massachusetts Institutes of Technology, June 1982.

 

BS in Electrical Engineering

Tsinghua University, Beijing.

 

PROFESSIONAL EXPERIENCE

 

University of Delaware

Newark, DE.

Associate Professor, Department of Electrical and Computer Engineering, Sept. 96-present

Associate Professor, Department of Electrical and Computer Engineering, Sept. 96-present

Founder and a leader of the Computer Architectures and Parallel Systems Laboratory (CAPSL) .

 

 

McGill University

Montreal, Canada

Associate Professor, School of Computer Science, June'92-August,1996

Assistant Professor, School of Computer Science, Aug.'87-June'92

Founder and a leader of the Advanced Compilers, Architectures and Parallel Systems Group (ACAPS) at McGILL since 1988.

 

Philips Research Laboratories

Sept. 1986 - June 1987

Briarcliff Manor, NY, USA

Senior member of research staff of the Computer Architecture and Programming Systems Group. Played a major role in founding a multiprocessor system project, and research in parallelizing compilers.

 

Massachusetts Institutes of Technology

June 1980 - Aug. 1986

Member of the Computational Structures Group at the Laboratory of Computer Science, MIT. Participated in the MIT Static Dataflow Architecture Project and other projects.

Proposed a novel methodology of organizing array operations to exploit the fine-grain parallelism of dataflow computation models. Developed a unique pipelined code mapping scheme for dataflow machines (later known as dataflow software pipelining).

 

Center Of Advanced Studies, IBM Toronto Lab

Aug 1993 - June 1994

Visiting scientist with a NSERC Senior Industrial Fellowship.

 

CURRENT RESEARCH AREAS:

 

Computer Architecture and Systems

Parallel and Distributed Systems

Optimizing and Parallelizing Compilers, Parallel Programming

VLSI and Application-Specific System Design

 

PROFESSIONAL MEMBERSHIP

 

I am a Senior Member of IEEE, Member of ACM, ACM-SIGARCH, ACM-SIGPLAN.

I am currently a Distinguished Visitor of IEEE Computer Society.

 

NATIONAL RECOGNITION:

 

 

  • IEEE, Senior Member

 

  • Program Committee Members of Recognized International Conferences

 

 

 

 

 

 

 

 

 

 

 

I have given seminars in many industrial and academic organizations: IBMT.J. Watson Research Center, IBM Toronto Lab, AT&T Bell Laboratories, BNR, HP Labs, SGI, DEC, NRL(Navy Research Lab.), MIT, Stanford, UC Berkeley, NYU, Cornell U., University of Victoria are just named a few.

 

 

 

Section A: Teaching and Research Supervision

 

A.1 Teaching

 

A.1.a Teaching at University of Delaware

 

 

CPEG 324 Computer System Design

 

This is is a new undergraduate CE design course which I developed and taught in Spring 97. Now, it will be offered again under the title CPEG-422 this fall.

 

This course stresses the principal design concepts which are embodied in modern computer architectures, and emphasizes ideas which we believe will continue to apply into the future, in spite

of a rapidly changing technological environment. The primary objective of the course is to show how the design and evaluation of architectural features, based on both qualitative and quantitative studies, can be used to achieve balanced, efficient systems, well-matched to the class of problems they overcome.

ELEGG-652: Topics in High-Performance Architecture

 

This is a graduate core course and I taught it first time in the fall of 1997.

This course examines the basic principles and methodology used in the design and evaluation

of high-performance computer architectures, and its relation with the underlying program execution and architecture models. Topics include pipelining and vector processing, instruction level parallelism (ILP) architectures, multiprocessor architectures and high-speed networking, memory consistency models and cache-coherence issues, fine-grain parallelism and multithreaded architectures, and the roles of optimizing and parallelizing compilers.

 

ELEG 867-14: Topics in Hardware/Software Codesign

 

This new course introduces the concepts, principles and methods of digital system design from both a hardware and a software viewpoint. In the context of general purpose computer systems, the principles studied in this course include the close interaction between compiler technology and architecture design. In the context of special-purpose systems, such as embedded systems, the course will deal with the close interaction between software synthesis and hardware system design.

 

Topics to be discussed include the fundamentals of analysis, generation, synthesis, and optimization of computer code. Specific topics in this area include dependency analysis, code motion,

scheduling, register and resource allocation. Among the hardware micro-architecture topics studied are pipeline co-design and memory models. Important case studies that illustrate the basic principles

of software/hardware co-design will be introduced. Topics in the new emerging field of adaptive computing system design will be discussed.

 

  • Activities

 

- New hardware/software tools introduced or developed for teaching laboratories:

 

Modern computer architecture and system design involve both intensive software and hardware design activities. In the new courses introduced, the students are exposed to both software/hardware tools and methodology for computer architecture design (e.g. software simulation toolset) as well as hardware design tools and methodology (e.g. VHDL tools and environments) on digital systems.

Students are expected to learn modern design tools and related skills through lab assignments and course projects. To this end, we have invested extensive effort to develop the laboratory and introduce the VHDL design environment in the course.

 

 

 

 

 

- The CAPSL laboratory seminar series.

 

I have established the Computer Architecture and Parallel System Laboratory since I joined UDel. In addition to perform research, one important objective of this laboratory has been to facilitate the teaching of the computer architecture and digital system courses, and training of graduate research and teaching assistants. The new courses and software tools described above depend directly on this laboratory. The laboratory is now equipped with various workstations, We have a wide variety of research and teaching software installed, and a number of my best graduate students have been actively participated and contributed to teaching. Activities organized include:

 

 

 

A.1.b Other Teaching Experience

 

At McGill Uniersity, I have introduced and developed a set of new courses (308-505,308-605,308- 622) on high-performance computer architectures, parallel systems and parallelizing compilers. These courses have been consolidated and improved over the period of time, forming a core for students who are interested in the related subject areas. I have also taught a number of graduate seminar courses. (Details can be provided by request). The excellence of my teaching have been recognized through the following outstanding teaching award nominations:

 

 

 

A.2. Research Supervision

 

Current, graduate students under my supervision include:

 

Gieger, Thomas (processing in memory and multithreading)

Marquez, Andres (multithreaded architectures)

Ryan, Sean (optimizing compilers)

Stouchinin, Artour (instruction-lelvel parallelism, software pipelining)

Tang, Xi-Nan (compiler for multitheading)

Thulasiraman, Parimala, (parallel algorithmsand applications)

Yang, Hongbo (instruction-level parallelism)

Douillet, Alban (compiling for multithreading)

 

Current Postodoc fellows under my supervision include:

 

Amaral, Nelson (system software, compilers)

Kevin, Theobald (computer architecture, parallel systems)

Rupak, Thulasiram (parallel applications)

 

Already Completed:

 

The applicant has completed the supervision of 7 Ph.D. and 18 M.Sc. students, and 5 postdoctoral fellows in the proposed research areas of high-performance computing.

Post-Doctor

Ph.D. Level

M.Sc Level

(4 Completed)

(7 Graduaged)

(18 Graduated)

G. Liao

E. Altman

(1991 - 1996)

H. Cai

(1995 - 1997)

R. Shanker

(1991 - 1993)

(1991-1993)

H. Hum

(1998 - 1992)

N. Elmasri

(1992 - 1995)

N. Shiri

(1990 - 1992)

O. Maquelin

S. Nemawarka

(1989 - 1996)

A. Emtage

(1988 - 1991)

R. Silvera

(1996 - 1997)

(1994 - 1998)

Q. Ning

(1990 - 1993)

S.H. Han

(1996 - 1997)

A. Stouchinin

(1994 - 1996)

G. Ramaswamy

V. C. Sreedhar

(1990 - 1995)

A. Jimenez

(1993 - 1996)

R. Wen

(1993 - 1995)

(1990 - 1994)

G. Tremblay

(1988 - 1994)

L. Lozano

(1992 - 1994)

Y-B Wong

(1989 - 1991)

X. Tian

R. Yates

(1988 - 1992)

S. Merali

(1993 - 1996)

(1993 - 1996)

C. Moura

(1991 - 1993)

J. Wang

C. Mukerji

(1991 - 1994)

(1995 - 1997)

R. Olsen

(1989 - 1992)

Z. Paraskevas

(1987 - 1989)

H. Petry

(1995 - 1997)

 

 

Those who have graduated are highly trained in the field of parallel architectures and compilers, as evidenced by the fact that they have been working (or worked) as tenure-track university professors (Ramaswamy, Tremblay); as engineers in key industrial sectors, e.g., Intel(Hum), Nortel (Wang), IBM (Altman,Nemawarkar, Sreedhar), BNR (Liao, Wen), HP (Lozano), Convex (Ning), NCUBE (Olsen), CAE(Nassur), AT&T (Petry); and as researchers in government labs, e.g., LLNL (Yates), or assuming other professional jobs.

 

Section B: Scholarship

 

B.1 Research Activity and Interests

 

1. Computer Architecture and Systems.

 

One main question facing modern computer architects is: Is it ever possible to build a high-performance parallel architecture combining the power of hundreds, or even thousands, of processors to solve real world applications (regular or irregular) with scalable performance?

My research interests have been seeking an answer to this challenge. In particular, our primary work has been concentrated on multithreaded program execution models and architectures. To this end, I have initiated/led or played a major roles in a number of research projects in this area.

 

 

 

 

 

2. Optimizing compiler technology.

 

 

 

 

 

has been conducted. Currently, we are extending our method and studying new thread partitioning algorithms which can integrate the scheduling and register allocation under the same framework

(partially funded via a NSF-CCR grant).

 

3. Other areas

 

protocol based on such a memory model. I and my colleague have defined a new memory consistency model, called Location Consistency (LC), in which the state of a memory location is modeled as a

partially ordered multiset (pomset) of write and synchronization operations. We have proved that LC is strictly weaker than existing memory models, but is still equivalent to stronger models for parallel

programs that have no data races. We also introduced a new multiprocessor cache consistency protocol based on the LC memory model.

 

 

 

 

 

B.2: List of Research Contributions

 

Refereed Journal Publications

 

X. Tang and Guang R. Gao, Automatic partitioning threads for multithreaded architectures, Special Issues on Compilation and Architectural Support for Parallel Applications, Accepted for Application, June, 99.

 

  1. Vugraman C. Sreedhar, Guang R. Gao and Yong-Fong Lee, A New Framework for Elimination0Based Dataflow Analysis Using DJ Graphs, ACM Transaction on Programming Languages and Systems, Vol 20, No. 2, PP 388-433, march 1998.
  2.  

  3. Erik Altman, Guang R. Gao, Optimal Modulo Scheduling Through Enumeration, International Journal on Parallel Programming, Accepted for publication, 1998.
  4.  

  5. Erik Altman, Guang R. Gao, A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards, Journal of Parallel and Distributed Computing, No. 39, pp 259-293, 1998.
  6.  

  7. Vugranam C. Sreedhar, Guang R. Gao, and Yong-fong Lee. Incremental computation of dominator trees. ACM Transactions on Programming Languages and Systems, 1996. Vol 19, No. 2, pp239-252, March 1997.
  8.  

  9. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. A quadratic time algorithm for computing multiple node immediate dominators. Journal of Programming Languages, 1996. Accepted for Publication.
  10.  

  11. R. Govindarajan, Erik R. Altman, and Guang R. Gao. A framework for resource-constrained rate-optimal software pipelining. IEEE Transactions on Parallel and Distributed Systems, pages 1133-1149, November 1996.
  12.  

  13. Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and Laurie J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319-347, August 1996.
  14.  

  15. Vugranam Sreedhar, Guang R. Gao, and Yong fong Lee. Identifying loops using dj graphs. ACM Transactions on Programming Languages and Systems, 1996. Accepted.
  16.  

  17. Vugranam C. Sreedhar and Guang R. Gao. A linear time algorithm for placing OE-nodes. Journal of Programming Languages, 1995. Accepted.
  18.  

  19. Ning Qi, Vincent V. Dongen, and Guang R. Gao. Automatic data and computation decomposition for distributed memory machines. Parallel Processing Letters, 5(4):539-550, April 1995.
  20.  

  21. Vugranam Sreedhar and Guang Gao. Computing OE-nodes in linear time using DJ-graphs. Journal of Programming Languages, April 1995. 3(1995), page 191-213.
  22.  

  23. E. Arjomandi, W. O'Farrell, I. Kalas, G. Koblents, F. Ch. Eigler, and G. R. Gao. ABC++: Concurrency by inheritance in C++. IBM Systems Journal, 34(1):120-137, 1995.
  24.  

  25. R. Govindarajan and Guang R. Gao. Rate-optimal schedule for multi-rate DSP computations. Journal of VLSI Signal Processing, 9(3), April 1995. page 211-232.
  26.  

  27. G. R. Gao. An efficient hybrid dataflow architecture model. Journal of Parallel and Distributed Computing, 19(4):293-307, December 1993.
  28.  

  29. Laurie J. Hendren, Guang R. Gao, Erik R. Altman, and Chandrika Mukerji. A register allocation framework based on hierarchical cyclic interval graphs. The Journal of Programming Languages, 1(3):155-185, 1993.
  30.  

  31. Qi Ning and Guang R. Gao. Optimal loop storage allocation for argument-fetching dataflow machines. International Journal of Parallel Programming, 21(6):421-448, December 1992.
  32.  

  33. H. H. J. Hum and G. R. Gao. A high-speed memory organization for hybrid dataflow/von Neumann computing. Future Generation Computer Systems, 8:287-301, 1992.
  34.  

  35. G. R. Gao, H. H. J. Hum, and Y-B Wong. Toward efficient fine-grain software pipelining and the limited balancing techniques. International Journal of Mini and Microcomputers, 13(2):57-68, 1991.
  36.  

  37. Guang R. Gao. Exploiting fine-grain parallelism on dataflow architectures. Parallel Computing, 13(3):309-320, March 1990.

 

 

Publications in Refereed Conference Proceedings (Last Six Years Only)

 

I have more than 80 publications in refereed conferences. Due to space limitations, only those in the last 6 years are listed. The rest can be provided by request.

 

 

 

 

  1. G. Heber, R. Biswas, and Guang R. Gao, Self-Adative Walks over Adaptive Unstructured Grids, In the Proceedings of Irregular'99 in conjuction to the International Parallel Processing Symposium (IPPS/SPDP), pp 969-977, San Juan, Puerto Rico, April 12-16, 1999.
  2.  

     

  3. G. Heber, R. Biswas, P. Thulasiram and Guang R. Gao, Using Multithreading for Automatic Load Balancing of Adaptive Finite Element Meshes, In the Proceedings of Irregular'99 in conjuction to the International Parallel Processing Symposium (IPPS/SPDP), pp 969-977, San Juan, Puerto Rico, April 12-16, 1999.
  4.  

     

  5. A. Khokhar, G. Heber, Parimala Thulasiraman and Guang R. Gao, Load Adaptive Algorithms and Implementation for the 2D Discret e Wavelet Transform on Fine-Grain Multihtreaded Architctures, In the Proceedings of the International Parallel Processing Symposium (IPPS/SPDP), pp 360-364, San Juan, Puerto Rico, April 12-16, 1999.
  6.  

     

  7. G. Heber, R. Biswas, and Guang R. Gao, Self-Avoiding Walks over Adaptive Triangular Grids, In Proceedings of SIAM Parallel Processing Conference for Scientific Computing, San Antonio, Texas, April, 1999.
  8.  

  9. Chihong Zhang, R. Govindarajan, and Guang R. Gao, Efficient State-Diagram Construction Methods for Software Pipelining, In Proceedings of the International Conference on Compiler Construction, CC'99, held as part of ETAPS'99, Amsterdam, The Netherland, March 22 - 26, 1999.
  10.  

     

  11. K. Theobald, Guang R. Gao and T. Sterling, Superconducting Processors for HTMT: Issues and Challenges, In Proceedings of The Seventh Symposium on The Frontiers of Massively Parallel Computation (Frontiers'99), pp 260-267, Annopolis, Maryland, February 21-25, 1999.
  12.  

     

  13. H. Cai, O. Maquelin, P. Kakulavarapu and Guang R. Gao, Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-Grain Multithreaded Execution Model, In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (MTEAC), in conjunction to the 1999 IEEE Symposium on High-Performance Computer Architecture (HPCA99), Orlando, Florida, Janurary, 1999.
  14.  

  15. A. Marquez, K. Theobald, X. Tang and Guang R. Gao, The Superstrand Model, In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (MTEAC), in conjunction to the 1999 IEEE Symposium on High-Performance Computer Architecture (HPCA99), Orlando, Florida, Janurary, 1999.
  16.  

     

     

  17. Sylvain Lelait, Guang R. Gao and Christine Eisenbeis, A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops, In Proceedings of the International Conference on Compiler Construction, CC'98, held as part of ETAPS'98, 1998, Kai Koskimies, volume 1383, Lecture Notes in Computer Science, pp 204--218, Springer, Lisbon, Portugal, March 28 - April 4.
  18.  

  19. R. Govindrarajan, Narasimba Rao, E.R. Altman and Guang R. Gao, An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams, In the Proceedings of the International Parallel Processing Symposium (IPPS/SPDP), pp 168-175, Orlando, Florida, April, 1998.
  20.  

  21. Maria-Dana Tarlescu, Kevin Theobald and Guang R. Gao, Elastic History Buffer: A Low Cost Method to Improce Branch Prediction Accuracy, In the Proceedings of the International Conference on Computer Design (ICCD'97), pp 82-87, Austin, TX., Oct. 1997.
  22.  

  23. Rauls Silvera, Jian Wang, Guang R. Gao and R. Govindarajan, A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors, In the Proceedings of the International Conference on Parallel Architecture and Compiler Techniques (PACT'97), San Francisco, CA, Nov. 1997.
  24.  

  25. X,N. Tang, Rakesh Ghiya, Laurie Hendren, Guang R. Gao, Heap Analysis and Optimizations for Threaded Programs, In the Proceedings of the International Conference on Parallel Architecture and Compiler Techniques (PACT'97), San Francisco, CA, Nov. 1997.
  26.  

  27. Xinan Tang, Guang R. Gao, How "Hard" is Thread Partitioning and How "Bad" is a List Scheduling Based Partitioning Algorithm, In Proceedings of Tenth Annual ACM Symposium on Parallel Algorithms and Architectures,Puerto Vallarta, Mexico, pp130--139, June,1998
  28.  

  29. Angela Sodan, Guang R. Gao, Olivier Maquelin, Jens-Uwe Schultz, and Xin-Min Tian. Experience with non-numeric applications on multithreaded architectures. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, Nevada, pp124-135, June, 1997
  30.  

  31. X. N. Tang, J. Wang, K. Theobald, and Guang R. Gao. Thread Partition and Schedule Based on Cost Model. In Proceedings of the 9th Annual Symposium on Parallel Algorithms and Architectures (SPAA), Newport, Rhode Island, pp272-281, July 1997.
  32.  

  33. Shashank S. Nemawarkar and Guang R. Gao. Latency tolearance: A metric for performance analysis of multithreaded architecture. In Proceedings of the International Parallel Processing Symposium, April 1997.
  34.  

  35. Parimala Thulasiraman, Xin-Min Tian, and Guang R. Gao. Multithreading implementation of a distributed shortest path algorithm on earth multiprocessor. In Proc. of the Internatinal Conference on High Performance Computing, Trivandrum, India, pp336-341, December 1996.
  36.  

  37. Xin-Min Tian, Shashank S. Nemawarkar, Guang R. Gao, et al. Quantitive studies of data locality sensitivity on the EARTH multithreaded architecture: Preliminary results. In Proc. of the Internatinal Conference on High Performance Computing, Trivandrum, India, pp362-367December 1996.
  38.  

  39. Guang Gao, Konstantin K. Likharev, Paul C. Messina, and Thomas L. Sterling. Hybrid technology multi-threaded architecture. In Proceedings of Frontiers '96: The Sixth Symposium on the Frontiers of Massively Parallel Computation, pages 98-105, Annapolis, Maryland, October 1996.
  40.  

  41. Laurie J. Hendren, Xinan Tang, Yingchun Zhu, Guang R. Gao, Xun Xue, Haiying Cai, and Pierre Ouellet. Compiling C for the EARTH multithreaded architecture. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), pages 12-23, Boston, Massachusetts, October 1996. IEEE Computer Society Press.
  42.  

  43. Erik R. Altman and Guang R. Gao. Optimal software pipelining through enumeration of schedules. In Proceedings of Euro-Par'96, pages 833-840, Lyon, France, August 1996.
  44.  

  45. Vivek Sarkar, Guang R. Gao, and Shaohua Han. Data locality analysis for distributed shared memory multiprocessors. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, San Jose, California, August 1996.
  46.  

  47. Olivier Maquelin, Guang R. Gao, Herbert H. J. Hum, Kevin B. Theobald, and Xin-Min Tian.Polling Watchdog: Combining polling and interrupts for efficient message handling. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 178-188, Philadelphia, Pennsylvania, May 1996.
  48.  

  49. John Ruttenberg, G. R. Gao, A. Stouchinin, and W. Lichtenstein. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pages 1-11, Philadelphia, Pennsylvania, May 1996.
  50.  

  51. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. A new framework for exhaustive and incremental data flow analysis using DJ graphs. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pages 278-290, Philadelphia, Pennsylvania, May 1996.
  52.  

  53. Jian Wang and Guang R. Gao. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In Proceedings of the 6th International Conference on Compiler Construction, Lecture Notes in Computer Science, Linkoping, Sweden, April 1996. Springer-Verlag.
  54.  

  55. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Instruction scheduling in the presence of structureal hazards: An integer programming approach to software pipeline. In Proc. of the nternatinal Conference on High Performance Computing, Goa, India, December 1995.
  56.  

  57. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Co-scheduling hardware and software pipelines. In Second International Symposium on High-Performance Computer Architecture, San Jose, California, February 1996.
  58.  

  59. Shashank S. Nemawarkar and Guang R. Gao. Measurement and modeling of EARTH-MANNA multithreaded architecture. In Proceedings of the Fourth International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 109-114, San Jose, California, February 1996. IEEE Computer Society TCCA and TCS.
  60.  

  61. Luis A. Lozano C. and Guang R. Gao. Exploiting short-lived variables in superscalar processors. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 292-302, Ann Arbor, Michigan, November-December 1995.
  62.  

  63. J. B. Dennis and G.R. Gao. On memory models and cache management for shared-memory multi-processors. In Proceedings of Seventh IEEE International Symposium on Parallel and Distributed Proccesing. IEEE, October 1995.
  64.  

  65. Olivier C. Maquelin, Herbert H. J. Hum, and Guang R. Gao. Costs and benefits of multithreading with off-the-shelf RISC processors. In Proceedings of the First International EURO-PAR Conference, number 966 in Lecture Notes in Computer Science, pages 117-128, Stockholm, Sweden, August 1995. Springer-Verlag.
  66.  

  67. R. Wen, Guang R. Gao, and Vincent V. Dongen. The design and implementation of the accurate array data-flow analysis in the HPC compiler. In Proceedings of High Performance Computing Symposium '95, Canada's Ninth Annual International High Performance Computing Conference and Exhibition, pages 144-155, Montr'eal, Qu'ebec, July 1995. Centre de recherche informatique de Montr'eal.
  68.  

  69. Nasser Elmasri, Herbert H. J. Hum, and Guang R. Gao. The Threaded Communication Library: Preliminary experiences on a multiprocessor with dual-processor nodes. In Conference Proceedings, 1995 International Conference on Supercomputing, pages 195-199, Barcelona, Spain, July 1995.
  70.  

  71. Erik R. Altman, R. Govindarajan, and Guang R. Gao. An experimental study of an ILP-based exact solution method for software pipelining. In Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, pages 2.1 - 2.15, Columbus, Ohio, August 1995. Springer-Verlag.
  72.  

  73. Guang R. Gao and Vivek Sarkar. Location consistency: Stepping beyond the memory coherence barrier. In 24th International Conference on Parallel Processing, pages II-73-II-76, University Park, Pennsylvania, August 1995.
  74.  

  75. Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Xinan Tang, Guang R. Gao, Phil Cupryk, Nasser Elmasri, Laurie J. Hendren, Alberto Jimenez, Shoba Krishnan, Andres Marquez, Shamir Merali, Shashank S. Nemawarkar, Prakash Panangaden, Xun Xue, and Yingchun Zhu. A design study of the EARTH multiprocessor. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '95, pages 59-68, Limassol, Cyprus, June 1995. ACM Press.
  76.  

  77. E. R. Altman, R. Govindarajan, and G. R. Gao. Scheduling and mapping: Software pipelining in the presence of structual hazards. In ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1995. page 139-150.
  78.  

  79. G. Tremblay and G. R. Gao. The impact of laziness on parallelism and the limits of strictness analysis. In Proceedings of the High Performance Functional Computing Conference, pages 119- 133, Denver, Colorado, April 1995. Lawrence Livermore National Laboratory. CONF-9504126.
  80.  

  81. Vugranam C. Sreedhar and Guang R. Gao. A linear time algorithm for placing OE-nodes. In Conference Record of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 62-73, San Francisco, California, January 1995.
  82.  

  83. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. Incremental computation of dominator trees. In Proceedings of the ACM SIGPLAN Workshop on Intermediate Representations (IR'95), pages 1-12, San Francisco, California, January 22, 1995. SIGPLAN Notices, 30(3), March 1995.
  84.  

  85. Kevin B. Theobald, Herbert H. J. Hum, and Guang R. Gao. A design framework for hybrid-access caches. In Proceedings of the First International Symposium on High-Performance Computer Architecture, pages 144-153, Raleigh, North Carolina, January 1995.
  86.  

  87. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 85-94, San Jose, California, November-December 1994.
  88.  

  89. R. Govindarajan, Erik R. Altman, and Guang R. Gao. A framework for resource-constrained rate optimal software pipelining. In Proceedings of the Third Joint International Conference on Vector and Parallel Processing (CONPAR 94 - VAPP VI), number 854 in Lecture Notes in Computer Science, pages 640-651, Linz, Austria, September 1994. Springer-Verlag.
  90.  

  91. R. Govindarajan, Guang R. Gao, and Palash Desai. Minimizing memory requirements in rate optimal schedules. In Proceedings of the 1994 International Conference on Application Specific Array Processors, pages 75-86, San Francisco, California, August 1994. IEEE Computer Society.
  92.  

  93. S. S. Nemawarkar, R. Govindarajan, G. R. Gao, and V. K. Agarwal. Performance of interconnection network in multithreaded architectures. In Proceedings of PARLE '94 - Parallel Architectures and Languages Europe, number 817 in Lecture Notes in Computer Science, pages 823-826, Athens, Greece, July 1994. Springer-Verlag.
  94.  

  95. V. Van Dongen, C. Bonello, and Guang R. Gao. Data parallelism with High Performance C. In Proceedings of Supercomputing Symosium '94, Canada's Eighth Annual High Performance Computing Conference, pp 128-135, Toronto, Ontario, June 1994. University of Toronto.
  96.  

  97. Herbert H. J. Hum, Kevin B. Theobald, and Guang R. Gao. Building multithreaded architectures with off-the-shelf microprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pp 288-294, Cancun, Mexico, April 1994. IEEE Computer Society.
  98.  

  99. G. Liao, E.R. Altman, V.K. Agarwal, and Guang R. Gao. A comparative study of DSP multiprocessor list sheduling heuristics. In Proceedings of the 27th Annual Hawaii International Conference on System Sciences, Kihei, Hawaii, 1994.
  100.  

  101. S. S. Nemawarkar, R. Govindarajan, Guang R. Gao, and V. K. Agarwal. Analysis of multithreaded multiprocessors with distributed shared memory. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pp 114-121, Dallas, Texas, December 1993.
  102.  

  103. R. Govindarajan and Guang R. Gao. A novel framework for multi-rate sheduling in DSP applications. In Proceedings of the 1993 International Conference on Application Specific Array Processors, pp 77-88, Venice, Italy, October 1993. IEEE Computer Society.
  104.  

  105. Guang R. Gao, Vivek Sarkar, and Lelia A. Vazquez. Beyond the data parallel paradigm: Issues and options. In W.K. Giloi, S. Jahnichen, and B.D. Shriver, Editors Proceedings - 1993 Programming Models for Massively Parallel Computers, pp 191-197, Berlin, Germany, September 20-23, 1993. IEEE Computer Society Press.
  106.  

  107. Guang R. Gao, Qi Ning, and Vincent Van Dongen. Extending software pipelining techniques for scheduling nested loops. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, number 768 in Lecture Notes in Computer Science, pp 340-357, Portland, Oregon, August 1993. Springer-Verlag.
  108.  

  109. Erik R. Altman, Vonod K. Agarwal, and Guang R Gao. A novel methodology using genetic algorithms for the design of caches and cache replacement policy. In Stephanie Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, pp 392-399. Morgan Kaufmann Publishers, Inc., July 1993. University of Illinois at Urbana-Champaign.
  110.  

  111. Kevin B. Theobald, Guang , and Laurei J. Hendren. Speculative execution and branch prediction on parallel machines. In Conference Proceedings, 1993 ACM International Conference on Supercomputing, pp 77-86, Tokyo, Japan, July 1993.
  112.  

  113. Robert Kim Yates and Guang R. Gao, A Kahn principle for networks of nonmonotonic real-time processes. In Proceedings of PARLE '93 - Parallel Architectures and Languages Europe, number 694 in Lecture Notes in Computer Science, pp 209-227, Munich, Germany, June 1993. Springer-Verlag.
  114.  

  115. Herbert H. J. Hum and Guang R. Gao. Supporting a dynamic PMD model in a multi-threaded architecture. In Digest of Papers, 38th IEEE Computer Society International Conference, COMPCON Spring '93, pp 165-174, San Francisco, California, February 1993.
  116.  

  117. Qi Ning and Guang R. Gao, A novel framework of register allocation for software pipelining. In Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp 29-42, Charleston, South Carolina, January 1993.
  118.  

  119. Kevin B. Theobald, Guang R. Gao, and Laurie J. Hendren. On the limits of program parallelism and its smoothability. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pp 10-19, Portland, Oregon, December 1992.
  120.  

  121. V. Van Dongen, Guang R. Gao, and Q. Ning. A polynomial time method for optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing, CONPAR-92, number 634 in Lecture Notes in Computer Science, pp 613-624, Lyon, France, September 1-4, 1992. Springer-Verlag.
  122.  

  123. J. M. Monti and Guang R Gao. Efficient interprocessor synchronization and communication on a dataflow multiprocessor architecture. In the Proceedings of 1992 International Conference on Parallel Processing, pp I-220-224, St. Charles, IL, August 1992.
  124.  

  125. Guang R Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings fo the 5th International Workshop on Languages and Compilers for Parallel Computing, number 757 in Lecture Notes in Computer Science, pp 281-295, New Haven, Connecticut, August 1992. Springer-Verlag.
  126.  

  127. L. Hendren, C. Donawa, M. Emami, Guang R Gao, Justiani, and B. Sridharan. Designing the McCAT compiler based on a family of structured intermadiate representations. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, number 757 in Lecture Notes in Computer Science, pp 406-420, New Haven, Connecticut, August 1992. Springer-Verlag.
  128.  

  129. Herbert H. J. Hum, Kevin B. Theobald, and Guang R. Gao. Building multithreaded architectures with off-the-shelf microprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 288-294, Canc'un, Mexico, April 1994. IEEE Computer Society.

 

Monographs, Books and Book Chapters

 

  1. G. R. Gao., J-L. Gaudiot, and L. Bic, editors. Advanced Topics in Dataflow and Multithreaded Computers. IEEE Computer Society Press, 1995.
  2.  

  3. Jack B. Dennis and Guang R. Gao. Multithreaded architectures: Principles, projects, and issues. In Robert A. Iannucci, Guang R. Gao, Robert H. Halstead, Jr., and Burton Smith, editors, Multithreaded Computer Architecture: A Summary of the State of the Art, chapter 1, pages 1-72. Kluwer Academic Publishers, Norwell, Massachusetts, 1994.
  4.  

  5. Robert A. Iannucci, Guang R. Gao, Robert H. Halstead, Jr., and Burton Smith, editors. Multi-threaded Computer Architecture: A Summary of the State of the Art. Kluwer Academic Publishers, Norwell, Massachusetts, 1994. Book contains papers presented at the Workshop on Multithreaded Computers, Albuquerque, New Mexico, November 1991.
  6.  

  7. G. R. Gao. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Boston, Massachusetts, December 1990.

 

 

 

 

 

B.3 Research Significance

 

The theme of my research in computer architecture and systems, compiler technology, and memory models not only enriches the field of parallel computing and encompass a host new techniques for high-performance architectures and compiling technology but also provides a new horizon for mapping applications, both regular or irregular, onto these architectures. Furthermore, the research activities are not only themselves intellectually stimulating, interesting and competitive, but also exposes students with a dynamic new field with excellent prospect of

employment and a productive career.

 

My work on EARTH model and architecture has important relevance to the design and development of future generation of parallel computer architectures. The research results have been published widely in a range of recognized international professional conferences and journals. It has attracted a considerable level of research support from NSF through 4 NSF research grants encompassing the areas from architecture and memory support, the efficient implementation of multithreaded execution models on SMP workstation cluster based parallel systems, the application of EARTH model to large irregular applications such as the crack propagation problem, and the compilation technology for multithreading. It has also attracted industry interests and funding such as the DRP grant we received with support from ACORN Inc. An extension of our work on fine-grain multithreading and EARTH to be applied to high-end Supercomputing has become an important component to the HTMT project, one of the few nation's on-going petaflow architecture project funded by DARP, NSA and NASA.

 

My work on modulo scheduling and software pipelining also have immediate relevance to the computer industry in their effort to exploit high performance with instruction level parallelism.

The research results have been published widely in a range of recognized international professional conferences and journals. The technology developed in our group has been used in the evaluation of the software pipelining techniques in the SGI production compiler,

and to foster the future collaboration, we have received the donation of two SGI workstations with special SGI software.

 

The co-scheduling technique has been funded by NSF through a research grant. The co-scheduling technology developed by me and my colleagues have also attracted strong industry attention, and Rockwell Semiconductor Systems has already committed funding to this research and a DRP grant on retargetable compiler for DSP architectures with Rockwell funding and

university matching has just been awarded.

 

The significance and novelty of my work on program analysis and memory models have also been recognized by the research community. Three papers out of the work on program

Analysis have been accepted for publications on the most prestigious journal -- the ACM Transactions on Program Languages and Systems.

 

 

B.4 Research Support

 

Agency Grant Number Title Amount Period Status
NSF CCR 9808522 Compiling Irregular Applications on a Multithreaded Architecture $319,156 08/97 - 07/00 co-PI
NSF MIPS 9707125 A New Generation Multithreaded Processors $400,000 07/97 - 06/00 co-I
NSF CDA 9703088 Parallel and Distributed Computing: Systems and Applications Development $633,513 07/97 - 06/02 co-PI
(Infrastructure Grant)
DARPA/NSA/NASA ASC 9612105 Hybrid Technology Multithreaded Architecture for Petaflops $800,000 06/97 - 05/99 co-I
NSF CCR 971147 A Framework of Modulo Scheduling Based on Finite Automaton $139,263 06/97 - 05/99 PI
(with REO) $6,250 06/97 - 05/98
DRP Approved Retargetable Compilers for Embedded DSP Processors $75,000 98-00 PI
(with Rockwell Semiconductor Systems Inc.)

 

Section C: Services

 

C.1 University Activities and Services

 

 

 

 

 

 

C.2 Profession Services

 

 

  • IEEE, Senior Member (since 1997)

 

  • Program Committee Members of Recognized International Conferences

 

 

 

 

  • Other Activities in Recognized Professional Conferences

 

I have served as a workshop chair, a session chair, an organizing committee or steering committee member of many international conferences.

 

 

 

 

I have given seminars in many industrial and academic organizations: IBMT.J. Watson Research Center, IBM Toronto Lab, AT&T Bell Laboratories, BNR, HP Labs, SGI, DEC, NRL(Navy Research Lab.), MIT, Stanford, UC Berkeley, University of Victoria are just named a few.