List of Research Contributions Refereed Journal Publications X. Tang and Guang R. Gao, Automatic partitioning threads for multithreaded architectures, Special Issues on Compilation and Architectural Support for Parallel Applications, Accepted for Application, June, 99. 1. Vugraman C. Sreedhar, Guang R. Gao and Yong-Fong Lee, A New Framework for Elimination0Based Dataflow Analysis Using DJ Graphs, ACM Transaction on Programming Languages and Systems, Vol 20, No. 2, PP 388-433, march 1998. 2. Erik Altman, Guang R. Gao, Optimal Modulo Scheduling Through Enumeration, International Journal on Parallel Programming, Accepted for publication, 1998. 3. Erik Altman, Guang R. Gao, A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards, Journal of Parallel and Distributed Computing, No. 39, pp 259-293, 1998. 4. Vugranam C. Sreedhar, Guang R. Gao, and Yong-fong Lee. Incremental computation of dominator trees. ACM Transactions on Programming Languages and Systems, 1996. Vol 19, No. 2, pp239-252, March 1997. 5. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. A quadratic time algorithm for computing multiple node immediate dominators. Journal of Programming Languages, 1996. Accepted for Publication. 6. R. Govindarajan, Erik R. Altman, and Guang R. Gao. A framework for resource-constrained rate-optimal software pipelining. IEEE Transactions on Parallel and Distributed Systems, pages 1133-1149, November 1996. 7. Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao, and Laurie J. Hendren. A study of the EARTH-MANNA multithreaded system. International Journal of Parallel Programming, 24(4):319-347, August 1996. 8. Vugranam Sreedhar, Guang R. Gao, and Yong fong Lee. Identifying loops using dj graphs. ACM Transactions on Programming Languages and Systems, 1996. Accepted. 9. Vugranam C. Sreedhar and Guang R. Gao. A linear time algorithm for placing OE-nodes. Journal of Programming Languages, 1995. Accepted. 10. Ning Qi, Vincent V. Dongen, and Guang R. Gao. Automatic data and computation decomposition for distributed memory machines. Parallel Processing Letters, 5(4):539-550, April 1995. 11. Vugranam Sreedhar and Guang Gao. Computing OE-nodes in linear time using DJ-graphs. Journal of Programming Languages, April 1995. 3(1995), page 191-213. 12. E. Arjomandi, W. O'Farrell, I. Kalas, G. Koblents, F. Ch. Eigler, and G. R. Gao. ABC++: Concurrency by inheritance in C++. IBM Systems Journal, 34(1):120-137, 1995. 13. R. Govindarajan and Guang R. Gao. Rate-optimal schedule for multi-rate DSP computations. Journal of VLSI Signal Processing, 9(3), April 1995. page 211-232. 14. G. R. Gao. An efficient hybrid dataflow architecture model. Journal of Parallel and Distributed Computing, 19(4):293-307, December 1993. 15. Laurie J. Hendren, Guang R. Gao, Erik R. Altman, and Chandrika Mukerji. A register allocation framework based on hierarchical cyclic interval graphs. The Journal of Programming Languages, 1(3):155-185, 1993. 16. Qi Ning and Guang R. Gao. Optimal loop storage allocation for argument-fetching dataflow machines. International Journal of Parallel Programming, 21(6):421-448, December 1992. 17. H. H. J. Hum and G. R. Gao. A high-speed memory organization for hybrid dataflow/von Neumann computing. Future Generation Computer Systems, 8:287-301, 1992. 18. G. R. Gao, H. H. J. Hum, and Y-B Wong. Toward efficient fine-grain software pipelining and the limited balancing techniques. International Journal of Mini and Microcomputers, 13(2):57-68, 1991. 19. Guang R. Gao. Exploiting fine-grain parallelism on dataflow architectures. Parallel Computing, 13(3):309-320, March 1990. Publications in Refereed Conference Proceedings (Last SixYears Only) I have more than 80 publications in refereed conferences. Due to space limitations, only those in the last 6 years are listed. The rest can be provided by request. 1. G. Heber, R. Biswas, and Guang R. Gao, Self-Adative Walks over Adaptive Unstructured Grids, In the Proceedings of Irregular'99 in conjuction to the International Parallel Processing Symposium (IPPS/SPDP), pp 969-977, San Juan, Puerto Rico, April 12-16, 1999. 2. G. Heber, R. Biswas, P. Thulasiram and Guang R. Gao, Using Multithreading for Automatic Load Balancing of Adaptive Finite Element Meshes, In the Proceedings of Irregular'99 in conjuction to the International Parallel Processing Symposium (IPPS/SPDP), pp 969-977, San Juan, Puerto Rico, April 12-16, 1999. 3. A. Khokhar, G. Heber, Parimala Thulasiraman and Guang R. Gao, Load Adaptive Algorithms and Implementation for the 2D Discret e Wavelet Transform on Fine-Grain Multihtreaded Architctures, In the Proceedings of the International Parallel Processing Symposium (IPPS/SPDP), pp 360-364, San Juan, Puerto Rico, April 12-16, 1999. 4. G. Heber, R. Biswas, and Guang R. Gao, Self-Avoiding Walks over Adaptive Triangular Grids, In Proceedings of SIAM Parallel Processing Conference for Scientific Computing, San Antonio, Texas, April, 1999. 5. Chihong Zhang, R. Govindarajan, and Guang R. Gao, Efficient State-Diagram Construction Methods for Software Pipelining, In Proceedings of the International Conference on Compiler Construction, CC'99, held as part of ETAPS'99, Amsterdam, The Netherland, March 22 - 26, 1999. 6. K. Theobald, Guang R. Gao and T. Sterling, Superconducting Processors for HTMT: Issues and Challenges, In Proceedings of The Seventh Symposium on The Frontiers of Massively Parallel Computation (Frontiers'99), pp 260-267, Annopolis, Maryland, February 21-25, 1999. 7. H. Cai, O. Maquelin, P. Kakulavarapu and Guang R. Gao, Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-Grain Multithreaded Execution Model, In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (MTEAC), in conjunction to the 1999 IEEE Symposium on High-Performance Computer Architecture (HPCA99), Orlando, Florida, Janurary, 1999. 8. A. Marquez, K. Theobald, X. Tang and Guang R. Gao, The Superstrand Model, In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (MTEAC), in conjunction to the 1999 IEEE Symposium on High-Performance Computer Architecture (HPCA99), Orlando, Florida, Janurary, 1999. 9. Sylvain Lelait, Guang R. Gao and Christine Eisenbeis, A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops, In Proceedings of the International Conference on Compiler Construction, CC'98, held as part of ETAPS'98, 1998, Kai Koskimies, volume 1383, Lecture Notes in Computer Science, pp 204--218, Springer, Lisbon, Portugal, March 28 - April 4. 10. R. Govindrarajan, Narasimba Rao, E.R. Altman and Guang R. Gao, An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams, In the Proceedings of the International Parallel Processing Symposium (IPPS/SPDP), pp 168-175, Orlando, Florida, April, 1998. 11. Maria-Dana Tarlescu, Kevin Theobald and Guang R. Gao, Elastic History Buffer: A Low Cost Method to Improce Branch Prediction Accuracy, In the Proceedings of the International Conference on Computer Design (ICCD'97), pp 82-87, Austin, TX., Oct. 1997. 12. Rauls Silvera, Jian Wang, Guang R. Gao and R. Govindarajan, A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors, In the Proceedings of the International Conference on Parallel Architecture and Compiler Techniques (PACT'97), San Francisco, CA, Nov. 1997. 13. X,N. Tang, Rakesh Ghiya, Laurie Hendren, Guang R. Gao, Heap Analysis and Optimizations for Threaded Programs, In the Proceedings of the International Conference on Parallel Architecture and Compiler Techniques (PACT'97), San Francisco, CA, Nov. 1997. 14. Xinan Tang, Guang R. Gao, How "Hard" is Thread Partitioning and How "Bad" is a List Scheduling Based Partitioning Algorithm, In Proceedings of Tenth Annual ACM Symposium on Parallel Algorithms and Architectures,Puerto Vallarta, Mexico, pp130--139, June,1998 15. Angela Sodan, Guang R. Gao, Olivier Maquelin, Jens-Uwe Schultz, and Xin-Min Tian. Experience with non-numeric applications on multithreaded architectures. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Las Vegas, Nevada, pp124-135, June, 1997 16. X. N. Tang, J. Wang, K. Theobald, and Guang R. Gao. Thread Partition and Schedule Based on Cost Model. In Proceedings of the 9th Annual Symposium on Parallel Algorithms and Architectures (SPAA), Newport, Rhode Island, pp272-281, July 1997. 17. Shashank S. Nemawarkar and Guang R. Gao. Latency tolearance: A metric for performance analysis of multithreaded architecture. In Proceedings of the International Parallel Processing Symposium, April 1997. 18. Parimala Thulasiraman, Xin-Min Tian, and Guang R. Gao. Multithreading implementation of a distributed shortest path algorithm on earth multiprocessor. In Proc. of the Internatinal Conference on High Performance Computing, Trivandrum, India, pp336-341, December 1996. 19. Xin-Min Tian, Shashank S. Nemawarkar, Guang R. Gao, et al. Quantitive studies of data locality sensitivity on the EARTH multithreaded architecture: Preliminary results. In Proc. of the Internatinal Conference on High Performance Computing, Trivandrum, India, pp362-367December 1996. 20. Guang Gao, Konstantin K. Likharev, Paul C. Messina, and Thomas L. Sterling. Hybrid technology multi-threaded architecture. In Proceedings of Frontiers '96: The Sixth Symposium on the Frontiers of Massively Parallel Computation, pages 98-105, Annapolis, Maryland, October 1996. 21. Laurie J. Hendren, Xinan Tang, Yingchun Zhu, Guang R. Gao, Xun Xue, Haiying Cai, and Pierre Ouellet. Compiling C for the EARTH multithreaded architecture. In Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques (PACT '96), pages 12-23, Boston, Massachusetts, October 1996. IEEE Computer Society Press. 22. Erik R. Altman and Guang R. Gao. Optimal software pipelining through enumeration of schedules. In Proceedings of Euro-Par'96, pages 833-840, Lyon, France, August 1996. 23. Vivek Sarkar, Guang R. Gao, and Shaohua Han. Data locality analysis for distributed shared memory multiprocessors. In Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing, San Jose, California, August 1996. 24. Olivier Maquelin, Guang R. Gao, Herbert H. J. Hum, Kevin B. Theobald, and Xin-Min Tian.Polling Watchdog: Combining polling and interrupts for efficient message handling. In Proceedings of the 23rd Annual International Symposium on Computer Architecture, pages 178-188, Philadelphia, Pennsylvania, May 1996. 25. John Ruttenberg, G. R. Gao, A. Stouchinin, and W. Lichtenstein. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pages 1-11, Philadelphia, Pennsylvania, May 1996. 26. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. A new framework for exhaustive and incremental data flow analysis using DJ graphs. In Proceedings of the ACM SIGPLAN '96 Conference on Programming Language Design and Implementation, pages 278-290, Philadelphia, Pennsylvania, May 1996. 27. Jian Wang and Guang R. Gao. Pipelining-dovetailing: A transformation to enhance software pipelining for nested loops. In Proceedings of the 6th International Conference on Compiler Construction, Lecture Notes in Computer Science, Linkoping, Sweden, April 1996. Springer-Verlag. 28. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Instruction scheduling in the presence of structureal hazards: An integer programming approach to software pipeline. In Proc. of the nternatinal Conference on High Performance Computing, Goa, India, December 1995. 29. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Co-scheduling hardware and software pipelines. In Second International Symposium on High-Performance Computer Architecture, San Jose, California, February 1996. 30. Shashank S. Nemawarkar and Guang R. Gao. Measurement and modeling of EARTH-MANNA multithreaded architecture. In Proceedings of the Fourth International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, pages 109-114, San Jose, California, February 1996. IEEE Computer Society TCCA and TCS. 31. Luis A. Lozano C. and Guang R. Gao. Exploiting short-lived variables in superscalar processors. In Proceedings of the 28th Annual International Symposium on Microarchitecture, pages 292-302, Ann Arbor, Michigan, November-December 1995. 32. J. B. Dennis and G.R. Gao. On memory models and cache management for shared-memory multi-processors. In Proceedings of Seventh IEEE International Symposium on Parallel and Distributed Proccesing. IEEE, October 1995. 33. Olivier C. Maquelin, Herbert H. J. Hum, and Guang R. Gao. Costs and benefits of multithreading with off-the-shelf RISC processors. In Proceedings of the First International EURO-PAR Conference, number 966 in Lecture Notes in Computer Science, pages 117-128, Stockholm, Sweden, August 1995. Springer-Verlag. 34. R. Wen, Guang R. Gao, and Vincent V. Dongen. The design and implementation of the accurate array data-flow analysis in the HPC compiler. In Proceedings of High Performance Computing Symposium '95, Canada's Ninth Annual International High Performance Computing Conference and Exhibition, pages 144-155, Montr'eal, Qu'ebec, July 1995. Centre de recherche informatique de Montr'eal. 35. Nasser Elmasri, Herbert H. J. Hum, and Guang R. Gao. The Threaded Communication Library: Preliminary experiences on a multiprocessor with dual-processor nodes. In Conference Proceedings, 1995 International Conference on Supercomputing, pages 195-199, Barcelona, Spain, July 1995. 36. Erik R. Altman, R. Govindarajan, and Guang R. Gao. An experimental study of an ILP-based exact solution method for software pipelining. In Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, pages 2.1 - 2.15, Columbus, Ohio, August 1995. Springer-Verlag. 37. Guang R. Gao and Vivek Sarkar. Location consistency: Stepping beyond the memory coherence barrier. In 24th International Conference on Parallel Processing, pages II-73-II-76, University Park, Pennsylvania, August 1995. 38. Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Xinan Tang, Guang R. Gao, Phil Cupryk, Nasser Elmasri, Laurie J. Hendren, Alberto Jimenez, Shoba Krishnan, Andres Marquez, Shamir Merali, Shashank S. Nemawarkar, Prakash Panangaden, Xun Xue, and Yingchun Zhu. A design study of the EARTH multiprocessor. In Proceedings of the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, PACT '95, pages 59-68, Limassol, Cyprus, June 1995. ACM Press. 39. E. R. Altman, R. Govindarajan, and G. R. Gao. Scheduling and mapping: Software pipelining in the presence of structual hazards. In ACM SIGPLAN Symposium on Programming Language Design and Implementation, June 1995. page 139-150. 40. G. Tremblay and G. R. Gao. The impact of laziness on parallelism and the limits of strictness analysis. In Proceedings of the High Performance Functional Computing Conference, pages 119- 133, Denver, Colorado, April 1995. Lawrence Livermore National Laboratory. CONF-9504126. 41. Vugranam C. Sreedhar and Guang R. Gao. A linear time algorithm for placing OE-nodes. In Conference Record of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 62-73, San Francisco, California, January 1995. 42. Vugranam C. Sreedhar, Guang R. Gao, and Yong fong Lee. Incremental computation of dominator trees. In Proceedings of the ACM SIGPLAN Workshop on Intermediate Representations (IR'95), pages 1-12, San Francisco, California, January 22, 1995. SIGPLAN Notices, 30(3), March 1995. 43. Kevin B. Theobald, Herbert H. J. Hum, and Guang R. Gao. A design framework for hybrid-access caches. In Proceedings of the First International Symposium on High-Performance Computer Architecture, pages 144-153, Raleigh, North Carolina, January 1995. 44. R. Govindarajan, Erik R. Altman, and Guang R. Gao. Minimizing register requirements under resource-constrained rate-optimal software pipelining. In Proceedings of the 27th Annual International Symposium on Microarchitecture, pages 85-94, San Jose, California, November-December 1994. 45. R. Govindarajan, Erik R. Altman, and Guang R. Gao. A framework for resource-constrained rate optimal software pipelining. In Proceedings of the Third Joint International Conference on Vector and Parallel Processing (CONPAR 94 - VAPP VI), number 854 in Lecture Notes in Computer Science, pages 640-651, Linz, Austria, September 1994. Springer-Verlag. 46. R. Govindarajan, Guang R. Gao, and Palash Desai. Minimizing memory requirements in rate optimal schedules. In Proceedings of the 1994 International Conference on Application Specific Array Processors, pages 75-86, San Francisco, California, August 1994. IEEE Computer Society. 47. S. S. Nemawarkar, R. Govindarajan, G. R. Gao, and V. K. Agarwal. Performance of interconnection network in multithreaded architectures. In Proceedings of PARLE '94 - Parallel Architectures and Languages Europe, number 817 in Lecture Notes in Computer Science, pages 823-826, Athens, Greece, July 1994. Springer-Verlag. 48. V. Van Dongen, C. Bonello, and Guang R. Gao. Data parallelism with High Performance C. In Proceedings of Supercomputing Symosium '94, Canada's Eighth Annual High Performance Computing Conference, pp 128-135, Toronto, Ontario, June 1994. University of Toronto. 49. Herbert H. J. Hum, Kevin B. Theobald, and Guang R. Gao. Building multithreaded architectures with off-the-shelf microprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pp 288-294, Cancun, Mexico, April 1994. IEEE Computer Society. 50. G. Liao, E.R. Altman, V.K. Agarwal, and Guang R. Gao. A comparative study of DSP multiprocessor list sheduling heuristics. In Proceedings of the 27th Annual Hawaii International Conference on System Sciences, Kihei, Hawaii, 1994. 51. S. S. Nemawarkar, R. Govindarajan, Guang R. Gao, and V. K. Agarwal. Analysis of multithreaded multiprocessors with distributed shared memory. In Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, pp 114-121, Dallas, Texas, December 1993. 52. R. Govindarajan and Guang R. Gao. A novel framework for multi-rate sheduling in DSP applications. In Proceedings of the 1993 International Conference on Application Specific Array Processors, pp 77-88, Venice, Italy, October 1993. IEEE Computer Society. 53. Guang R. Gao, Vivek Sarkar, and Lelia A. Vazquez. Beyond the data parallel paradigm: Issues and options. In W.K. Giloi, S. Jahnichen, and B.D. Shriver, Editors Proceedings - 1993 Programming Models for Massively Parallel Computers, pp 191-197, Berlin, Germany, September 20-23, 1993. IEEE Computer Society Press. 54. Guang R. Gao, Qi Ning, and Vincent Van Dongen. Extending software pipelining techniques for scheduling nested loops. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, number 768 in Lecture Notes in Computer Science, pp 340-357, Portland, Oregon, August 1993. Springer-Verlag. 55. Erik R. Altman, Vonod K. Agarwal, and Guang R Gao. A novel methodology using genetic algorithms for the design of caches and cache replacement policy. In Stephanie Forrest, editor, Proceedings of the 5th International Conference on Genetic Algorithms, pp 392-399. Morgan Kaufmann Publishers, Inc., July 1993. University of Illinois at Urbana-Champaign. 56. Kevin B. Theobald, Guang , and Laurei J. Hendren. Speculative execution and branch prediction on parallel machines. In Conference Proceedings, 1993 ACM International Conference on Supercomputing, pp 77-86, Tokyo, Japan, July 1993. 57. Robert Kim Yates and Guang R. Gao, A Kahn principle for networks of nonmonotonic real-time processes. In Proceedings of PARLE '93 - Parallel Architectures and Languages Europe, number 694 in Lecture Notes in Computer Science, pp 209-227, Munich, Germany, June 1993. Springer-Verlag. 58. Herbert H. J. Hum and Guang R. Gao. Supporting a dynamic PMD model in a multi-threaded architecture. In Digest of Papers, 38th IEEE Computer Society International Conference, COMPCON Spring '93, pp 165-174, San Francisco, California, February 1993. 59. Qi Ning and Guang R. Gao, A novel framework of register allocation for software pipelining. In Conference Record of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp 29-42, Charleston, South Carolina, January 1993. 60. Kevin B. Theobald, Guang R. Gao, and Laurie J. Hendren. On the limits of program parallelism and its smoothability. In Proceedings of the 25th Annual International Symposium on Microarchitecture, pp 10-19, Portland, Oregon, December 1992. 61. V. Van Dongen, Guang R. Gao, and Q. Ning. A polynomial time method for optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing, CONPAR-92, number 634 in Lecture Notes in Computer Science, pp 613-624, Lyon, France, September 1-4, 1992. Springer-Verlag. 62. J. M. Monti and Guang R Gao. Efficient interprocessor synchronization and communication on a dataflow multiprocessor architecture. In the Proceedings of 1992 International Conference on Parallel Processing, pp I-220-224, St. Charles, IL, August 1992. 63. Guang R Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings fo the 5th International Workshop on Languages and Compilers for Parallel Computing, number 757 in Lecture Notes in Computer Science, pp 281-295, New Haven, Connecticut, August 1992. Springer-Verlag. 64. L. Hendren, C. Donawa, M. Emami, Guang R Gao, Justiani, and B. Sridharan. Designing the McCAT compiler based on a family of structured intermadiate representations. In Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing, number 757 in Lecture Notes in Computer Science, pp 406-420, New Haven, Connecticut, August 1992. Springer-Verlag. 65. Herbert H. J. Hum, Kevin B. Theobald, and Guang R. Gao. Building multithreaded architectures with off-the-shelf microprocessors. In Proceedings of the 8th International Parallel Processing Symposium, pages 288-294, Canc'un, Mexico, April 1994. IEEE Computer Society. Monographs, Books and Book Chapters 1. G. R. Gao., J-L. Gaudiot, and L. Bic, editors. Advanced Topics in Dataflow and Multithreaded Computers. IEEE Computer Society Press, 1995. 2. Jack B. Dennis and Guang R. Gao. Multithreaded architectures: Principles, projects, and issues. In Robert A. Iannucci, Guang R. Gao, Robert H. Halstead, Jr., and Burton Smith, editors, Multithreaded Computer Architecture: A Summary of the State of the Art, chapter 1, pages 1-72. Kluwer Academic Publishers, Norwell, Massachusetts, 1994. 3. Robert A. Iannucci, Guang R. Gao, Robert H. Halstead, Jr., and Burton Smith, editors. Multi-threaded Computer Architecture: A Summary of the State of the Art. Kluwer Academic Publishers, Norwell, Massachusetts, 1994. Book contains papers presented at the Workshop on Multithreaded Computers, Albuquerque, New Mexico, November 1991. 4. G. R. Gao. A Code Mapping Scheme for Dataflow Software Pipelining. Kluwer Academic Publishers, Boston, Massachusetts, December 1990.