Paper Session I (10:00am ~ 12:00am)
[10:00am ~ 10:20am]
Title: Code Size reduction by difference classification and customized loop-up table generation [Slides]
Authors: Subrato K. De and Kun Zhang and Tony Linthicum
Abstract: Code size has become an important constraint for applications on mobile devices. Not only should the applications be very responsive and execute fast, they should also consume low power, and be reasonably compact in size to fit in the limited memory of the mobile devices. In this paper we focus on a specific opportunity of code size optimization that is detected in software for mobile devices like wireless networking protocols, modems, etc. These applications have large amount of code-regions across the control-flow graph that are inherently similar, but with slight differences that are compile time constants.
Examples of such similar code regions are: case statements labeled by "jump tables" tail regions merging to a common point, cascaded sequence of similar code-regions with unique predecessor-successor relationship, etc. This paper presents a novel approach where the seemingly similar code-regions across the control flow graph are classified into disjoint sets based on their differences, and then replaced by a single representative code-region with the differences encoded in customized look-up tables resulting in reduced code size.
The algorithm uses heuristics to determine the profitability of performing the transformation. In particular, the methodology shows remarkable reduction in code size for large switch-case constructs that have very similar case statements, which are traditionally implemented using jump tables by most compilers.
Implementation is done in the Open64 compiler, even though it could be integrated with any other compilers (e.g., GCC). We also provide some results of code size reduction, which are in the range of 30% to 80% for pure text (instructions only), and 5% to 60% for total text (instructions and read only data), achieved by the algorithm on some of functions in software applications for mobile devices.
[10:20am ~ 10:40am]
[10:40am ~ 11:00am]
Title: A Demand-driven Alias Analysis Implementation Based on Open64 [Slides]
Authors: Xiaomi An
In this paper, an implementation of a demand-driven alias analysis  in Open64 is presented. In the algorithm, a program expression graph is constructed based on all the expressions and assignments in the program, and the memory alias problem is formulated as a CFL-reachability problem. To deal with field accesses of structs which are common in multi-media applications, a field-sensitive extension of the original algorithm is also implemented. Currently, the field-sensitivity assumes ansi-compliant programs. By evaluating the implementation using some spec2000 programs, we found that the scalability of the original algorithm is not able to give enough precision in a reasonable compile time. To improve the scalability, a new one-level flow demand driven algorithm is developed, in which the hierarchical state machine is simplified by making machine M transitive while still keeping machine V non-transitive. The algorithm can achieve the same precision as Das’s algorithm , but uses a demand driven approach. Although the new one-level flow algorithm lost some of the precision for making machine M transitive, it can finish analysis of more queries for a given reasonable compile time and give more definite alias results. In other words, it can give better precision than the original algorithm and is more practical for product compiler. From experiment results, we can see that the algorithm can finish about half of the queries in a short compile time, and give much more “not aliased” results than the original algorithm.
[11:00am ~ 11:20am]
Title: Extending Open64 with Transactional Memory Features [Slides]
Authors: Jiaqi Zhang and Wenguang Chen and Weimin Zheng
Abstract: The fast development of parallel platforms is demanding more parallelism in modern applications. However, the manipulation of mutual-excluded memory accesses is obstructing the way towards high productivity in parallel software development for shared memory system. Transactional Memory (TM) is a promising paradigm that helps abstract the complexity of concurrency while keeping the scalability. And the compilers for TM are needed in order to facilitate both parallel programmers and TM researchers. This paper describes the design, detailed implementation, and optimization of our extension of TM features in Open64. The preliminary experimental results show that our optimized implementation of the compiler, together with a high performance TM runtime, have the potential to be competitive to parallel programs with fine grained locks.
[11:20am ~ 11:40am]
Title: Aggressive Program Analysis Framework for Static Error Checking in Open64 [Slides]
Authors: Hong-Tao Yu and Wei Huo and Zhao-Qing Zhang and Xiao-Bing Feng
Abstract: Nowadays error checking becomes more and more significant for constructing high reliable software. In this paper, we will introduce our work of integrating static error checking into Open64. We are devoting to construct an aggressive program analysis framework for error checking in the compiler. We integrate the intraprocedural analysis into interprocedural phase in order to do flow- and context-sensitive whole program analysis. The precision of alias information can heavily impact many consequent analyses and error checking. We also have improved the original alias analysis to be field-sensitive so that field members of the same structural object can be distinguished in the resulted points-to graph.
[11:40am ~ 12:00pm]
Title: A SoC simulator, the newest component in Open64 Report and Experience in Design and Development of a baseband SoC [Slides]
Authors: Wendong Wang, Tony Tuo, Kevin Lo, Dongchen Ren, Gary Hau, Jun zhang, Dong Huang
Abstract: There are lots of
challenges in designing and developing complex SoCes like wireless baseband processors: time-to-market requirement, flexibility to support multi-standards, power consumption and cost. Hardware and software co-design method is usually used to solve these design problems. A good simulation environment is necessary for performance evaluation and correctness validation in hardware and software co-design approach. In this paper we report a SoC simulation infrastructure which is used for hardware/software codesign and flexible to retarget. And experiment on design one baseband processor based on the simulation environment is also present. It is proved that time is reduced obviously when proposed codesign approach is used.
Paper Session II (1:30pm ~ 2:30pm)
[1:30pm ~ 1:50pm]
Title: Opencj: A research Java static compiler based on Open64 [Slides]
Authors: Keqiao Yang and Zhemin Yang and Zhiwei Cao and Zeng Huang and Di Wang and Min Yang and Binyu Zang
Abstract: As Java becomes more pervasive in the programming landscape even in HPC applications, it is very important to provide optimizing compilers and more efficient runtime systems. To this end, we try to leverage the synergy between static and dynamic optimization to exploit more optimization chances and improve Java runtime performance especially for server applications. This paper presents our first achievement of implementing a Java static compiler Opencj which can perform fully optimization for Java applications. Opencj is developed based on Open64 with some optimizations added for Java applications. Efficient support for exception handling and virtual method call resolution fulfills the demands which are imposed by the dynamic features of the Java programming language. Opencj takes Java source files or class files as inputs and generates machine dependent executable code for Linux/IA32. It enable us to evaluate the performance gap between Java and C/C++ programs. By now, the evaluation shows a similar peak performance for the scientific SciMark 2.0 benchmark suite between Java and C which compiled by Opencj and Open64, correspondingly. The evaluation also illustrates that the performance of Opencj is better than GCJ for SPECjvm98 benchmark suite.
[1:50pm ~ 2:10pm]
Title: OpenMP 3.0 Tasking Implementation in OpenUH [Slides]
Authors: Cody Addison and James LaGrone and Lei Huang and Barbara Chapman
Abstract: As multicore technology dominates the processor market, new methodologies are being explored to exploit the parallelism inherent to these architectures and shared memory programming models are gaining in popularity. The ratification of the OpenMP 3.0 API has provided compiler developers with another challenge as the multicore revolution reshapes the landscape in scientific computing. The introduction of explicit tasking in this latest revision of the de facto standard for shared memory programming introduces new capabilities for parallel programming. Tasking abilities in OpenMP allow irregular applications with pointer based data and recursive algorithms to be executed in parallel, as well as providing alternative parallelization techniques for traditional loop-centric codes. This paper outlines the implementation of OpenMP 3.0 tasking features in OpenUH, a branch of Open64 compiler suite.
[2:10pm ~ 2:30pm]
Title: Open64 on MIPS: porting and enhancing Open64 for Loongson II [Slides]
Authors: Zhou Shuchang and Liu Ying and Lu Fang and Yin Le and Huang Lei and Li Shuai and Ma Chunhui and Gao Zhitao and Lian Ruiqi
Abstract: Loongson II is a MIPS III-compatible platform with various modern features, including a state-of-art memory subsystem. In the process of porting and enhancing Open64 on Loongson II, we observe that some optimizations lacking or being rudimentary in Open64, like edge profiling and array contraction, prove crucial for performance. We also develop LIDO, Locality Inspired Data Optimizer, to facilitate the optimization of data structure by source-to-source transformation. In orchestra of the powerful optimization framework of Open64, our work results in 28.5% ~ 78.4\% performance improvement in SPEC2000INT/FP over latest GCC on Loongson 2F.
Short Paper Session (2:30pm ~ 3:15pm)
[2:30pm ~ 2:45pm]
Title: A Context-Sensitive Pointer Analysis Phase in Open64 Compiler [Slides]
Authors: Tianwei Sheng and Wenguang Chen and Weimin Zheng
Abstract: The precision of the pointer analysis plays an important role in compiler and other software understanding tools. Context-sensitive is proven to be an effective methods to improve the precision of the final results of pointer analysis. Many context-sensitive pointer analysis methods have been proposed, however, few of them have been incorporated into production compilers. In Open64 compiler, it still uses a flow-insensitive, context-insensitive, and field-insensitive pointer analysis. In this paper, we design and implement a cloning-based inter-procedural context-sensitive pointer analysis in Open64 compiler. We expect that the new pointer analysis phase will improve the performance and give a good infrastructure for any program analysis tool based on Open64 compiler.
[2:45pm ~ 3:00pm]
Title: Retargeting Open64 to A RISC processor - a Student's Perspective [Slides]
Authors: Huimin Cui and Xiaobing Feng
Abstract: This paper presents a student's experience in Open64-Mips prototype development, we summarize three retargeting observations. Open64 is easy to be retargeted and the procedure takes only a short period. With the retarget procedure done, the compiler can achieve good and stable performance. Open64 also provides many supports for debugging, with which a beginner can debug the compiler without difficulty. We also share some experiences of our retarget, including methodology of verifying the compiler framework, attention to switches in Open64, importance of debugging and reading generated code.
[3:00pm ~ 3:15pm]
Title: A Software Pipelining Framework for Simple Processor Cores [Slides]
Authors: Juergen Ributzka and David Stephenson and Tim Kong and Dee Lee and Fred Chow and Guang R. Gao
Abstract: Current trends in many-core architectures show a switch from a small number of architecturally sophisticated cores (e.g. Intel Core2, IBM PowerPC) to many simple cores (e.g SiCortex and Tilera multiprocessor). These simple cores lack many of the advanced features of the complex cores (e.g. out-of-order execution, rotating register files, predication, speculation, etc.), which puts extra burden on the compiler to produce an efficient schedule, especially for loops. We provide an advanced loop scheduling framework using software pipelining, which does not rely on any special hardware support. This open-source framework makes software pipelining available to simple cores and helps to mitigate the lack of advanced hardware features. The paper also provides measurements to demonstrate the effectiveness of this framework.