Tutorials

The International Conference on High Performance Computing & Simulation (HPCS 2012)

http://hpcs2012.cisedu.info or http://cisedu.us/rp/hpcs12

July 2 – July 6, 2012

Madrid, Spain

In Cooperation with ACM, IEEE, and IFIP

HPCS 2012 TUTORIALS

T1: Programming the GPU with CUDA (4.0 Hours)

Click to get notes

Manuel Ujaldon, Department of Computer Architecture School of Computer Engineering

University of Malaga, Spain

T2: Distributed Virtual Environments: From Client Server to P2P Architectures (2.0 Hours)

Click to get notes

Laura Ricci, Department of Computer Science

University of Pisa, Italy

T3: High Performance Computing in Biomedical Informatics (3.5 Hours)

Click to get notes

Hesham Ali, College of Information Science and Technology,

University of Nebraska at Omaha, USA

T4: How to correctly deal with pseudorandom numbers in manycore environments?

Application to GPU programming with Shoverand (2.0 Hours)

Click to get notes

Jonathan Passerat-Palmbach and David R.C. Hill

ISIMA/LIMOS -- Blaise Pascal University, Clermont-Ferrand, France

T5: Prototype of Grid Environment for Earth System Models (1.5 Hours)

Click to get notes

Italo Epicoco

University of Salento & Euro-Mediterranean Center for Climate Change (CMCC), Italy

T6: Rare event simulation: the RESTART method (2.0 Hours)

Click to get notes

Manuel Villen-Altamirano

Technical University of Madrid, Madrid, Spain

T1: Programming the GPU with CUDA (4.0 Hours)

Manuel Ujaldon, Department of Computer Architecture School of Computer Engineering

University of Malaga, Spain

BRIEF TUTORIAL DESCRIPTION

This tutorial gives a comprehensive introduction to programming the GPU architecture using the Compute Unified Device Architecture (CUDA). CUDA is an architecture and software paradigm designed for programming a GPU many-core architecture using SIMD extensions of C language, and it is available for Windows, Linux and MacOS users. A compiler generates executable code for the GPU, which is seen by the CPU as a co-processor/accelerator. Since its inception in late 2006, CUDA has achieved extraordinary speed-up factors in a great range of grand challenge applications and has continuously increased its popularity within the High Performance Computing community. To that extent, it is being taught in more than 500 Universities worldwide, also sharing a range of computational interfaces with two competitors: OpenCL, championed by the Khronos Group, and DirectCompute, led by Microsoft. Third party wrappers are also available for Python, Perl, Java, Fortran, Ruby, Lua, Haskell, MatLab and IDL.

The tutorial is organized into two parts: First, we describe the CUDA architecture through hardware generations until we reach Fermi models. Second, we illustrate the way of programming applications using those resources, transforming typical sequential CPU programs into parallel codes. We emphasize the use of CUDA threads hierarchy structured into blocks, grids and kernels, and CUDA memory hierarchy decomposed into caches, texture, constant and shared memory, plus a large register file. For a programmer, the CUDA model is a collection of threads running in parallel which can access any memory location, but, as expected, performance boosts with the use of closer memory and/or collectively read by groups of threads. Illustrating examples will be used to discuss fundamental building blocks in CUDA, programming tricks, memory optimizations and performance issues. The tutorial concludes with a high-level comparison of the CUDA model with other GPU programming models.

T2: Distributed Virtual Environments: From Client Server to P2P Architectures (2.0 Hours)

Laura Ricci, Department of Computer Science

University of Pisa, Italy

BRIEF TUTORIAL DESCRIPTION

Currently, most Distributed Virtual Environments (DVEs) rely on a centralized architecture which supports a straightforward management of the main functionalities of the DVE, such as user login, state management, synchronization between players and billing. However, as the number of simultaneous users keeps growing, centralized architectures show their scalability and applicability limitations.

To overcome these limitations, server clusters have to be bought and operated to withstand service peaks, also balancing computational and electrical power constraints. However, a cluster-based centralized architecture concentrates all communication bandwidth at one data center, requiring the static provisioning of large bandwidth capability. Further, large static provisioning leaves the DVE operators with unused resources when the load on the platform is not at its peak.

Cloud Computing allows to solve the aforementioned scalability and hardware ownership problems because of on-demand resource gathering. On the other hand, many research efforts have been done to design pure P2P-based infrastructures for DVEs. These are inherently scalable, and may relieve the load on centralized servers by exploiting the capacity of the peers. Both Cloud computing and P2P-based have their drawbacks, thus there are incentives to combine on-demand (or Cloud) computing and Peer-to-Peer (P2P) infrastructures. Some recent proposals combine the P2P and cloud paradigm to define a scalable support for Distributed Virtual Environments. The tutorial will introduce the main problems for the definition of a scalable support for DVE, will present the main architectures proposed in the last years and will propose some recent approaches combining P2P and cloud paradigms.

T3: High Performance Computing in Biomedical Informatics (3.5 Hours)

Hesham Ali, College of Information Science and Technology,

University of Nebraska at Omaha, USA

BRIEF TUTORIAL DESCRIPTION

The last decade has witnessed significant developments in various aspects of Biomedical Informatics, including Bioinformatics, Medical Informatics, Public Health Informatics, and Biomedical Imaging. The explosion of biomedical data requires an associated increase in the scale and sophistication of automated systems and intelligent tools to enable the researchers to take full advantage of the available databases. The availability of vast amount of data continues to represent unlimited opportunities as well as great challenges in biomedical research. Developing innovative data mining techniques and clever parallel computational methods to implement them will surely play an important role in efficiently extracting useful knowledge from the raw data currently available. The proper integration of carefully developed algorithms along with efficient utilization of High Performance Computing (HPC) systems forms the key ingredients in the process of reaching new discoveries from biological data. This tutorial focuses on addressing several key issues related to the effective utilization of HPC in biomedical research. We focus on using HPC to efficiently solve key problems in Biomedical Informatics, particularly using network models for the analysis of massive biological data under the systems biology approach. Various models for implementing data analysis algorithms will be introduced and compared using several biological datasets. The tutorial also addresses the energy-awareness issue in the context of Biomedical Informatics. Again, with the massive data being processed and analyzed, considerable computational resources along with significant energy requirements have been employed. We present dynamic models for conserving energy with minimal impact on performance for scheduling computationally intensive biomedical applications.

T4: How to correctly deal with pseudorandom numbers in manycore environments? Application to GPU programming with Shoverand (2.0 Hours)

Jonathan Passerat-Palmbach and David R.C. Hill

ISIMA/LIMOS -- Blaise Pascal University, Clermont-Ferrand, France

BRIEF TUTORIAL DESCRIPTION

This tutorial will present, from the practitioner point of view, the need for sound partitioning techniques of stochastic streams in the domain of stochastic simulations (L’Ecuyer 2010 ) (Hill 2010). For more than 5 years, the High Performance Computing domain has seen an increasing usage of multi-cores and manycore processors architecture. This tutorial will help modelers who are non-specialists in parallelizing stochastic simulations, in distributing rigorously their experimental plans and replications according to the state of the art in pseudo-random numbers partitioning techniques.

We intend to inform attendees on good practice when dealing with pseudorandom streams in parallel. Such considerations are often complicated to take into account without a minimum knowledge on the subject. The attendees will be introduced to our Shoverand framework that enables safe using of pseudorandom facilities on GPU hardware. At the end of the tutorial, they should know how to use Shoverand to feed their GPU-enabled simulations with random streams, and how to integrate their own RNG developments in Shoverand.

T5: Prototype of Grid Environment for Earth System Models (1.5 Hours)

Italo Epicoco

University of Salento & Euro-Mediterranean Center for Climate Change (CMCC), Italy

BRIEF TUTORIAL DESCRIPTION

Users as well as developers of Earth System Models rely on an infrastructure, consisting of high-end computing, data storage and network resources to perform complex and demanding simulations. We have planned and developed a grid prototype used for prototyping and testing a distributed environment for running ensemble experiments. It allows the job submission and monitoring of ensemble climate experiments. Moreover, information of the job status has been enriched with details on the experiment, knowing, at run time, the progress of each ensemble member, involved resources, etc.

The prototype has been tested on a case study related to a global coupled ocean-atmosphere general circulation model (AOGCM) and deployed involving three HPC sites composed of CMCC in Italy, BSC in Spain and DKRZ in Germany. The grid portal is one of the services provided by v.E.R.C.: https://verc.enes.org/computing/job-submission-and-monitoring-portal.

During the tutorial the grid portal and the related technologies will be described presenting also a user-friendly tool for executing and monitoring climate change ensemble experiments. Considering existing grid infrastructures and services, the tutorial will show how it is possible to efficiently exploit the European HPC ecosystem for real computational science experiments. Hence, a tutorial on this environment is relevant for HPCS 2012 attendees in particular for climate end-users and people skilled in HPC and grid computing.

T6: Rare event simulation: the RESTART method (2.0 Hours)

Manuel Villen-Altamirano

Technical University of Madrid, Madrid, Spain

BRIEF TUTORIAL DESCRIPTION

Performance requirements of broadband communication networks and ultra reliable systems are often expressed in terms of events, such as packet losses or system failures, with very low probability, of the order of 10-10. Simulation is an effective means for their evaluation, but the computational time may be prohibitive if acceleration methods are not used.

One such acceleration method for rare event simulation is importance sampling. The basic idea behind this approach is to alter the probability measure governing events so that the formerly rare event occurs more often, but the difficulty of selecting an appropriate change of measure makes it inappropriate for complex systems. Another method is RESTART, introduced by the instructor in 1991, which is more widely applicable. This method is based on performing a number of simulation retrials when the process enters regions of the state space where the chance of occurrence of the rare event is higher. These regions, called importance regions, are defined by comparing the value taken by a function of the system state, the importance function, with certain thresholds. The key point of the application of this method is the choice of a suitable importance function.

In the tutorial, the method will be described and its efficiency analyzed, showing formulas for the acceleration gain obtained and for the parameter values, i.e. threshold values and the number of retrials, that maximize the gain. Guidelines will be provided for the choice of a suitable importance function and several application examples will be exposed.