Applications Exploiting e-Infrastructures Across Europe and India Within the EU-IndiaGrid Project

Grid rooted in distributed and high performance computing, started in mid-to-late 1990s. Soon afterwards, national and international research and development authorities realized the importance of the Grid and gave it a primary position on their research and development agenda. The Grid evolved from tackling data and compute-intensive problems, to addressing global-scale scientific projects, connecting businesses across the supply chain, and becoming a World Wide Grid integrated in our daily routine activities. This book tells the story of great potential, continued strength, and widespread international penetration of Grid computing. It overviews latest advances in the field and traces the evolution of selected Grid applications. The book highlights the international widespread coverage and unveils the future potential of the Grid.


Introduction
In the last few years e-Infrastructures across Europe and India faced remarkable developments. Both national and international connectivity improved considerably and Grid Computing also profited of significant developments.
As a consequence scientific applications were in the position of taking substantial benefits from this progress. The most relevant cases are represented by High Energy Physics (with the contribution to the program of Large Hadron Collider at CERN, Geneva) and Nano Science (exploiting NKN-TEIN3-GEANT interconnection for crystallography experiments with the remote access & control of experimental facility at the ESRF Synchrotron based in Grenoble, France directly from Mumbai, India). Other relevant application areas include Climate Change research, Biology, and several areas in Material Science.
Within this framework, in the last five years period two specific EU funded projects (the EU-IndiaGrid and EU-IndiaGrid2) played a bridging role supporting several applications that exploited these advanced e-Infrastructures for the benefit of Euro-India common research programs. EU-IndiaGrid2 -Sustainable e-infrastructures across Europe and Indiais a project funded by European Commission under the Research Infrastructure Programme of the Information and Society Directorate General with the specific aim of promoting international interoperation between European and Indian e-Infrastructures. The project started in January 2010 and will close at the end of 2011. EU-IndiaGrid2 strongly bases and capitalizes on the achievement of its precursor the EU-IndiaGrid project (lasting from 2006 to 2009), whose contribution in bringing forward EU-Indian collaboration in e-Science and effectively mobilising actors on both sides, was widely recognised both in Europe and India. A crucial important part in the project activity was the support offered to selected applications which ranges from the training the user communities behind up to the porting of their scientific applications on the grid computing infrastructure.
This article aims to present and review the main e-Infrastructures development in India and their full exploitation by scientific applications with a focus on the role played by the EU-IndiaGrid and EU-IndiaGrid2 projects.

Fig. 1. NKN Layers
This approach responds to a vision where different activities in the research domain but also other areas as Healthcare, or e-Governance can move from a silos-like structure to a gradual share of the upper layers from the network to the computing (grid and HPC) and the data management. This e-Infrastructure can provide a core of services not affordable to an individual application jumping across geographical, administrative and academic boundaries (see figure2).
The NKN infrastructure is entirely fiber based and owned by Government of India. It relies on a high capacity highly scalable backbone and covers the entire country. NKN will connect more than 5000 sites across the country serving million of end-users and all major escience projects. In the vision of Prof. Raghavan, Scientific Secretary to Principal Scientific Adviser to Government of India and Chief Architect and Chairman of Technical Advisory Committee of NKN, NKN represents for education a great integrator, making reachable and available the collective wisdom of institutions and laboratories with every Indian, irrespective of the geographical location, being able to benefit by accessing this vast www.intechopen.com intellectual resource. These features are also of paramount impact for research and for health related applications. In this case NKN provides is "Clear Visibility" of whatever medical records are generated at the remote end X Rays, 2D and 3D,MRIs, CT Scans, PETs and so on. Moreover thanks to the very high bandwidth and very low latency a patient requiring critical attention and an expert opinion can be remotely seen, examined, diagnosed, and treated.

287
NKN implementation strategy consists of two phases the initial phase and the final phase. Initial phase is already operational with a fully meshed core backbone spanning across the country with twenty-three points of presence (PoPs) and connecting 90 institutions. Efforts are underway to scale the number of institutions connected to 550 in the next future. In the initial phase NKN is already providing services for virtual classrooms and grid computing applications such as High Energy Physics to Climate modelling and Health care as well as collaborative design of advanced complex engineering systems.

International connectivity
NKN is connected to the Pan-European Research network, GÉANT (www.geant.net), by means of a 2-5 Gbps link co-funded by Government of India and the European Commission within the framework of the TransEurasia Information Network project phase 3 (TEIN3, http://www.tein3.net). TEIN3 provides Internet network to the research and education communities in the Asia-Pacific area serving more than 8000 research and academic centre and 45 million users. TEIN3 includes 19 partners and rely on four hubs: Mumbai, Singapore, Hong Kong, Beijing (see Figure 4). For India TEIN3 provides a 2.5 Gbps link from Europe to Mumbai and also a 2.5 Gbps link from Mumbai to Singapore. In this vision, for research collaborations, Mumbai represents, at the same time, the Gate of India and the gateway to Asia-Pacific area.

www.intechopen.com
In addition it is under way an agreement to connect the Open Science Grid in the USA in cooperation with the USA National Science Foundation and to connect to Japan a dedicated high speed link from Chennai. The NKN-TEIN3 connectivity is successfully exploited by several research applications as described in the sections below.

Grid infrastructures
Two main Grid Initiatives are present in India: the Regional Component of the WorldWide LHC Computing Grid (WLCG, http://lcg.web.cern.ch/LCG/) and the GARUDA National Grid Initiative (http://www.garudaindia.in/). Both are strongly connected with the EU-IndiaGrid2 project thanks to the presence, as project partners, of the leading actors of both initiatives.

The Worldwide LHC Computing Grid in India
WLCG represents the world largest grid infrastructure and with the start of data taking at the CERN Large Hadron Collider (LHC) WLCG entered the full production phase (Bonacorsi, 2011). The data from the LHC experiments will be distributed around the globe, according to a four-tiered model. India participates with two Tier2 centres supported by the Department of Atomic Energy (DAE). One at the Tata Institute for Fundamental Research (TIFR) Mumbai provides the services for the CMS experiment. The other one, dedicated to the ALICE experiment, is located in Kolkata and managed by the Saha Institute of Nuclear Physics in collaboration with the Variable Energy Cyclotron Centre (VECC).
These Tier2 centres provide access to CMS & ALICE users working from Tier III centres of Universities and national labs and LCG Data Grid services for analysis. TIFR is presently connected to CERN with the 2.5 Gb/s TEIN3 link via NKN. The ALICE TIER2 centre is also connected via NKN at 1 Gb/s. Thanks to a coordinated action of the main actors involved participating also to the EU-IndiaGrid2 project TIFR is successfully exploiting the TEIN3 connectivity for LHC data transfer since December 2010.

GARUDA: The national grid initiative of India
GARUDA is India's first national grid initiative bringing together academic, scientific and research communities for developing their data and compute intensive applications with guaranteed Quality of Service. The project is coordinated by Centre for Development of Advanced Computing (CDAC) under the funding of Department of Information Technology (DIT) of Government of India. GARUDA involves 36 Partner institutions with 45 research and academic centres. GARUDA ended its Proof of Concept phase in 2008 and concluded the Foundation Phase in August 2009. Afterwards GARUDA has moved successfully to the Operational phase, thereby providing sustainable production grade computing infrastructure as a service to its partners. With this endeavour, the GARUDA grid is playing a key role in accelerating the research work by inter-connecting academicians, researchers, policy makers and the masses at large. Applications of national importance are being hosted on the vast infrastructure offered by the GARUDA grid, to solve grand challenge problems of national priority as Disaster Management (DMSAR) and Bio informatics. At present GARUDA network infrastructure is provided by NKN (see section 2.1), consolidating this way the integration between NKN and Indian grid infrastructures and related applications.
Within the grid infrastructure various resources such as High Performance Computing systems (HPC) and satellite based communication systems have been committed by different centres of C-DAC and GARUDA partners. GARUDA Grid is composed of various heterogeneous computing resources such as HPC clusters and SMP systems running on AIX, Linux and Solaris. At the moment GARUDA is offering several thousand of cores. The total computational power available today on Garuda is approximately 65 Teraflops.
GARUDA is well equipped with the provisioning of a customized middleware stack that can effectively harness this diverse range of resources being available on the grid. The middleware stack is enabled with CDAC's in-house developed, efficient reservation managers that ensure high availability & reliability of the resources to the applications running on the production grid infrastructure. Basic grid middleware services are provided by GlobusToolkit -4.0.8 on the top of which Gridway metascheduler has been enabled.

The role of EU-IndiaGrid & EU-IndiaGrid2 projects
As discussed in the sections above during the period of the EU-IndiaGrid project activity and right at the start of EU-IndiaGrid2 e-Infrastructures in India marked a considerable progress (Masoni, 2011). The leading responsibilities of EU-IndiaGrid Indian partners and the project bridging role between European and Indian e-Infrastructures gave to EU-IndiaGrid project the opportunity to be at the core of this development and to effectively contribute at improving cooperation between Europe and India in this domain.
In particular the activities related to the High Energy Physics applications were fully integrated in the framework of the Worldwide LHC Computing Grid Collaboration with particular concern with the ALICE and CMS experiments active in India.
Moreover a dedicated Virtual Organization (VO euindia) was made available by the project to the users from the beginning of 2006. Such a VO included several grid resources distributed across Europe and India and it was fundamental to allow users to deploy and then use the grid infrastructure for their research.
Grid resources available to the user communities via the dedicated VO comprises grid sites installed and configured mainly using gLite 3.2 middleware running on 64bit Linux Operating System. Available hardware is relatively recent and includes several multicore (4, 8,12,24 CPU-cores) Worker Nodes. A snapshot taken at end of December 2010 shows that VO members have at disposal , on a best effort basis, over about 7300 CPU-cores (1800 CPUsockets) that represents a computing power of around 20 MSI00; on the storage side, the VO can use up to 44 TB of total space.
The EU-IndiaGrid2 project installed and currently maintains all the gLite central services needed to support the operational status of the user applications belonging to the EUIndia Virtual Organization.
As discussed in the next section on the top of this standard gLite infrastructure some advanced services have been installed and configured. We stress here the importance to have our own dedicated EUIndia services that allow the project to easily experiment and configure additional services on the request of users.
The increase of usage of the EU-IndiaGrid infrastructure, combined with scientific results obtained and presented at relevant international conferences or published on journals represent a clear measure of success of the user communities activity. The project Workshops and Conferences dedicated to the different applications were an important vehicle for the dissemination of results and for fostering the adoption of grid technology toward the scientific community and not only. In addition, supporting and addressing the problem of the interoperability at the application level further contributed to promote the use of advanced grid technology, and the cooperation between different projects and Institutes. Applications and user communities behind can thus be regarded as a key to sustainability, and they can help motivating the investment in e-Infrastructures.
EU-IndiaGrid2 which started on January 2010, leveraged on the EU-IndiaGrid project achievements and the strong cooperation links established with the foremost European and Indian e-Infrastructure initiatives and then paved the way for successful sustainable cooperation across European and Indian e-Infrastructures.
EU-IndiaGrid2 is strongly integrated in the Indian e-Infrastructure scenario. Its partners take leading roles in NKN, WLCG and GARUDA and a solid and fruitful cooperation has been established between these initiatives and the EU-IndiaGrid2 project.
EU-IndiaGrid2 provided specific support to ensure exploiting the progress in connectivity favouring Euro-India cooperation in e-Science. The project supports the interoperation and interoperability between the European and the Indian grid infrastructures as well as four main application areas in the domain of Biology, Climate Change, High Energy Physics and Material Science. The main landmarks during the time life of the projects includes: - The interoperation between GARUDA and worldwide Grid infrastructure - The exploitation of the TEIN3 link for LHC data transfer www.intechopen.com The exploitation of grid services to study the Indian Monsoon - The exploitation of grid services to develop advanced seismic hazard assessment in India With the transition from ERNET to NKN for the network layer an interoperation problem occurred since all the nodes within the GARUDA grid became not visible to the external world. Thanks to the effort, coordinated by the EU-IndiaGrid2 project, ERNET, NIC and CDAC it was possible to solve this issue in the context of the EU-IndiaGrid2 Workshop in December in Delhi and since end 2010 all the GARUDA infrastructure is visible to worldwide grids. In addition the project supported the interoperability between the European Grid Initiative, EGI (www.eigi.eu) and the GARUDA grid infrastructure which is now possible using a metascheduler based on Gridway (Huedo 2005).
The TEIN3 link from Europe to Mumbai was commissioned in March 2010. However a number of issues related to the connectivity between the TEIN2 PoP and the WLCG Tier2 at TIFR needed to be solved. Again with the coordinated effort of NIC and EU-IndiaGrid2 partners it was possible since fall 2010 to exploit the TEIN3 links for LHC data transfers.
Considering that the Academia Sinica Computing Centre acts as reference Tier1 for CMS Tier2 at TIFR both TEIN3 links (to Europe for CERN and to Singapore for Academia Sinica Tier1) are crucial for WLCG operation in India. In addition the commissioning of the 1 Gbps NKN connectivity from Kolkata to Mumbai makes the international connectivity available also for the ALICE experiment.

Finally the collaboration between Bhabha Atomic Research Centre (BARC) in Mumbai and
Commissariat pour l' Energie Atomique (CEA) in Grenoble represents an excellent showcase for the usage of NKN-TEIN3-géant connectivity for remote control and data collection at the Grenoble beam facility. The BARC and CEA research groups collaborate in experiments dedicated to the study of crystallography of biological macromolecules. using protein crystallography beamlines. Two facilities have been set-up in India allowing to operate remotely the beamline FIP on ESRF, Grenoble. Good X-ray diffraction data has been collected on crystals of drug resistant HIV-1 protease enzyme. Both BARC and CEA are EU-IndiaGrid2 partners and this activity is fully supported by the EU-IndiaGrid2 project.

Tools and methodologies within EU-IndiaGrid projects
In this section we will briefly discuss some tools and some methodologies we successfully developed within the lifetime of the two EU-IndiaGrid projects in order to enable a full exploitation of the scientific applications we promoted. The motivations behind such development effort rely on the requirements of the user communities involved in the projects. User communities issued several requests. In particular users wanted: i. Training additional tools and methods to learn how to use the Grid.
ii. Specific advanced service to better implement their computational scientific packages on the GRID. iii. Tools to use easy and seamlessly all the grid infrastructures made available.
In the following subsection we will highlight three different actions, one for each category listed above.

Parallel support on GRID
Many scientific applications, like for instance climate modelling simulations, require a parallel computing approach and many tasks are also of the tightly coupled type. The question of how to run in parallel on the grid is therefore of great importance and we want to address here. Nowadays multicore architectures are widely available, even on the European GRID, but they are only suitable for small and medium size jobs. Distributed memory, multi-node clusters are still the only viable tool for serious scientific computing done generally through the MPI paradigm.
Our aim was thus to provide a simple, transparent and efficient mechanism to exploit MPI distributed memory parallelism over capable GRID CEs.
As of today, the gLite middleware does not yet provide proper MPI support. gLite is now integrating the MPI-start mechanism, a set of scripts, which should make it easy to detect and use site specific MPI-related configuration. In fact, it can select the proper MPI distribution, the proper batch scheduler and it can distribute the files if there is no shared disk space. The MPI-Start scripts will also handle user's pre/post execution actions. However from the point of view of users the JDL attributes that could characterize MPI enabled CEs in job descriptor files are misleading and they describe a wrong level of abstraction. The EGEE-MPI working group proposed (more than one year ago) three new attributes added to CPUnumber in order to request explicitly MPI-type distributed resources. These are  WholeNodes: to ask for all the cores on a specified node  SMPGranularity: to determine how many cores you would like to use on every single node  HostNumber: to specify the total number of nodes you wish to run on Even if it is still an open question, whether WholeNodes has priority over SMPGranularity and whether SMPGranularity has priority over CPUnumber, they could provide however a great improvement to submit parallel jobs on the gLite infrastructure. Unfortunately these attributes are still to be implemented on gLite middleware.
There are however some patches available to enable the WMS and the CREAM CE to recognize these new attribute. The EUIndia WMS and some computing elements CEs have been therefore patched and the patches are now available and distributed within our Virtual Organization. The new attributes allow the GRID users to submit transparently their MPI parallel jobs to the MPI capable resources and furthermore fine-tune their request matching it to the job requirements.

GRIDSEED training tool
The GRIDSEED tool (Gregori, 2011) , a set of virtual VM machines preconfigured with several grid services, is by far the most successful training tool developed within EuindiaGRID projects. GRIDSEED is not just a software package, but a complete training tool containing a software package and all the information to use it as training platforms, targeting different user community and different kind of aspects (users vs. sys administrator training).
www.intechopen.com The development of this tool, coordinated by CNR/IOM laboratory for e-science (eLab) continued after the closing of the first project and was quite active also during the EUIndiaGRID2 project. This tool was used in several ICTP training events and in the entire official training events of the projects. Gridseed is now an open laboratory where interoperability solutions and ideas are tested and experimented jointly by European and Indian partners (Amarnath, 2010). Version 1.6.2 was released in December 2010. This latest version includes not only gLite services but also Globus/GARUDA Services allowing setup a portable and interoperable environment between the two Grid infrastructures of the project. Components of GRIDSEED were downloaded more than 2000 times so far.
GRIDSEED provides a simple tool to setup a portable and interoperable Grid infrastructure based on virtual machines. The GRIDSEED tool was originally developed to easily deploy a training gLite Grid infrastructure almost everywhere in the world with a set of machines (simple PC's) locally connected among them as the only requirement. It uses standard virtualization tools (VirtualBox and/or VMmare) easily and widely available. Recently the virtual environment is enriched with other different middleware (Arc and Globus Toolkit) to make it the first virtual training laboratory for interoperability and interoperation among different middleware. GRIDSEED is therefore a complete training environment formed by a virtual infrastructure complemented by some demo applications and training materials ready to be used both in standard training events and advanced interoperation demo/events session.

General services toward interoperability: Milu software and Gridway installation
GARUDA and gLite users are using different method to access and use resources belonging to the two different infrastructure. To avoid this burden and to make the usage of both infrastructures as simple as possible the MILU tool, originally conceived as a portable gLite User Interface, was further developed and enhanced in collaboration with eLab .
Miramare Interoperable Lite User interface (MILU) is now a software tool which allows seamless usage of different grid infrastructures from the same Linux workstation.
MILU is a repackaging of the user-interface software provided by gLite, ARC and the Globus Toolkit (version 4), providing access to the functionality of all three middlewares concurrently. Extensive testing and an ingenious use of UNIX tricks allow MILU binaries to run on a large variety of Linux distributions (CentOS, Debian, Ubuntu, etc.), including some that are not supported by the original upstream middleware packages. MILU is packaged as a single archive that users can extract and install into their home directories by running a single shell script; no super-user privileges or technicalities are needed. MILU is distributed with a configuration ready for use for several VOs; new configurations can be added by the users (and we encourage submission upstream, so that more people can benefit from prepackaged configuration).
MILU is already in use by the EU-IndiaGrid and the e-NMR VOs, plus other groups of local grid users.
We think that MILU could be interesting to community developers, in addition to nontechnical users, who have been historically the target audience of MILU. Indeed, MILU can be the ideal tool for quickly enabling Grid client access on a Linux machine, for the purpose of rapid prototyping a new tool, or for deploying test/debug instances of running services.
We believe that MILU, possibly extended in the future to include the EMI "unified client" to be, can have an impact for the users and developers belonging to emerging communities, as a lower-level tool upon which more sophisticated Grid access mechanisms can be built. There is a clear need for a tool like MILU:  To provide a smooth transition between different middleware systems, e.g., when the old software still needs to be around as the new one is too immature to completely replace it, or when two infrastructures with different access methods have to be bridged.  To provide a preconfigured environment that can satisfy the needs of the average scientific user, who does not care about the technical details and only needs a tool that "just works".
Milu 1.1 is now distributed with Gridway middleware bundled together and properly configured to access both GARUDA and gLite infrastructure. User communities have therefore at disposal a simple and efficient interoperability tool to use both infrastructures at the same time. Milu was downloaded more that 2500 times so far. (see http://eforge.escience-lab.org/gf/project/milu/)

Applications exploiting e-infrastructures
In the course of the last five years a specific effort was dedicated to the support of several user communities including Biology, High Energy Physics, Material Science, and Earth & Atmospheric Sciences.
For each user community specific applications were deployed on the grid infrastructure and each application was supported by a collaboration of European and Indian partners. Scientific and technical results were presented at relevant international conferences or published on journals and represent a clear measure of success of the user communities' activity.
A short guideline on how to enable applications on Grid infrastructure has been also drawn up by the EU-IndiaGrid projects based on the experience collected within various user communities. The document had the goal to offer a first support to users interested in using the EU-IndiaGrid infrastructure to its best. In such a document we propose that a successful procedure for Grid-enabling application should be performed in the following five major steps: Step: Awareness of Grid computing opportunities/analysis of the computational requirements  Step 1: Technical deployment on the infrastructure  Step 2: Benchmarking procedures and assessment of the efficiency  Step 3: Production runs and final evaluation  Final Step: Dissemination of the results among peers.
At the end of each intermediate step an evaluation procedure takes place to understand if the action should move to the next step, should be stopped with the execution of the final step, or should go back and repeat the step previously done. It is worth to note that such procedure was elaborated keeping in mind the point of view of the users: this is why we insist that the procedure should have in any case (even if it is stopped after the initial step) a final dissemination phase. The results (even negative) can be of great importance for other users as initial input when starting the Grid-enabling procedure. Several successful porting stories followed this approach as reported in the EU-IndiaGrid deliverables dedicated to applications and in other publication as well, see e.g. a few papers in (Cozzini, 2009).
In the following subsection we report two successful and outstanding examples of scientific application within the project, which exploited at best the euro-Indian infrastructure in a joint collaboration among Indian and European partners.

Climate change simulations
Climate research activities were mainly conducted in a joint and strict collaboration among ICTP EU-IndiaGrid team and the Computational Atmospheric Science (CAS) Department at IIT Delhi lead by Professor Dash.
Climate modelling is among the most computational power and data intensive fields, it is one of the key users of High Performance Computing Infrastructure. Grid infrastructure could be suited well to perform climates experiments since it is the best alternative for HPC as it facilitates computing power and storage space required. Moreover, it allows running a full state-of-art model and store regular output information unlike the volunteer hosts. Grid usage for climate simulation is becoming more important but still has some big limitations:  Limited MPI/parallel support  Complexity and heterogeneity of middleware  Data management issues: moving terabytes of data is not easy Within EuindiaGrid2 project we address and analyse the issues above presenting novel methodology to overcome some of them (data management through opendap protocols, MPI support on multicore platforms using "relocatable package" mechanism, interoperability by means of Gridway to solve complexity and heterogeneity of middleware). Scientific work was then done in order to study Indian Moonson.
Monsoon is one of the most important phenomenon's that influences various aspects of India to a great extent. India has an agro-based economy and this fact makes it crucial and important that various facets of monsoon and the associated rains be predicted as accurately as possible. It is a challenging task to all scientists for developing and tuning models for obtaining beneficial forecasts. The Indian monsoon extends over a distance of nearly 3000 km, directed to the southwest from the Himalayas [13]. Monsoon lasts from June to September. The season is dominated by the humid southwest summer monsoon, which slowly sweeps across the country beginning in late May or early June. Monsoon rains begin to recede from North India at the beginning of October. South India typically receives more rainfall. Monsoon features are difficult to be simulated by Global Climate Model (GCM) primarily because of large temporal and spatial variations.
For this reason, the ICTP regional climate model named RegCM in its last version (4.1.1) was used to study the Indian Monsoon.
Specifically a careful tuning the RegCM4.1.1 package was performed to get the correct parameterization for Indian summer monsoon simulations.
In the following subsection we will discuss the porting strategy of the package toward the Euro -Indian infrastructures and the innovative solution we developed.

RegCM 4.1.1 exploitation on the GRID
RegCM4 is the fourth generation of the regional climate models originally developed at the National Centre for Atmospheric Research (NCAR) and currently being developed at ICTP, Trieste, Italy. It was released in June 2010 as a prototype version RegCM4.0 and as a complete version RegCM4.1 in May 2011 (Giorgi 2011). A bug fixing release RegCM4.1.1 was then released in June 2011.
The RegCM4.1 package now uses dynamic memory allocation in all its components. The whole output subsystem is now able to produce NetCDF format files, which is in better compliance with the standard Climate and Forecast Conventions. The RegCM modelling system has four components namely Terrain,sst,ICBC, RegCM and some postprocessing utilities. Terrain sst and ICBC are the three components of RegCM preprocessor: Th first step will define the domain of the simulation: the executable terrain horizontally interpolates the landuse and elevation data from a latitude-longitude grid to the cartesian grid of the chosen domain.
Sst executable then creates the Sea Surface Temperature file. Finally the ICBC program interpolates Sea Surface Temperature (SST) and global re-analysis data to the model grid. These files are used for the initial and boundary conditions during the simulation.
The pre-processing phase will therefore need to read global re-analysis data (size of the order of several dozen GB for each year of simulation) and will produce a large size of input data as well. RegCM itself produces than a large amount of output data.
As an example, a monthly run on domain of 160x192 points will produce around 10 GBytes of data, which means around 10 TBytes of data are produced for a climate simulation lasting a century. This data needs to be locally available, as a remote file system would slow the simulation down dramatically.
The greatest challenge in running RegCM on the GRID is therefore handling the data properly.

RegCM on gLite
The present day status of the RegCM software and the GRID gLite infrastructure makes it not really suitable for long production runs, which require a number of CPUs in the 64-256 range, but a well implemented MPI handling mechanism, such as MPI-Start, makes the running of small to medium size RegCM simulations feasible. The data transfer from and to the GRID Storage Elements is still a matter of concern and its impact on performance should be investigated thoroughly in the future.
The GRID could be therefore used with proficiency for physical and technical testing of the code by developers and users, as well as for "parametric" simulations, that is. running many shorter/smaller simulation with different parameterisation at the same time.
As said the RegCM preprocessing part requires a big data ensemble (several TBytes) to be locally available every time it is run. This is quite impossible to accomplish on the GRID so the preprocessing needs to be always performed locally before submitting the job. A typical run will be started by uploading the input data previously stored on a SE. This can be accomplished by a pre-run hook script. Afterwards the actual execution will be handled by MPI-Start in a transparent way on the appropriate resources requested. Once the execution is over the data produced by the simulation will be transferred on a SE and the job will terminate. This is accomplished by a post-run hook script. In this way a RegCM simulation can be run on GRID resources.
To run RegCM properly on a MPI-Start enabled resource will actually require a compilation to be performed in advanced. This means that the RegCM software should be made available on the Grid resources by means of the Vo software managers and the grid site will then publish the availability of such a software.
Not many CEs support MPI-Start and the new MPI-related attributes in the JDL script files so we provide the possibility to run RegCM (or any other MPI parallel application actually) through a "relocatable package" approach.
With this approach all the software needed, starting from an essential OpenMPI distribution, is moved to the CEs by the job. All the libraries needed by the program have to be precompiled elsewhere and packaged for easy deployability on any architecture the job will land on. The main advantage of this solution is that it will run on almost every machine available on the GRID and the user will not even need to know what the GRID will have assigned to him. The code itself will need to be compiled with the same "relocatable" libraries and shipped to the CE by the job.
This alternative approach allows a user to run a small RegCM simulation on any kind of resource available to him, even if not a MPI-Start enabled one, although it is actually aimed at SMP resources, which are quite widely available nowadays. The main drawback of this solution is that a precompiled MPI distribution will not take advantage of any high speed network available and will not be generally able to use more than one computing node. The "relocatable" solution will though be able to use an available SMP resource, making it a reasonably good solution to run small sized RegCM simulations on any GRID resource available.
RegCM4.1.1 manages all the I/O operation through netcdf data format, provided by the netcdf library. This allows the use of the OPeNDAP Data Access Protocol (DAP), which is a protocol for requesting and transporting data across the web. This means that all the RegCM input/output operations can be done remotely without no need to upload/download data through grid data tools. Any data server providing opendap protocol (for instance THREDDS Server) can therefore be used to provide global dataset for creating DOMAIN and ICBC without the need to download the global dataset, but just the required subset in space and time. This means that even pre-processing phase can be easily done on any e-infrastructure which provides outbound connectivity. The netCDF library is to be compiled with OpenDAP support to be able to use this approach The schema is therefore to just submit the correct input file where an an URL can be used as a path in the inpter and inpglob variables in the regcm.in file instead of the usual data path on the server itself. A command line web downloader such as curl is also to be installed on the system as pre-requisite, along with its development libraries to be used, to enable OpenDAP remote data access protocol capabilities of netCDF library.
We performed several experiments in order to estimate how feasible is to perform all the steps require to perform RegCM climate simulation on the grid hiding all the data management complexity within the netcdf library and the openDAP protocol. A detailed report is in preparation.

RegCM across Garuda and gLite by means of GridWay
On GARUDA grid infrastructure RegCM can be easily compiled by the user itself connecting to the MPI clusters she plans to use and once the executable is provided on the cluster as local software simulation can be submitted by means of Gridway metascheduler.
There was however at the moment still a big concern about data storage and movement because data management tools and services on GARUDA still have to provided. At the moment users can only move simulation data back and forth from the grid resources to the local resources by means of globus-url-copy command, a solution far to be acceptable for large simulations which produce terabyte of data.
For this reason an approach based on netcdf opendap protocols allows us to run easily the model without any concerns about the status the data management.
As said the actual submission will then be done using the Gridway scheduler able to reach all the MPI resources made available on the Garuda GRID. Being now Gridway perfectly able to manage both gLite and globus resources it is clear that RegCM simulations can be easily and transparently submitted on both Indian and European GRID resources. A unique job description file can be used and modifying just the name of the hostname where you want to run.
This working environment is presently made available by MILU, the interoperable user interface and the GRIDWAY package preconfigured as discussed in the above sections.

Overall achievements
The regional climate model version 4.1.1 was ported and tested on both Indian and European infrastructure and the feasibility and performance of executing the code on both HPC and grid infrastructure has been tested. It has been observed that the performance of the code on different platforms is comparable considering the CPU cores. The performance has been measured in terms of execution speed and the time taken to complete one execution.
Data management issues has been solved by means of the openDAP approach that proved to be efficient and feasible. This marks a considerable result that allow to exploit in a similar manner all the available computational resources without the need to move back and forth terabyte of data.
Finally Different simulations has been performed on the South Asia CORDEX domain to find out the best suited configuration of parameters and tune the model to be able to get the best possible results for the Indian sub-continent. The tuning is being done by performing various experiments using different set of parameters each simulation, for instance, using different convective precipitation schemes on land and ocean, modifying the values of the parameters in the regcm.in file and changing the landuse pattern. Scientific results of Indian Summer Monsoon using RegCM4.1 compared with those of Global Climate models such as GPCP and CRU are under study and will published soon.

Advanced seismic hazard assessment in India
Another remarkable examples of fruitful collaborations established within EU-IndiaGrid is the activity performed on the EU-IndiaGrid computing infrastructure by ICTP/SANDs group and their Indian partners (Institute of Seismological Research (ISR), in Gujarat and CSIR Centre for Mathematical Modelling and Computer Simulation (C-MMACS) in Bangalore) in the area of advanced seismic hazard assessment in the Indian region of Gujarat.
Seismic risk mitigation is a worldwide concern and the development of effective mitigation strategies requires sound seismic hazard assessment. The purpose of seismic hazard analysis is to provide a scientifically consistent estimate of seismic ground shaking for engineering design and other considerations. The performances of the classical probabilistic approach to seismic hazard assessment (PSHA), currently in use in several countries worldwide, turned out fatally inadequate when considering the earthquakes occurred worldwide during the last decade, including the recent destructive earthquakes in Haiti (2010), Chile (2010) and Japan (2011).
Therefore the need for an appropriate estimate of the seismic hazard, aimed not only at the seismic classification of the national territory, but also capable of properly accounting for the local amplifications of ground shaking (with respect to bedrock), as well as for the fault properties (e.g. directivity) and the near-fault effects, is a pressing concern for seismic engineers.
Current computational resources and physical knowledge of the seismic waves generation and propagation processes, along with the improving quantity and quality of geophysical data (spanning from seismological to satellite observations), allow nowadays for viable numerical and analytical alternatives to the use of probabilistic approaches. A set of scenarios of expected ground shaking due to a wide set of potential earthquakes can be defined by means of full waveforms modelling, based on the possibility to efficiently compute synthetic seismograms in complex laterally heterogeneous anelastic media. In this way a set of scenarios of ground motion can be defined, either at national and local scale, the latter considering the 2D and 3D heterogeneities of the medium travelled by the seismic waves.
The considered scenario-based approach to seismic hazard assessment, namely the NDSHA approach (neo-deterministic seismic hazard assessment), builds on rigorous theoretical basis and exploits the currently available computational resources that permit to compute realistic synthetic seismograms. The integrated NDSHA approach intends to provide a fully formalized operational tool for effective seismic hazard assessment, readily applicable to compute complete time series of expected ground motion (i.e. the synthetic seismograms) for seismic engineering analysis and other mitigation actions.

301
e-Infrastructures represent a critical mean to provide access to important computing resources and specialized software to worldwide seismological community. In fact, e-science removes some of the infrastructural barriers that prevent collaborative work at the international level. Accordingly, the proposed scientific and computational tools and networking will permit a widespread application of the advanced methodologies for seismic hazard assessment, particularly useful for urban planning and risk mitigation actions in developing countries, and, in turn, will allow for a faster development and verification of the models.
The use of the mentioned seismological methodologies can be optimized by the use of modern computational infrastructures, based on GRID computing paradigms. Advanced computational facilities, in fact, may enable scientists to compute a wide set of synthetic seismograms, dealing efficiently with variety and complexity of the potential earthquake sources, and the implementation of parametric studies to characterize the related uncertainties.
The application of the NDSHA approach to the territory of India already started in the framework of long-term bilateral cooperation projects Italy-India, involving ICTP/Sand group and CSIR C-MMACS (Bangalore). In that framework, a neo-deterministic hazard map have been produced for India, and specific studies have been performed to estimate the ground motion amplifications along selected profiles in the cities of Delhi and Kolkata. The collaboration has been recently extended to the ISR, Institute of Seismological Research (Ghandinagar, Gujarat).

Porting and optimization of codes
Preliminary studies have been devoted to expose seismologists and seismic engineers to modern e-infrastructures (which includes both HPC and Grid environments) so that the potential provided by this infrastructure for seismic hazard assessment research can be assessed and exploited. These activities aim to enable the computational seismology user community to the use of modern e-infrastructure and acquire the core innovations emerging in this framework, for example the development of an European and worldwide einfrastructure for advanced applications in seismic hazard assessment driven by European Union projects such as EU-IndiaGRID2. The major goals of this new collaboration are to: facilitate the development and application of a scientifically consistent approach to seismic hazard assessment; -disseminate, in scientific and in engineering practice, advanced reliable tools for seismic hazard estimates; -exploit, as much as possible, the advantages provided by computational resources and e-Infrastructures.
Activities carried out so far have been dedicated to a general introduction to the e-Infrastructures for Grid and HPC and to the preliminary assessment of their use in seismological research, with special emphasis on methods for advanced definition of ground shaking scenarios based on physical modelling of seismic waves generation and propagation processes. Researchers gained some practice on the use of the e-infrastructure for neo-deterministic seismic hazard assessment at different scales and level of detail, working actively on Italian data and testing the specialized seismological software running www.intechopen.com on an e-Infrastructure environment, leveraging on the work performed within the EU-IndiaGrid projects.
The use of the EU-India Grid infrastructure allows conducting massive parametric tests, to explore the influence not only of deterministic source parameters and structural models but also of random properties of the same source model, to enable realistic estimate of seismic hazard and their uncertainty. The random properties of the source are especially important in the simulation of the high frequency part of the seismic ground motion.
We have ported and tested seismological codes for national scale on the Grid infrastructure Flowchart of this code is illustrated in figure 5. The first step of the work, performed at a EU-IndiaGrid side-training event was the optimization of whole package by the identification of the critical programs and of the hot spots within programs. The critical point in the algorithm was the computation of synthetic seismograms. The optimization was performed in two ways: first by removing of repeated formatted disk I/O, second by sorting of seismograms by source depth, to avoid the repeated computation of quantities that are depth-dependent. Figure 6 show the remarkable improvement obtained in such work. The second step was porting the optimized hazard package on EU-IndiaGrid infrastructure. Two different types of parametric tests were developed: on the deterministic source parameters and on the random properties of the source model. The first experiment is performed by perturbing the properties of the seismic sources selected by the algorithm before the computation of synthetic seismograms. In the second test different sets of curves of source spectrum generated by a MonteCarlo simulation of the source model are used for scaling the seismograms. In both cases there are many independent runs to be executed, so a script for generation of the input and other scripts for checking the status of jobs, retrieving the results and relaunching aborted jobs were developed.

Preliminary results
Two preliminary tests over deterministic source parameter for a restricted area ("persut") and for whole Italy ("persut Italy"), and two different tests over random properties ("seed1Hz" and "seed10Hz") for the whole Italian territory, with different frequency content and different maximum distance for the computation of seismograms, were conducted. So the performance of the package over the grid in terms of computational time and number of successful jobs were tested, and submission of job and retrieval of its output were refined. The number of seismograms that must be computed determines the duration and the storage requirement of the run. This parameter seems critical for the success of the job. The test runs on the random component of the source gave an indication on the effective number of jobs that must be computed to have a good estimate of the distribution of the ground shaking peaks at each receiver. The first runs have provided a preliminary evaluation of the uncertainty of the hazard maps due to the random representation of the source and to the uncertainty on source parameter. Figure 3 shows an example of results of the test on the random component of the source model. The variability on the different random realizations of the source model (right) is shown in terms of ratio between standard deviation and average at each receiver.

Future perspective
The NDSHA methodology has been successfully applied to strategic buildings, lifelines and cultural heritage sites, and for the purpose of seismic micro zoning in several urban areas worldwide. Several international projects have been carried out and are still in progress based on the NDSHA methodology, including: the "MAR VASTO" project, with the participation of Italian (ENEA, Universities of Ferrara and Padua, ICTP) and Chilean (University Federico Santa Maria in Valparaiso, University of Chile in Santiago) partners; the UNESCO/IUGS/IGCP projects "Realistic Modelling of Seismic Input for Megacities and Large Urban Areas", "Seismic Hazard and Risk Assessment in North Africa" and "Seismic microzoning of Latin America cities"; the multilateral-oriented network project "Unified seismic hazard mapping for the territory of Romania, Bulgaria, Serbia and Republic of Macedonia", supported by the CEI (Central European Initiative). The very positive outcomes from seismological collaborative research call for an improvement of such interactions; this is attained by integration and formalization of the existing scientific and computing networks. The e-Infrastructures provide an innovative and unique approach to address this problem. They demonstrated to be an efficient way to share and access resources of different types, which can effectively enhance the capability to define realistic scenarios of seismic ground motion, i.e. to compute the reliable seismic input necessary for seismic risk mitigation. Such facilities, in fact, may enable scientists to compute a wide set of synthetic seismograms, dealing efficiently with variety and complexity of the potential earthquake sources, and the implementation of parametric studies to characterize the related uncertainties.
A Cooperation Project, aimed at the definition of seismic and tsunami hazard scenarios by means of indo-european e-infrastructures in the Gujarat region (India), has been recently funded by the Friuli Venezia Giulia Region. This two-years project, starting in November 2011, involves three Italian partners (DiGeo, University of Trieste; ICTP SAND Group; CNR/IOM uos Democritos) and two Indian partners (ISR, Gujarat; CSIR C-MMACS, Bangalore). The project aims to set up a system for the seismic characterization, integrated with the e-infrastructures distributed amongst India and Europe, to allow for the optimization of the computation of the ground shaking and tsunami scenarios. This goal will be attained thanks to the strict connection with the European project EU-IndiaGrid2, which provides the necessary infrastructure. Thus, the project will permit developing an integrated system, with high scientific and technological content, for the definition of scenarios of ground shaking, providing in the same time to the local community (local authorities and engineers) advanced information for seismic and tsunami risk mitigation in the study region. Advanced services for the use of the computational resources will be developed, integrating the seismological computer codes inside the grid infrastructure of the EU-IndiaGrid Project. Synthetic seismograms, and the related ground shaking maps and microzonation analyses (that define the seismic input) will be generated using the abovementioned advanced services.

Conclusions
e-Infrastructure progress in the last years in India impacted considerably in the evolution of grid computing with a significant benefit for scientific applications. Internal and international connectivity scaled up by two orders of magnitude, the National Grid Initiative moved from proof-of-concept to operational phase and essential services as a national grid certification authority, interoperability tools, MPI support, were established. This created the essential conditions for an effective use of these services by applications and the development of international cooperation in several strategic scientific areas.
Europe has a long-term, coordinated and shared e-Infrastructures R&D vision, mission, strategy, roadmap and funding, driven by the European Commission's Framework Programmes.
As clearly stated in the latest e-Infrastructures Roadmap released by the e-Infrastructures Reflection Group (e-IRG,2010), the fundamental contribution of research e-Infrastructure to European competitiveness is almost universally acknowledged. Sustainable and integrated networking, grid, data and high performance and commodity computing services are now essential tools for 40 million users in research and academia across Europe. The same document also remarks: Increasingly, new and diverse user communities are relying on e-Infrastructure services; as such, the common e-Infrastructure must cater to new and updated requirements. This junction between leading-edge research and the e-Infrastructure that supports it is an area where considerable socio-economic benefits can be realised.
This vision is also present in the Government of India policy driven by the Office of the Principal Scientific Adviser to Government of India. Thanks to this action a major e-Infrastructure programme, based on the Indian National Knowledge Network Project (NKN) has been launched and provides the high-speed network backbone to the Indian grid infrastructures as GARUDA NGI and the regional component of the LHC Computing Grid.
These developments are expected to impact in a substantial way on research applications in strategic and socially relevant domains as eHealth, climate change, study of of seismic hazard. For eHealth Europe and India are moving the first steps for the cooperation of e-Infrastructure-based neuroscience projects.
Climate simulation exploitation on Euro-Indian infrastructures is in perspective the driving application to integrate a mixed HPC/HTC paradigm across India and Europe. The variety and the complexity of the simulations that could be performed by means of regional climate can provide a dataset of unprecedented quality to assess the potential of global warming effects on the South Asia region and associated uncertainties, in particular as they relate to simulation aspects relevant to water, food and health security. Such an ambitious task clearly requires in the near future a seamless approach to both HPC and HTC infrastructures. Our experience and work described can be regarded as the starting point for such integration where the key role will be played by data management.
A similar line of reasoning could also be applied to the seismic hazard applications: in this moment EU-IndiaGrid computing infrastructure enables more detailed and complex computational investigations otherwise not possible. In perspective however the complexity of the simulation can easily grow when a 2D and 3D approaches will be used At this point, once again HPC resources will be needed and the seamless usage of them within the same Grid infrastructure will be necessary. Seismic hazard has some interesting perspective also for a future cloud computing exploitation with possible links to industry exploitation as well. The idea is to setup an advanced service where seismic hazard maps could be easily and seamlessly provided on demand could be of great appeal for industry players.
We remark as final note that the EU-IndiaGrid and EU-IndiaGrid2 projects played a crucial role in promoting fruitful international collaboration in grid computing involving Europe, India and Asia-Pacific countries. It is expected that driving key applications (like Climate simulations) established in the course of the project will enhance further such cooperation and extend it to HPC infrastructure as well.