Home General Libraries GridRPC Documents Tests Contact Misc
General information

Project aims

The main aim of the project is to provide Polish scientific users easy, uniform and effective access to the mathematical libraries installed in educational computing centres.

Motivation

One of the motivations of our work is the need for computing power that is easily, transparently and uniquely accessible to end-users. We realise this need in our scientific environment.
Therefore, we aim providing the remote math. function execution service for computational applications in the heteregeneous, production Grid environment.

Relation to the existing solutions

There are some systems that are developed to make remote mathematical function calling possible. The most important are NetSolve and Ninf. In the confines of these projects the Grid RPC protocol has been developed. The protocol enables the end-users to call the mathematical functions residing on the remote machines in an easy and transparent way disregarding the actual physical mathematical library localisation. All the programmer has to do is to specify the name of the mathematical functionality and provide its input arguments. Click here to read more about the Grid RPC protocol and its reference implementations.

Our project is intended to:
- extend the functionality of Grid RPC systems
- improve some of their features
- expand the range of their possible applications

Therefore, we DO NOT claim to have another solution for a remote function calling problem. We base on the existing Grid RPC system, NetSolve. That is why we use the "powered by NetSolve/GridSolve" logo.



 
 

The area of R&D works:

1. execution time prediction & scheduling techniques
2. fault tolerance techniques
3. integration with SGIgrid cluster

Execution time prediction & scheduling techniques

We develop a new mechanism for predicting the execution time of mathematical library functions in Grid RPC systems.

The existing mechanism is relatively simple in our opinion. It uses the theoretical static performance models so it does not consider possible dynamical changes of characteristics of the computing resources or their practical performance characteristics.

We work on giving the prediction mechanisms the ability to:

  • dynamically discover the performance-related features of particular mathematical functions implementations,
  • dynamically discover the performance characteristic of the computing systems (machines and clusters).
  • The solution we work on bases on the collection and analysis of the historical information concerning the performance of previous function executions.

    The new scheduling mechanism we develop will exploit a new prediction mechanism while planning the execution of the mathematical functions requested by users.

    The details of the works on prediction and scheduling can be found in the PPAM'03 conference paper and presentation. Please visit the documents section.

    Fault tolerance techniques

    The current implementations of Grid RPC clients assumes that Grid RPC system is always accessible (i.e. the agent and the computational servers). In our opinion, such approach has the following drawback. If the Grid RPC system becomes unavailable (e.g. because of agent failure, network link down etc.), the execution of a client application that calls the remote mathematical function is stopped.
    We work on a mechanism that can extend the functionality of the Grid RPC system client. The mechanism will use the local copy of the library in case of the Grid RPC systems unavailability in order to allow the execution of user applications regardless of the GridRPC system's unavailability.

    Integration with SGIgrid cluster

    The SGIgrid cluster is the first environment where we plan to introduce the Grid RPC systems. A part of the machines in this environment is controlled by LSF queue systems - there are some local LSF clusters. However, all the computing resources are managed by the SGIgrid broker.
    These specific features require some modifications/ extensions to the Grid RPC mechanisms. The extensions will give the possibility of executing the functions of the mathematical libraries that are installed on computers controlled by the queue systems. The integration of the Grid RPC system agent with the SGIgrid broker is also needed in order to avoid double-scheduling and inconsistency in the view of the envrionment state.

    The works include the development of:

  • methods for running computations on the "indirectly accessible" resources, transporting input/output data to/from them, running the computational processes and controlling its execution,
  • techniques for monitoring and predicting the performance of computations on the "indirectly accessible" resources,
  • a method for integrating the Grid RPC system agent with the SGIgrid broker.
  • The details of integration are the subject of current work. For more information please refer to the documents section.