TechnicalDoc

From XtremWebCH Wiki

Jump to: navigation, search

Contents

How does XWCH work?

This page present quickly how XWCH work. A more detailled document can be found from publications (XWCH at a glance).

XWCH components

XWCH consists of three main components: the coordinator, the worker, and the warehouse. There is a command line client, and an API for submitting applications.

Figure 1 : XWCH Architecture
Figure 1 : XWCH Architecture

The coordinator

The coordinator accepts execution requests coming from clients, assigns the tasks to the workers according to a scheduling policy and the availability of data, transfers binary codes to workers (if necessary), supervises task execution on workers, detects worker crash/disconnection and re-launches tasks on any other available worker.

The worker

The worker extracts the assigned task, starts computation and waits for the task to complete. The workers use a "pull" method only: they initiate connections with the coordinator in order to get jobs/input data (Work Request signal, see figure bellow), or to notify the coordinator about the status of the job (Work Alive signal in the figure bellow). Workers can, therefore, be run in a NAT environment.

The warehouse

Warehouses are used by workers to download input data needed to execute their allocated task and/or upload output data produced by the task. A warehouse node acts as a repository or file server. The following figure demonstrates the components:

Logical view

From the user’s point of view, there are three levels of abstraction in XWCH.

Application

An application is a set of jobs, which could be submitted to precedence constraints. An application is identified by an “application identifier”.

Module

A module is a set of binary codes having, in general, the same source code. Each binary code targets a specific (OS, CPU) platform. Once a module is created, it’s available for use. Developers can use it as a building block to develop their applications. A module can be "fed" with binary codes either through a client application or by the web interface.

Job

A job is the execution of a binary on a given worker

FAQ

What is the difference between an application, a module and a task (a job)?

Let's start with a module: you prepare a module for something that you want to do quite frequently. Typically, you will want to prepare executables for some platforms (Linux, Windows), create a module M using the Web GUI, zip your executables and add the ZIP files (once again using the Web GUI).

In order to use your module, you will need an application. An application can use many modules and have many steps, but let's have a simple example: your application uses just one module M just once. This means that you prepare an input ZIP file I, use the API to create an application and then add (create) a job. When you add a job, you declare that you will use module M and your input I.

A job is simply an activation of your module, using your input. It will start when you "add" it by the API, and you can observe how it runs by the API's GetJobStatus call.

What is the division of labour between client, coordinator, warehouse and worker?

In a simplified way, as follows:

  • the client asks "ping warehouses" from the server. If no warehouses reply, the client should quit.
  • the client calls "create application" from the server
  • the client creates or attaches a module in the application. If a module is created, binaryfiles related to the module are sent to warehouses.
  • the client calls AddJob with applicationID, moduleID, a ZIP file containing input files, and instructions of how to run the job and what to recover.
  • AddJob sends the input files to warehouses and the instructions to coordinator.
  • the coordinator assigns the job to a worker
  • the worker recovers the input files and executables
  • the worker runs the task and places the outputs in a warehouse
  • the coordinator probes the worker, gets to know that job was finishes, informs client
  • the client recovers output files from a warehouse.

For a scenario diagram of the communication, see: http://www.xtremwebch.net/clientcomm.svg


How does the scheduler in the coordinator select the worker(s) in which the jobs will run?

Currently, the potential workers are first selected based on the requirements stated by the client and the module (including: for which OS the executables have been provided). As of now (Nov 2010), the worker is selected from this set randomly. Other selection criteria will be implemented soon.

How does replication work?

Replication of output files is done by the workers, technically in the file ThreadJob.java.

The method of replication can be influenced by the extrafields' "replication" value.

How can I program..?

Please see the FAQ in the DeveloperGuide

Views
Personal tools