02/05/2009 - 16:10
02/05/2009 - 17:30
Short Title: 
STA/BST 290: Karim Chine
Short Desc: 
Statistical computing in the cloud: towards a federative and collaborative platform

STA / BST 290

STATISTICS COLLOQUIUM

THURSDAY, February 5th, 2009 at 4.10pm, MSB 1147 (Colloquium Room)

Refreshments: 3.30pm, MSB 4110 (Statistics Lounge)

Speaker: Karim Chine (Cloud Era Ltd, Cambridge UK; visiting UC Berkeley)

Title: Statistical computing in the cloud: towards a federative and collaborative platform

Abstract: We propose to build on top of R - the highly popular statistical environment, an open platform for computing and data analysis. Using a rich workbench within the browser, the statistician can now work with an R server running at any location as if it was local to his machine. The platform hides the complexity of High Performance Computing or cloud computing Infrastructures and the computational Resource is abstracted with a simple URL. The R server can be running near the large files to be analyzed or within the database where the terabytes of data to mine are stored, the R packages can extend the computational capabilities of the server and the workbench's Plugins can improve the user-experience and the productivity of the statistician: Biocep provides the required tools to democratize Grid/Cloud Computing and to deal with the data deluge.

The new Platform makes distributed computing accessible to a larger number of Statisticians. Very easy-to-use functions enable the control from within an R session of several R servers running anywhere as additional workers or as a cluster to solve embarrassingly parallel problems. The new platform widens the scope of the computational research resources that can be easily shared. Besides the interoperable software components, the R packages, the Statistician can share functions and algorithms as Web Services or as nodes for workflow workbenches. An R server can also be shared: Statistician and collaborators can connect their workbenches to the same R and analyze shared data collaboratively via a set of broadcasted and high interaction views.

The seminar will give an overview of the new platform. Biocep's deployments on Amazon EC2 and on The British National Grid Service will be demonstrated.