SCIENTIFIC APPLICATIONS PERFORMANCE MODELING AND PREDICTION

Marlon W. Bright,  Javier Delgado,  E. Javier Figueroa,  S. Masoud Sadjadi*

Florida International University, School of Computing and Information Sciences, Miami, FL 33199

sadjadi@cs.fiu.edu


Abstract

To cope with the high demand for more accurate forecasts of weather, over the past decades Meteorologists have developed and continued to improve weather simulation software. Their latest attempt has resulted in the Weather Research and Forecasting (WRF) model; a complex software application with more than 160,000 lines of legacy Fortran 90 and C. WRF is based on its predecessor, the Mesoscale Model 5 (MM5), inheriting its architecture and older stub functions. Initially, as part of our effort to enable WRF in a grid environment to provide more accurate forecast in less time, we were obligated to conduct a detailed code inspection in order to discover its actual functionality and flow of execution. We continued the effort started by UCAR (University Corporation for Atmospheric Research, the originator and coordinator of WRF code) to develop a document that is aimed to give in-depth knowledge to those software engineers who need to understand the functionality of WRF. Moreover, once we gained sufficient understanding of this application we initiated the original task of adapting WRF to a Grid computing environment. Using a transparent approach to the code, we have been able to profile WRF processor-power versus number-of-nodes behavior, and as a result we developed a simplified mathematical model that predicts application execution time in a distributed environment. The model was implemented by the development of a set of monitoring and prediction tools, most specially Amon (A Monitoring Tool) and Aprof (A Profiling Tool). Amon is a process-monitoring program running on each compute node of a cluster and Aprof is a regression analysis-based program running on the head node of the same cluster or in an offline state. The latter is capable of making predictions by using output data from Amon. To validate our approach, experiments were conducted using WRF on two relatively small clusters at Florida International University and a large cluster, MareNostrum Super Computer at the Barcelona Super Computing Center. The data generated in these experiments reflects different cpu clock speeds throughout a cluster. The different cpu power utilization was achieved through the use of the cpulimit tool which effectively limits the percentage of processor speed allocated to an specified process, in our case, WRF. The data gathered served to compare against predictions made by Aprof for different combinations of numbers of nodes with different cpu percentage utilization. The results obtained let to the validation of our modeling approach concluding to be successful within clusters and scalable to bigger environments. Additionally, our proposed process of prediction shows efficiency of less than 10% error and it shows to be fast and architecture agnostic.

Download

[Abstract (DOC)]