Advances in computational infrastructure during the last decade have facilitated the development of biological data analysis for big data and computational biology as key research methodologies in both academia and industry. The use of computers in biology has enabled our better understanding of mechanistic aspects in health and disease and has accelerated the development of novel therapeutics. In this project, the Life Sciences Research Community is chosen because of its central role in achieving a higher quality of life in the SEEM region. The aim of the VRE is to create and provide the necessary services over a capable infrastructure to facilitate research for understanding of disease mechanisms in the SEE and EM populations. The overall goal is analysis of datasets by using integration of data within the VRE that could ultimately lead to regional characteristics that would assist the effort for developing personalized medicine in SEE and EM. Project participants and related institutes will assist in data collection and analysis, run and optimizing computational codes and using the research results to understand the molecular basis of diseases associated with SEE and EM areas with projections to develop personalized therapies. By bringing their own expertise into the consortium, the VRE partners will intersect their research interests in order to achieve the common goal in an interdisciplinary approach. Such an undertaking would require the use of computational resources for a) data production, b) data analyses, b) storage, d) visualization.
The Life Sciences Research community in the SEEM region is in need of a variety of infrastructures. Large amounts of data need to be stored and be made available to researchers for processing in the compute centres of the region. Therefore, apart from storage resources, fast and reliable networking infrastructure is important for moving large datasets from data archives to the computing centres and also moving simulation results to the researchers’ facilities for further post processing and acquisition of results. In terms of compute infrastructure, the models and services to be used by the research groups require capacity and capability computing as well as the provision of computing resources for the installation of user facing services. For example codes such as NAMD and NWCHEM scale up to hundreds or thousands of cores and can benefit from scalable HPC clusters or supercomputers such as the IBMs BlueGene. Molecular dynamics applications are also known to perform well on GPU systems, while also are being ported to new Intel’s Phi accelerator platform. On the other hand, parametric codes for human genome sequence analysis can benefit greatly from the Grid or Cloud IaaS computing model. Finally, user-facing services can be also installed in the IaaS infrastructure that will be available in the project. It is evident that the Life Sciences Scientific Community requires a variety of infrastructure resources all of which are going to be available in the VI-SEEM VRE.
The large amount of data that is publicly available through the current literature can be further utilized to address regions’ medical needs using computational infrastructures. The following table summarizes the potential global data sources for the life sciences community of the project.