High-Performance Object Oriented Framework for Scientific Computing

Loading...
S

KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING NUMERIEKE ANALYSE EN TOEGEPASTE WISKUNDE Celestijnenlaan 200 A — B-3001 Leuven

A Component Environment for High-Performance Scientific Computing Design and Implementation

Jury : Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr. Prof. Dr.

ir. D. Vandermeulen, voorzitter ir. S. Vandewalle, promotor ir. H. Deconinck, promotor (VKI) S. Poedts ir. K. Meerbergen ir. Ph. Dutr´e T. Holvoet ir. H. P. Langtangen (Simula, Oslo)

U.D.C. 681.3∗D2

December 2008

Proefschrift voorgedragen tot het behalen van het doctoraat in de ingenieurswetenschappen door Tiago Quintino

Contact information: Tiago Quintino von Karman Institute for Fluid Dynamics 72 Chauss´ee de Waterloo 1640 Rhode-St-Gen`ese BELGIUM email: [email protected] webpage: http://www.vki.ac.be/~quintino This research work was financially supported by the Portuguese Funda¸c˜ao para a Ciˆencia e Tecnologia, with the doctoral scholarship SFRH/BD/9080/2002. This manuscript was typeset on LATEX with KOMA - Script. All the research work was done with the GNU/Linux operating system.

c

Katholieke Universiteit Leuven – Faculteit Ingenieurswetenschappen Arenbergkasteel, B-3001 Heverlee (Belgium) Alle rechten voorbehouden. Niets uit deze uitgave mag worden vermenigvuldigd en/of openbaar gemaakt worden door middel van druk, fotocopie, microfilm, elektronisch of op welke andere wijze ook zonder voorafgaande schriftelijke toestemming van de uitgever. All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the publisher. D/2008/7515/127 ISBN 978–94–6018–019–4

Para a Maria do Carmo, que n˜ao teve a oportunidade de frequentar uma escola.

Para o Lucas, que o futuro te abra muitas mais oportunidades.

This book had two authors, and they were both the same person. Terry Pratchett (1948-), The Carpet People

Summary This dissertation focuses on the design and implementation of a framework for high-performance scientific and engineering computing. The framework creates an environment of modular components. These components can be reused and assembled together to form dedicated simulation tools to solve specific problems governed by partial differential equations. This research is part of a long term investment by the von Karman Institute and many university partners, in the development of a platform for the solution of multi-physics systems using multiple numerical methods. The platform, named COOLFluiD, provides the environment where these different numerical techniques can coexist and work together. The present work is the design and development of the platform core, that allows such cooperation. The main aim is the maximisation of three software quality objectives: efficiency, flexibility and usability. These objectives are usually considered to be somewhat in conflict with each other. This thesis tries to show that they can be independently and simultaneously achieved within a single framework, provided certain techniques are employed. This research concentrates on the development of these techniques. Three sets of techniques are presented, and each set separates the software issues in a different direction. To derive these techniques the research followed certain keylines: first, it designed a framework architecture where components are reused in different applications; second, it focused on computational complex problems that demand high performance computing; and, finally, it defined a platform which is easily usable and developer friendly, where non-expert programmers can implement and research new numerical algorithms. An assessment and validation of the three software qualities is provided to support the thesis and multiple applications are shown to demonstrate the capabilities of the framework.

Contents 1 Overview 1.1 Motivation . . . . . . . . . . . . . . . . 1.2 Application Examples . . . . . . . . . . 1.3 Problem statement . . . . . . . . . . . . 1.4 The Importance of Software Engineering 1.5 Summary of Developed Techniques . . . 1.6 COOLFluiD Project . . . . . . . . . . . 1.7 How to read this dissertation . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

1 1 4 7 11 12 14 15

2 Background on Software Engineering 2.1 A Not So Abstract Introduction . . . . . . . . . . . 2.2 Object-Oriented Methodology . . . . . . . . . . . . 2.2.1 Object-Oriented Analysis and Design . . . 2.2.2 Object-Oriented Programming . . . . . . . 2.3 Generic Programming and Metaprogramming . . . 2.4 Patterns . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Design patterns . . . . . . . . . . . . . . . . 2.4.2 Architectural patterns . . . . . . . . . . . . 2.5 Framework Architecture . . . . . . . . . . . . . . . 2.6 Component-Based Software Engineering . . . . . . 2.7 Generative Programming and Domain Engineering 2.7.1 Generative Programming . . . . . . . . . . 2.7.2 Domain Engineering . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

17 17 20 20 23 28 31 32 32 34 35 37 37 39

3 Research Problem and Thesis definition 3.1 Problem definition . . . . . . . . . . 3.1.1 Initial Concepts . . . . . . . . 3.1.2 Problem description . . . . . 3.2 Research Goals . . . . . . . . . . . . 3.2.1 Primary goals . . . . . . . . . 3.2.2 Derived goals . . . . . . . . . 3.2.3 Solution space . . . . . . . . 3.2.4 Concrete Research Objectives

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

43 43 43 44 48 48 49 50 51

. . . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . .

. . . . . . . .

. . . . . . . .

i

Contents 3.3 3.4 3.5 3.6

. . . .

. . . .

52 53 55 56

4 Domain engineering contributions 4.1 Runtime extension of functionality . . . . . . . . . . . . . . 4.1.1 Problem definition . . . . . . . . . . . . . . . . . . . 4.1.2 Decoupling object construction from type definition 4.1.3 Applying to component architectures . . . . . . . . . 4.1.4 Mixing dynamic and static polymorphism . . . . . . 4.1.5 Summary of the technique . . . . . . . . . . . . . . . 4.2 Dynamic Configuration of Components . . . . . . . . . . . . 4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Problem definition . . . . . . . . . . . . . . . . . . . 4.2.3 Self-configuration of objects . . . . . . . . . . . . . . 4.2.4 Configuration hierarchy . . . . . . . . . . . . . . . . 4.2.5 Centralised option registration . . . . . . . . . . . . 4.2.6 Language for Component Configuration . . . . . . . 4.2.7 Summary of the technique . . . . . . . . . . . . . . . 4.3 Automatic runtime instantiation of static components . . . 4.3.1 Problem statement . . . . . . . . . . . . . . . . . . . 4.3.2 Automatic instantiation . . . . . . . . . . . . . . . . 4.3.3 Template class loader . . . . . . . . . . . . . . . . . 4.3.4 Source repository, Code builder and Compiler . . . . 4.3.5 Template strategy pattern . . . . . . . . . . . . . . . 4.3.6 Application . . . . . . . . . . . . . . . . . . . . . . . 4.3.7 Summary of the technique . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

59 60 60 64 67 72 74 77 77 77 82 88 90 92 96 97 98 104 106 106 111 113 114

5 Framework Design 5.1 Enabling Components . . . . . . . . . . 5.1.1 Problem Statement . . . . . . . . 5.1.2 Unsatisfactory solutions . . . . . 5.1.3 Combined solution . . . . . . . . 5.1.4 Summary of the technique . . . . 5.2 Managing Communication . . . . . . . . 5.2.1 Problem Statement . . . . . . . . 5.2.2 Partitioning the Domain . . . . . 5.2.3 Handling Local and Global Data 5.2.4 Storing Local and Global Data .

. . . . . . . . . .

115 115 115 118 122 138 139 140 143 147 149

ii

Thesis statement . . . . . . Related work . . . . . . . . Validation Method . . . . . Contributions of the Thesis

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . . . . . . . .

Contents . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

152 154 154 158 159 163 174 175 176

6 Component based architecture 6.1 Decoupling Numerics from Physics . . . . . . . . 6.1.1 Problem Statement . . . . . . . . . . . . . 6.1.2 Different Perspectives . . . . . . . . . . . 6.1.3 Extending Physical Models . . . . . . . . 6.1.4 Summary of technique . . . . . . . . . . . 6.2 Handling multi-physics . . . . . . . . . . . . . . . 6.2.1 Problem Statement . . . . . . . . . . . . . 6.2.2 Exporting and Importing Data . . . . . . 6.2.3 Data Sockets . . . . . . . . . . . . . . . . 6.2.4 Data Broker . . . . . . . . . . . . . . . . . 6.2.5 Multiplicity and the Context Architecture 6.2.6 Context Switching . . . . . . . . . . . . . 6.2.7 Connection algorithm . . . . . . . . . . . 6.2.8 Summary of the technique . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

177 177 178 179 187 187 189 189 193 194 200 201 204 206 209

7 COOLFluiD Project 7.1 Short Description . . . . . . 7.2 Architecture . . . . . . . . . 7.2.1 Conceptual View . . 7.2.2 Software Products . 7.3 Implementation: Bringing it 7.4 Available Components . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

211 211 213 213 214 217 219

8 Thesis validation 8.1 Case-Studies . . . . . . . . . . . . . . . . . . . . . 8.1.1 Using Multiple Numerical Methods . . . . . 8.1.2 Using Multiple Physical Models . . . . . . . 8.1.3 Using Other Components . . . . . . . . . . 8.1.4 Simulating Coupled Multi-Physics problems

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

223 223 223 224 229 232

5.3

5.2.5 Summary of the technique . . . . . Representing the domain of computation . 5.3.1 Problem Statement . . . . . . . . . 5.3.2 Proposed Solution . . . . . . . . . 5.3.3 Representing the Topology . . . . 5.3.4 Representing the Geometry . . . . 5.3.5 Representing the Analytical Model 5.3.6 Representing the Numerical Data . 5.3.7 Summary of the technique . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . all together . . . . . . .

. . . . . .

. . . . . . . . .

. . . . . .

. . . . . . . . .

. . . . . .

. . . . . . . . .

. . . . . .

. . . . . .

iii

Contents

8.2

8.1.5 Solving Industrial Problems on Complex 8.1.6 Researching with the platform . . . . . Analysis of Software Products . . . . . . . . . . 8.2.1 Quantitative assessments . . . . . . . . 8.2.2 Qualitative assessments . . . . . . . . .

9 Conclusions 9.1 Synthesis . . . . . . . . . . . . . . . . . . . 9.1.1 Summary of Problems and Proposed 9.1.2 Remaining Challenges . . . . . . . . 9.2 Thesis Evaluation . . . . . . . . . . . . . . . 9.3 Key Lessons . . . . . . . . . . . . . . . . . . 9.4 Future research . . . . . . . . . . . . . . . .

Geometries . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . Solutions . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . .

235 236 238 238 240

. . . . . .

243 243 243 245 246 248 249

Bibliography

253

Acknowledgements

269

iv

They both savoured the strange warm glow of being much more ignorant than ordinary people, who were only ignorant of ordinary things.

Chapter 1

Terry Pratchett (1948-), Discworld scientists in Equal Rites

Overview This chapter gives a general overview of the dissertation. In section 1.1 we give the motivation behind the research work, followed by some examples of applications in section 1.2. Next, in section 1.3, we identify the domain of research and the problems to be addressed. To solve them, section 1.4 explains why software engineering should be used. Section 1.5 presents a summary of the techniques developed in detail throughout the text. In section 1.6, we describe the software project to which the research work is connected. Section 1.7 explains the contents of each chapter, helping the reader to choose how to read this dissertation.

1.1 Motivation Why scientific computing Scientific computing is nowadays one of the most important tools for research and development in any field of engineering. It is so pervasive, that any improvement in the prediction of how a linear or non-linear complex system will behave might lead to improvements on construction processes, enhanced efficiency, or lead to new revolutionary breakthroughs. It is also becoming common for numerical simulations of complicated physical systems to precede and help in the design of expensive experiments by predicting unexpected behaviours of those systems. Design process Numerical simulations will never fully substitute experimental results, because these are always needed as final proof of the system at work. It is clear however that the purpose of experimentation is moving towards validation, leaving numerical simulations to drive the design process. In figure 1.1, a schematic of the design process is shown where the initial parameters of the design help define a geometrical model, usually developed within a Computer Aided Design (CAD) modeling tool. The CAD tool produces a topology definition that is modelled under some assumptions and fed into a discretisation procedure within a Computer Aided Engineering

1

Chapter 1 - Overview (CAE) tool. Then, a simulation is performed and the results are analysed in light of the project objectives. Back in the CAD tool, an evaluation phase of these objectives is performed followed by a subsequent transformation into new input parameters for the design phase, thus re-initiating the loop. This whole process may be automatically driven by an optimisation framework or performed interactively by an engineer. Thus, the numerical simulation plays an important role in the design loop and the quality of the final design depends on how well these numerical tools can predict the real system behaviour. In recent years, these predictions have grown in accuracy, increasing the confidence of industry in their use. initial parameters

Design

topology

Model

feedback parameters

Evaluate CAD Processes

final product

discretization

objective analysis

Simulate CAE Processes

Figure 1.1: Feedback loop for designing systems based on scientific and engineering numerical simulations.

Advantages There are many advantages to integrating numerical simulations in the design process: first the feedback loop is shorter in comparison to changing design parameters in the setup of an experiment, where usually scale models have to be rebuilt and large teams of people have to be involved. A shorter feedback loop will promote a smaller time-to-market of the final product. Secondly, the price of running software and buying a large High Performance Computing (HPC) facility is considerably smaller compared to the costs of operating large experimental facilities. Finally, a flexible numerical simulation tool may allow radical changes in the design process, which are harder to implement in an experimental facility. The same numerical tool

2

Motivation (1.1) and computation hardware may also be reused for other design projects of similar nature with minimal change. Solving PDEs Within the realm of scientific and engineering computing, the solution of field equations described by partial derivatives is extremely appealing because an extraordinary number of systems can be modelled with such equations. Systems ranging from aerodynamics, electromagnetism, space weather phenomena, sound propagation to heat transfer can all be modelled with Partial Differential Equations (PDE). These equations are generally difficult or impossible to solve analytically so numerical algorithms are employed to obtain practical solutions. Multiple physics A certain set of PDE’s together with closure constitutive relations and appropriate boundary conditions, forms a Physical Model of a specific system. Sometimes the engineering problem at hand is complex and involves multiple interacting phenomena, each modelled with a different Physical Model. Therefore we think of the simulation of one system composed of multiple sub-systems. Interest in these types of problems, called Multi-Physics simulations, has been increasing as growing available computing power renders them more feasible to tackle. Nevertheless, they remain hard to implement, and require special numerical algorithms to model their interactions. Multiple methods These equations are discretised by a numerical method suited for their inherent mathematical nature. Since each Physical Model might have a different mathematical character, an optimal Multi-Physics simulation often implies a discretisation by multiple methods. This increases the complexity of the implementation of the software tool. Moreover the coupling procedure between the multiple sub-systems can itself be modelled in various ways, therefore adding another dimension to the problem. Unstructured meshes The numerical discretisation is done across the spatial domain of interest, usually by subdividing the domain into control volumes, often called elements or cells. The set of the control volumes defines a mesh covering the spatial domain. When systems modelled by these types of equations have complex geometries, one very common practice is to use unstructured meshes. These bear the advantage that the process of subdividing the domain, called meshing, can be almost fully automated even for complex topologies, requiring few interactions with the user.

3

Chapter 1 - Overview Parallelisation Simulating these complex processes often requires large computational resources not present in a single computer. Some sort of partitioning of the problem must be performed. Such partitioning can follow several approaches [89] some more adequate to some types of problems. The most common for this class of PDE problems is based on data partitioning. Large scale problems are broken down into smaller chunks of synchronised data, processed on parallel distributed computers (clusters). Efficiency The type of problems solved usually requires long running iterative procedures that may take days or weeks to produce a single solution. Therefore numerical efficiency is of paramount importance. This efficiency is measured not only by the processing time but also by the physical resources required (computer memory and bandwith for communication and input/output). These efficiency concepts are often difficult to balance, and they depend heavily on the other issues previously mentioned. Focus of the dissertation The focus of the present work is on the efficient solution of multi-physics problems, discretised by multiple methods on unstructured meshes of complex geometries, and solved by parallel distributed computers.

1.2 Application Examples We consider now two concrete examples of application of such scientific simulations. The examples involve, in one way or another, all the issues listed in the previous section. Although they are from different engineering fields, and of disparate nature, they bear in common the complexity of the modelling procedure. Fluid-structure interaction Fluids and structures interact when a fluid exerts pressure upon a solid structure, causing deformation. By changing its form, the solid affects back the flow field geometry, closing in a tightly coupled system. Such systems come into play in many engineering applications, ranging from construction of buildings, to airplane design or simulation of bio-medical flows. We take the example of the flow over a wing, as seen in figure 1.2. This system simulates the vibrations produced on the wing by the air flow. Such

4

Application Examples (1.2)

ll Wa

g W in

Structure Flow

ic a s ts oel Ae r ra tio n Vib

Figure 1.2: Example of a problem of of fluid-structure interaction for study of aeroelastic vibrations on an aircraft wing.

application could be for instance the study of flutter for an aircraft wing design. The fluid flow can be modelled by the compressible Navier-Stokes equations, possibly completed by a turbulence model for high Reynolds flows. If viscous effects are neglected these equations are of hyperbolic nature and their solutions may feature discontinuities, requiring special numerical treatments. The wing itself is modelled by equations of solid linear or non-linear elasticity in three dimensions. These equations exhibit a elliptic mathematical nature, different from the flow equations, therefore necessitating another discretisation method. The discretisation of the wing subsystem is itself much different, as it is composed of spars, frames, stringers and skin elements, each with different computational representations. Since the geometry of the flow field changes, the mesh over the discretised domain must adapt to this change by either moving its discretised control volumes, or rebuilding the discretisation when confronted with large

5

Chapter 1 - Overview deformations. The methodologies to perform such tasks vary greatly with the application and with the topology of the system so simulate. Therefore, this wing-air flow interaction, must be simulated not by two but by three subsystems, fluid, structure and domain mesh. The system just described presents the following challenges: • multiple methods for discretisation of different physical models • multiple physics need flexible coupling algorithms • unstructured mesh adapts to changes in geometry by the structure • parallel algorithms need to tackle long running unsteady simulation Electrochemical process Electrochemical processes are an important part of many industrial production techniques such as electroplating, corrosion, electrochemical machining and treatment of industrial waste products [93]. An electrochemical system is composed of an ionic solution surrounded by electrodes and insulators. An outer electric circuit is closed by the flow of ions in the electrolyte, causing transfer of charge from the solution to the electrodes.

Anodes

Electrolyte Flow

Cathodes

+

Figure 1.3: Example of a problem of electrochemical process.

In figure 1.3 we have a parallel flow electrochemical reactor with a series of anodes on the top wall and a series of cathodes on the bottom wall. The flow

6

Problem statement (1.3) subsystem is simulated by a fluid flow analysis based on the incompressible Navier-Stokes equations. These equations are coupled with the multi-ion transport equations, which possess not only convective and diffusion terms, but also terms for the migration of ions and the chemical reactions between species. The discretisation of the multi-ion subsystem, involves a finer spacial discretisation than the flow subsystem because the mass transport boundary layer (BL) is about one order of magnitude smaller than the hydrodynamic BL, and the chemical reaction BL can become several orders of magnitude smaller than the hydrodynamic BL. Present in this system are many complex modelling issues that directly project complexity issues into software landscape, from multiple discretised domains, to coupling strategies between different modelling equations and different time scales involved. For more realistic results, we can compute the potential within the anodes and cathodes, as separate domains, solving a Laplace problem for the potential. Special boundary conditions are added in the interface between the electrode domains and the fluid domain. We can also compute the heating effects by solving the heat transfer equations on those anodes and cathodes together with a new heat equation in the fluid. The set of equations of the flow may be discretised by a Finite Volume Method (FVM) while the equations for the potential may require a different discretisation like the Finite Element Method (FEM). This introduces another level of software complexity, which we call multiple methods. In this example we have used the description of a straight channel flow, but the reader should take into consideration that real electrochemical systems are of a more complex geometrical nature, such as the one presented in figure 1.4. This complex system presents the following software challenges: • multiple methods for discretisation of different physical models • multiple physics need multiple coupling algorithms • multiple domains for different space scales • efficient solution of strongly coupled systems • complex geometries for real industrial applications

1.3 Problem statement Domains of addressed problems In the previous sections we defined the focus of this dissertation and its applicability. We described the goals that

7

Chapter 1 - Overview motivate the work, which raise several software issues that add up to the complexity of the software tool. We restate the goals here by conveniently rearranging them into four problem domains and by summarising the issues they raise: Multi-Physics The domain of multi-physics contains the physical models that describe the behaviour of the systems as well as the interactions between those physical models, when present together in the same simulation. Such simultaneous presence increases software complexity because each phenomenon may require different discretisation. Numerical methods This domain includes the multiplicity of numerical methods, their many variations and specific techniques. This domain adds to software complexity mainly due to the many degrees of variance in the functionality and implementation of the algorithms, specially when they need to coexist within the same application. This is related to different numerical methods requiring different data structures and functionality from the underlying infrastructure. Geometry representation This domain is concerned with the representation of the system within the software environment, in order to abstract the geometry of the problem from the numerical tools that will simulate the system. Because geometries are complex, this representation is not straightforward, certainly if a link with the CAD representation is required by the numerical methods. Software infrastructure The domain of software infrastructure wraps the issues related with the implementation of concepts that in themselves do not contribute directly to the simulation but serve as support to the numerical methods. These include complexity of parallelisation due to the concurrent nature of the numerical processes involved and efficient implementation of the numerical algorithms. Software complexity The nature of the software issues raised within each of these domains will be addressed and detailed along the dissertation, mainly in chapter 3. For now, it is sufficient to note that whereas each of the domains poses software problems in itself, their interaction within a single application not only has a cumulative but often a multiplicative effect on software complexity.

8

Problem statement (1.3)

Figure 1.4: Example of unstructured mesh applied to a complex geometry problem of electrochemical plating.

Idealised environment The ideal situation would be to isolate and address independently those issues. Each isolated solution would then be reused automatically and would not affect the solution of the other domains. As an example, once multi-physics are introduced in the numerical tool, reusing the physical models together with multiple methods should be automatic. Such a cooperative environment must focus mainly on code reuse and on keeping software concerns isolated.

Crude reality The reality is that in most cases, scientists and engineers developing numerical tools, are not preoccupied with these software issues, and the resulting code is often monolithic and not cooperative. But the point is, these scientists should not be aware of these issues. Indeed, ideally the numerical developers should be able to contribute their work without stumbling into the above mentioned complexity issues.

9

Chapter 1 - Overview Importance of software processes For that purpose the choice of software process plays an important part. Numerical tools with such complex interacting domains are hard to develop and even harder to maintain. If no attention is dedicated to the problem of software complexity, the quality of the software degrades and quickly falls out of hand. This implies an ever bigger investment in developing time for smaller changes, which in the end causes spiralling budgets and unfinished projects [21]. In one word, ignoring the underlying software issues in the long run makes software more expensive. Problem definition The problem addressed by this dissertation can be stated as: How to develop numerical tools for scientific computing that involve multi-physics, multiple methods, complex geometry representations and intricate software infrastructures, without introducing undesirable software complexity and exposing these issues to the developer of numerical algorithms?

Multi-Physics Fluid Dynamics Plasma Phenomena Structural Analysis Heat Transfer ...

Numerical Methods Multiple Discretizations High-Order Approximations Error Estimation Acceleration Techniques ...

Thesis Domain

Geometry Representation Unstructured Grids Mesh Adaptation CAD Topology Parameters Arbitrary Elements ...

Software Infrastructure Reusable Components Efficient Algorithms Parallel Communication Dynamic Load Balancing ...

Figure 1.5: Domain of the problems addressed by the thesis.

Thesis domain of application The aim of this dissertation is to identify and develop the software techniques that address the problem stated above. These techniques apply to the interdisciplinary domain that results from the

10

The Importance of Software Engineering (1.4) interaction of Numerical Methods, Multi-Physics, Geometry Representation and Software Infrastructure, as seen in figure 1.5.

1.4 The Importance of Software Engineering New techniques Over the years, in the field of Software Engineering (SE), many new techniques and methodologies have evolved to address different types of software problems. These techniques allowed to build complex software systems that today are pervasive in our daily lives. From banking systems to relational databases, from desktop environments to web services, all these systems employ well know software development techniques, like object-orientation and component architectures, that reduce the complexity of the system. Old practices We would expect that in the field of HPC, where raw number crunching computation is the standard practice, the revolutions would have been equally surprising, however it has been recently recognised [36] that the dramatic increases in processor speed and memory access times have not necessarily correlated with high-level increases in the productivity of HPC applications. While the machines are getting faster, the developer effort necessary to fully exploit these advances is becoming prohibitive. Main assumption The work developed in this thesis assumes that one of the reasons HPC has not seen these fast evolutions in productivity is due to the slow adoption of modern software techniques that promoted the revolutions in the other application fields. We suppose that this adoption was slow because the typical developer of HPC applications is not familiar with the techniques. This dissertation tries to bridge this gap by adapting these techniques to the requirements of HPC. Comparing examples As an example, nowadays a user can easily join an instant messaging service and communicate with other people over the internet regardless of different existing platforms. The user may exchange files, text, voice and video. Whenever a new software version is available, it pops up for immediate upgrade. When a new protocol is developed it can be introduced in the form of a plug-in, which can be downloaded on the fly and integrated into the running application. We can compare this user-friendly operation with the troublesome HPC simulations on clusters, where platform dependent tools dominate, which

11

Chapter 1 - Overview are developed into monolithic pieces of code which have little flexibility and modularity. Whenever a new algorithm is updated or improved, the time to implement and deliver it to the computing machine, may be of the same order of the time to compute the actual solution. The interactivity with the design engineer and the short feedback loop one seeks are far from achieved when it comes to HPC systems for scientific computing. Using software engineering techniques To tackle the software issues mentioned in section 1.3, we shall make use of several techniques from the discipline of software engineering. With these techniques, we expect to increase not only productivity, but the applicability of the developed software, specifically with multi-physics simulations in mind. Adapting techniques The development of HPC software systems differs significantly from the systems typically developed by the software engineering community. The latter systems often have different objectives in mind, like safety, security or scale. Therefore these techniques need to be adapted when importing them to the field of HPC, to keep objectives like efficiency and simplicity under consideration.

1.5 Summary of Developed Techniques Car Industry Analogy To present the developed techniques in “layman’s” terms we will use an analogy with the car industry. Let us imagine a very demanding client enters a car dealer to order a new car. This car will be customised to his requirements. The client will choose the car segment and an assortment of available options for that segment. These options are for example the chassis (3 or 5 doors), the engine, the gearbox, the colour, the air conditioning, etc. This configuration is sent to the factory where the car will be assembled in an industrial product line. Framework Design Some techniques developed in this thesis and described in chapter 5 focus on Framework Design for the scientific computing domain. Framework design is analogous to construction of car chassis, which functions as the skeleton of the car. One chassis can support different engines or gearboxes. We can build a different chassis for each segment (sports or family car), but using the same technology like composite materials. Each chassis may be different but the technology and purpose is common: give structural integrity to the car.

12

Summary of Developed Techniques (1.5) In this work we develop the technology to build a framework that supports multiple interacting numerical methods and physical models. We designed a structure that allows numerical methods to share similar data structures and communication channels. We also developed a consistent way to attach new functionality to this structure. Component Based Architecture Software components are analogous to car components like engine, gearbox and suspension. These components are designed to fit together, like the engine connects to the gearbox through the transmission shaft. Nevertheless, these components are interchangeable. The client can choose a more powerful engine, or diesel instead of gasoline. The client has a selection of gearboxes, with 5 or 6 gears. Even the colour of the car can be considered as a component with many available choices. We created a component architecture made of reusable numerical methods. We developed technology that allows components to connect and interact with each other. These components are different numerical methods and different physical models that compose the final simulation tool. The system allows the end-user to pick numerical methods and plug them with physical models. For example, using FVM with fluid flow equations to simulate cooling of a metal part modelled with a FEM thermoelastic stress analysis. The result is that the end-user has an assortment of numerical methods and physical models to choose from, to customise the simulation tool to his requirements. These techniques are described in chapter 6. Domain Engineering and C++ Meta-Programming The car components, like the engine, are built from smaller mechanical parts. These parts are constructed according to standards that make them interchangeable. You may buy an oil filter from your car brand or use a generic one. We developed C++ meta-programming techniques to build interchangeable software parts. These software parts are then used to construct the numerical components. The meta-programming techniques specify criteria for parts to match and substitute eachother. To automate the assembly of the parts, much like in a car assembly line, we use Domain Engineering techniques. Domain engineering uses the building experience of a particular domain to automate the reuse of components and interchangeable parts. In the car example, it transforms the client car configuration into the assembly of the car. In our case, it takes the end-user requirements and assembles the final simulation tool by matching numerical methods and physical models. These techniques also allow the end-user to

13

Chapter 1 - Overview extend the simulation tool with his own user defined functionality. So, while C++ meta-programming techniques provide the way to build, Domain Engineering provides the way to automate the assembly. Both work together to create customised simulation tools. These techniques are described in chapter 4.

1.6 COOLFluiD Project All the developed techniques have been researched and implemented within a software development project called Computational Object Oriented Libraries for Fluid Dynamics (COOLFluiD). Objectives The project started in January 2002 at the von Karman Institute for Fluid Dynamics (VKI) in collaboration with the Katholieke Universiteit Leuven (KULeuven). The idea was to introduce novel techniques of software engineering in the design and implementation of a new generation of applications for Computational Fluid Dynamics (CFD). The aim was to increase productivity and collaboration between different fields of scientific computing. Initially the project objective was to develop a general purpose solver that would support multiple methods of discretisation and multiple physics. Latter the objectives were refocused to build a cooperative environment where those solvers could interact to handle complex multi-physics simulations. Evolution The first years of the project, 2002 and 2003, were dedicated to the analysis, design and prototype implementations of the first simulation solver [31]. Initially, the Object-Oriented Analysis and Design (OOA/D) process followed a modified waterfall model as described in [80] but after a year it was switched to an adapted Agile Process[7] to promote quicker, even if incomplete, deployment of functionalities. Creating a Solver Application Starting from October 2003 and during 2004 the first solver application was developed. The first techniques that were used were object-orientation, design patterns and C++ template metaprogramming for the combined flexibility and efficiency that we estimated they could deliver to the final application.

14

How to read this dissertation (1.7) From an Application to Framework Driven by the need to discretise different PDEs and to use diverse numerical methods, we realised that the initial techniques were insufficient to express that degree of flexibility. Around 2004-2005, the research was expanded to use techniques as architectural patterns and framework design. Applying these techniques, we created a platform that provided reusable structure and functionality. The outcome of this transition was a Framework capable of accommodating different numerical methods and physical models. From Framework to Component Environment Pressed by the need to handle multi-physics simulations, from 2005 to 2006, we introduced component oriented techniques to transform the numerical methods into reusable components. The existence of well defined interfaces in the framework made it possible to restructure the existing code into components. These components could then be loaded on demand to satisfy the user requirements. The newly created Component Environment allowed the simultaneous existence and cooperation of components. From Components to Customised Applications Along the way, some C++ metaprogramming techniques were developed by necessity. From 2006 to 2007, it became evident that these techniques were inter-connected and that together they form the basis of a domain engineering environment for scientific simulations. Applying these techniques, allows the environment to generate customised simulation tools for each user specific application.

1.7 How to read this dissertation While chapters 2 and 3 provide the basis of the work, chapters 4, 5 and 6 form the core of the dissertation and present the techniques developed within this research. The chapters 7, 8 and 9 close up by showing case studies and explaining the interactions between all the described techniques. Chapter 2 reviews important concepts of software engineering which are relevant to the domain of scientific computing. These include framework technology, component based architectures, object-oriented techniques and meta-programming.

15

Chapter 1 - Overview Chapter 3 states the researched problem in detail and presents the proposed solutions along three distinct axes of development. This chapter is key to understand why this research was conducted. Chapter 4 presents the first set of contributions, that can be applied to different domains of interest, not only to the domain of scientific computing. These are seen as contributions to methodologies of domain engineering using the C++ language. Chapter 5 presents the second set of contributions which are specific to the domain of scientific computing and are general techniques of framework development that promote reuse of structure. Chapter 6 presents the third set of contributions that are also specific to the domain of scientific computing. These improvements enable the creation of a component based architecture which promotes reuse of implemented functionality in the creation of different computing applications. Chapter 7 provides an explanation of how the three sets of contributions work together to create a flexible, yet highly efficient and user friendly family of applications for computational science and engineering. This family of applications takes the form of the COOLFluiD project. Chapter 8 of the dissertation deals with the proof of the thesis, by providing many applications and case-studies to validate the assertions presented in the previous chapters. Chapter 9 presents an analysis with conclusions, successes and failures of the research work. It also provides an estimation of the future directions of the research work.

16

Each pattern describes a problem which occurs over and over again [. . . ], and then describes the core of the solution [. . . ] in such a way that you can use this solution a million times over, without ever doing it the same way twice.

Chapter 2

Christopher Alexander (1936-), A Pattern Language: Towns, Buildings, Construction

Background on Software Engineering This chapter introduces some basic concepts of software engineering for the reader oriented to numerical methods by means of some simple examples given in that domain. These examples do not represent the software entities developed within the COOLFluiD project.

2.1 A Not So Abstract Introduction Software Engineering involves many interconnected abstract concepts for the construction of software. Before presenting them in their more generic form, we shall introduce them with a more familiar analogy: LEGOTM constructions. Classes and Objects Much like the basic construction pieces of LEGOTM are the classic plastic bricks, the smallest building blocks of modern software engineering are objects. They are put together and interact with each other via interfaces. Interfaces are like the little cylinders on top of the LEGOTM bricks: they define the way blocks fit together. All the objects of the same type form a class. The class is a mold or blueprint from which many objects of the same type can be created. Each object has its own specific attributes, like colour or material, but they are all made from the same mold. For example, in figure 2.1, there is a class definition of a 2x4 brick, which is used to construct many bricks of different colours. Design patterns When mounting LEGOTM , some assemblies are repetitive. Put some bricks on top of each other in staggered positions and you get a wall. Put some transparent bricks and you get a window. Skip a whole section, and you make the door. These assemblies repeat, but they show a pattern: the design is always the same. In software engineering, these are called design patterns. They are assemblies of objects that show a common design. The objects (bricks) each time are different but the design idea is always the same.

17

Chapter 2 - Background on Software Engineering

Class 2x4 Brick 20 40

mm

mm 60

mm

Objects 2x4 Bricks

Figure 2.1: Class of Classic LEGOTM 2x4 bricks. A single mold (class) can produce multiple bricks (objects) of different colours.

Frameworks LEGOTM is bought in box sets. Although there are some generic, usually each set has a certain thematic: bricks to assemble trucks, houses, airplanes, etc. Inside you will find a manual that helps you to assemble multiple trucks form the same bricks, using some typical assemblies. These assemblies are the design patterns: four rubber wheels, a base and some bricks and you have the bottom of the car. But the box does not limit your options because you can imagine new ideas on how to use those pieces and patterns. Likewise, software frameworks provide ready made designs to be reused for your specific applications. They provide structured solutions to ease your work, but without limiting your options. Components Components are ready made constructions that match together to create a more complex construction. For instance a LEGOTM city is made of many cars, houses and LEGOTM people. Some of these are built from smaller bricks (houses, cars) while others are parts by themselves (people). Software components are used to building more complex software applications, typically to satisfy a specific requirement. The main idea behind components is reuse. Once a component is constructed, it may be reused in different settings, therefore saving work and increasing productivity. Components are like objects but at a much coarser level. Relations between concepts The relations between these concepts are presented in figure 2.2. Classes are used as building blocks, that compose pat-

18

A Not So Abstract Introduction (2.1)

Customized Applications

Components

Frameworks ClassA

ClassB

- attribute1 - attribute2

ClassA

ClassA

- attribute1 - attribute2

+ method1() + method2()

ClassB

- attribute1 - attribute2

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

+ method1() + method2()

- attribute1 - attribute2

+ method1() + method2()

ClassC ClassA

+ method1() + method2()

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

ClassC

+ method1() + method2()

- attribute1 - attribute2

ClassC

+ method1() + method2()

ClassA

- attribute1 - attribute2 + method1() + method2()

ClassB

- attribute1 - attribute2

+ method1() + method2()

ClassA

ClassB

+ method1() + method2()

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

+ method1() + method2()

- attribute1 - attribute2

+ method1() + method2()

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

ClassC ClassA

ClassB

- attribute1 - attribute2

ClassC ClassA - attribute1 - attribute2

ClassB - attribute1 - attribute2

+ method1() + method2()

- attribute1 - attribute2

+ method1() + method2()

ClassA - attribute1 - attribute2

ClassC - attribute1 - attribute2

ClassC

+ method1() + method2()

- attribute1 - attribute2

ClassC

+ method1() + method2()

- attribute1 - attribute2 + method1() + method2()

ClassA

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

ClassA

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

ClassC - attribute1 - attribute2 + method1() + method2()

Patterns ClassA

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

ClassA ClassB

- attribute1 - attribute2

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

+ method1() + method2()

ClassC ClassA

ClassA

ClassB

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

ClassB - attribute1 - attribute2 + method1() + method2()

ClassC ClassC

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

Classes ClassA

ClassB

ClassC

- attribute1 - attribute2

- attribute1 - attribute2

- attribute1 - attribute2

+ method1() + method2()

+ method1() + method2()

+ method1() + method2()

Figure 2.2: Relation between software concepts.

19

Chapter 2 - Background on Software Engineering terns of design and architecture. Frameworks are ways to reuse patterns to provide structure to the applications. Components build on top of frameworks to implement functionality that is reused to assemble very specific, customised applications.

2.2 Object-Oriented Methodology Object-oriented methodology is a software engineering approach that models a system as a group of objects and their interactions. An object contains encapsulated private (hidden) data and presents a public behaviour (functions). The only way a client (another object, a user) has to interact with the object is via its interface. An interface is a specification of the functions that the object implements. The advantage of this rule is that replacing the implementation of the object by another one that meets the same interface specification does not cause changes in the client. This is called the Liskov substitution principle [82]. Object Interface

Client

Public Functions

Private Data

Figure 2.3: Client interacts with the object via the public functions that make the interface.

2.2.1 Object-Oriented Analysis and Design When applying the object-orientation modelling techniques to the design of a system, the methodology is called OOA/D [80, 90]. This method analyses the parts of the system as black boxes and designs the interactions between them. The concepts presented in this chapter and in figure 2.3 are all involved in this method of analysis and design. While the analysis focuses on what the system does, the design focuses on how the system does it. Analysis The analysis is the first phase of the construction of software. First, we look at the problem the software should solve, so-called problem

20

Object-Oriented Methodology (2.2) domain. Because real-life problems are complex and present non-necessary details, the analysis goes through an abstraction process to achieve a simplified model of the problem, as depicted in figure 2.4. In the same figure, we can follow the example of the abstraction of the PDE that describes a certain physics, complemented by physical properties. We realise that although mathematically these equations are self-sufficient, in a finite domain they require closure conditions on the boundary. Therefore, in the software, we choose to abstract them with the Physical Model concept, together with accompanying Boundary Conditions. What the analysis phase does is to assign to these abstract concepts certain responsibilities. In this example, they could be: Physical Model provides the terms of the equation and coefficients for those terms. Describes also the variables in which the equations are solved. Boundary Conditions provides closure of the domain discretisation.

Problem

Partial Differential Equations Physical Properties Example

Abstraction

Model

Physical Model Boundary Conditions

Figure 2.4: Abstracting the problem to create the model.

Finally, the analysis does not focus on any design issues, meaning, how the system is actually built. Design issues like how to perform communication or how to interface with the user are left for the design phase. Design The design phase is the second phase of software construction. The design phase transforms the conceptual model produced by the analysis to produce an architecture (blueprint) of the whole system. This process takes into account the constraints of the system, like efficiency and flexibility. For example, if efficiency is important, a very expensive computation may be performed beforehand and stored in a fast look-up table. Choosing to compute beforehand rather than on demand is a design decision. Design

21

Chapter 2 - Background on Software Engineering decisions strongly affect the final behaviour of the system, making this a critical phase in software construction. During this phase, attention is paid to the interactions between the objects. These interactions will dictate how the object interfaces should look like and what are the responsibilities of each object. This is expressed by the Object-Oriented (OO) design principle [51]: Design for interfaces not for implementation Focusing on interactions rather than how to implement each object forces the designer to think about decoupling the objects. Decoupled objects are more easy to change, because as long as the interface stays the same, we can substitute the implementation. In the example of figure 2.5, the client uses the interface compute. This allows to change freely between a Lookup Table or an On Demand implementation, without changing the client software. Because both implementations respect the substitution principle, since both have the function compute, they are said to have the same Application Programming Interface (API). An API is similar to a contract between objects, specifying what each function should do, but not how it should be done. Objects Lookup Table

Interface compute

Client

compute precompute

On Demand compute

Figure 2.5: Example of object substitution. Both On Demand and Lookup Table objects satisfy the interface compute but the table has yet another function precompute to be used before the computation starts.

The end result of the design phase is the whole map of collaborations between the objects (defining their API’s). For example, in figure 2.6, we see a simplified design where the interactions between the concepts are described. The numerical method discretises the physical model by accessing its terms. It is composed of boundary conditions (BC) that also discretise the terms on the domain boundary. Both the method and the BC’s write on the matrix and right-hand side of the linear system that then gets solved.

22

Object-Oriented Methodology (2.2) This graphical description is known as a collaboration diagram, and there are more types of diagrams that describe different aspects of the design. For a complete description see [63, 80]. Physical Model

1

has

r sc di

Numerical Method

1

et

*

Equation Terms

s ize

has

discretizes

Boundary Conditions

*

rewrites

writes

Linear System

Figure 2.6: Simplified example of design for interaction between numerical method and physical model.

The design phase does not focus on implementation details, such as memory usage or programming functions. These details are handled during the implementation phase, which can use many different styles of programming, within which Object-Oriented Programming is included.

2.2.2 Object-Oriented Programming Object-Oriented Programming (OOP) is a programming style that uses objects and their interactions to create computer programs. Some alternatives to OOP are procedural and functional programming. While those focus on decomposing the program into procedure or functions, the OOP decomposes the program into classes. Class A class is the smallest building block of an OO system and represents a well defined concept with responsibilities within the system. A class defines a group of objects that have the same properties and behaviour. The properties of a class are given by its attributes, or simply speaking, its member data. The behaviour of a class is given by its methods, which are the (member) functions defined within the class. The class NumericDiff presented in figure 2.7 has 2 attributes: h is a real and f is a string. The class has 3 methods (functions), differentiate, set_precision and get_precision.

23

Chapter 2 - Background on Software Engineering Class Member Data Member Functions

NumericDiff -h: real -f: string +differentiate(x:real): real +set_precision(h:real) +get_precision(): real

Figure 2.7: Example of a Class Diagram, featuring NumericDiff, a class that performs numerical differentiation.

A class should be recognisable to a non-programmer familiar with the problem domain. This means that the responsibilities of the class should be connected to the abstraction it provides. In the example, NumericDiff provides a clear task: numerical differentiation of mathematical functions. On the other hand, creating a class that does both numerical integration and differentiation is not a good design because it mixes two different concepts together, each requiring a different type of algorithm. Moreover, good class design keeps the code self-contained, using encapsulation. Encapsulation affects both data and the implementation of functions. Encapsulation of Data The encapsulation concept, introduced informally in section 2.2, states that a class should hide its data from the clients that use the class. For example, the user of class NumericDiff does not see the attributes h and f. They are presented in the class with a ’−’ sign and therefore are hidden. The function set_precision is visible from the user (’+’ sign) and is used to change the value of h. The function get_precision may be used to access the value. These functions that access and change the encapsulated data of a class are called accessors and mutators. The advantage of encapsulation of data is that we can change the data representation without affecting the client code. For instance, we could change the attribute h from real to complex, and the change with be localised. Encapsulation of Functions Encapsulation also applies to functions. It states that each function implementation should not be exposed to the client. For example, we know that the function differentiate does a numerical differentiation of a mathematical function, but we don’t know which algorithm it uses. It could use finite differences or spline differentiation. To perform the algorithm it may use other functions of the class, that are hidden from the user, for instance compute_spline. But because the only

24

Object-Oriented Methodology (2.2) visible function is differentiate, we can change its implementation to use compute_finite_diff instead. The advantage is that we do this change without affecting the client code. Object A class is a definition. An object is the software entity built according to that definition. The class is like a blueprint, that defines how to build the object and how it should behave. The object is the concretization of that blueprint. An object is defined by the state of all its attributes and its identity (name or variable). In code listing 2.1 two objects are created from the class NumericDiff. The first is named dsinx and the second dxlogx. 1

NumericDiff dsinx ( " sin ( x ) " ) ; // first object NumericDiff dxlogx ( " x * log ( x ) " ) ; // second object

3 5

dsinx . set_precision () ; real dx = dsinx . differentiate (0.5) ;

Code Listing 2.1: Creation of two objects of type NumericDiff.

As explained by figure 2.8, both objects have the same behaviour (differentiate functions) but different states. The state of dsinx is to apply its behaviour to the function sin(x) and dxlogx applies to x log(x). Each object is built with different data ’sin(x)’ and ’x*log(x)’ which is then associated to the string of characters f. Since each object has a different state for f, it also has a different overall state. Composition One object may contain another object. This is useful for decomposing complex systems. If an object has a complex task to perform, it may delegate parts of that task to objects that are more specialised. The advantage is that each object is responsible for handling one task, thus simplifying the system. For example, the class NumericDiff needs to transform the string ’sin(x)’ into the value returned by the mathematical function sin(x). This means parsing the text ’sin(x)’ and processing that into a function call. This task should be delegated to an object that knows how to parse and call mathematical functions based on strings of characters. Assuming that the class FunctionParser is capable of that, we can put an object of that type into NumericDiff, as shown in figure 2.9. In this case, each object NumericDiff contains and owns an object FunctionParser. An object may have ownership of another object in which case the lifetime of both objects is connected: when one is destroyed the other is also.

25

Chapter 2 - Background on Software Engineering object names dsinx

object states NumericDiff differentiate(real:x):real

same behavior

f="sin(x)"

dxlogx

NumericDiff differentiate(real:x):real f="x*log(x)"

different state

Figure 2.8: Example of two objects with same behaviour but different state. NumericDiff delegates -h: real -f: string +differentiate(x:real): real +set_precision(h:real)

FunctionParser +parse(func:string,x:real): real

Figure 2.9: Example of a class that is composed of another class to which it delegates one task.

Alternatively, an object may simply have knowledge about the other, called acquaintance, in which case both have independent lifetimes. Inheritance Inheritance is a way to create new classes using classes that have already been defined. The new classes, called derived classes, inherit the attributes and behaviour of the classes from which they derive. Those are called base classes. Inheritance is employed to reuse or improve on preexisting code. Moreover, a derived class may specialise or change the behaviour of the base class. For example, suppose we want numeric differentiation of first and second order. We should make 2 new classes, each with its differentiation algorithm. Both need to make calls to ’sin(x)’, so the usage of the FunctionParser is still the same. Here we make use of inheritance. We remove the function

26

Object-Oriented Methodology (2.2) differentiate from NumericDiff and leave the rest. We then derive each new class from NumericDiff, and implement there the new differentiate functions. Figure 2.10 represents the class diagram with this inheritance relation. Base Class

NumericDiff delegates -h: real -f: string +set_precision(h:real)

FunctionParser +parse(func:string,x:real): real

Derived Class

Derived Class

FirstOrder

SecondOrder

+differentiate(x:real): real

+differentiate(x:real): real

Figure 2.10: Two child classes FirstOrder and SecondOrder inheriting functionality from a base class NumericDiff. Both reuse the access to Function parser.

Polymorphism Polymorphism is the ability of objects of different class to behave differently when called with the same function. The client calling the function does not need to know the exact type (class) of the object executing the function. The different objects involved in the polymorphic behaviour need only to respect the same interface. This means, the functions must have the same name and take the same arguments. In C++, polymorphism is done with class inheritance and the declaration of virtual functions. Virtual functions are functions for which the derived classes can change the behaviour. In such case, the base classes work as interface classes. Suppose we want to change the NumericDiff class to support any type of numerical differentiation. We transform the existing class into an interface class, with a virtual function differentiate. Next we derive 3 new classes from the interface, that will implement the concrete algorithms: ForwardDiff, CentralDiff and SplineDiff. This class hierarchy is presented in figure 2.11. In code listing 2.2 we see the code using this polymorphic structure: line 1 declares a pointer to the polymorphic type NumericDiff that in line 9 is used to call the function differentiate independently of the algorithm the user chose.

27

Chapter 2 - Background on Software Engineering Interface Class

Virtual Function

NumericDiff -h: real -f: string +set_precision(h:real) +differentiate(x:real): real

delegates

FunctionParser +parse(func:string,x:real): real

Implementation Class ForwardDiff

CentralDiff

SplineDiff

+differentiate(x:real): real

+differentiate(x:real): real

+differentiate(x:real): real

Figure 2.11: Example of an inheritance tree used to create a family of algorithms accessed via an Interface Class with polymorphic behaviour.

1 3 5 7 9

NumericDiff * ndiff ; switch ( user_choice ) { case ( FORWARD ) : ndiff = new ForwardDiff ( " sin ( x ) " ) ; break ; case ( CENTRAL ) : ndiff = new CentralDiff ( " sin ( x ) " ) ; break ; case ( SPLINE ) : ndiff = new SplineDiff ( " sin ( x ) " ) ; break ; } // calling the function is independent of whatever algorithm ndiff - > differentiate (0.5) ;

Code Listing 2.2: Example of polymorphic behaviour.

This type of polymorphism is called dynamic polymorphism and its advantage is flexibility. We can enrich the functionality by adding new algorithms to the class hierarchy without needing to change or recompile the existing code. In fact, techniques presented within this dissertation (section 4.3), will show how to add new algorithms while the simulation tool is running. Since software evolution is strongly dependent on the addition of new classes, using polymorphism to isolate and localise changes is the most important aspect of Object-Oriented Programming.

2.3 Generic Programming and Metaprogramming Generic Programming Generic Programming (GP) is a programming style where the algorithms are written without specific reference to the involved classes or objects. Later these algorithms are applied to concrete classes to generate the desired code. With GP, we are able to program an algorithm for multiple types of data and with many degrees of functional variance and reuse them in different situations.

28

Generic Programming and Metaprogramming (2.3) Multiple data types For example, in numerical methods often we deal with lists of objects in which we iterate to solve some problem. We can loop over a list of coordinates or a list of boundary conditions. A generic algorithm allows to program these lists only once, instead of programming separately a list of coordinates and then a list of BC’s. Figure 2.12 shows a class List which can be parametrised with Coordinates or BoundaryConditions objects. Coordinates

BoundaryConditions Element substitution

Element:Class List

Generic List Implementation

+begin() +end()

Figure 2.12: Example of a Generic Class for List of objects.

Functional Variance GP can be used for other purposes than constructing algorithms to work with multiple data types. Using a technique commonly called trait classes, a generic algorithm can be configured to work in different manners. For example, we can implement the class NumericDiff with multiple orders, not using inheritance but with GP. We make the functional parts of the algorithm dependent on the order part of separate classes, like FirstOrder. These classes are used to substitute in the generic NumericDiff those functional part, as seen in figure 2.13. Those classes lend behaviour to the generic class, therefore they are considered trait classes. C++ Templates Generic programming in C++ is achieved through the use of templates. While the polymorphism presented in section 2.2.2 is dynamic and therefore changeable at runtime, templates provide C++ language with static polymorphism which cannot be changed once compiled. Nevertheless, the C++ template code can take advantage of many compiler optimisations that dynamic code cannot, like inlining. This makes it a popular candidate for writing efficient numerical codes.

29

Chapter 2 - Background on Software Engineering

FirstOrder

SecondOrder

+differentiate(x:real): real

+differentiate(x:real): real

Order substitution Order:Class

NumericDiff Generic Numeric Differentiation -h: real -f: string +set_precision(h:real) +differentiate(x:real): real

Figure 2.13: Example of a Template Trait Class providing the order of differentiation to the NumericDiff algorithm.

Template restrictions When algorithms are written using C++ templates, not every type can be used to instantiate them. For example, a sorting algorithm can only be applied to a C++ type that has a comparison operator. If we define a class Colour, we cannot sort a collection of colours if we cannot compare them. A suitable operator in this case would compare the equivalent colour wavelength. If the developer does not program this operator in the class Colour, then the compilation of the sorting fails. Concepts are a recent addition to C++ due to the next ISO release [58]. They are used to describe the limitations that a template parameter has. In other words, they describe what a template can do or not. For more information refer to [57]. Metaprogramming Metaprogramming is the creation of programs that handle other programs as their data. The aim is to perform error-prone repetitive programming actions to increase developer productivity. It can be part of a Generative Programming (GenP) procedure (see section 2.7), when used to generate source code at runtime. Within C++, metaprogramming uses generic programming techniques with C++ templates. In this case we call it Template Metaprogramming (TMP). For example, with TMP we can create a code that generates automatically the result of any factorial as a constant. This means that when we compile the code, the compiler evaluates the expression and substitutes it by the constant value. When we run the code the expression is already constant making the numerical calculations more efficient. The implementation of

30

Patterns (2.4) the factorial is in code listing 2.3 1 3 5

// / Template m et a pr og ra m mi ng function for factorial function template < int N > struct Factorial { enum { value = N * Factorial < N - 1 >:: value }; };

7 9 11

template <> struct Factorial <0 > { enum { value = 1 }; };

13 15

// / usage x = Factorial <4 >:: value ; // x = 24;

Code Listing 2.3: Example of metaprogramming.

Applications of TMP go far beyond this example. TMP is mostly used to improve reuse of algorithms and provide fast-running code. Nevertheless it has some drawbacks. One drawback is that the generated code is static: once compiled its behaviour cannot be changed. Using this code in a dynamic tool poses some problems. Section 4.1 presents a technique that allows easy integration of TMP code in standard dynamic code.

2.4 Patterns For small systems, classes and objects are enough to describe the system architecture. But as the system becomes larger, other levels of abstraction are needed to help developers cope with the complexity. Patterns are the next level of abstraction in a system (see figure 2.2). When common arrangements of classes repeat to solve recurring problems, we say we have a pattern. Alexander, the creator of the concept, describes them as: Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice. – Christopher Alexander [3] What this means is that patterns are not finished solutions and must be adapted to each specific problem. Patterns can occur in Design and Architecture of the system. While an architectural pattern expresses a fundamental structural organisation of the software system, a design pattern describes

31

Chapter 2 - Background on Software Engineering relationships and interactions between smaller scale entities (classes). Other classifications of patterns also exist.

2.4.1 Design patterns At the lower level, when designing how classes interact, many solutions may be used. Each solution carries some benefits and trade-offs. Design patterns, as described in [51], are common solutions to those recurring design problems. They are very useful because they standardise the solutions, without imposing strict constraints. This means that the programmer has the freedom to adapt them to best fit the problem. It is typical in complex systems to use many design patterns in conjunction. Frameworks use design patterns to provide solid and clear relations between classes. For example, frameworks commonly use the Template Method design pattern. Template Method Template method, [51] pp. 325, defines the skeleton of an algorithm in an operation, deferring some steps to derived classes. Therefore, the structure of the algorithm is fixed but the steps that compose it can be tuned by the derived classes. For example, we can implement the general FEM with a template method pattern. First we identify what is the structure of the FEM discretisation: 1) 2) 3) 4)

clean the linear system assemble the jacobian matrix apply boundary conditions solve system

Then we create a class FEM that implements this algorithm in function discretize, see figure 2.14. The function discretize calls 4 functions in order. The functions clean_system and solve_system are implemented, but functions assemble_jacobians and apply_bcs are not. These last 2 functions must be implemented by the derived classes. They are called hook functions because it is where the developer will hang or attach his extension to the FEM.

2.4.2 Architectural patterns Architectural patterns are a level above design patterns in terms of complexity. They describe the structure of the system, therefore they are used

32

Patterns (2.4) Template Method Class

FEM +discretize() +clean_system() +solve_system() +assemble_jacobian() +apply_bcs()

Hook Functions

clean_system(); assemble_jacobian(); apply_bccs(); solve_system();

Concrete Methods GalerkinFEM +assemble_jacobian() +apply_bcs()

PetrovGalerkinFEM +assemble_jacobian() +apply_bcs()

Figure 2.14: Class diagram for the Template Method pattern.

more in isolation. Their advantage is to provide a structure that has clearly defined software qualities and known implications in the system behaviour.

Microkernel The Microkernel pattern [26], describes the structure of a system that needs to adapt to changing design requirements. It separates the system into a kernel with minimal functionality, components with extended functionality and client customised parts. The kernel also serves as a socket for plugging in these extensions and coordinating their collaboration. For example, we could use the Microkernel pattern to implement a more flexible version of the FEM design of the previous section. In figure 2.15 we have created a kernel holding the FEM code and made the BC’s into flexible plug-in’s. The client can request the FEM to discretise the equations, and the FEM will in turn use the LinearSystem and the various BC’s. The client can also request to add his own MovingBC, which comes in as an external loaded plug-in. Due to the centralised structure, whenever a BC needs access to the LinearSystem it must request it from the FEM kernel. This architecture insures independence between the participants, so allows for flexible changes in the structure and functionality of each pluggable component. This work uses an adapted version of this pattern in section 6.2 as a way for numerical components to work together. It allows the engineer to assemble a customised simulation tool according to specific needs.

33

Chapter 2 - Background on Software Engineering

DiricheletBC LinearSystem

NeumannBC

FEM Microkernel

Client

MovingBC

Figure 2.15: Collaboration diagram for the Microkernel pattern.

2.5 Framework Architecture A software framework is a reusable design together with an implementation, consisting of set of collaborating classes, both abstract and concrete[2], that helps developers write faster and better applications within a certain domain. Frameworks are domain specific. We can create frameworks for numerical simulation tools, data visualisation or signal processing. They will provide the basis for the construction of applications in each domain. Provides implementation Frameworks implement the repetitive details common to all applications of their domain in a reusable package. Using the framework, the developer concentrates on the problems related to his specific application, and avoids reimplementing all the common machinery. Provides structure But frameworks are more than just collections of classes. Frameworks also provide structure, by having architectures adjusted to the application domain. Within each architecture, the responsibilities and interactions of each class and objects are soundly defined. So, frameworks dictate the architecture of the applications built on top of them.

34

Component-Based Software Engineering (2.6) Use of patterns Frameworks make heavy use of patterns, both design and architectural, to provide a consistent structure. Moreover, patterns help the developer understand better what extensions he needs to implement. Extension points A framework by definition is an incomplete work. It needs to be extended in order to form the concrete applications. For instance, a framework may provide functionality to read, write and manipulate meshes for numerical simulations, but it still requires the developer to code the numerical method that uses the mesh. A framework is composed of a frozen structure, with some base functionality and a set of hot-spots, [26]. A hot-spot is the part of the framework that the developer is supposed to extend and specialise to his application. Hotspots are similar to hook methods in the Template Method pattern (check section 2.4.1), but instead of a single function, they may be a set of classes to extend. Framework example In figure 2.16, we present a simplified framework for numerical methods. The abstract class NumericalMethod loops over objects of the MeshEntity class, and uses the Integrator class to discretise the element contributions. When we use this framework to construct different applications, we are forced to extend these classes. To construct a FEM application, we must extend NumericalMethod with new class FiniteElement. We also provide the extension of the MeshEntity class into LagrangeP1 elements, which the method will loop on. Then the extension of Integrator into ElementMatrix creates the algorithm to assemble the element contributions to the linear system. The specialisation for the FiniteVolume class is different because the loop is performed on CellCentered entities, that have their fluxes integrated by FaceFlux.

2.6 Component-Based Software Engineering Component-Based Software Engineering (CBSE) focuses on decomposition of the systems into functional blocks with well defined interfaces used for communication between components. The main idea is to reuse these components as building blocks to construct many different applications, thereby reducing the effort of developing those applications. Components and Object-Orientation A component architecture is always the result of OOA/D but it does not imply the implementation of the com-

35

Chapter 2 - Background on Software Engineering Framework Interface Classes

Finite Element Application

Finite Volume Application

NumericalMethod

FiniteElement

FiniteVolume

+discretize()

+discretize()

+discretize()

specialization integrate

Integrator

ElementMatrix

+integrate()

+integrate()

FaceFlux +integrate()

specialization loop over

MeshEntity

LagrangeP1

CellCentered

+shape()

+shape()

+shape()

specialization

Figure 2.16: Example of Framework usage by deriving classes and adapting to specific application.

ponent using OOP. A component can be implemented with a procedural language as FORTRAN. But when implemented with OOP, components sit at a higher level of abstraction over objects because they do not share state and communicate by exchanging messages which carry the data.

Components and Frameworks Frameworks and components are cooperating technologies [2]. Frameworks provide a reusable structure in the form of component specifications and templates for their implementation, thereby making the development of new components easier.

Example of Components In figure 2.17, we can see a Unified Modelling Language (UML) component diagram describing the interactions between the components that compose a simulation of fluid flow modelled by the Navier-Stokes equations and discretised with FVM. Each component declares what interfaces it provides to other components (the lollipop symbols) and what interfaces it requires (the antenna symbols). These are connected according to some criteria, normally depending on user requirements.

36

Generative Programming and Domain Engineering (2.7)

Matrix

PETSc

<>

Tecplot Writer

<>

Solution

Vector

<>

Matrix

Mesh

Gambit Mesh

<>

Solution

Finite Volume

Vector Mesh <>

Terms Terms

Navier-Stokes

Figure 2.17: Example of a UML Component diagram with interacting components.

2.7 Generative Programming and Domain Engineering While CBSE creates isolated, reusable software blocks, Generative Programming (GenP) provides the way to combine the blocks into a highly customised software application. Domain Engineering (DE) uses the domain knowledge to interpret user requirements and drive the customisation process.

2.7.1 Generative Programming GenP is a programming style that uses programs to automatically generate other programs from a set of user requirements, rather than have a human developer write them [35]. It uses generic classes, templates and code generators to improve developer productivity or perform repetitive tasks that otherwise would not be realisable. In C++, GenP uses mostly metaprogramming techniques (see section 2.3). This technique is used in section 4.3 for

37

Chapter 2 - Background on Software Engineering the automatic creation of algorithms based on existing C++ template code. 1 3 5 7 9 11 13

typedef double real ; // / file Trapezium . hh template < typename FUNC , int N > struct TrapeziumRule { static real integrate ( real a , real b ) { real sum_intg = FUNC :: eval ( xk (a ,b ,0) ) ; for ( int k =1; k < N ; ++ k ) { sum_intg += 2* FUNC :: eval ( xk (a ,b , k ) ) ; } sum_intg += FUNC :: eval ( xk (a ,b , N ) ) ; return (b - a ) / (2.* N ) * sum_intg ; } static real xk ( real a , real b , int k ) { return a + k *( b - a ) / N ; } };

Code Listing 2.4: Trapezium rule algorithm for generic functions. 1 3 5 7 9 11 13 15 17 19 21 23

# include < cstdlib > # include < iostream > # include < fstream > using namespace std ; int main () { double a , b ; int user_pts ; string user_func ; // request user parameters cout <<" Input the function of ’x ’" << endl ; cin >> user_func ; cout <<" Input the number of points " << endl ; cin >> user_pts ; cout <<" Input the interval points " << endl ; cin >> a >> b ; // generate source file based on user requirements ofstream fout ( " userint . cxx " ) ; fout << " # include < iostream >\ n # include < cmath >\ n " ; fout << " # include \" Trapezium . hh \"\ n " ; fout << " struct UserFunc { static real eval ( real x ) " << " { return " << user_func <<" ;}};\ n " ; fout << " int main () { \ n " ; fout << " std :: cout << TrapeziumRule < UserFunc , " << user_pts << " >:: integrate ( " <
Code Listing 2.5: Program that generates programs to integrate user defined functions.

Example of GenP To give an example, using the code for the trapezium rule for function integration as in code listing 2.4, we can create a program that reads a user defined mathematical function and integrates it. This program, given in code listing 2.5, generates another program using the integrator and places the user defined function in the integrator. It will also place the interval of integration and the number of integration points, as provided by the user. Given the following user input:

38

Generative Programming and Domain Engineering (2.7) Input the function of ’x’ > x*sin(x) Input the number of points > 1000 Input the interval points > 0 6.283185 The program generates the code in code listing 2.6, compiles and runs the generated program giving the result −6.28316 ≈ −2π. Because the code is compiled with the user choice of parameters, a whole range of compiler optimisations is possible, resulting in a highly optimised code. 2 4 6 8

# include < iostream > # include < cmath > # include " Trapezium . hh " struct UserFunc { static real eval ( real x ) { return x * sin ( x ) ;}}; int main () { std :: cout << TrapeziumRule < UserFunc ,1000 >:: integrate (0 ,6.28318) << std :: endl ; }

Code Listing 2.6: Generated program according to user requirements.

2.7.2 Domain Engineering Domain Engineering is the process of using domain knowledge to produce new software systems by systematical software reuse. It makes strong use of GenP and Domain Specific Language (DSL). The domain engineering process, depicted in figure 2.18, is used to drive an automatic generation of solutions. The process starts with an analysis of the problem space, which is mapped with some configuration knowledge to a solution space. The problem space consists of the required concepts and features and the solution space is the set of all possible component combinations, that satisfy those requirements. The configuration knowledge specifies illegal feature combinations, default settings, construction and optimisation rules. For more complete explanation, see chapter 5 of [35]. Example of Domain Engineering In the example presented in figure 2.19, we start of with a collection of available components that respect certain interfaces. For instance, all numerical methods stick to the same definition: they provide a solution and require a Matrix, Vector, Mesh and Terms interfaces.

39

Chapter 2 - Background on Software Engineering

Problem Space Domain specific concepts Required features

Domain Specific Language

Configuration Knowledge Default settings & dependencies Construction rules Optimization

Automatic Generation

Solution Space Elementary Components Maximise Reuse Minimise redundancy

Component Environment + Framework Architecture

Figure 2.18: Domain Engineering Process.

Now suppose that we have 3 different clients: FluidFlow, Thermal and Multi-Physics. The client FluidFlow is only interested in simulating fluid flow problems, whereas the client Thermal is interested in heat conduction analysis on solids. Finally, client Multi-Physics is interested in solving oil and water cooling of engines. Each client gives his requirements through the following parameters: Client FluidFlow: Navier-Stokes Client Thermal: Heat Transfer Client Multi-Physics: Conjugate Heat Transfer Tecplot Visualization Using the domain knowledge, the software system is able to interpret these inputs and choose the appropriate components to assemble the simulations. For instance, to solve Navier-Stokes the system selects automatically the Finite Volume component. To solve the linear system, there is no preferential solver, so Trilinos is chosen. For the solution of Heat Transfer, the Finite Element component is automatically selected. This component produces matrices which are better solved with the PETSc solver, so the system immediately chooses the best option. What about the special client Multi-Physics? He requests a special component that couples together a solution from fluid flow and from heat transfer, to simulate fluid cooling a solid. Then the system treats each solution as another request, which are satisfied in the same way as the first two clients. On top of the conjugate heat transfer solution, he asks for a Tecplotr visualisation component.

40

Generative Programming and Domain Engineering (2.7) Notice that all the connections on figure 2.19 are done automatically by the simulation system, although they need not be. If a client Fluid Flow would request, he could override the system choice and, for instance, impose the usage of the PETSc solver.

Available Linear System Solvers

Available Physical Models

Matrix

Matrix

<>

<>

<>

Vector

Finite Volume

Solution

Terms

PETSc

Navier-Stokes

Vector

Mesh Terms

Finite Element

Solution

Matrix

<>

Matrix <>

Terms

Vector

<>

Heat Transfer

Trilinos Vector

Mesh

Component Environment

Terms

Domain Engineering

{

Framework Architecture

Available Numerical Methods

Domain Knowledge + User Requirements

Each user chooses his components Fluid Flow Client

Thermal Client Matrix

Trilinos

Matrix

Solution

Vector

<>

Finite Volume

Matrix

<>

Finite Element

<>

Terms

<>

Solution

Conjugate Heat Transfer

<>

Terms

Navier-Stokes

Terms

Solution <>

Tecplot Writer

Vector Mesh

Mesh Terms

Multi-Physics Client

Vector

<>

Vector

<>

PETSc

Matrix

Solution

Flow

Heat Transfer

Highly Customized Tools

Heat

Figure 2.19: UML Component diagram for Domain Engineering of Components for customised simulation tools.

41

Chapter 2 - Background on Software Engineering Domain Specific Language Domain Specific Language is a specialised problem-oriented language and thus is always connected to a specific domain. This limits their scope of application but allows to have semantically rich elements on the language that can express complex arrangements very simply. These languages are used, for example, to describe the user requirements or to model the interfaces within the systems. Within this thesis, a language for the description of scientific simulations is presented in section 4.2.

42

There is central quality which is the root criterion of life and spirit in a man, a town a building, or a wilderness. This quality is objective and precise, but it cannot be named.

Chapter 3

Christopher Alexander (1936-), Timeless Way of Building

Research Problem and Thesis definition In this chapter we define the problem of developing software for numerical scientific simulations (section 3.1) and which goals we set for our research work (section 3.2). We proceed to define in section 3.2.4 some concrete objectives that will help us evaluate the final outcome. In section 3.3 we state the thesis to develop and we compare with previous related work in section 3.4. In section 3.6 we summarise the contributions of this thesis towards the specified goals, as a foreword to chapters 4, 5 and 6.

3.1 Problem definition Knowing why we take the effort of researching is as important as what we research. The objective of this section is to explain the motivation behind the goals defined in section 3.2. For this purpose we shall describe our problem within a problem space without reference to software qualities, and define the causes that justify the introduction of such software qualities as goals for the research. It is important to keep in mind that such methodology to describe the problem is not unique. Even if it may seem, it does not pretend to be a formal mathematical description of the problem. For a more formal approach, read the work of Veldhuizen [136] on the quantification of software complexity.

3.1.1 Initial Concepts We shall clarify some concepts to properly define our problem. Such concepts are only relevant for the current section and should not be transposed to the remaining parts of this dissertation. Requirement is an external objective that the software system must fulfil in order to solve a specific scientific simulation.

43

Chapter 3 - Research Problem and Thesis definition Action is an external observable behaviour of the overall system which satisfies some requirement. It could be understood as external functionality, but we shall refrain from using that concept as is is too close to the concept of function. Interaction is a point of contact between two actions, where one action depends on and/or affects the other. Complexity is the ratio between the effort to implement a new (or change an existing) action to satisfy a new requirement and the perceived effort for that same implementation. In other words, a software is complex if for a small change (increment) in requirements it implies a large change in existing implementation of the actions. The advantage of the previous definitions is that they only rely on the external view of the system and does not invoke any specific methodology for analysis, design and implementation of the actions.

3.1.2 Problem description We now proceed to describe the problem. The interactions between two actions can be of four different types: Affecting - An action is said to affect another action if a change in the API of the former imposes changes in the latter. This is related to delegation, i.e., to the reuse of interfaces via a non-intrusive relationship. Dependent - An action is dependent upon another action if the requirements of the former depend on satisfying the requirements of the latter. In other words, one action uses the other action to accomplish part of its own requirements. This is related to composition and inheritance, i.e., to reuse of implementation via an intrusive relationship. Bidirectional - When two actions are affecting and depending on each other, they are said to be bidirectional. Null - A null interaction occurs when it is neither dependent nor affecting. Matrix of interactions We can arrange existing actions of a system in a matrix form. As an example we take a system that has three external visible actions: one to discretise a PDE with a Finite Volume Method [FV], another

44

Problem definition (3.1) action to provide the Navier-Stokes [NS] equations as such PDE and a third to solve the linear system using PETSc [9] external library [PE].   FV NS PE  FV B D −     NS A B −  A − B PE The rows represent the action we are considering, and each column the action that it relates to. We fill the entries with the types of interactions that occur between the actions. What we get is a simple description of the interactions in the system, that can be read as: Finite Volume action depends on the Navier-Stokes implementation. Changes in the API of Navier-Stokes affect the Finite Volume action. Changes in the API of PETSc affect the Finite Volume action.

The entries in the diagonal are always marked as bidirectional because changes in the action both affect and depend on changes in the action itself. Note that an affecting interaction is not always paired with a depending interaction. In the example, Finite Volume simply uses the API of PETSc. It does not need to change if PETSc changes the implementation of the algorithms. On the other hand, Finite Volume uses not only API functions from Navier-Stokes but also, for example, arrays bound to that specific physical model, where each specific entry has a defined meaning. More interactions This methodology can now be used to help us visualise the interactions of more complex systems. Say we add some new requirements. The system must solve problems also with the Magnetohydrodynamics (MHD) equations [MHD] and use the Trilinos [60] library [TR] as an alternative to PETSc. The matrix of interactions grows to the following:   F V N S M HD P E T R  FV B D D − −     NS  A B − − −    M HD A  D B − −    PE A − − B −  TR A − − − B We suppose that MHD reuses part of the implementation of Navier-Stokes, therefore depends on it. MHD also affects Finite Volume. Adding Trilinos

45

Chapter 3 - Research Problem and Thesis definition has less impact. Only Finite Volume is affected by having to implement the calls to Trilinos API. The added entries in the matrix show the effort to design and implement the MHD and Trilinos actions. The diagonals show the direct (perceived) effort. Nevertheless, we must deal with 4 new interactions, by adapting the code to the new requirements. We add another discretisation method, and require that both equations be discretised using the Finite Element Method and solved by both solver libraries. The matrix of interactions would again grow, to incorporate all the possible interactions between all actions. These would add 6 interactions to handle beside the effort to implement the FEM [FE].   F V F E N S M HD P E T R  FV B − D D − −     FE − B D D − −     NS A A B − − −     M HD A A D B − −     PE A A − − B −  TR A A − − − B Number of interactions We start to visualise the problem. Every new action added has many interactions to handle plus is a source of additional interactions for future actions. The work to add such future actions increases quadratically. Thus, small changes in the requirements start to imply ever larger changes in the implementation of existing and future actions. According to our initial definition, this means increased complexity of the software. Although some of these interactions should be easy to deal with, we must keep in mind that this is a simplified exercise. In fact, the size of such matrix of interactions is much bigger as each of those actions has many different observable behaviours: each numerical method has many schemes, PDE’s have different formulations, linear solvers have many preconditioners, etc. Other types of requirements must be taken into account. For instance, we might require to solve the simulations in a parallel environment. This would add a row and a column to the matrix which, given the pervasive nature of the parallel environment, tends to be filled with affecting if not bidirectional interactions. Concrete problem The domain of application of this research involves the solution of multi-physics problems discretised by multiple methods, in a parallel environment handling complex topology representation.

46

Problem definition (3.1) This is a long list of requirements. Since the number of interactions can grow quadratically with the number of requirements, the effort to develop a software solution is potentially very high. This means the system will become ever more expensive to develop and maintain. The challenge is how to reduce the number of interactions without compromising on the number of requirements. The problem addressed by this dissertation is: How to reduce the complexity of the software without compromising requirements of the system? Simplifying the matrix One way is to simplify the matrix of interactions. Such simplification correspond transforming it from a dense matrix to a sparse matrix. To do that, we must find improvements to the software that transform complex interactions into simple ones, and ideally to null interactions. We can also try to reduce the problem, accepting the same requirements but satisfying them with less of actions. Reducing the number of actions will reduce the number of interactions. Another way is to introduce actions that shield the dependencies. This is done by creating API interfaces under which different implementations may exist. In the example, we can add an action Physical Model [PM] that abstracts Navier-Stokes and MHD from FEM and FVM. Another abstraction can be Linear Solver [LS], to shield the dependencies from Trilinos and PETSc. Although this increases the number of actions, it transforms many interactions into null ones. As seen below, the number of interactions in the expanded system is the same as in the previous matrix. The advantage to introduce the abstractions becomes evident when adding a third discretisation method, for instance, Discontinuous Galerkin. Instead of dealing with 6 interactions we only need to deal with 2.   FV   FE   PM   NS   M HD   LS   PE TR

FV B − A − − A − −

FE − B A − − A − −

PM − − B D D − − −

NS − − − B D − − −

M HD − − − − B − − −

LS − − − − − B D D

PE − − − − − − B −

TR − − − − − − − B

             

47

Chapter 3 - Research Problem and Thesis definition

3.2 Research Goals Finding the most effective methodologies that simplify software complexity is the general goal of this research.

3.2.1 Primary goals The research will be focused on the development of software methodologies, to be applied in the domain of scientific computing. Most of these methodologies have been developed outside this domain of application and need to be adapted for effective use in scientific computing. To evaluate the characteristics of these methodologies, we select a set of guiding goals each based on a software quality. Each quality is defined below, and is similar to definitions from literature [90]. Flexibility is the ease of adapting existing software to changes in its requirements. Efficiency is the ability of a software system to place as few demands as possible on hardware resources, such as processor time, memory space, and communication bandwidth. Usability is the ease with which people of different backgrounds can learn to use the software products and apply them to solve problems. It includes both development efforts as well as end-user interactions. Flexibility Flexibility is chosen because it carries the notion that in order to simplify the problem no limitation to the requirements should be placed. On the contrary, whatever solutions we find must be open for further expansion of the requirements, within the scope of the domain of application. This quality has itself a more philosophical meaning. As mentioned in [90], Software is ‘soft’, indicating it is supposed to evolve, to support changes in requirements along its life cycles. Efficiency The second goal is strongly related to our domain of application. High performance is a major goal of whatever scientific simulation involving the solution of PDE’s. The complexity of the physical systems we solve demands large numbers of mathematical operations, taking long periods of time to get any solution. This goal is measured not only in processing time, but also on memory consumption. Memory availability will limit the size of the problems to be

48

Research Goals (3.2) solved. Communication bandwidth between processing entities will adversely affect the scalability of the solutions to large numbers of processors. Traditionally, efficiency is an objective that is at odds with flexibility. We will show that flexibility is not opposed to efficiency and that both objectives can be simultaneously achieved.

Usability The end system will not only be used by the developers that conceived it, but also by the non-expert developers whose work is to implement algorithms without concentrating on the software issues. This goal also expresses the need for the end-user to have a simple software system to work with. The configuration and adaptation of the system to its multiple applications should be easy and seamless. This quality is often called user- friendliness.

3.2.2 Derived goals From the goals presented in the past section, some more goals can be derived. They help us understand better the effectiveness of different methodologies.

Reusability Reusability is the ability of software elements to serve for the construction of many different applications. In contrast with other domains, reusability is not a goal by itself in the domain of scientific computing. Many end-users will not be interested whether a certain software product can be used for other applications as long as it does what they need. It is our believe and we will demonstrate that reusability of software components implies more usability from the developer point of view. The developer can do more by reusing existing components.

Modularity Modularity is the quality of a system composed of autonomous software components, which are connected by coherent simple structures. In [90] a whole chapter is devoted to describe this rather intricate concept. Despite the fact that flexibility does not imply modularity, they are two objectives that go hand in hand. Moreover, combined with reusability, flexibility yields a system very close to being modular. Along this research we will look for solutions that bring modularity to the system, therefore satisfying multiple objectives simultaneously.

49

Chapter 3 - Research Problem and Thesis definition

3.2.3 Solution space Over the years, software engineering saw the emergence of many methodologies. These methodologies address the problem of developing software to satisfy qualities like the ones we selected in section 3.2.1. Searching for the solution to our problem within the whole spectrum of methodologies is beyond the scope of this work. Based on previous experience and input from similar projects, we shall reduce our solution space to some methodologies, and try to find, within those, which techniques are more effective to attain our objectives. 3.2.3.1 A case for OOA/D Meyer argues OOA/D technology over Structured Systems Analysis and Design (SSA/D) on basis of flexibility of the end design [90]. The top-bottom approach of SSA/D based on the analysis of the functions of a system gives a static architecture that is not easily adaptable to changes in requirements. This violates the first goal of this research. On the other hand, OOA/D is focused on building a system that can evolve. It achieves this by concentrating the analysis effort on the data types, to abstract and generalise them, and by defining the relationships between them. It is more accurate to say that OOA/D is about object types, rather than objects themselves. For complete descriptions of the method, please refer to [80, 90]. The success of OOA/D in other domains of application has led to the appearance of many proposals over the last two decades to use it for the domain of high performance scientific computing. Some of these efforts have been partially successful [22, 24, 94, 100, 142] but it has been evident that some issues still remain to be solved, especially regarding efficiency. It seems natural to continue the research where others left off, by adopting OOA/D as our starting methodology. 3.2.3.2 Frameworks and Components In [51], a framework is defined as “a set of cooperating classes, both abstract and concrete, that make up a reusable design for a specific class of software”. Frameworks provide a reusable structure to the applications. This reduces the development time of many applications at the cost of an initial investment in developing the framework. Some projects have already shown that this investment pays back in the long run with a remarkable capacity

50

Research Goals (3.2) for reuse [33, 34, 46]. We will design a framework for scientific computing, hoping to create an architecture that simplifies the development of our multi-method, multi-physics environment. Szyperski defines software components as independent units of development that interact to form a functioning system [125]. Components are used to increase modularity and provide a pool of functionality to be reused in different applications. Many research projects [5, 17, 64, 104] have already shown the potential of this methodology in scientific computing. We hope to increase software reuse within this project by developing a component based environment. Fayad and Schmidt defend in [118] the synergistic relationship between frameworks and components. They discuss that an “effective pattern to organise a common software architecture is to use (1) frameworks in the horizontal middleware platform layers and (2) components in the vertical application layers”. Following this idea, we will explore the creation of a component environment based on the architecture provided by a common framework. 3.2.3.3 Domain Engineering and Generative Programming Some authors advocate the use of Domain Engineering (DE) techniques together with Generative Programming (GenP) to reduce development costs [35]. In scientific computing, DE has been applied by Gerlach in his thesis [53]. Gerlach used domain analysis for the data representation of a FEM PDE solver. This was used in the design of Janus: a C++ template library for parallel mesh based PDE solvers [54–56]. We intend to use these techniques to guide the automatic generation of numerical algorithms as software components. Haveraaen has applied GenP in the development of CodeBoost [8] to automatically generate and optimise code for the Sophus coordinate-free numerical library [8]. This approach has shown great economies in the amount of code to develop numerical algorithms. We will use it to speed-up the creation of numerical components.

3.2.4 Concrete Research Objectives Having restricted the search for solutions to the software methodologies mentioned in section 3.2.3, we proceed to define the concrete research objectives for this work:

51

Chapter 3 - Research Problem and Thesis definition 1. To create a component based, object-oriented framework architecture for scientific simulations. 2. To analyse the software methodologies and improve our understanding of how they reduce software complexity, within the thesis domain. 3. To identify which techniques, within the chosen software methodologies, satisfy the research goals best. 4. To understand how the employed techniques cooperate and integrate.

3.3 Thesis statement Having set the concrete objectives for the research, we are now in position to propose a hypothesis that we will try to validate throughout this dissertation: With judicious execution of multiple techniques, from different software methodologies, it is possible to construct a family of systems within the domain of application as defined in section 1.3, which satisfy simultaneously the goals of flexibility, efficiency and usability. The definition of the thesis uses terms which are not precise, therefore we should clarify the meaning of these terms. What does it mean ‘judicious’ ? By judicious we mean that each technique will be used where it provides the best results, being adapted in a case per case form. As argued in [20, 21], we will not search for one size fits all techniques that alone appear to achieve all goals. Which are the ‘software methodologies’ ? The methodologies we will investigate are OOA/D, Framework architectures, Component based software construction, Generative Programming and Domain Engineering. Which are the referred ‘multiple techniques’ ? Within methodology, several techniques are available. Sometimes a technique will be part of more than one methodology. Examples of these techniques are design patterns, abstract data types, inheritance, meta-programming, generic types, etc. By combining different techniques we expect to find a solution that satisfies all the goals simultaneously.

52

Related work (3.4) How will this ‘family of systems’ look like? The family of systems will be made of software components. These components will be reused as building blocks of different applications for the solution of scientific multi-physics simulations. To which degree will the goals be satisfied? The goals of flexibility, efficiency and usability should be satisfied to the same extent that other applications in the field satisfy them, even if individually. That is to say, we will measure each goal by comparing the efficiency with the most efficient, the flexibility with the most flexible, and the usability with the most usable application in the field.

3.4 Related work Diffpack Diffpack is an object-oriented C++ framework for solving PDE’s by finite element methods [23, 24]. Much of the initial research of this dissertation was based on the publications from Langtangen and Bruaset. The difference between this project and the present work is that COOLFluiD aims at supporting multiple methods. Nevertheless, Diffpack shows good abstractions from the physical models and an innumerable list of possible applications. Overture Overture was developed by Henshaw and Quinlan [22, 59, 100] as a C++ framework for the solution of PDE ’s. It supports Finite Difference and Finite Volume methods for multiple physics. This project differs from our objectives in that it focuses on structured grids. ELEMD Nelissen and Vankeirsbilck [94] developed a generic solver based on the Finite Element method. They developed a multiple method and multiple physics solver based on OOA/D techniques. The authors have recognised that the final application had a slow-down compared to procedural FORTRAN code of a factor 2 and a non-optimal use of computer memory. Their early work was much of the historical background used in the analysis phase of this research. Cogito Thun´e and his colleagues used OO techniques for developing parallel PDE solvers [126]. These have been demonstrated in the Cogito platform which implements an MPI solver with Fortran 90 but interfaced by the Compose C++ abstract data types. This work has been also an initial source of

53

Chapter 3 - Research Problem and Thesis definition information for the present research. This group continues an active research in parallel PDE solvers with the work of Ljungberg [84]. OpenFOAM Weller and his colleagues developed the FOAM solver, later distributedin open source format as OpenFOAM [141, 142]. The solver uses C++ generic programming techniques to achieve high performance. Their objectives differ from ours in the sense that they concentrate on one space discretisation (the Finite Volume method) whereas the research in this thesis focuses on multiple discretisation methods. Nevertheless, both handle multiple physics within the same platform. FEniCS FEniCS is a set of software tools to automatise the solution of differential equations [45, 49]. It is mainly focused on automatising the construction and optimisation of FEM formulations [85]. It has shown great speed improvements over the typical quadrature approaches. It does so by taking user defined PDE problems, evaluating the correspondent FEM matrices and generating optimised code that gets compiled into a customised application [73]. One interesting feature is that it couples a dynamic language (Python) to manage and control modules developed in highly performant C++ code. SIERRA The Sierra framework [46], developed by Edwards at Sandia National Laboratories provides essential services for managing the mesh data structure, computational fields and physical models of scientific computing applications. SIERRA manages the services that the application selects under massively parallel multiphysics simulations. Cactus Cactus is a modular simulation environment, consisting of a core (called flesh) and plug-in modules (called thorns) [27]. Although it was initially developed for solving relativity problems using finite differences, it has an open architecture that allows user extensions and has been applied to many diverse fields. Its parallel environment is focused on distributed computing, and provides support for running on Grid environments. It has shown remarkable scalability to hundreds of processors [96]. OOFELIE OOFELIE [97] is a multiphysics toolkit which supports multiple physics coupled problems such as piezoelasticity, acoustical, fluid-structure, etc. It uses both Boundary and Finite Element methods. It is a commercial product but with a special license for source access by universities. It seems

54

Validation Method (3.5) to be developed through an OOA/D approach and its range of functionalities specially connected to CAE, are similar to the aims of our environment.

3.5 Validation Method To validate the thesis we will perform an extensive number of case studies, covering multiple aspects of the research work, hoping that the separate validation of each aspect yields a satisfactory validation of the thesis as a whole.

Diversity of implemented components One aspect to evaluate is the diversity of components implemented within the presented solution. The more types of numerical algorithms are present, the more flexible is the solution. For example, we will look at how many different space discretisations exist, or how many linear system solvers are interfaced.

Number of interacting components Another aspect to analyse is the number of components that can be matched together by the end-user. This shows the usability of the platform as well as its flexible architecture.

Coverage of Application Domain Covering many features of the domain of application will not only show flexibility, but also usability of the platform. We shall evaluate the diversity of disciplines in which the platform is applicable, like aerodynamics, chemistry, elasticity, etc.

Communication Layer Another case study should cover the independence and transparency of the parallel communication layer, by assessing what is the percentage of code (possibly in number of lines) which is dedicated to parallel communication, and how many of them interact with the implementation of other components.

Efficiency Finally, some case studies should be devoted to the estimation of the efficiency in memory consumption, parallelisation speedup and time of numerical computation.

55

Chapter 3 - Research Problem and Thesis definition

3.6 Contributions of the Thesis Chapters 4, 5 and 6 present several contributions that answer questions raised in this chapter. These contributions span essentially across three directions, which we can view as software engineering techniques that lay in orthogonal axes, see figure 3.1. Each axis is a group of related contributions described in a different chapter.

Vertical Contributions on Component Architecture for Reusable Methods Handle Multi-Physics Decouple Numerics from Physics

Enable Components

Horizontal Contributions

on Framework Design for HPC domain

Manage Communication Represent Domain of Computation

Runtime Extension of Functionality Dynamic Configuration of Components Automatic Runtime Instantiation of Static Components

Cross Domain Contributions on Domain Engineering and C++ Meta-Programming Figure 3.1: Contributions of this dissertation follow three axis: Domain Engineering, Framework Design and Component Based Architectures.

Chapter 4 focuses on contributions that are not specific to the domain of scientific computation. These contributions belong to the discipline of MetaProgramming and Domain Engineering. They promote dynamic extension of functionality (see section 4.1), dynamic configuration of components (section 4.2), and automatic runtime instantiation of components (see section 4.3).

56

Contributions of the Thesis (3.6) Chapter 5 presents contributions that support the construction of an object-oriented framework for scientific computing. The framework provides structure and common functionality for different applications. In section 5.1, we define how specific functionality is plugged into the framework using software components. In section 5.2, we factor out the dependencies between the parallel communication and the numerical methods by defining an underlying communication layer. In section 5.3 we define a mesh data structure and topology that supports multiple numerical discretisations. The contributions in chapter 6 promote the creation of an environment of multiple interacting components. In section 6.1 we focus on the separation of the discretisation algorithms from the equations they discretise. In section 6.2 we present a technique to create a multi-physics simulation by plugging different components together.

57

Chapter 3 - Research Problem and Thesis definition

58

Within C++, there is a much smaller and cleaner language struggling to get out.

Chapter 4

Bjarne Stroustrup (1950-), The design and evolution of C++

Domain engineering contributions This chapter introduces a number of techniques which are not specific to the domain of scientific computation and can be reused across different domains of application. Consider an example of a software-radio system. The end-user selects components from a library of pre-existing components (filters, encryption protocols, etc), and combines them to generate automatically the radio system with the features he desires. By automatic, we mean by satisfying given interfaces, these components match and work together without extra effort from the end-user. The enduser needs only to specify how to fit the components together. Such automatic process needs: • a well defined language to describe the end-user requirements; capable of detailing the qualities of the desired features, • an automatic configuration facility that uses the requirements to correctly select the components that satisfy the features from the library of pre-existing available components, • a process to assemble the components together and produce the final application. An analogous example, within the scientific computing domain, is the simulation of the complex physical phenomena described in section 1.2. They involve multiple PDE’s and discretisation methods. If we consider that each is a predefined component, an automatic process to setup a simulation has the same needs as stated above for the radio systems. In the following sections, three different contributions will address these topics. Section 4.2 introduces a configuration language for the selection and setup of components. These components can be automatically generated by the automatic runtime instantiation technique defined in section 4.3 and

59

Chapter 4 - Domain engineering contributions loaded by the procedure of dynamic extension of functionality presented section 4.1. Note that this chapter does not describe a full Domain Engineering process as defined in [35]. These are contributions to a Domain Engineering process that the author would like to develop for scientific computing but that is not yet completed.

4.1 Runtime extension of functionality We first look at the dynamic extension of functionality within an existing application. This is a fundamental problem in many large software engineering projects. We will extend the known technique of self-registering objects to component environments. This technique is only relevant for the C++ language because it remedies one of its short-comings. Motivation Suppose we have developed a numerical simulation tool based on a general methodology like FEM, and that this tool is shared between multiple research groups in disperse locations. Assume that each group is researching new algorithms for element discretisation and assembly into the FEM linear system and that the number of possible algorithms is not limited. Each research group may add and remove algorithms as it sees fit, following a life cycle of trial and error. Because of their different locations, they cannot rely on close communication to avoid conflicts in the integration of the new developments. Moreover, because the set of algorithms is open ended, we cannot forsee all conflicts and try to solve them beforehand. To overcome this integration effort, we will see that the software must allow the runtime extension of functionality by third-party developments without changing the supporting framework.

4.1.1 Problem definition To define the problem, we assume that each algorithm is encapsulated into an object class. If an algorithm is implemented with multiple classes, it is always possible to wrap them within a single class, therefore reducing the problem again to a single object class. Object creation In C++, objects are created by a static call to the class constructor. By static, we mean that the object class name is explicitly present in the call, thus the class name is compiled in the code. The example

60

Runtime extension of functionality (4.1) in code listing 4.1 illustrates this by calling the constructor of the concrete class GaussLobatto, to build an algorithm of the abstract class Integrator. When we need to build different concrete types, based on runtime user defined options, some logic must be put in place to select the correct concrete type. The problem is that this logic often has a fixed set of possible types to build, thus it not extendable.

2 4

/* file Integrator . hh */ class Integrator { public : virtual Matrix compute_term ( Element & e ) = 0; };

6 8 10 12 14 16 18

/* file GaussLobatto . hh */ class GaussLobatto : public Integrator { public : virtual Matrix compute_term ( Element & e ) { /* implem entatio n */ } }; int main () { ... Integrator * Integrator = new GaussLobatto () ; ... }

Code Listing 4.1: C++ object creation.

Naive solution A naive solution would be to use a logic for object creation explicitly based on switch-case (or equivalent), as presented in code listing 4.2. In line 23 we can see the switch statement, based on some user parameter.

Adding more algorithms The problem becomes clear when we add more algorithms. Suppose we want to include a GaussLobattoOrder3 algorithm. We would need to change the logic in file ReadUserConfig.cxx, by adding another case in the switch statement, to allocate the object with the class GaussLobattoOrder3. We would also need to add another inclusion of the new header defining the algorithm, see code listing 4.3. We would then implement the new algorithm into a new file like GaussLobattoOrder3.cxx, compile it and link it to the existing application.

61

Chapter 4 - Domain engineering contributions

1 3 5

/* file G au s s L o b a t t o O r d e r 1 . hh */ class Ga u ss Lo b a t t o O r d e r 1 : public Integrator { public : virtual Matrix compute_term ( Element & e ) { /* order1 impleme ntation */ } };

7 9 11 13 15 17 19 21 23 25 27

/* file G au s s L o b a t t o O r d e r 2 . hh */ class Ga u ss Lo b a t t o O r d e r 2 : public Integrator { public : virtual Matrix compute_term ( Element & e ) { /* order2 impleme ntation */ } }; /* file ReadUse rConfig . cxx */ # include " G au s s L o b a t t o O r d e r 1 . hh " # include " G au s s L o b a t t o O r d e r 2 . hh " void read_use r _ p a r a m s ( UserParams user ) { // many constructions depending on user parameters ... // construct the Integrator Integrator * intg ; switch ( user . order ) { case ORDER1 : intg = new G a u s s L o b a t t o O r d e r 1 () ; break ; case ORDER2 : intg = new G a u s s L o b a t t o O r d e r 2 () ; break ; } }

29 31 33 35 37 39 41

/* file As sem b l e F e m M a t r i x . cxx */ # include " Integrator . hh " void assemble _m at r ix ( vector < Element >& elems , LSSMatrix & matrix ) { // assemble element contributions Matrix contrib ; vector < Element >:: iterator e = elems . begin () ; for (; e != elems . end ; ++ e ) { contrib = intg - > compute_term (* e ) ; matrix . add ( contrib ) ; } }

Code Listing 4.2: Naive object creation.

62

Runtime extension of functionality (4.1)

2 4 6 8 10

# include " G au s s L o b a t t o O r d e r 1 . hh " # include " G au s s L o b a t t o O r d e r 2 . hh " # include " G au s s L o b a t t o O r d e r 3 . hh " ... switch ( user . order ) { case ORDER1 : Integrator = new G a u s s L o b a t t o O r d e r 1 () ; break ; case ORDER2 : Integrator = new G a u s s L o b a t t o O r d e r 2 () ; break ; case ORDER3 : Integrator = new G a u s s L o b a t t o O r d e r 3 () ; break ; }

Code Listing 4.3: Adding algorithms to switch case.

Disadvantages This type of solution bears some disadvantages. The logic for creation of objects is bound and dependent on the logic of interpreting the user parameters. This logic gets more complex as more options for algorithms are added. The creation of the object must explicitly name the concrete class type because in C++ constructors cannot be virtual functions. In the example, it means that a line with new GaussLobattoOrder3() must be introduced. That piece of code is therefore dependent on the definition of the class. It is also dependent on the logic that interprets the user options, and thus creates a binding between two unrelated parts of the application: the configuration based on user parameters and the definition of the algorithms. This binding is evident from the need to include the header defining the algorithm, in this case GaussLobattoOrder3.hh. In summary, the addition of functionality implies changing and recompiling the existing software components. This means that third-party developers cannot contribute their own algorithms without access to the code with the configuration logic. Clearly, this solution does not scale to many developers and an open-end set of possible algorithms. Objectives We require a facility that allows to implement a new algorithm, that can immediately be compiled and linked to the existing application, without changing the configuration logic of that application. Consequently, the application does not need to be recompiled. This is summarised by the objectives described below: • create polymorphic objects in a simple and automatic way • reduce implementation and compilation dependencies • allow the dynamic addition of new components

63

Chapter 4 - Domain engineering contributions Assumptions To work on the solution of this problem we assume: • the use of the C++ language, because we tackle one of its shortcomings and use one of the language features; • an open-end or large enough number of algorithms to implement. If the number would be small, implementing the logic to cover all possibilities would suffice; • that some of the algorithms may be added by third-party developers, without access to the application configuration logic, possibly due to Intellectual Property Rights (IPR) restrictions.

4.1.2 Decoupling object construction from type definition To solve this problem we shall use self-registering objects. The self-registration technique decouples the logic of object construction from the object definition. We will introduce the technique as first described in [18], and then extend it in the context of component architectures. Object construction We start by recognising that the problem is that C++ does not have virtual constructors. As a static typed language, its objects must be constructed with the static definition of the concrete type. This implies that somewhere in the code, a hardcoded construction of the concrete object type must exist in the form of an instruction of the kind new OBJTYPE(). Moreover, the full type definition must be accessed at the point of construction, therefore binding together the object creation with the class definition. In a component architecture, this binding between definition and object creation implies that a component needs to be recompiled to use classes from another component. To ensure component isolation, a solution must be found for the independent creation of objects. Separating definition from construction Self-registration proposes that each polymorphic type is responsible to register a static object, globally accessible, which knows how to instantiate objects of that type. This static object is then used to create instances of that type upon request from other components, in fact bridging but not binding definition and construction.

64

Runtime extension of functionality (4.1) Self-registering objects To enable self-registering objects, we first create a class that will be responsible for the creation of objects of the type of interest. Following our prototype example, this would be IntegratorProvider which, through the function create returns a new instance of the type Integrator. To avoid repeating code, we make IntegratorProvider a template class, as shown in code listing 4.4.

2 4 6 8 10 12 14

/* file I n t e g r a t o r P r o v i d e r B a s e . hh */ # include " Int eg r at or St o re . hh " class I n t e g r a t o r P r o v i d e r B a s e { public : I n t e g r a t o r P r o v i d e r B a s e ( string name ) { Integrat or S to re :: register ( name , this ) ; } virtual Integrator * create () const = 0; }; template < class T > class In t eg ra t o r P r o v i d e r : public I n t e g r a t o r P r o v i d e r B a s e { Integrator * create () const { return new T () }; };

Code Listing 4.4: Definition of the provider for object creation.

IntegratorProvider derives from the IntegratorProviderBase, which is an interface class. The constructor registers the concrete provider object, named IntegratorProvider, into a storage, IntegratorStore. This storage, defined in code listing 4.5, associates a name to each IntegratorProvider and registers them in a database. 1 3 5

/* file Integ r at or St o re . hh */ class Integra to rS t or e { public : static void register ( string name , I n t e g r a t o r P r o v i d e r B a s e *) ; static I n t e g r a t o r P r o v i d e r B a s e * get_provider ( string name ) ; };

Code Listing 4.5: Definition of the provider storage.

Using self-registration Next, to use self-registration, we need to place a variable of the IntegratorProvider type in the global scope, as in line 6 of code listing 4.6. According to the C++ standard [62], variables in the global scope are created before execution of the main code starts. Therefore, by

65

Chapter 4 - Domain engineering contributions placing the registration in the constructor of the provider, we guarantee that the provider registers itself in the storage before anything else happens.

2

/* file GaussLobatto . hh */ # include " I n t e g r a t o r P r o v i d e r B a s e . hh " class GaussLobatto : public Integrator { ... };

4 6

/* file GaussLobatto . cxx */ IntegratorProvider < GaussLobatto > a G a u s s L o b a t t o P r o v i d e r ( " GaussLobatto " ) ;

Code Listing 4.6: Registering a type in the storage.

Note how the variable aGaussLobattoProvider is created with the name "GaussLobatto". This name will be used for the registration in the database. Later, the end-user uses this name as a parameter (user.integrator) to retrieve the registered provider, as in line 7 of code listing 4.7. The provider is then used to build an object of the concrete GaussLobatto, as in line 9.

Analysis Analysing code listing 4.7, we see that no reference to the concrete type GaussLobatto is present, and that the only included definition is of the provider abstract class, IntegratorProviderBase. This separates the definition of the type from the point of creation. It uses a provider object that functions as polymorphic constructor, by emulating a virtual table dispatch based on the entries registered into the IntegratorStore. From the function create, holding the call to the operator new in code listing 4.4, we note that no cast operation is needed therefore type safety is preserved. As we will see, type safety is especially important across components.

2 4 6

/* file ReadUse rConfig . cxx */ # include " I n t e g r a t o r P r o v i d e r B a s e . hh " void read_use r _ p a r a m s ( UserParams user ) { ... // construct the Integrator I n t e g r a t o r P r o v i d e r B a s e * provider = Integrat or St o re :: get_provider ( user . integrator ) ;

8

Integrator * intg = provider - > create () ; 10

}

Code Listing 4.7: Using the provider to dynamically create objects.

66

Runtime extension of functionality (4.1)

4.1.3 Applying to component architectures In the previous section, we saw that the technique of self-registration decouples the object construction from the type definition. However, to apply this technique within a component architecture, some problems still need to be solved. Components must rely on other components to correctly construct objects of polymorphic type through a common interface. Therefore, the following issues need careful treatment: Generic application The self-registration technique implies the construction of additional classes to deal with the object creation. This process is repetitive and error prone. A single, generic procedure for the creation of objects across components is desirable. Constructor parameters Each type may need different construction parameters. Therefore, the create function must allow the component requesting the object creation to pass an arbitrary number and type of parameters to the component performing the creation. Memory management Self-registration uses a polymorphic create function to allocate objects across components. Likewise, for consistency, a polymorphic function to deallocate the objects must also be used. This is not present in the original technique [18] and is necessary to implement different memory management strategies, as explained in section section 4.1.3.2. 4.1.3.1 Template factory design pattern To handle generically the creation of polymorphic objects, we first recognise that the original self-registration technique uses a variant of the abstract factory design pattern [51]. We will use this pattern, and generalise it with C++ templates. This will maximise code reuse and retain type safety. Provider hierarchy Where before there were two classes IntegratorProviderBase and the template IntegratorProvider, we extend this class hierarchy to introduce the base type (Integrator) also as a template parameter. The class relations are represented in class diagram of figure 4.1. ProviderBase and Provider together make a reusable database that stores the providers. ConcreteProvider and ObjectProvider make a convient, reusable way to create providers.

67

Chapter 4 - Domain engineering contributions

ProviderBase +freeInstance(ptr:void*): void +getProviderName(): string +getProviderType(): string

BASE:class

Provider +freeInstance(): void +getProviderName(): string +getProviderType(): string

BASE:class NARGS:int

ConcreteProvider +create(): SelfRegistPtr

CONCRETE:class BASE:class NARGS:int

ObjectProvider +create(): SelfRegistPtr +freeInstace(ptr:void*)

Figure 4.1: Class diagram for the template provider class hierarchy.

Class ObjectProvider is at the bottom of the hierarchy and provides the creation of the concrete type. It also provides the implementation of the freeInstance function, which polymorphically handles the release of the object memory. As a consequence, this class must be parametrised not only with the BASE polymorphic type but also the CONCRETE type to be constructed. Class ConcreteProvider is the interface that provides the polymorphic constructor create with the correct number and type of parameters. It has the BASE type as template parameter plus the number of arguments in the constructor. This number is used to specialise the template, thus selecting the implementation that has the create function with the exact number of arguments. Class Provider defines the type of the pointers stored in the Factory, where it registers itself. Via its BASE parameter, it identifies the type used

68

Runtime extension of functionality (4.1) in the process of registration. This selects the appropriate factory in which to register. ProviderBase is the most basic provider class, and does not have the abstract BASE type or the CONCRETE type. It serves for the storage of all providers together, for inspection of available functionality by the component environment. Factory Next we need to adapt what in the original technique was the IntegratorStore to hold generic providers of class Provider. We will call this storage a Factory, to reflect the usage of the Abstract Factory design pattern. In figure 4.2 is represented the class diagram of this implementation, where the following classes interact: Class Factory is templatized with the polymorphic BASE type as a parameter, and serves as a registry point for all the providers of that type. Therefore, it holds a database of all the providers for the BASE type. This class is implemented as a Singleton class, accessed by the static function getInstance, to allow global access by any component. Class FactoryBase does not have knowledge of the polymorphic BASE type that it builds. This class together with ProviderBase allows the component environment to inspect all the available factories, and to determine which providers are present in each factory. 4.1.3.2 Memory management Components may use different memory management strategies. For instance, one component may have the ConcreteProvider create function not actually allocate a new object, but use cached memory from a pool of existing instances. Therefore we need to provide a polymorphic function to free the instances, thus allowing the creator to return the instances to the pool. Diverse memory techniques are often used to speed-up operations of memory allocation. On the other hand, a component may need to hold many references to an instance created by another component. Correct reference counting is necessary to avoid that the creator component deletes any instance while it is still being used by other components. To address both the issue of memory management strategies and the multiple references we introduce a smart pointer class. This uses an intrusive

69

Chapter 4 - Domain engineering contributions

FactoryBase

ProviderBase

+getTypeName(): string +getAllProviders(): std::vector

iterates over

+freeInstance(ptr:void*): void +getProviderName(): string +getProviderType(): string

BASE:class BASE:class

Factory +getTypeName(): string +getAllProviders(): std::vector +getProvider(): BaseProvider +regist(prov:Provider): void +getInstance(): Factory&

Provider database

Provider registers into

+freeInstance(): void +getProviderName(): string +getProviderType(): string

registration of Providers

Figure 4.2: Class diagram for the template factory class hierarchy.

technique for reference counting and provides a release function for consistent memory deallocation. To use it, each abstract type is required to derive from the class OwnedObject which provides the interface for reference counting. Next, we impose that all create functions return the created objects wrapped in this pointer class, which we call SelfRegistPtr. This class uses the OwnedObject interface for accounting of references. Whenever a hold on a reference is acquired the counter is incremented. The counter is decremented upon release. When no owners are left, SelfRegistPtr frees the object. But to perform this properly, uses the provider polymorphic freeInstance function. To use it, the class not only wraps the pointer to the object contents, but also includes the pointer to the object Provider<>. The details of the implementation of the SelfRegistPtr class are given in code listing 4.8 and its collaborations with other classes represented in figure 4.3.

70

Runtime extension of functionality (4.1)

2 4

6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42

template < BASEname BASE > class SelfRegistPtr { public : // / Constructors SelfRegistPtr () : m_contents ( CFNULL ) , m_provider ( CFNULL ) {} SelfRegistPtr ( BASE * obj , ProviderBase * prov ) : m_contents ( obj ) , m_provider ( prov ) { if ( m_contents != CFNULL ) { m_contents - > addOwner () ; } } SelfRegistPtr ( const SelfRegistPtr & src ) : m_contents ( src . m_contents ) , m_provider ( src . m_provider ) { if ( m_contents != CFNULL ) { m_contents - > addOwner () ; } } // / Destructor ~ SelfRegistPtr () { release () ; } // / access to the raw pointer BASE * getPtr () const { return m_contents ; } // / dereference operators BASE * operator - > () const { return m_contents ; } BASE & operator * () const { return * m_contents ; } // / Release the ownership of over the pointer contents void release () { if ( m_contents != CFNULL ) { m_contents - > removeOwner () ; if ( m_contents - > hasNoOwner () ) { m_provider - > freeInstance ( m_contents ) ; m_contents = CFNULL ; m_provider = CFNULL ; } } } // / Reset the contents of the pointer to a another pointer void reset ( const SelfRegistPtr < BASE >& source ) { release () ; m_contents = source . m_contents ; m_provider = source . m_provider ; if ( m_contents != CFNULL ) { m_contents - > addOwner () ; } } private : // / pointee object aggregated by this smart pointer BASE * m_contents ; // / provider object that created the m_contents ProviderBase * m_provider ; };

Code Listing 4.8: Definition of the provider for object creation.

71

Chapter 4 - Domain engineering contributions OwnedObject -m_owners: int +addOwner(): void +removeOwner(): void +hasNoOwners(): bool BASE:class

SelfRegistPtr

-m_contents owns

-m_contents: BASE* -m_provider: ProviderBase* +getPtr(): BASE* +release(): void +reset(source:SelfRegistPtr): void

AlgorithmBase +compute(): void

-m_provider

ProviderBase frees

+freeInstance(ptr:void*): void +getProviderName(): string +getProviderType(): string

BASE

BASE:class

Provider +freeInstance(): void +getProviderName(): string +getProviderType(): string

Figure 4.3: Class diagram for the wrapper class for self-registering object pointers.

Analysis By introducing templates in the class Provider this technique becomes generic, in the sense that the construction of any abstract algorithm is possible. Correct use of constructor parameters is ensured by ConcreteProvider class and memory is managed by a SelfRegistPtr wrapper class. Thus, all the issues raised in section 4.1.3 are addressed by the technique.

4.1.4 Mixing dynamic and static polymorphism Template Metaprogramming has many advantages for the implementation of generic algorithms with high degree of reusability and efficiency, as discussed in section 2.3. Static polymorphism In C++, polymorphism via metaprogramming uses templates. This has a significant drawback: the code must specify the template parameters at the point of instantiation of the algorithm. This means that the full type must be known at that point. This is similar to the object

72

Runtime extension of functionality (4.1) construction problem defined in the previous section 4.1.2. The template parameter is hardcoded in the client code, therefore statically binding it to a specific instance of the template algorithm. Take as example another version of the assembly of the element matrix contributions, initially set in code listing 4.2. In code listing 4.9, we have reimplemented it using template meta-programming. The call to the algorithm in line 7 has been bound statically to the template parameter ORDER1. As a consequence this code is hardcoded to use this specific algorithm. 1 3 5 7 9

void assemble _m at r ix ( vector < Elements >& elems , LSSMatrix & matrix ) { Matrix contrib ; vector < Elements >:: iterator e = elems . begin () ; for (; e != elems . end ; ++ e ) { contrib = compute_fem_term < ORDER1 >( e ) ; matrix . add ( contrib ) ; } }

Code Listing 4.9: Template object creation.

Spread of template parameters Hardcoding the algorithm creates a different problem: to be able to vary this algorithm, we need to make the function assemble_matrix itself a template code as seen in code listing 4.10. The template parameter TYPE_ORDER is now a parameter of the function assemble_matrix. The template parameter spreads upstream to the client code, and the code using the function must itself declare which algorithm to use, or provide a configuration logic similar to the one in line 23 in code listing 4.2.

Using self-registration and the template factory design The technique of self-registration with the template factory design can be used to avoid the spreading of template parameters, as shown in code listing 4.11. First we cast the algorithm assemble_matrix into a template class Assembler (line 7), which implements an interface AssemblerBase, declared in line 2. Next, based on user choice of the algorithm, we create a SelfRegistPtr using the Factory (line 26). This pointer is later used to execute the algorithm in line 34.

73

Chapter 4 - Domain engineering contributions

2 4 6 8 10 12 14 16 18 20

/* file As sem b l e F e m M a t r i x . hh */ template < typename TYPE_ORDER > void assemble _m at r ix ( vector < Elements >& elems , LSSMatrix & matrix ) { Matrix contrib ; vector < Elements >:: iterator e = elems . begin () ; for (; e != elems . end ; ++ e ) { contrib = compute_fem_term < TYPE_ORDER >( e ) ; matrix . add ( contrib ) ; } } /* file Solve . cxx */ void solve () { ... switch ( user . order ) { case ORDER1 : assemble_matrix < ORDER1 >( elems , mat ) ; break ; case ORDER2 : assemble_matrix < ORDER2 >( elems , mat ) ; break ; case ORDER3 : assemble_matrix < ORDER3 >( elems , mat ) ; break ; } ... }

Code Listing 4.10: Spreading of template parameter to client code.

Advantages This technique allows the mixing of algorithms using static polymorphism with dynamic polymorphism. No template parameters are spread upstream because the self-registration and the factory design work as a template firewall. This is achieved without sacrificing efficiency of the template implementation, because the added cost is only one virtual call per algorithm execution. Since the algorithm incorporates the loop over the elements, this call cost does not scale with problem size and is small in comparison with the time to execute.

4.1.5 Summary of the technique The technique presented merges the self-registration technique from [18] and an abstract factory design from [51], employing template metaprogramming. This technique provides a solution to the constructor problem, by creating polymorphic constructors that separate the logic of object creation from the object definition. This is applicable to component architectures, where one component needs to allocate polymorphic objects from another component. It also serves as a template firewall to avoid spreading upstream the template parameters from algorithms to the client code. The self-registration technique was also used by J. Dijk in the Plasimo [130] simulation code, where it was formulated with an abstract factory pattern. It differs from our technique in that it is not applied in a component

74

Runtime extension of functionality (4.1)

1 3 5 7 9 11 13 15 17

/* file As sem b l e F e m M a t r i x . hh */ class AssemblerBase { virtual void assemble_mat ri x ( vector < Elements >& , LSSMatrix &) = 0; } template < typename TYPE_ORDER > class Assembler : public AssemblerBase { // / assembles the FEM matrix virtual void assemble_mat ri x ( vector < Elements >& elems , LSSMatrix & matrix ) { Matrix contrib ; vector < Elements >:: iterator e = elems . begin () ; for (; e != elems . end ; ++ e ) { contrib = compute_fem_term < TYPE_ORDER >( e ) ; matrix . add ( contrib ) ; } } };

19 21 23 25 27

/* file ReadUse rConfig . cxx */ # include " I n t e g r a t o r P r o v i d e r B a s e . hh " void read_use r _ p a r a m s ( UserParams user ) { ... // construct the Assembler Provider < AssemblerBase >* p = Factory < AssemblerBase >:: getInstance () . getProvider ( user . assembly ) ; SelfRegistPtr < AssemblerBase > assmb = p - > create ( param1 ) ; }

29 31 33 35

/* file Solve . cxx */ void solve () { ... // one virtual call , no template spread assmb - > asse mb le _m a tr ix ( elems , mat ) ; ... }

Code Listing 4.11: Using the provider to dynamically create objects.

75

Chapter 4 - Domain engineering contributions environment, therefore missing all issues named in section 4.1.3 . It is also more verbose because it does not use templates to reduce code duplication. Some other authors explored the use of self-registration, namely Bellingham [15] and Norton [95]. The solution by Norton is specific to the Linux OS and not generic whereas Bellingham’s solution does not use the static global objects, thus having to provide workarounds for the registration. An alternative to using self-registration in component environments is for each component to provide an initialization and a closure function. These functions can then register in a centralized place a callback function for each object type to be created. The advantage of using self-registration with template factory design is that it preserves type-safety while the callbacks, to be generic might need to cast the pointers from void*.

76

Dynamic Configuration of Components (4.2)

4.2 Dynamic Configuration of Components 4.2.1 Motivation Most computer systems need user input of some sort to perform their tasks. This input takes many forms: commands, database entries, data files, graphical user interfaces (GUI), etc. Assembling components However, component oriented systems need another level of user input. A component system is by definition incomplete. It is a mere collection of components waiting to be assembled into a usable system. Therefore, these systems need the information on how to assemble them. We can consider this information as a configuration input. Configuration We define configuration as the act of providing enough information to a system of components to assemble them into a usable system. Consider the purchase of a car: before buying, the client chooses all sort of options of the final car: engine power, automatic or manual gearbox, paint colour, with or without car stereo, alloy wheels, GPS navigation, etc. This process is a configuration process, which results in the car being assembled to the client specifications. Component systems must go through a similar process, where the user assembles the system to his specifications by making choices on the available options. This section describes how to build a configuration facility to handle the description and processing of these specifications on the component system.

4.2.2 Problem definition Specifying the configuration of a component system is a challenging task. Depending on the system complexity, the configuration facility must handle more or less complex configuration cases. To better understand them, we will follow the configuration process of a numerical component and identify the problematic configuration cases. Along the way, we will gradually introduce a rudimentary pseudo-language for component configuration. Hopefully this will help visualise the descriptions. Based on this language, we will develop an actual language for configuration in section 4.2.6. Configuration Example In the domain of scientific computing, the Newton method, defined in equation (4.1), is used for solution of systems of

77

Chapter 4 - Domain engineering contributions non-linear equations. This method requires the evaluation of derivatives of the residual of the target functional, r(u), to form the Jacobian of the system, J. Assume that this method is implemented in a component based environment. ∂r(u) ∂u = −r(uk )

J≡ J uk · δuk+1

uk+1 = uk + δuk+1

(4.1)

Valued option The first configuration case we consider will be called valued option. A valued option is a basic configuration involving an assignment of a value to an option of a certain type. In our example, k is the Newton iteration number. Assume that there is a variable with the maximum number of iterations. This is a user option of the component NewtonMethod. When this number is reached the method returns the last computed solution. This configuration case needs special care to satisfy the type-safety on the conversion of the value from the user input. Typically the user gives a string that needs to be cast to the correct type that the component understands. In this case the type is a positive integer. Note that this applies also to conversion of user parameters to any other type of variables. For example, the input could be a string describing an equation. To represent the configuration cases we create a rudimentary pseudolanguage. In this language the component NewtonMethod is declared to have a option MaxIter of type int which the user assigns to “10”; NewtonMethod(int:MaxIter) := 10; Component composition Component composition is the description of how components are composed of other components. Composition delegates part of his task to another component. At least two component instances are involved. This is usually achieved via a dynamic binding between components. We assume that in the example the way to compute the Jacobian J is a configuration option. It may use one component, we call it AnalyticJ, to perform analytically the computation by automatic differentiation. Or it may use another component, NumericJ, that computes the Jacobian via numerical differentiation:

78

Dynamic Configuration of Components (4.2)

∂r(u) r(u +  e) − r(u) ≈ (4.2) ∂u e where e is a the basis unit vector of the solution space. We express composition in the same manner as valued options, but instead of a value type (int) the option declares a interface type. The interface type in this case is Jacobian and the associated option name is Compute. Je ≡

NewtonMethod(Jacobian:Compute) := AnalyticJ; Component parametrisation Component parametrisation is the description of how a parameter is used in a component. A parameter may be another component or a valued option. Parametrisation differs from composition, in that the parameter is used to rebuild the component. It is usually achieved with static binding of the component to the parameters. The outcome of a parametrisation is a new component customised for a more specialised task. Consider the functional r(u). Because it will be evaluated many times over, we introduce it as a parameter of NewtonMethod. NewtonMethod will be rebuilt and hardwired to r(u). For example, we introduce a system of non-linear equations     x1 2x21 + 5x2 − 10 u= , r(u) = (4.3) x2 x1 + 3x22 − 3 This expression will be parsed and compiled into object code and introduced into the NewtonMethod, providing efficient calls for r(u). We name the component describing these equations SysA. In our pseudolanguage we express a parametrisation the same way as a composition but with square brackets. This nomenclature indicates that the option is statically bound to the component and cannot be changed after the configuration. NewtonMethod[ResModel:Function] := SysA; SysA(string:Equation) := 2x_1^2+5x_2-10 x_1+3x_2^2-3; Dynamic option A dynamic option is an option that may be reconfigured multiple times over the lifetime of the component. Usually these options are connected with interactive inputs from the user, who monitors the system and adjusts it to meet certain requirements. Not all options may be dynamically configured. Parametrisations are fixed after the initial configuration. Valued options are easier to reconfigure dynamically, because they require a simple reassignment. Dynamic composition configuration is complex as components may be switched in real time

79

Chapter 4 - Domain engineering contributions by other components. An option which has side effects outside its component cannot be changed dynamically without also reconfiguring the affected components. Within the example of the NewtonMethod, there is another way to compute the Jacobian. An approximate Jacobian, r(u) ≈

X

Ci (b u)u

J ≈ Ci (b u)

(4.4)

i

may be used for equations difficult to converge to a solution. We call this component ApproxJ. Usually this Jacobian is used in the first iterations for robustness and then switched to a NumericJ when near the solution, because the latter is faster to converge once the initial phase is passed. Dynamic options should be declared explicitly such that the configuration facility informs the user what options may be changed dynamically over the life-time of the component. In our pseudo-language, we indicate them as normal options prepended by an asterisk *. NewtonMethod(Jacobian*:Compute) := AnalyticJ; Encapsulated option An option is encapsulated if it is only relevant to a certain component implementation. Encapsulation of options means that the component must not expose its logic of configuration. The configuration logic is what the component does with the option value. Because each option may have a unique form of configuration, exposing the logic outside of the component would force the client to adapt to that specific option. Imposing this adaptation reduces the potential for reuse of the component. In our example we see that equation (4.2) involves a value  which is only present in the NumericJ implementation. This option should not be exposed to the component NewtonMethod, and should be completly handled inside NumericJ. In the configuration pseudo-language below, you see that the Epsilon only appears in the NumericJ expression: NewtonMethod(Jacobian:Compute) := NumericJ; NumericJ(real:Epsilon) := 10E-6; Available options Available options are the set of options possible to configure for one given component. Although encapsulated, options still need to be advertised outside of the component. For example, to construct a user interface, the environment must list all the options of each component.

80

Dynamic Configuration of Components (4.2) Passing this information to the user provides him with full control over the component features. We declare all options in a component by writing: NewtonMethod(Jacobian*:Compute,int:MaxIter); Moreover, we must include the possible choices for each option, in case there are limitations. For instance, assume that NewtonMethod may only accept the components AnalyticJ and NumericJ. We would write the component declaration as: NewtonMethod(Jacobian*:Compute <- AnalyticJ | NumericJ); Note that this expression does not use an assignment operator := but a restriction operator <- within the component declaration. Invalid options Options may have restrictions on their values. For example, a number may only exist inside a certain interval. Moreover, an option when set may deactivate other options. The configuration must specify which restrictions apply to each option and signal the user if some are not respected. In our pseudo-language, we expressed this using the restriction operator in the component declaration: NewtonMethod(int:MaxIter <- MaxIter > 0); Choosing an option may deactivate other options, if they are incompatible. In the NewtonMehod example, another stop condition for the algorithm can be a measure of the residual, say with an L2 norm. In that case, we invalidate the MaxIter option if the L2 is chosen and viceversa. We express this by using the negation operator ! as: NewtonMethod(int:MaxIter <- MaxIter > 0 & !L2 ); NewtonMethod(real:L2 <- !MaxIter ); Analysis The configuration facility must handle the above cases. Moreover it must be simple for the developers to use, and easy for the end-user to describe a full system configuration. To satisfy these requirements we present a design of a configuration facility that is based on four concepts, described in the following sections: • the Self-Configuration of objects (section 4.2.3) provides encapsulation, and option assignment with type safety. It also provides option validation and restriction.

81

Chapter 4 - Domain engineering contributions • the Configuration Hierarchy (section 4.2.4) complements the self-configuration and provides the component composition. • the Centralised Option Registration (section 4.2.5) provides option availability and dynamic configuration. • the Configuration Language (section 4.2.6) describes the valued options, the composition and the parametrisation of components.

4.2.3 Self-configuration of objects Need of encapsulation Encapsulation avoids exposure of configuration details to the client. For example, in code listing 4.12, we see that the configuration of ComponentA is not encapsulated because it exposes the use of function allocateSorting and that the parameter is a string, ’BubbleSort’. If another component would take the place of ComponentA and needed different parameters, the allocateSorting function would not be sufficient. We would have to change the interface of IComponent or both components would not be interchangeable.

2 4

// client code IComponent comp = new ComponentA () ; comp - > allocate So rt i ng ( " BubbleSort " ) ; Sorting psort = comp - > getSorting () ;

Code Listing 4.12: Non-encapsulated configuration.

Self-configuration To ensure encapsulation, the configuration interface of the components needs to be minimal. The client code should only pass the parameters and rely on the component to do the rest. This concept we call self-configuration. An object is said to be self-configurable if it accepts from the client code an arbitrary set of parameters and is responsible for his own configuration logic based on those parameters. In practice, the self-configuration concept states that the best object to configure a certain functionality according to end-user choices is the object where that functionality resides. This means that the object must be able to configure itself.

82

Dynamic Configuration of Components (4.2) Configuration parameters For an object to be able to configure itself, all information about end-user options must be passed to the object. This information is the set of configuration parameters. The object then chooses from the parameters which ones are relevant to its configuration. All end-user choices must be passed because, due to encapsulation, the client has no information on the relevance of the options. We opted to pass the configuration parameters in a sorted map for faster search. Each parameter is composed of a pair of key-value, in the form of strings, as defined in code listing 4.13. It is convenient to store all options as strings, because it is possible to convert a string into whatever user-defined type, based on some mapping function; and it is in a human readable form, which makes configuration files or scripts more readable.

2 4 6

// / definition for option key typedef std :: string OPTION_KEY ; // / definition for option value typedef std :: string OPTION_VALUE ; // / define map of configuration parameters typedef std :: map < OPTION_KEY , OPTION_VALUE > ConfigParam ;

Code Listing 4.13: Type definition of the configuration parameters.

The OPTION_KEY is the name of the option. It will be used to identify to which option the OPTION_VALUE should be assigned. Configuration Interface To interpret these parameters we create a configuration interface called ConfigObject. It has a function configure which accepts the ConfigParam parameters and uses them to configure the object. From this class will derive all the components that need configuration. The configure function call is generic because any parameters may be passed within the ConfigParam function argument. In the previous example, this allows to substitute ComponentA for ComponentB even if the latter has parameters other than the sorting algorithm string. In figure 4.4, the class diagram for the ConfigObject is presented. This class holds an object OptionList, which is a group of all the options present in the component. Each option is represented by an instance of the class Option. The base class ConfigObject handles the interpretation of the ConfigParam and the assignment of each value to the appropriate Option in the OptionList. The class OptionList is responsible for processing the passed parameters and assigning the values to the respective options. This functionality is provided by the function processArgs.

83

Chapter 4 - Domain engineering contributions When an option is added to the ConfigObject this request is dispatched to the OptionList which allocates a new Option object. The constructor takes a label as parameter, which is attached to the new option and used henceforward as the OPTION_KEY that identifies the option. Each Option is given a reference to the variable to change with the user input. Therefore, when the option is modified, the value is assigned directly to the variable. ConfigObject +configure(args:ConfigParam): void +configureNested(obj:ConfigObject&): void +addConfigOptionsTo(ptr:const CLASS*): void +setParameter(name:string,value:TYPE*): void

holds

OptionList IComponent +configure(args:ConfigParam): void +defineConfigOptions(lst:OptionList): void

+processArgs(args:ConfigParam): void +addConfigOption(name:string,descp:string): void +linkToParam(opt:Option*,value:TYPE*): void 1

*

Option ComponentA +configure(args:ConfigParam): void +defineConfigOptions(lst:OptionList): void

+setValue(val:string): void +linkToValue(val:void*,typestr:string): void +getLabel(): string +getValueAsString(): string

Figure 4.4: Class diagram for ConfigObject.

Using ConfigObject Any component that needs to be configured should derive from the class ConfigObject, and implement the configure function if it needs a more specific configuration. This implementation should call first the configure of the parent class and then extend the functionality with more configuration logic. In code listing 4.14 we see how ComponentA first calls the configuration of IComponent because it derives from it. IComponent calls configure on ConfigObject, where the user parameters get assigned to the options. Once the parent classes are configured, ComponentA proceeds to use the user defined name to allocate the sorting algorithm. In the class ConfigObject, the template function addConfigOptionsTo is called to add the options to the OptionList. It uses a static function defineConfigOptions that defines the options. This function must be defined by each level of the inheritance hierarchy to add the options of that class to the OptionList. This is shown in code listing 4.15. Defining these

84

Dynamic Configuration of Components (4.2)

2 4 6 8 10 12 14 16 18 20

class IComponent { virtual void configure ( ConfigParams args ) { // call parent class configure // m_size gets assigned here from the args ConfigObject :: configure ( args ) ; // use the configured integer to resize a vector vec . resize ( m_size ) ; } }; class ComponentA : public IComponent { virtual void configure ( ConfigParams args ) { // first call parent class configure IComponent :: configure ( args ) ; // use string to allocate the sorting algorithm allocateSo rt i ng ( m_sort_name ) ; Sorting psort = getSorting () ; ... } };

Code Listing 4.14: Type definition of the configuration parameters.

options in a static function allows to retrieve the available options of a class without having to instantiate it. 1 3 5 7 9

11

class IComponent { static void d e f i n e C o n f i g O p t i o n s ( OptionList & optlst ) { optlst . addConfigOption < int >( " VecSize " ," Size of the sorting vector . " ) ; } }; class ComponentA : public IComponent { static void d e f i n e C o n f i g O p t i o n s ( OptionList & optlst ) { optlst . addConfigOption < string >( " SortName " ," Name of sorting algorithm . " ) ; } };

Code Listing 4.15: Defining options.

Option Interface The class Option is a generic interface to any type of options. We assume that any option value may be converted to and from a string. This interface provides a setValue function to change the value associated to the option. The value is not stored in the Option. The constructor of the class ComponentA places a variable reference in the Option via the

85

Chapter 4 - Domain engineering contributions function linkToValue as shown in code listing 4.16. Later the Option uses the reference to change the value.

2 4 6 8

IComponent :: IComponent () // constructor { // defines options in this object a dd Co n fi g Op t i o n s T o ( this ) ; // vector default size m_size = 0; // link the vector size option with the variable setParameter ( " VecSize " ,& m_size ) ; }

Code Listing 4.16: Linking a variable to a value.

As shown in figure 4.5, Option is just an interface to more refined implementations. The function setValue is then implemented by the child classes to convert the user defined string to the value. In the case of class OptionComponent the string is used to create a new component, using the self-registration technique of section 4.1. The ConfigObject class uses the getLabel function in Option to match the entries in the ConfigParam map to the correct option. Option +setValue(val:string): void +linkToValue(val:void*,typestr:string): void +getLabel(): string +getValueAsString(): string +onProcess(): void +onUpdate(): void +onComplete(): void

COMPONENT:TYPE

VALUE:TYPE

OptionComponent

OptionT +setValue(val:string): void +linkToValue(val:void*,typestr:string): void +getLabel(): string +getValueAsString(): string

+setValue(val:string): void +linkToValue(val:void*,typestr:string): void +getLabel(): string +getValueAsString(): string +onProcess(): void +onUpdate(): void +onComplete(): void

Figure 4.5: Class diagram for Option.

86

Dynamic Configuration of Components (4.2) Configuration sources many:

The sources of configuration parameters may be

User Interface - A user interface may be as simple as options passed in the command line, or as complex as a graphical interface, providing visualisation and editing of the system. Configuration Files - A more experienced user requires finer control over the component behaviour. When advanced options are not present in the user interface, finer control is achieved by parameters given in configuration files. Independence from sources To keep the configuration independent of the source of the information, the class ConfigObject provides an interface function to handle each case. Each function processes the information in its own way but it outputs the parameters in the format defined in code listing 4.13. Therefore OptionList ignores the origin of the options and the configuration logic remains encapsulated. Option validation When the values are finally attributed to the options, a conversion from a string to the value type is necessary. It is possible that the converted value does not satisfy some validity condition. For example, the user may input a value for temperature which on the Kelvin scale cannot be negative. We allow OptionValidation objects to be attached to each options. When all the options of an object are configured, each option is individually validated by its associated validations. In figure 4.6 the class diagram for the example of a positive number validate is given. OptionValidation +isValid(): bool +getCondition(): string +getReason(): string

BiggerThanZero +isValid(): bool +getCondition(): string +getReason(): string

Figure 4.6: Class diagram for OptionValidation.

87

Chapter 4 - Domain engineering contributions The interface class OptionValidation provides three functions: isValid performs the validation, getCondition gives a description of the condition to be valid and getReason gives the reason why it was not successfull and can be used as error message for the user.

4.2.4 Configuration hierarchy How do we differentiate which option goes to which object? In multi-physics simulation, suppose you have 2 objects of the type StopCondition with option MaxIter, one for the iterative process of the fluid and another for the solid. If user provides values 100 and 2000 to each, what value is assigned to which object? Because there are two Stopper assignements, writing the following would be ambiguous: NewtonMethod(StopCondition:Stopper) := MaximumIter; Stopper(int:MaxIter) := 100; ForwardEuler(StopCondition:Stopper) := MaximumIter; Stopper(int:MaxIter) := 2000; Hierarchic Configuration To differentiate which user choice is assigned to which object, we introduce the concept of hierarchic configuration. Hierarchic configuration states that the user parameters are given to the top object in the hierarchy. This object configures itself and passes its unused parameters to the objects it holds, and so on to the bottom of the hierarchy. Thus, options are propagated from parent object to nested object. We will use a ’.’ to express the nesting of objects by other objects. If objects belong to the context of another object, and that context is unique, then the assignment is not ambiguous: NewtonMethod(StopCondition:Stopper) := MaximumIter; NewtonMethod.Stopper(int:MaxIter) := 100; ForwardEuler(StopCondition:Stopper) := MaximumIter; ForwardEuler.Stopper(int:MaxIter) := 2000; Because only ConfigObject’s can accept parameters via the configure function, if a deeply nested object needs to be configured, it requires that all its parent objects become also ConfigObject’s. This hierarchic chain of objects configuring their nested objects is depicted in figure 4.7. This is a

88

Dynamic Configuration of Components (4.2) drawback of this hierarchic structure that will be overcome in the following section.

configuration parameters NewtonMethod(StopCondition:Stopper) := MaximumIter; NewtonMethod.Stopper(int:MaxIter) := 100; ForwardEuler(StopCondition:Stopper) := MaximumIter; ForwardEuler.Stopper(int:MaxIter) := 2000;

End-User object

NewtonMethod StopCondition:Stopper

object

MaximumIter int:MaxIter = 100

object

ForwardEuler StopCondition:Stopper

object

MaximumIter int:MaxIter = 2000

Figure 4.7: Objects configure their nested objects in a hierarchic chain.

The mechanism of hierarchic configuration is shown in code listing 4.17. The class ComponentA configures a nested Sorting object by passing the ConfigParams that it received. 1 3 5 7 9 11

class ComponentA : public IComponent { virtual void configure ( ConfigParams args ) { // first call parent class configure IComponent :: configure ( args ) ; ... Sorting psort = getSorting () ; // configure the nested Sorting object psort - > configure ( args ) ; } };

Code Listing 4.17: Hierarchic configuration.

89

Chapter 4 - Domain engineering contributions

4.2.5 Centralised option registration The hierarchic configuration does not have a global knowledge of all the available options. To inspect the existence of an option, it would need to transverse the hierarchy tree searching for that option. Central Configuration Facility An extension to this technique is based on an additional registration step into a centralised configuration facility. As a consequence, the kernel of the application becomes aware of the possible options available to the end-user. This opens the possibility to many potential new features that would require this centralised knowledge. • check existence of options and objects, usefull when parsing blocks of options that are input by a script or configuration language • dynamically define options for access by dynamic GUI • dynamically generate documentation With the centralised registration, we also lift the requirement that all objects in the configuration hierarchy derive from ConfigObject. By registering in a centralised facility, the objects can be configured via that facility and no longer by their owner objects. How it works As already seen, each ConfigObject calls in their constructor addConfigOptionsTo(this). Listing 4.18 presents the code of this function. The first step is to add all options of the class to the OptionList, using the static function defineConfigOptions. The second step is to register the options in the facility. 1 3 5

template < typename CLASS > void a dd C on f i g O p t i o n s T o ( const CLASS * ptr ) { CLASS :: d e f i n e C o n f i g O p t i o n s ( m_options ) ; ConfigFacil ity :: getInstance () . r eg is t Co nf ig O bj ( this ) ; }

Code Listing 4.18: Linking a variable to a value.

The ConfigFacility is a Singleton object [51], which keeps a database of all registered ConfigObject’s. The function configureDynamicOptions configures options on-the-fly during the application run, based on interactive inputs from the user.

90

Dynamic Configuration of Components (4.2) regist

ConfigFacility

ConfigObject

+getInstance(): ConfigFacility& +registConfigObj(cobj:ConfigObject*): void +unregistConfigObj(cobj:ConfigObject*): void +configureDynamicOptions(args:ConfigParam): void

+addConfigOptionsTo(): void

ConfigObject Registry

Figure 4.8: Class diagram of the Configuration Facility.

Dynamic Options Dynamic options are options that can change interactively as the application is running. The developer may declare an option to be dynamic in the static function defineConfigOptions, by adding a trait class when registering the option. In code listing 4.19, we parametrise the option Value with a DynamicOption<> trait class. This marks the option as able to change during the lifetime of the object CFL.

2 4

void CFL :: d e f i n e C o n f i g O p t i o n s ( Config :: OptionList & options ) { options . addConfigOption < double , DynamicOption < > > ( " Value " ," CFL number ( it can be changed interactively ) . " ) ; }

Code Listing 4.19: Making a option dynamic.

Option Traits To attach validation classes to the options or mark them as dynamic, we use trait classes. This is a common [4, 132] meta-programming technique used to change behaviour of classes. With trait class we add characteristics to the Option’s when we register them in the OptionList. Each trait marks one characteristic of the option. We build a chain of markers, each doing his mark and calling the next trait. The chain finishes with a closure trait to stop the recursion. The code listing 4.20 shows the template metaprogram that marks the CFL Value option as a dynamic option with validation of values bigger than zero. Three trait classes are present: EndMarkOption is the closure

91

Chapter 4 - Domain engineering contributions trait, ValidateOption is a trait that adds a validation to the option (any number may be added) and DynamicOption makes the option dynamic. Unfortunately, this technique cannot represent the case where an option deactivates another option. The developer has to write the explicit logic involving the two (or more) options. When one of them is assigned, the developer deactivates the others linked options inside the hook action onProcess. 1

struct EndMarkOption { static void mark ( Option * opt ) {} };

3 5 7 9 11 13 15 17 19 21 23

template < typename ValidationType , typename NEXT = EndMarkOption > struct ValidateOpti on { static void mark ( Option * opt ) { opt - > addValidation ( new Va lidatio nType ( opt ) ) ; NEXT :: mark ( opt ) ; } }; template < typename NEXT = EndMarkOption > struct DynamicOption { static void mark ( Option * opt ) { opt - > ma ke O p t i o n D y n a m i c () ; NEXT :: mark ( opt ) ; } }; // / usage : options . addConfigOption < int , DynamicOption < ValidateOpti on < BiggerThanZero > > > ( " Value " , " CFL number ( it can be changed interactively ) . " ) ;

Code Listing 4.20: Trait classes used to characterise Options.

Option availability The client can use the ConfigFacility to inspect which objects are configurable, and what options are available. Since the objects inform the facility when they are constructed, and unregister when destructed, the facility is always up-to-date about which options are available.

4.2.6 Language for Component Configuration To describe how components match together in each setup, the user must pass this information to the system. To express the information without ambiguities, a language with appropriate rules must be specified. Because

92

Dynamic Configuration of Components (4.2) this language is dedicated to configuring components of a certain domain, like CFD, it can be considered a Domain Specific Language. 4.2.6.1 CFcase configuration language We developed a human readable configuration language called CFcase. This language describes directly the hierarchic nature of the ConfigObject structure. It is defined as a set of assignment operations. Each option has a key, and to this key, a value is assigned. The language parser then creates the set of pairs key-value and creates the ConfigParams, as already defined in code listing 4.13, that are passed along the hierarchy of objects. To avoid name clashes, the hierarchy is directly mapped in each language assignment. Each key is preceded with the full identity of the owner object (see code listing 4.21).

2 4 6 8

# this is a comment SimulationA . PhysicalModel = NavierStokes SimulationA . NavierStokes . Pressure = 10000 SimulationA . NavierStokes . Temperature = 288.15 # the option method of SimulationA is FiniteVolume SimulationA . Method = FiniteVolume # the option scheme of FiniteVolume is RoeFlux SimulationA . FiniteVolume . Scheme = RoeFlux

Code Listing 4.21: Example of CFcase language.

Incomplete language The CFcase language is an incomplete configuration language because it is based only on assignment operations. The pseudolanguage we used along in section 4.2.2 would be a complete configuration language, but more complicated to implement. A complete configuration language is able to directly express all the configuration cases we listed before. These configuration cases are similar to the feature types described in the feature diagrams of Czarnecki and Eisenecker in [35]. For example, the CFcase language cannot express options that disable other options. If an option is contradictory to another, the language parser is not able to detect the contradiction. In such case the developer has to program the code to safe-guard that condition, as described in section 4.2.5. The same happens with invalid values for the option: the user cannot specify that a certain parameter is only valid within a certain interval.

93

Chapter 4 - Domain engineering contributions Non strict language Moreover, the language does not enforce the existence of all the options mentioned in the text. The language parser only performs checks to duplicate assignments. Assignments to options that do not exist are ignored. Therefore, writing directly a CFcase file is error-prone. Users of the language rely on multiple indirect methods to assert the validity of the text. Such methods include using the dump of the information generated by the parser or adapting preexisting files describing simulations of a similar type. Despite these limitations, we consider the CFcase language a valuable tool due to its simple form consisting only of assignments. This language is used daily within the COOLFluiD project [78]. 4.2.6.2 Application of XML Alternative language Considering the hierarchic structure of this configuration technique, a possible alternative to the CFcase language is the use of Extendable Markup Language (XML). XML is a standard that has gained much acclaim in recent years. For example, in Service Oriented Architecture (SOA) applications, which is also based on components, XML became a language for the messages exchanged between interacting services. Characteristics XML is a human readable format, but is more structured than the CFcase language. An advantage is that XML allows for multiple methods of parsing the text. The first method, called Document Object Model (DOM), reads the full structure as a whole. The second method, called Simple API for XML (SAX), reads the text as a stream of data as it reaches the destination, and is used for messages transmitted in real-time over computer networks. Nevertheless, XML is a pure syntax language. The semantics (meaning) has to be interpreted by some programmed logic. We use it also as an assignment language, like CFcase. In code listing 4.22 we have the equivalent in XML of code listing 4.21. Current application of XML We have extended the configuration technique to also support XML case files (XCFcase) using DOM parsing. Within COOLFluiD, we use them to configure dynamically the applications while running. However, since XCFcase is less readable that CFcase, the users of the CFcase language, choose to keep using it for human defined configuration files.

94

Dynamic Configuration of Components (4.2)

1 3 5 7

< XCFcase > < SimulationA PhysicalModel = " NavierStokes " Method = " FiniteVolume " / > < NavierStokes Pressure = " 10000 " Temperature = " 288.15 " / > < FiniteVolume Scheme = " RoeFlux " / >

Code Listing 4.22: Example of XCFcase language.

Future application of XML A future use of the XML configuration is to transfer dynamic configuration messages. As an example, suppose that a component based application is deployed in a server. An end-user sits in front of a Graphical User Interface (GUI) client which connects to the remote server. The dynamic configuration messages sent by the client to the server should be described in XML rather than CFcase, because the language is suitable to be parsed by parts, using the SAX method. XML will also be used for dynamic generation of documentation of component options. Consider again the client-server configuration mentioned. Assuming the user might load user-defined components, the fact that the client is separated from the server makes it complicated to inform the GUI client about the new options of the new components. A description of the available options, their format and their documentation should be done with XML, and not CFcase, because the information is descriptive and not imperative (composed of assignments). Moreover the structure must be received and parsed as a whole, which suits the DOM parsing method.

Future interactive dynamic client-server Both previous situations could be applied together to form an interactive dynamic client-server architecture as depicted in figure 4.9. Exemplified also in figure 4.9 is a scenario of a scientific simulation, where a design engineer works within a CAD/CAE software tool. The GUI of this tool could be generated automatically based on the description of the components that the user selects. As an example, the selection of the heat transfer simulation model would trigger the GUI to show the options related to heat conduction coefficients and boundary conditions related to heat transfer. The options would be sent to the server that would initiate the simulation in a high-performance machine, e.g., a computation cluster. When the solution is ready, the server sends the relevant information to the GUI client for visualisation.

95

Chapter 4 - Domain engineering contributions

XML SAX configuration messages

End-User

GUI Client

Network

Server

Cluster

XML DOM option documentation

Example Design Engineer

Loading Components, Simulation options

CAD/CAE Client Program

Options Description, Simulation Results

Simulation Server

High-Performance Computation

Figure 4.9: Client-server architecture using dynamic generation of options in GUI and dynamic component configuration.

4.2.7 Summary of the technique The technique presented in this section describes how to configure objects in a component environment, according to user runtime decisions. Some configurations are simple assignements of values, while other are more complicated static parametrizations of components. A centralized configuration facility is developed to handle dynamic user configurations. This allows an engineer to steer interactively a simulation, by changing options without need to restart it. A simple configuration language was developed, expressed in text assignements and XML structure. This language is used by advanced users to setup their simulations. On component encapsulation The main inovation of the technique is the self-configuration concept. It ensures that in a component architecture the kernel is not exposed to details of specific components. This maximises component reuse. Encapsulation of configuration options is especially important if it is not possible to forsee all the options a component may have. Exposing them would clutter the kernel with unnecessary component details. On usability We did not implement the full language that was described in section 4.2.2. The reason is we want to provide a configuration system that is simple to use. This is the argument to make CFcase (and XCFcase) a simple assignment language. A full language, with syntax and complete configuration semantics, would be difficult to parse and even more difficult for non-experts to read.

96

Automatic runtime instantiation of static components (4.3)

4.3 Automatic runtime instantiation of static components Multidisciplinary simulations Multidisciplinary scientific simulations have recently seen wide acceptance by industry as tools integrated into the design cycle of products. The multidisciplinary character of these tools requires the interaction of many different types of discretisation methods for partial differential equations (PDE). Not only the numerical methods may be different in type but multiple PDE’s may be involved in a given simulation. Flexibility and runtime extension This multitude of numerical methods and physical models requires great flexibility of the simulation environment. Moreover, to reduce investments in development, numerical algorithms should be re-used as often as possible. Given the diverse nature of the simulations, it is desirable that the end-user be able to extend himself at runtime the capabilities of the simulation environment. These extensions allow tailored simulations for very specific applications. For example, in high enthalpy flow simulations such as when an aerospace vehicle re-enters the earth atmosphere, the dissociation of the gas into ions can be modelled with many numbers of ion species. This modelling involves the discretisation of many chemical reactions. The end-user may want to model air with 5, 11 or 21 species, or simply add a newly developed model. Defining beforehand all the numerical schemes that discretise such equations is error-prone and does not provide the flexibility to extend the available functionality. High performance Such concerns of flexibility and capacity to extend functionality, have to be balanced with the necessity for high performance of the final application tool. These simulations involve many hours of computation in large clusters of distributed memory computers. Therefore, flexibility and the runtime extension of functionality can not be included at the expense of efficiency. Frameworks and components To build such flexible highly performant simulation environments, techniques using frameworks and component architectures are of great interest. Frameworks are employed to promote the re-use of design solutions. For example, there are frameworks for parallel simulation of PDE’s in distributed

97

Chapter 4 - Domain engineering contributions memory computers and others for deployment in grid environments. Component architectures are, on the other hand, employed to promote re-use of implementations of specific algorithms. Components are building blocks that can be re-used to build user tailored applications. For example, in a PDE simulation environment a component can describe the set of PDE’s to solve, say the Navier-Stokes equations, while another can describe PDE integration algorithms, say with Finite Element method (FEM). The first component can be used to configure the second and produce a tailored application, where the FEM is used to solve the Navier-Stokes equations. When we construct another PDE component, say the magnetohydrodynamics (MHD) equations, the advantage is that we just re-use the same FEM component with the new MHD component, by just plugging them together. In such flexible environment, a system for binding these components has to be chosen. This binding of components can either be done via a dynamic or static binding.

4.3.1 Problem statement Dynamic binding Dynamically bound code allows to build applications which delay configuration options to runtime, and thus allow the end-user to control them. As already stated, this runtime control is advantageous for a multi-purpose simulation environment, where, for example, a user can choose multiple discretisation methods or multiple physical models (PDE’s). However, in C++, dynamically bound code relies on abstractions, virtual dispatches and mechanisms of runtime identification. All these mechanisms hinder the task of the compiler when optimising the code and impact on the overall efficiency. Static binding On the other hand, statically bound code has seen recently a growing interest by the scientific computing community in the form of C++ templates. Using meta-programming, these techniques are based on compile time configuration decisions which bind components to each other without abstractions. The lack of abstractions promotes a multitude of compiler optimisations, leading to extremely efficient code. Nevertheless, this static binding introduces limitations in flexibility and usability of the code, as each application is a static combination of components which can no longer be changed. This freezing of the combination of components means that after deployment, the end-user has little room for manoeuvre in the configuration of the simulations.

98

Automatic runtime instantiation of static components (4.3) Goals Ideally, one would like to have the advantages of both bindings and none of the disadvantages. This would be a code that could delay to runtime the configuration options of which components are bound together but would remove the dynamic dependencies that affect performance of the final product. The C++ language, which is used to build large numerical frameworks and components [11, 38, 141], features both paradigms of development: dynamic binding, via language features like class inheritance, virtual dispatch and function overriding, and static binding, via template types, template metaprogramming and overloading. Making these paradigms work together would require that at runtime, the functionality present in the application would be enhanced with a new combination of components, statically bound together. However, C++ features runtime configuration only through dynamic polymorphism, via virtual functions. Moreover, all functionality is encoded in the object code and therefore predefined at compile-time. C++ does not have direct language features to dynamically load classes, although applying certain techniques as described in [15, 18, 95], allows this problem to be mitigated. Summarising, the dynamic creation and selection of new components requires language features for runtime modification of the code, which are missing in the C++ language. Therefore it is not possible to defer to runtime decisions of the configuration of static code, based purely on language constructs. In this section we present a technique that uses delayed, runtime instantiation of C++ template code to overcome this problem. 4.3.1.1 Using static components Advantages Using statically defined component parameters is a way of improving performance. To illustrate this we look at the results of a benchmark of several matrix-vector libraries. The benchmark was performed over a matrix-vector operation of the form: z = 2 · A · x + 1.5 · y

(4.5)

where A is a square matrix and x, y, z are vectors of given size. This operation is noteworthy because it can be implemented with a vector assignment followed by a single call to a BLAS [42] library function. Therefore, this operation is a strong candidate for diverse compiler optimisations. Some of these optimisations, such as loop-unrolling, are based on knowledge of the matrix and vector sizes.

99

Chapter 4 - Domain engineering contributions The vector size varied from 1 to 50 for all libraries, which were compiled with the same compilation and linking options. In figure 4.10, each data point resulted from the average of 10 tests on a collection of 800 thousand matrix and vectors on which the operation in equation (4.5) was performed. Benckmark Matrix-Vector library z=2Ax+1.5y

Benckmark Matrix-Vector library z=2Ax+1.5y

35 30

4.5 Blitz Flens TinyBlitz tvmet

Blitz Flens TinyBlitz tvmet

4

Time/Elem x 10E-6 [s/e]

Time/Elem x 10E-6 [s/e]

3.5 25 20 15 10

3 2.5 2 1.5 1

5

0.5

0

0 5

10

15

20

25

30

35

40

Vector size

(a) Full range of tested sizes

45

50

2

4

6

8

10

12

14

16

18

20

Vector size

(b) Zoomed plot for small sizes

Figure 4.10: Matrix-vector library benchmark

The libraries Blitz [133, 135] and Flens [81] define the size parameter as a dynamic variable, chosen at runtime. The libraries TinyBlitz [133] and tvmet [102] define it as static template parameters, hardwired at compile time. Figure 4.10 shows that libraries that define the size parameter statically perform better than the ones that define it dynamically, for vector sizes up to 20. The efficiency gained in this particular case seems to be connected to loop unrolling and memory cache optimisation. For a certain type of scientific applications, these optimisations are of utmost importance. At the core of these applications often exists a computation kernel performing operations on matrices and vectors of small sizes. These sizes are given, for instance, by the number of partial differential equations being modelled. An example of such a kernel is the element matrix assembly in a finite element method [146]. This size is usually small, ranging from 3 up to 10 equations in most cases and up to 20 for very specific problems [101]. The code implementing this computation kernel is sometimes referred to as fast-path code because it is executed many times over. The number of executions scales with the size of the overall problem, for instance with

100

Automatic runtime instantiation of static components (4.3) the number of elements. Therefore, this is a critical path in terms of high performance of the simulation. Making these computation kernels as efficient as possible is a high priority. A conclusion from this benchmark is that the implementation of these fast-path codes benefits in efficiency if the code is statically parametrised with the size of the matrix and vectors. For static parametrisation to be possible, the size must be known when the code is compiled. This improved efficiency of static parametrisation is confirmed by evidence [103, 134, 137] that C++ template matrix-vector operation kernels perform well, sometimes better than hand-coded Fortran77 and C routines, and better than some BLAS implementations for small sizes. 1

void

3

dgemv ( double * a , double * x , double * y , double cax , double cy , double * z , unsigned int size )

{ 5 7 9 11 13 15

for ( unsigned int i = 0; i < size ; ++ i ) { z [ i ] = cy * y [ i ]; for ( unsigned int j = 0; j < size ; ++ j ) { z [ i ] += cax * a [ i + j * size ] * x [ j ]; } } } // usage dgemv (a ,x ,y , cax , cy , z ) ;

Code Listing 4.23: Dynamic sized algorithm for equation (4.5). 1 3 5 7 9 11 13

template < typename TYPE , size_t SIZE > void dgemv ( TYPE * a , TYPE * x , TYPE * y , TYPE cax , TYPE cy , TYPE * z ) { for ( size_t i = 0; i < SIZE ; ++ i ) { z [ i ] = cy * y [ i ]; for ( size_t j = 0; j < SIZE ; ++ j ) { z [ i ] += cax * a [ i + j * SIZE ] * x [ j ]; } } } // usage dgemv < double , 10 > (a ,x ,y , cax , cy , z ) ;

Code Listing 4.24: Static sized C++ template algorithm.

In listing 4.23, the algorithm for equation (4.5) using dynamic size is shown. The size is passed into the function as a variable during execution. Because this size information is not available, the compiler is not able to unroll any of the loops. The same algorithm is reproduced in listing 4.24,

101

Chapter 4 - Domain engineering contributions where it is implemented with a template function, of which the parameters are the TYPE and SIZE of the matrix and vectors. This parametrisation allows to optimise the algorithm to a given size, in this case to 10 as shown in the last line. It also allows to use the same function for other types, for example matrices of complex numbers. Problem with static components A problem arises: if the size is predefined at compilation time and it depends on the physical model, no choice is left for the end-user as to which physics to simulate. More generally speaking, one can say that statically defining components limits the flexibility for the user of the final application. We are interested in the development of a multidisciplinary, general purpose simulation environment, where components can be dynamically configured at runtime. As an example, the selection of the physical model has to be deferred to the end-user. This implies that the size of the PDE’s will only be known at runtime. Another problem is related to the runtime addition of externally developed functionality. It is of interest that the simulation environment allows the enduser to plug-in functionality not initially present. For example, the existing components may allow to solve MHD equations with an FVM component, but it should be possible for the user to plug-in its own set of PDE’s, or to provide another discretisation method like FEM. All of this should be possible without sacrificing performance, thus by allowing the use of statically defined algorithms where it matters most. 4.3.1.2 Solution based on template algorithms First solution A recurrent solution to the problem defined in the previous section is to devise a general algorithm and implement it using C++ templates. The template parameters are selected at the point of instantiation, during compilation. This allows the compiler to infer information about the configuration of the algorithm and hopefully provide good optimisations. 2 4

AlgorithmA < Euler :: size > AlgorithmA < NStokes :: size > ... AlgorithmA < Air21 :: size >

aEulerAlgo ; aNStokesAlgo ; aAir21Algo ;

Code Listing 4.25: Multiple instances of same algorithm.

The next step is to provide a list of instantiation points, each defining the algorithm with a specific parameter, as seen in listing 4.25. At each point

102

Automatic runtime instantiation of static components (4.3) the compiler re-compiles and optimises AlgorithmA with the size of each physical model. An analysis of listing 4.25 raises many questions. How long should that list of instances be? What is the criterion to add more instances? Should we add one instance for every model or just for the ones which require improved efficiency? The problem becomes more difficult if we take into account that sometimes, physical models are dynamic by nature, not having a precise number of PDE’s. For instance, a physical model for high-enthalpy air flow can consider multiple species present in the reactive fluid, e.g. air modelled with 5, 11 or more species, therefore the number of equations to be solved changes with the user parameter. Another problem is how to handle extensions at runtime. If an end-user provides a physical model of his own authorship, how to dynamically extend the list of instances?

More problems In reality, an algorithm may have more than one template parameter. Creating an instance to all the possible combinations of algorithms quickly leads to explosion of the number of instances. This is illustrated in listing 4.26, where an algorithm with 3 template parameters for the integration of finite element matrices is presented. This algorithm implies S × P × R combinations, where S, P and R are respectively the number of possibles occurrences of the template parameters SIZE, INTERPOL and RULE. 2 4 6 8 10

template < int SIZE , class INTERPOL , class RULE > class Integrate { ... }; Integrate Integrate Integrate Integrate ... Integrate ... Integrate

< < < <

Euler :: size , Euler :: size , Euler :: size , Euler :: size ,

LagrangeP1 , LagrangeP2 , LagrangeP1 , LagrangeP2 ,

Gauss1 Gauss1 Gauss2 Gauss2

> > > >

EulerP1G1 ; EulerP2G1 ; EulerP1G2 ; EulerP1G2 ;

< NStokes :: size , LagrangeP1 , Gauss1 > NStokesP1G1 ; < Air21 :: size , LagrangeP1 , Gauss1 > Air21P1G1 ;

Code Listing 4.26: Explosion of number of instances of same algorithm.

Adding and maintaining an up-to-date list of all these possible combinations is not only tiresome, but potentially error-prone. Moreover, such implementation is still not truly extensible. An end-user would not be able to plug-in his own physical model, for instance, provided by some externally developed library, and hope that it works with the predefined algorithms.

103

Chapter 4 - Domain engineering contributions

4.3.2 Automatic instantiation Alternative solution To remedy the problems presented in the previous section, another solution is proposed. This solution acknowledges the limitations of the C++ language, as not being able to load new functionality dynamically, using pure language constructs. Nevertheless, it still uses the same definition of the algorithms by C++ templates. However, it introduces a process by which these algorithms are parametrised only at runtime, based on the end-user choices. This parametrisation is immediately compiled into a new object code, in the form of a loadable library. This library is dynamically loaded to enrich the running application with the new functionality. This procedure follows four steps: 1. Delay to runtime the instantiation of templates, by storing the template code in a source repository; 2. Automatically generate and compile the code, based on end-user choices; 3. Assemble a dynamically loadable library with the newly compiled algorithm; 4. Load the library and use the freshly built algorithm. Composing entities This procedure works as an automatic instantiation and loading device for algorithms implemented using C++ templates. It is composed of four interacting entities, each responsible for one of the steps of the mechanism. The mechanism and the interacting entities are depicted in figure 4.11. 1. SourceRepository implemented as a collection of SourceProvider’s which store the source code. 2. CodeBuilder responsible to assemble the algorithm with the end-user parameters. 3. Compiler compiles the assembled code and creates a loadable library. 4. LibraryLoader dynamically loads the library in the application. The design of this technique, which we named template class loading, and its intervening entities are described in the following sections.

104

Automatic runtime instantiation of static components (4.3)

Physical Model

Model NS3DAir11

NavierStokes 3D Air11

BScheme NS3DAir11 ORDER3

Solver Algorithm End-User

RDS B scheme 3rd order

derived parameters

user defined choices

TemplateClassLoader

SourceProvider

BScheme

BScheme.hh

CodeBuilder

BScheme_NS3DAir11_3.cxx

Compiler

libBScheme_NS3DAir11_3.so

LibraryLoader

Figure 4.11: Automatic loading mechanism of static components.

105

Chapter 4 - Domain engineering contributions

4.3.3 Template class loader The TemplateClassLoader class performs two interconnected tasks. It stores the template algorithms in a SourceRepository and generates the code assembled from those algorithms (see section 4.3.4). The template algorithms are stored as C++ source files and must be organised following a template strategy pattern, as defined in section 4.3.5. The combined use of these two different tasks results in the automatic runtime template instantiation technique as explained in section 4.3.2. These two tasks can be identified with the two interface functions load_class and create_object, as presented in the class declaration in code listing 4.27. The class diagram, with the collaborations between all entities is presented in figure 4.12. 1 3 5 7 9 11 13 15

template < class BASETYPE > class T e m p la t e C l a s s L o a d e r { public : // / list of headers typedef list < string > THeaders ; // / template parameters typedef pair < string , THeaders > TPar ; // / loads the concrete algorithm class BASETYPE :: PROVIDER load_class ( string classname , list < TPar > p ) ; // / creates an instance of the algorithm // / and loads the class if not yet loaded BASETYPE * create_object ( string classname , list < TPar > p ) ;

17 19 21 23 25

private : // / loads a dynamically loadable library // / keeps track of the loaded libraries void load_lib ( string classname , list < TPar > p ) ; // / generates the code , creates the library , // / and keeps track of the generated libraries void create_lib ( string classname , list < TPar > p ) ; };

Code Listing 4.27: Template class loader definition.

4.3.4 Source repository, Code builder and Compiler Example To understand how the mechanism works as a whole, we will go through the process as presented in figure 4.11, following the depicted example. Suppose that the end-user chooses to solve a simulation of high-enthalpy flows, modelled by the Navier-Stokes equations on a real gas, air composed

106

Automatic runtime instantiation of static components (4.3) BASEALGO:

TemplateClassLoader +load_class(): BASEALGO::PROVIDER +create_object(): BASEALGO*

CodeBuilder +build_source()

Source Repository

Compiler

*

BASEALGO:

+compile_source() +link_library()

SourceProvider +algo_name: string +get_parameters(): list +get_include_paths(): list +get_link_libraries()(): list +get_headers(): list

LibraryLoader +load_library(libname:string) +set_search_paths(list:path)

Figure 4.12: Class diagram of the Template class loader.

of 11 species. These equations are to be discretised using a Residual Distribution Scheme’s (RDS), in particular the third order B scheme. Somehow these choices are passed to the simulation interface, and translated to the parameters that a scheme should receive: • SIZE: NSAir11::size headers: NSAir11.hh • ORDER: 3 headers: none To define the first parameter, the header that defines the physical model is needed, therefore the environment will pass to the TemplateClassLoader, a parameter with NSAir11.hh. For the second parameter, no headers are needed to define it, since it is an integer. The environment informs the TemplateClassLoader that it pretends to load the algorithm BScheme with the mentioned parameters. This is done via the load_class function.

107

Chapter 4 - Domain engineering contributions Source repository Next the TemplateClassLoader requests the sources associated with the key BScheme from the SourceRepository. These are C++ template sources stored in the repository within a database. The database holds the SourceProvider’s sorted by keys based on their names. At this point, the TemplateClassLoader performs some sanity checks on the parameters, namely checking if the parameters passed by the environment match the ones the template algorithm needs. This is done using the function get_parameters form the SourceProvider. The source code of the algorithm is defined in a header file holding the declaration and definition of the algorithm. This is accessed by the function get_headers, in which the first header of the list is BScheme.hh. The remaining headers define headers that need to be included in order to successfully compile. Typically these headers are system headers like standard template library (STL) vector and algorithm. Code builder The code builder receives the template algorithm from the SourceRepository in the form of the list of headers, and creates a C++ source file. It places at the beginning the headers that describe the algorithm, together with the headers that describe the parameters. For the ongoing example it results in the source code described in listing 4.28. The meaning of the Provider is defined in section 4.1. 2

// headers that the algorithm needs # include < vector > # include < algorithm >

4 6 8

// header that defines the algorithm # include " BScheme . hh " // headers that define parameters # include " NS3DAir11 . hh "

10 12

Provider < BScheme < NS3DAir11 , 3 > > aBScheme_NS3DAir11_3Prov (" BScheme_NS3DAir11_3 ");

Code Listing 4.28: Generated source code.

The generated source code is put into a file, whose name is generated based on the name of the intervening parameters. In this case, the generated object file is BScheme_NS3DAir11_3.cxx. This parametrisation of the BScheme component is weak in the sense that no check is made for the validity of the parametrisation. For instance, a third order parameter might be invalid or make no sense. But usually that would, or should, result in a compilation error that is interactively signalled to the end-user, informing him that the compilation failed.

108

Automatic runtime instantiation of static components (4.3) Compiler The source code is then assembled into object code and generates a loadable library. The name of the library will depend on the operating system. For instance, on Linux it will result in libBScheme_NS3DAir11_3.so. Therefore, to allow this technique to be cross-platform, the way to build the object code must be abstracted. This is done by the class Compiler. In fact, the Compiler class wraps the usage not only of the available C++ compiler but also of the operating system linker. The C++ compiler also needs the location of the definition headers, and possibly of some external headers. This information is accessible through the SourceProvider function get_include_paths. The algorithm to be compiled may optionally use some external library, for example BLAS. Therefore, the operating system linker needs the name and location of those libraries. This information is also accessible through the SourceProvider, via the function get_link_libraries. The Compiler class inheritance structure has two levels, as shown in figure 4.13. The first level abstracts the operating system dependencies, and defines information like library and object prefix and extensions, which linker to use, etc. The second level abstracts the compiler toolset, since multiple toolsets may be present in each platform. Compiler +compile_source() +link_library() -libprefix(): string -libext(): string -objext(): string -compiler_command(): string -linker_command(): string -compiler_opts(): string -linker_opts(): string

LinuxCompiler

LinuxGnuCC

LinuxIntelCC

MacOSXCompiler

MacGnuCC

Win32Compiler

MSVC

Win32IntelCC

Operating system dependencies...

Concrete toolset dependencies...

Figure 4.13: Class diagram of the Compiler interface class hierarchy.

Preferably, the build system, should automatically generate the concrete compiler implementation classes based on the actual platform being used. This is possible with cross-platform build systems, such as CMake [88] which are knowledgeable of the compiler toolset being used.

109

Chapter 4 - Domain engineering contributions The Compiler will compile and generate a dynamically loadable library and place it in the filesystem. This is performed by the compile_source and link_library functions. The first constructs a command using the compiler, that gets executed by a system call. The second also make use of a system call, this time to use the linker to construct the loadable library. Library loader LibraryLoader is the class responsible for loading the libraries. Its class diagram is presented in figure 4.14. There are many procedures to dynamically load libraries into a running application. The most challenging problem is to abstract the operating system dependencies. We chose to delegate all the work to the lower classes of the hierarchy, which are operating system dependent. For this reason the interface of the parent class is reduced to a minimal set of functions. LibraryLoader +load_library(libname:string) +set_search_paths(list:path)

LibTool

PosixDlOpen

Win32LoadLibrary

+load_library(libname:string) +set_search_paths(list:path)

+load_library(libname:string) +set_search_paths(list:path)

+load_library(libname:string) +set_search_paths(list:path)

cross-platform: win32, linux, macosx

POSIX compliant: unix, linux, macosx

Microsoft WIN32 API

Figure 4.14: Class diagram of the Template class loader.

One function, set_search_paths serves to inform the loader where to search for the libraries in the local filesystem. This is done by supplying a list of operating system independent file paths. Another function, load_lib will load the library with a given name, by automatically choosing the correct procedure and file suffix and prefix according to the operating system. The name of the library to load is returned by the call to the function link_library of the Compiler class. Once the library is loaded, the dynamic extension of functionality makes use of the self-registration technique described in section 4.1. In this case, the variable aBScheme_NS3DAir11_3Prov is a global variable that registers how to create BScheme’s with those specific parameters, under

110

Automatic runtime instantiation of static components (4.3) the name "BScheme_NS3DAir11_3". After this step, the TemplateClassLoader can answer any requests to the function create_object by retrieving the correct provider from the factory and calling the function create to return instances of objects of the correct type.

4.3.5 Template strategy pattern The technique described so far will automatically instantiate algorithms organized in a template strategy design pattern. This is a strategy pattern implemented using C++ templates. It is based on the strategy pattern described in [51], hereby called reference strategy pattern. Reference strategy pattern This design allows a family of algorithms with a common interface to be exchanged without affecting the client code. This design is very effective in factoring out common functionality within the algorithm and avoiding sub-classing on the client code. The family of algorithms may provide different behaviours, for instance different quadrature integrations under the same integrate interface. Or, the family of algorithms may provide the same visible effects but with different implementations, featuring different time and space trade-offs, for instance the same sorting algorithm, using less or more memory. BaseAlgorithm +getClassName(): string +compute_stuff()

T1: T2:... TM:

T1: T2:... TN:

OtherAlgo

SomeAlgo +getClassName(): string +compute_stuff()

+getClassName(): string +compute_stuff()

different template parameters

Figure 4.15: Class diagram for the Template Strategy pattern.

111

Chapter 4 - Domain engineering contributions Strategy pattern with templates The difference between the design pattern used here and the reference strategy pattern is that each child of the inheritance family tree, see figure 4.15, is itself a family of algorithms. It is a family because is is a algorithm templatised with different static parameters, to be configured at runtime by the choices of the end-user. Applying the strategy pattern to the template algorithm BScheme from the example, we get the source code in listing 4.29. Notice that the functions getClassName exist to give a name that identifies the algorithm in the SourceRepository. 2 4 6 8 10 12

// BaseScheme . hh header # include " Provider . hh " class BaseScheme { public : // provider for these algorithms typedef BaseScheme BASETYPE ; // interface to the algorithm virtual void c o m p u t e _ s o m e t h i n g () = 0; // class name static string getClassName () { return " BaseScheme " ; } };

14 16 18 20 22 24 26 28

// BScheme . hh header stored into S o u r c e R e p o s i t o ry # include " BaseScheme . hh " template < class SIZE , class ORDER > class BScheme : public BaseScheme { public : // definition of the interface class typedef typename BaseScheme BASETYPE ; // implementa tion of algorithm virtual void c o m p u t e _ s o m e t h i n g () ; // class name static string getClassName () { return " BScheme " ; } };

Code Listing 4.29: Example source code for the template strategy.

Note that each algorithm may have a different set of template parameters. Each parameter, may be a class behaving as a policy or trait class, as described by [4]). To define each parameter type, a different set of header inclusion is needed. This information is be provided to the TemplateClassLoader via the parameters of the functions load_class() or create_object(). These functions allow the end-user to pass the information about an externally defined template parameter. Therefore it enables the runtime extension of functionality by the end-user. There is a limitation to use this pattern with the template class loader technique. All the child classes must be atomic, meaning they are defined with only one class definition and do not require multiple class instances

112

Automatic runtime instantiation of static components (4.3) interacting together. This limitation is not strong, but it helps to simplify the process of storing the algorithms in the SourceRepository by assuming one file per algorithm and no inter-dependencies.

4.3.6 Application This technique was employed within a concrete application, similar to the example followed in the previous sections. A physical model for the NavierStokes equations of perfect gas in 2D was selected. This model was then discretised with a finite volume method, using a Roe flux scheme. This test case is of interest because it uses not only matrix-vector operations in its kernel, but also matrix-matrix operations. Some of the operations involved are described in the equations below: | A |= R· | Λ | ·L

(4.6)

 1 fR + fL − | A | ·(uR + uL ) (4.7) 2 where uppercase letters represent matrices and lowercase letters represent vectors. | A | is computed from the multiplication of 3 matrices, where | Λ | is a diagonal matrix. Note that this computation involves other scalar calculations not described in the above equations. In particular, the entries of vectors fR and fL are computed using the analytical definition of the physical flux and depend on the values of u. Therefore, the performance of this case is more similar to the performance of a real application. A benchmark test using the above setup, was implemented with the same matrix-vector libraries used in section 4.3.1. The computations were performed on 800,000 elements and averaged over 7 tests for each library. f=

Blitz 2,203

Flens 3,536

TinyBlitz 0,936

tvmet 0,581

Table 4.1: Benchmark to a Roe flux algorithm, in seconds.

The results of the benchmark, presented in table 4.1, show that the static parametrised libraries, TinyBlitz and tvmet, outperform both dynamic libraries, Blitz and Flens. Therefore, static parametrisation of the computation kernel is a good candidate for improvements in performance of the simulation environment. In such case, the technique presented allows to retain the full flexibility of the environment.

113

Chapter 4 - Domain engineering contributions

4.3.7 Summary of the technique The presented technique provides an automatic facility for generating C++ template static bound code, according to user runtime requirements. The aim is to maximise code efficiency without sacrificing flexibility of the final application. With such facility, the problem described in section 4.3.1 is overcome by deferring the binding decision to runtime, giving the user the choice which algorithm to bind with which configuration parameters. By deferring the compilation, it allows the compiler to perform all its optimisations and avoid dynamic dispatch in the kernel of computation. This is accomplished with a code builder, which instantiates providers of static template combinations which are compiled on demand and dynamically loaded into the automatic facility. In practice, this extends the available functionality at runtime. We can apply this technique whenever a component (algorithm) needs to be parametrised with user defined information. This information can either be in the form of values (integer, sizes) or more generally in the form of other components. We must point out that our technique is not equivalent neither aims at providing the C++ language with reflection capabilities. This technique does not allow class and method discovery and identification. The class interface and its methods must be clearly known and preexisting. Relation to other work Dawes [37] developed a library that loads C++ classes and uses a factory pattern similar to our ClassLoader, but does not support automatic code generation. Carzaniga [29] developed a dynamic class loader with some reflection capabilities, where function discovery is possible, but also does not support code generation. A similar technique, described by Langtangen and Cai in [28, 75], is based on using an interpreted language like Python, from which C or C++ modules are automatically generated. The end result is a flexible scripting environment for creating programs, but with tailored implementations for the CPU intensive tasks, written in compiled languages.

114

Chapter 5

Concepts that have proven useful in ordering things easily achieve such authority over us that we forget their earthly origins and accept them as unalterable givens. Albert Einstein (1875-1955)

Framework Design This chapter presents three techniques to develop the architectural structure of scientific computing tools. These techniques answer a number of questions. In section 5.1, we address what structure a numerical method should have to integrate well with other methods and physical models. We also focus on how each numerical method can be a component of a larger simulation tool. In section 5.2 we look how different numerical methods can share data and communicate without depending on a specific communication process. We also investigate how communication be transparent to the developer. In section 5.3 we try to answer how different numerical methods can share data structures and how these data structures can satisfy each method’s specific needs.

5.1 Enabling Components Often in high performance computing codes are developed in a monolithic way with a single purpose in mind. This is enough if the purpose remains static and the methodologies the same. But as the problems to be solved keep pushing the boundaries of the state of the art, new methodologies quickly arise. As a consequence the monolithic codes have to be adapted to the new reality or become legacy codes.

5.1.1 Problem Statement Dependency problem The development of monolithic codes is expensive because each functional sub-part is inherently related and tied to other subparts. Changes and evolutions aren’t localised and propagate throughout the code, involving many developers to implement and maintain. Moreover, the entangled dependencies may be responsible for limiting the growth capability of the code. We call this problem the dependency problem.

115

Chapter 5 - Framework Design Object oriented techniques Object oriented techniques have slowly been brought into the high performance computing community, see [9, 11, 23, 94]. These techniques are helpful because they break the dependencies between the sub-parts of the algorithms. First, by creating objects which have certain responsibilities and perform certain actions and then by defining interfaces and establishing how objects interact among themselves. The effect is to localise the functionality and reduce the interdependencies between the code. These techniques also promote code re-usability via inheritance to reduce the size of the maintained code. Type Optimisation problem Nevertheless, there is another problem OO techniques introduce: applying them without discretion leads to undesired impacts on numerical performance due to overuse of abstractions. An abstraction is a powerful way to disentangle dependencies, but can also be perilous, because it shields type identification. Compilers rely strongly on type identification for optimisation, and in this case, OO may prevent it. We refer to this problem as the type optimisation problem. Modularity problem When applying OO techniques, we are often surprised by the large number of objects that appear in the design. Complex arrangements of objects form ad-hoc facilities, for example to handle the mesh data structure, implement a finite element method or perform parallel communication. These arrangements of objects do not enforce barriers to dependencies, and thus do not completely solve the dependency problem. For example, how can we be sure that the mesh data structure is independent of the finite element method? We can implement these arrangements of objects using limited language features as C++ namespaces [123] or as Python modules [75]. But these techniques do not produce a unique interface that defines a code unit with certain defined responsibilities. By applying the OO techniques we solve part of the dependency problem and we are left with a new, although reduced problem, which we refer to as the modularity problem. Component Architecture To solve the modularity problem we have to define a role for arrangements of objects and give them a unique interface that increases separation of code dependencies. A Component Architecture implements the separation of coarse grained roles and it allows certain large structures of the code to be interchanged

116

Enabling Components (5.1) without affecting the other structures [125]. These interchangeable structures are called Components. They connect to a central Kernel whose responsibility is to bring all components together and orchestrate their functionality. Some scientific software environments have been developed with a Component Architecture in mind [5], although some are focused on a single discretisation method [141]. Objectives Our objective is to design a component environemnt that is an extendable and modular framework for scientific computing where multiple highly performant discretisation methods and physical models coexist. But to accomplish that we must solve the modularity problem and the type optimisation problem simultaneously. Therefore, the central question in this chapter is: How to develop a numerical code that is modular and extendable but retains the peak performance of a dedicated solver? To answer this question, we choose some definitions for a modular, extendable and performant code, to serve as guidelines in the evaluation of different strategies. Modular A code is said to be modular if different functional sub-parts, possibly with different external behaviour, can be interchanged without affecting the other sub-parts. As a consequence, different sub-parts can be separately developed and do not have direct interactions. Extendable A code is said to be extendable if the client code does not change as more algorithms are added, i.e.,, the decision tree must not need to be updated to take into account the new additions. In the scope of a Component Architecture, the Kernel does not need to be changed as more components are added. Performant A code is said to be performant if its implementation is as efficient in CPU and memory usage as a typical monolithic code dedicated to a single purpose. Outline We will follow the evolution of a prototype code, with the purpose of introducing gradually the various concepts. We will start with a monolithic finite element code, which we’ll assume can be applied to multiple types of equations. We will refactor the existing structure to address the problems of modularity and type optimisation by introducing OO techniques. Then,

117

Chapter 5 - Framework Design we will change the code to support multiple discretisation methods, while still dealing with multiple physics. This will lead us to the implementation of the Component Architecture. The result will be a combination of design patterns [51] which is specially adequate for the implementation of numerical method components. We consider that the present technique be recurrently applied to interface multiple and diverse numerical methods, such as time discretisation methods, error estimation methods, mesh adaptation and even input-output methods.

5.1.2 Unsatisfactory solutions 5.1.2.1 Starting with Procedural Programming Consider an existing numerical code for the solution of PDE’s based on the Finite Element Method, capable of discretising multiple physical models. A hypothetical prototype of the code could be code listing 5.1. Observe that the method loops over a large collection of elements, integrating some quantity on each of them, assembling this contribution into a large linear system, solving the linear system, then updating the nodal vector with the solution. 5.1.2.2 Introducing Object-Oriented Programming The initial code is implemented in a Procedural Programming (PP) way, in C/C++ language. We could argue that the preceding code listing 5.1 could be better designed, say by using function pointers. But our main concern is that the decision tree present in function ComputeElem() is based on the type of element to be integrated. This decision tree is always needed to select the correct pointer or function. In a way, OO languages do the same with hidden tables of (virtual) functions. Implementing the previous code in a OO way using abstract inheritance, with the C++ language, would yield something like the code listing 5.2. OO Advantage This solution decouples the client (ComputeElem) from the algorithm implementation (integrate function). This decoupling allows to extend the application with added functionality (new way of integrating the elements) without need to update the client. Disadvantages This decoupling has a price: the type information is lost. We only know it is of type Shape. We lost the knowledge about what concrete

118

Enabling Components (5.1)

2 4 6 8 10 12

// global variables struct Elem e []; int size_e ; struct Node n []; int size_n ; struct Result u []; void FEMSolve () { ... ComputeElem () ; ... lss - > assemble () ; lss - > solve ( u ) ; updateSolut ion () ; }

14 16 18 20 22 24 26 28 30 32

void ComputeElem () { struct Result r ; int i = 0; for ( i = 0; i < size_e ; ++ i ,++ e ) { switch (e - > Type ) { case TRIAG : r = integ rateTri ag ( e ) ; break ; case QUAD : r = integrateQuad ( e ) ; break ; /* many more */ } c op yT o Li n e a r S y s t e m ( r ) ; } } void updateSo lution () { for ( i = 0; i < size_n ; ++ i , ++ n , ++ u ) n += u ; }

Code Listing 5.1: Algorithm implemented with procedural programming.

2 4 6 8

void ComputeElem ( std :: vector < Shape * > s ) { Result r ; for ( int i = 0; i < s . size () ; ++ i ) { // here you have a virtual call r = s [ i ] - > integrate () ; c op yT o Li n e a r S y s t e m ( r ) ; } }

Code Listing 5.2: Algorithm implemented with object oriented programming.

119

Chapter 5 - Framework Design shape is to be integrated. Certain algorithmic simplifications are impossible, for example avoiding to compute expensive jacobian transformation terms with the simpler elements. Furthermore, it removes the chance for certain compiler optimisations like inlining or interprocedural analysis [44]. It also increases the time spent to execute the call, since virtual calls involve more indirections [25]. 5.1.2.3 Trying Generic Programming We try another solution using GP techniques based on type parametrisation, e.g. template meta-programming with C++. In that case, one can recast the ComputeElem into code listing 5.3: 1 3 5 7 9

template < class TYPE > void ComputeElem ( std :: vector < TYPE * > s ) { Result r ; for ( int i = 0; i < s . size () ; ++ i ) { // every TYPE will implement its own integrate r = s [ i ] - > integrate () ; c op yT o Li n e a r S y s t e m ( r ) ; } }

Code Listing 5.3: Algorithm implemented with GP programming.

GP Advantage With the previous code there is no type discovery at all. The use of template techniques enables several code optimisations to be done, most importantly, inlining. Disadvantage On the other hand, this solution affects the client code, as the template parameter needs to be specified at some point. A hypothetical adaptation of the client is given in code listing 5.4. Observe that the concrete types are explicitly specified in the client code. For every new type added to the implementation, an adaptation and recompilation of the client code is necessary. According to the definition given in section 5.1.1, this code is not extendable, because by adding new functionality we need to change the existing skeleton of the method, the FEMSolve function.

120

Enabling Components (5.1)

2

// global variables struct Node n []; struct Result u [];

4 6

// template parameter specified std :: vector < TRIAG > elem_tg ; std :: vector < QUAD > elem_qd ;

8 10 12 14 16 18

void FEMSolve () { ... // explicit calls to the template code ComputeElem ( elem_tg ) ; ComputeElem ( elem_qd ) ; ... lss - > assemble () ; lss - > solve ( u ) ; updateSol () ; }

Code Listing 5.4: Client code adapts to GP parameters.

5.1.2.4 Bringing OO and GP together Comparison Both OO or GP are not fully satisfactory on their own. OO prioritises abstraction to achieve modularity, but at the cost of losing the type identification that allows precious optimisations. GP technique keeps the type knowledge which allows multiple optimisations but exposes the types in the client code thus sacrificing modularity. Both techniques presented, OO and GP, can be though as orthogonal to each other. Each prioritises one axis of development: either the type abstraction or the type binding. Therefore one favours the solution of the modularity problem while the other favours the solution of the type optimisation problem. Type discovery Obviously some sort of type discovery is always needed: be it dynamic as in the OO case or static as in the GP case. There are places where we would prefer to have static type discovery in order to preserve performance. There are other places where dynamic type discovery should be kept for modularity. Bearing this in mind, we ask: Can we develop a flexible design that implements modular numerical methods, where type discovery is adapted to each specific situation, balancing between modularity and performance?

121

Chapter 5 - Framework Design

5.1.3 Combined solution In this section we derive a strategy using both OO and GP, to create components out of numerical methods while retaining high performance. First, we decompose the numerical method in Command’s (section 5.1.3.1) and Strategy’s (section 5.1.3.2) and show how to keep static type binding (section 5.1.3.3) and share common data (section 5.1.3.4). All is structured with a template method pattern (section 5.1.3.5). The technique is summarised in section 5.1.3.6 where we show how to use it to create different types of components. 5.1.3.1 Adding flexible commands Command Design Pattern We apply an adaptation of the Command pattern from [51]. This design pattern is used to encapsulate actions and their parameters. The interesting fact is that the client that commands the action does not know how and what action is performed, because the only visible function is execute. This decouples the act of commanding from the act of execution. We use this pattern to customise the numerical methods by selecting which group of tasks are executed. For example, a Finite Volume solver uses a command to compute the full Jacobian matrix of the problem. This can be done either analytically or by numerical differentiation. Both approaches are implemented as different commands and the user customises the method by choosing one. Analogy This Command pattern works like a uncaring factory manager (the client) who does not care for his employees as long as they do some job. He is there to tell when to do things, not how or what to do. The employees (the commands) just obey and for better or worse, they do something. You can substitute an employee (command) for another one, and the manager will not notice: he will keep ordering him to do something. Description To distinguish the adaptation from the original design pattern, we call this design NCommand, denoting that it applies for Numerical methods. To use it, we first create a base class with one virtual function execute. From this class, we derive all the commands of this method. In figure 5.1 we see that three commands are derived, each doing different actions: one prepares the computation, the second performs elementwise computations and the third applies the boundary conditions (BC).

122

Enabling Components (5.1) Client

execute

NCommand +execute(): void

Prepare +execute(): void

ComputeElements +execute(): void

ApplyBCs +execute(): void

Figure 5.1: Class hierarchy for Numerical Commands

Example of application In code listing 5.5 we refactored our example with this design. All commands of the FEM method are derived from the FEMCommand class. Since all have the same interface, the method can use any command to any task. In the example, one command computes the element contributions while the other updates the solution. We also realise that some smart configuration facility must be used to initialise correctly the FEMCommand pointers, for instance following the technique of section 4.2. Specialisation The NCommand design allows for specialisation of actions. If a simulation only involves triangles, we exchange the behaviour of the action computer by a more specialised command that takes advantage of that information (see code listing 5.6). This new command is more efficient because it keeps type information and avoids type discovery. Notice that altough this command is a conceptually related to the more generic ComputeElem, they do not share any inheritance tree. In this design, we allow for commands to be unrelated, to maximise the flexibility. For full flexibility, we need to instantiate the commands without hardcoding their construction in the client code. We can use the self-registration technique (section 4.1) to abstract the constructor, together with the selfconfiguration technique to let the user select the command by passing a configuration string with its name. An example of this is shown in code listing 5.7. In line 14 we define a user option for the name of the command computer as a string. This option is used in line 24 to get the provider of commands. In line 25 the provider creates the instance of the command. Analysis The differences to the initial prototype are small, yet they bring high flexibility: we can now treat any action in a method equally. The advantage is that we can specialise the actions without influencing the method structure. The disadvantage is that this adds the responsibility of setting up

123

Chapter 5 - Framework Design

1

// same global variables as code listing 5.1 struct FEMCommand { virtual void execute () = 0; };

3 5 7 9 11 13 15 17 19 21 23 25 27 29

struct ComputeElem : public FEMCommand { virtual void execute () { struct Result r ; for ( i = 0; i < size_e ; ++ i ) { switch ( e [ i ]. Type ) { case TRIAG : r = integ rateTri ag ( e [ i ]) ; break ; case QUAD : r = integrateQuad ( e [ i ]) ; break ; /* many more */ } c op yT o Li n e a r S y s t e m ( r ) ; } } }; struct UpdateSol : public FEMCommand { virtual void execute () { for ( i =0; i < size_n ; ++ i ) n [ i ] += u [ i ]; } }; void FEMSolve () { // should be done by configuration object FEMCommand * computer = new ComputeElem () ; FEMCommand * updatesol = new UpdateSol () ; ... computer - > execute () ; ... lss - > assemble () ; lss - > solve ( u ) ; updatesol - > execute () ; }

Code Listing 5.5: Commands are flexible tasks that can be substituted.

1 3 5 7 9

struct ComputeTriag : public FEMCommand { virtual void execute () { struct Result r ; int i = 0; for ( i = 0; i < size_e ; ++ i ,++ e ) { r = integrat eTriag ( e ) ; c op yT o Li n e a r S y s t e m ( r ) ; } } }

Code Listing 5.6: Specialised command only for triangles.

124

Enabling Components (5.1)

1 3 5 7 9 11 13

15 17

// this class allocates the commands for the finite element method // see section 4.2 for ConfigObject class FEMConfig : public ConfigObject { FEMConfig () // constructor { // defines options in this object a dd Co n fi g Op t i o n s T o ( this ) ; // sets the default name for the computer computer_name = " ComputeElem " ; setParameter ( " Computer " , & computer_name ) ; } static void d e f i n e C o n f i g O p t i o n s ( OptionList & optlst ) { optlst . addConfigOption < string >( " Computer " ," Name of the Computer Command . " ) ; } virtual void configure ( ConfigParams user_opts ) { ConfigObject :: configure ( user_opts ) ;

19 21 23 25 27

// construct the computer command // we want to avoid : computer = new ComputeElem () ; Provider < FEMCommand >* provider = Factory < FEMCommand >:: getInstance () . getProvider ( computer_name ) ; SelfRegistPtr < FEMCommand > computer = provider - > create () ; } } // end class FEMConfig

Code Listing 5.7: Specialised command only for triangles.

the actions correctly, using some configuration process. We have thus developed the first contribution towards a modular and extendable design, where NCommand ’s can be substituted with more suitable ones, either with the intent of varying the algorithm or for increased performance. 5.1.3.2 Decoupling strategies Strategy Design Pattern Using NCommand ’s we vary algorithms which have the same interface (execute). That allows unrelated actions to be interchanged even if they perform different tasks. Now we see how to vary related algorithms using the Strategy pattern. Definition The Strategy design pattern can be considered the dual to the Command pattern, because it provides a specific interface (instead of generic) to a family of related algorithms (opposed to unrelated) that perform the same function (rather than alternative functions).

125

Chapter 5 - Framework Design This design constructs a family of algorithms which implement the same task in different ways. The clients use it through the abstract interface, where the algorithm functions are declared. In figure 5.2, we see a hierarchy of integration strategies which have the class Integrator as interface, and three different algorithm implementations (TrapeziumRule, etc). ComputeElem

dynamic binding

-integrator: Integrator* +execute(): void

Integrator

NStrategy

+integrate(): Result

static binding

TrapeziumRule

SimpsonRule

+integrate(): Result

+integrate(): Result

GaussQuadrature

INTEGRATOR_TYPE:class

ComputeGeneric +integrator: INTEGRATOR_TYPE* +execute(): void

GaussOrder1 +integrate(): Result

. . .

GaussOrder4 +integrate(): Result

Figure 5.2: Class hierarchy for Numerical Strategies

NCommand’s use NStrategy’s The clients are the NCommand ’s, who use these strategies as helpers in the realisation of their tasks. Varying the strategy implementation, we can vary the effect of each command. Therefore, we say that the NCommand ’s are parametrised with the NStrategy’s. This parametrisation can be either dynamic or static, depending on the type binding. If the command accesses the strategy through the abstract interface, as ComputeElem in figure 5.2, then it is dynamic. If a command is parametrised with a concrete strategy, as ComputeGeneric is parametrised with TrapeziumRule, then it is static. Further discussion is presented in section 5.1.3.3. Example of application In our example, we are integrating the elements with one single algorithm which needs small adaptation to be used on the different types (triangles, quadrilaterals, etc). Suppose that we would like to extend this functionality to support multiple types of integration, say trapezium rule and Gauss quadrature. We would also like to use the latter in many orders of approximation, say up to fourth order. The algorithm has increased its order of variance from one

126

Enabling Components (5.1) to three. It would be tedious and error prone to implement explicitly this decision tree in each NCommand that computes element contributions.

1 3

class Integrator { virtual Result integrate ( Elem *) = 0; }

5 7 9 11 13

class GaussLegendre : public Integrator { virtual Result integrate ( Elem *) { ... } } class TrapeziumRule : public Integrator { virtual Result integrate ( Elem *) { ... } }

Code Listing 5.8: Inheritance hierarchy for the integration algorithm.

We extract that functionality into a NStrategy which will be responsible to integrate elements. Code listing 5.8 presents an implementation of the inheritance tree. Notice how the abstract class Integrator exposes the specific interface integrate(), and how the derived classes implement it.

2 4 6 8 10 12 14

struct ComputeElem : public FEMCommand { virtual void execute () { // get or construct the correct integrator Integrator * integrator = new GaussLegendre () ; ... struct Result r ; int i = 0; for ( i = 0; i < size_e ; ++ i ,++ e ) { // here is a virtual function call r = integrator - > integrate ( e ) ; c op yT o L i n e a r S y s t e m ( r ) ; } } };

Code Listing 5.9: Use of the integration strategy.

In code listing 5.9, we have adapted the command that computes the element contributions using the integration strategy. The allocation of the concrete strategy (TrapeziumRule or GaussLegendre) is left again to some configuration facility depending on user input.

127

Chapter 5 - Framework Design Analysis Using NStrategy’s avoids subclassing in NCommand ’s. Because the Strategy pattern uses inheritance it factors out common functionality in the algorithms. In the example, the integration algorithm was moved into a strategy hierarchy and this allows other commands to share the same algorithms, for instance, for the element or face integration in the boundary conditions. 5.1.3.3 Binding Algorithms by Type Dynamic Type binding Analysing the evolution of the example code from section 5.1.3.1 to section 5.1.3.2, we see that we removed the decision tree (the case-switch) from the command into the object Integrator. The code in ComputeElem (code listing 5.9) is dynamically bound via the abstract interface which prevents type based optimisation techniques and thus hinders performance. Static Type binding To regain performance we can combine the flexible substitution of NCommand with static type binding using GP techniques. This is how ComputeGeneric command is bound to the strategy TrapeziumRule in figure 5.2. The code, presented in code listing 5.10, defines a command with C++ templates that can be bound to any specific instance of the Integrator family. This binding is done at compilation time and enables the compiler to perform type based optimisations.

2 4 6 8 10 12

template < class INTEGRATOR_TYPE > class ComputeGen eric : public FEMCommand { virtual void execute () { INTEGRATOR_T YP E integrator ; struct Result r ; int i = 0; for ( i = 0; i < size_e ; ++ i ,++ e ) { // here is a static function call r = integrator . integrate ( e ) ; c op yT o Li n e a r S y s t e m ( r ) ; } } }

Code Listing 5.10: Command using Generic Programming.

In code listing 5.11, the client code allocates the specific command that the user requests. We can automate the instantiation of these static commands as described in section 4.3.

128

Enabling Components (5.1)

1 3 5 7 9 11

void FEMSolve () { // this allocation should be done by // a configuration object like FEMConfig FEMCommand * computer = new ComputeGeneric < TrapeziumRule >() ; ... computer - > execute () ; ... lss - > assemble () ; lss - > solve ( u ) ; updatesol - > execute () ; }

Code Listing 5.11: Client code using GP command.

Analysis We have parametrised the NCommand, ComputeGeneric, with the concrete type of the NStrategy, TrapeziumRule. As a consequence, no virtual calls are performed within the function execute and the type handled is not an abstract type allowing more compiler optimisations. Therefore it is now possible to choose between two ways to bind the strategies into the commands: Dynamic Strategies are bound to the commands via their abstract interfaces and provide full flexibility by using runtime type discovery. Static Strategies are bound to the commands via their concrete types thus enable type based optimisations, but can only be used when certain conditions of the simulations are met, for example on a problem discretised only with triangles. Synergies We can now appreciate the full flexibility of these two design patterns put together, the NCommand and the NStrategy. The NCommand provides the flexibility to choose between available commands to customise the numerical methods. This contributes to a modular environment and the solution of the modularity problem. The NStrategy decouples the algorithms but allows them to be statically bound to the commands therefore contributing to solve the type optimisation problem. 5.1.3.4 Sharing Method Data Common Data Commands are isolated tasks, but their actions sum up to the full effect of the numerical method and often they need to use and modify the same data. Since commands delegate part of their algorithms to strategies, they need to share their data also with these strategies.

129

Chapter 5 - Framework Design Common Strategies To improve reuse, commands also need to share and reuse strategies. If an integrator is created for elements, then it could be reused for boundary faces. Putting the strategies in a central place, where all commands have access, would allow to reuse them. Need of a Grouping Object To allow commands and strategies to share data and access to strategies, we introduce another concept: the NMethodData. This is a class which will have only one instance per numerical method, where all the data relative to the method will be grouped and shared. In practice the NMethodData consists of several accessor functions and a configuration function, where according to user input, the method algorithms are configured. This is the central point for parametrisation of the numerical method. Moreover, the NMethodData holds ownership of the strategies which can access back to the NMethodData. This allows any strategy to access any other strategy. Figure 5.3 shows the collaboration between commands, strategies and the NMethodData. NMethodData

FEMCom

NCommand

+execute(): void +getMethodData(): FEMData&

ComputeElements +execute(): void

ApplyBCs

*

1

FEMData

1 -integrator: Integrator* -elems: Elem* +configure(): void +getIntegrator(): Integrator* +getElements(): Elem*

+execute(): void *

Integrator

shared data: Elems

NStrategy

+integrate(): Result

Figure 5.3: NCommands share the access to a common NMethodData.

Example of application Continuing our example, suppose we have the ComputeElem command that uses the Integrator strategy and now we add the ApplyBC command. We would like that the new command reuses the same integration strategy. In code listing 5.12, we create the FEMData class that will act as NMethodData. We place the accessor function getIntegrator which returns the integrator. The integrator is configured with the user options in function configure. Next, in code listing 5.13 we adapt the com-

130

Enabling Components (5.1) mands to access the method data with the getMethodData function, and retrieve the integrator. 1 3 5 7 9 11 13

class FEMData { public : void configure () { // allocate integrator according to user input integrator = new TrapeziumRule () ; } // accessor functions Integrator * getIntegrator () { return integrator ; } Elem * getElements () { return elems ; } private : Integrator * integrator ; Elem * elems ; };

Code Listing 5.12: Example of NMethodData .

Note that we can reuse the self-registration and self-configuration techniques used in code listing 5.7 for FEMConfig with the class FEMData. FEMData can derive from ConfigObject and, according to user parameters, instantiate all the strategies using providers, as described in section 4.1. Analysis All that should be shared within a method, should be centralised in the NMethodData. This allows sharing without breaking the encapsulation of the NCommand design or polluting the interface of the NStrategy’s with accessor and mutator functions. Centralisation also makes the configuration simpler: if a user selects the integration order to be fourth, then the method only needs to configure it in one point, and all commands are instantly updated. 5.1.3.5 Giving structure to the Numerical Method Each numerical method has invariant parts in its algorithm: for example, FEM must compute the contribution of the elements, then apply the boundary conditions, then solve, etc. The invariants are certain actions that must be done in a certain fixed order. Providing structure To vary how the element contribution are computed you may use the NCommand solution. What remains, is to decide when to execute it. What we are asking is: what provides structure to the algorithm? To answer this question, we apply an adaptation of the Template Method

131

Chapter 5 - Framework Design

1 3 5 7 9 11 13

struct ComputeElem : public FEMCommand { virtual void execute () { // get the shared integrator Integrator * integrator = getMethodData () . getIntegrator () ; Elem * els = getMethodData () . getElements () ; ... struct Result r ; int i = 0; for ( i = 0; i < size_e ; ++ i ,++ e ) { r = integrator - > integrate ( e ) ; c op yT o L i n e a r S y s t e m ( r ) ; } } };

15 17 19 21 23 25 27

struct ApplyBC : public FEMCommand { virtual void execute () { // get the shared integrator Integrator * integrator = getMethodData () . getIntegrator () ; // loop over boundary faces for ( i = 0; i < size_f ; ++ i ,++ f ) { r = integrator - > integrate ( f ) ; ad dTo Li n e a r S y s t e m ( r ) ; } } };

Code Listing 5.13: Inheritance hierarchy for the integration algorithm.

design pattern [51]. This design provides a skeleton for algorithms, by fixing their invariant parts and providing extension points for their variations. Numerical Method Design We create a class that represents the NMethod and which has a number of public functions. For instance: discretise the domain, adapt the mesh or solve the linear system. Each of these functions is implemented with template method design, defining the invariant structure of the algorithm. Parts of that algorithm are extension points, where we can vary the behaviour. The extension points are abstract functions or hook functions. Abstract extensions must be specified, while hooks provide a default that may be specialised. For example, on a FEM code a hook method may be the function updating the solution; usually it just takes the solution of the linear system, but we may override it with another implementation that does that plus writing the solution to a file. In figure 5.4, we see how the NMethod class relates to the other participants of the design. It holds ownership of the NCommand’s and the NMethodData,

132

Enabling Components (5.1)

FEM

NMethod

-method_data: FEMData -compute_elem: FEMCom* -apply_bcs: FEMCom* -update_sol: FEMCom* -solver: LinearSolver +discretise(): void #assembleSys(): void #solveSys(): void #updateSol(): void

NMethodData

FEMData

1 1

-integrator: Integrator* -elems: Elem* +configure(): void +getIntegrator(): Integrator* +getElements(): Elem*

NCommand *

FEMCom

*

+execute(): void +getMethodData(): FEMData&

Integrator

NStrategy

+integrate(): Result

ComputeElements +execute(): void

ApplyBCs +execute(): void

Figure 5.4: NMethod provides structure to the Commands and Strategies

which in turn holds the ownership over the NStrategy’s. Besides providing structure, the NMethod also works as a wrapper of the method objects, providing a single interface to the environment outside. Applying the template method pattern Take the example of the FEM code we have been changing. The algorithm for discretising the computational domain with FEM is given in general terms by: 1) Assemble linear system a) Compute element contributions b) Apply boundary conditions 2) Solve linear system 3) Update the solution These actions are the invariants of the algorithm and together they make the discretise function as in code listing 5.14. Their order is always the same, but what each action concretely does might vary. Each action is implemented as a function. Each function may implement part of the algorithm statically (if invariant) or delegate to a NCommand which works as an extension point (hook function). For example, assembleSys has two hook points:

133

Chapter 5 - Framework Design compute_elem and apply_bcs, but the usage of solver to assemble the system is fixed. 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

class FEM { public : void discretise () { // algorithm structure assembleSys () ; solveSys () ; updateSol () ; } protected : virtual void assembleSys () { compute_elem - > execute () ; // extension point ( hook ) solver - > assemble () ; apply_bcs - > execute () ; // extension point ( hook ) } virtual void solveSys () { solver - > assemble () ; solver - > solve () ; } virtual void updateSol () { update_sol - > execute () ; // extension point ( hook ) } private : FEMData method_data ; FEMCom * compute_elem ; FEMCom * apply_bcs ; FEMCom * update_sol ; LinearSolver * solver ; };

Code Listing 5.14: Example of NMethod with FEM.

5.1.3.6 Achieving the Component Architecture Now, assume that we have the objective of refactoring this code to support alternative discretisation methods, for instance the cell-centered Finite Volume Method (FVM). We would like to isolate the development of all numerical methods into components. First we create an abstract interface that the central Kernel will use to interact with the discretisation methods. In the example, we call this interface SpaceMethod and we present it in code listing 5.15. The SpaceMethod provides a service to the other components in the kernel via the discretise function. To this abstract class on top of the NMethod’s we call AbstractNMethod. This class must derive from the Component class where we place the interac-

134

Enabling Components (5.1)

1 3

class SpaceMethod : public Component { public : virtual void discretise () = 0; }; // class SpaceMethod

5 7 9 11 13

class FEM : public SpaceMethod { public : virtual void discretise () { // algorithm structure assembleSys () ; solveSys () ; updateSol () ; } ... }; // class FEM

15 17 19 21 23

class FVM : public SpaceMethod { public : virtual void discretise () { // algorithm structure computeFluxes () ; solveSys () ; updateSol () ; } ... }; // class FVM

Code Listing 5.15: Abstraction of the space discretisation.

tion protocol with the Kernel. This protocol is a set of functions that plugs the component and declares services and data to the Kernel (see section 6.2). Each AbstractNMethod interfaces a collection of algorithms of a certain type. Under the SpaceMethod interface we have implemented discretisation methods like Discontinuous Galerkin, Spectral Finite Volume, Fluctuation Splitting, etc. We can create other AbstractNMethod classes for other tasks in the component environment. Other examples could be an ErrorEstimator, providing an abstract function estimate or a LinearSystem solver presenting a solve, a getMatrix and a getRHS interface functions. Method-Command-Strategy The designs presented so far, the NCommand, the NStrategy and the NMethod, form under the AbstractNMethod what we call the Method-Command-Strategy (MCS) pattern. This architectural design is reused for the implementation of each numerical component. In figure 5.5 we see the collaboration diagram of the MCS pattern. The class AbstractNMethod provides the interface visible to the Kernel. In the figure this interface is templateMethodA and templateMethodB. Following

135

Chapter 5 - Framework Design

AbstractNMethod

Kernel

public: virtual configure(Args) = 0; virtual templateMethodA() = 0; virtual templateMethodB() = 0;

1

NMethod public: configure(Args);

1

NMethodData public:

... doActionA(); ... doActionB(); ...

templateMethodA(); templateMethodB(); private: doActionA();

... com1−>execute(); ... com3−>execute(); ...

doActionB(); doActionC(); private: AbstractNCommand * com1; AbstractNCommand * com2; AbstractNCommand * com3;

getSomeStrategy(); getOtherStrategy(); getSomeData();

1

*

NStrategy public: doStuff(); computeMore();

1

*

NCommand public: execute();

Figure 5.5: Method-Command-Strategy (MCS) Pattern

the technique of section 4.2, it also provides a configure function to allow the Kernel to pass user options. Next we see in figure 5.5 the NMethod deriving from the AbstractNMethod and implementing its functionality. Each of the interface functions (templateMethodA and templateMethodB) is implemented by a template method pattern. This design provides structure to the algorithms. This structure is composed of functions calls, e.g., doActionA and doActionB. Some actions have static algorithms but others have flexible points (hooks) that can be configured by the user. In the diagram, the flexible points are represented by calls to execution of commands com1->execute(). For example, in the FVM method the discretise function has the following algorithm 1) compute fluxes a) apply boundary conditions b) compute RHS and Jacobian from fluxes in faces 2) solve linear system

136

Enabling Components (5.1) 3) update the solution where steps (2) and (3) are fixed actions, and both (1a) and (1b) are configurable actions (hook points). The first hook point provides flexibility in choosing different boundary conditions. The second hook point allows the user to change the way to compute the global jacobian, for instance switching from numerical differentiation to analytical formulation of the jacobian entries. The above algorithm is implemented into functions and presented in code listing 5.16.

2 4 6 8 10 12 14 16 18

void FVM :: discretise () { // algorithm structure computeFluxes () ; solveSys () ; updateSol () ; } void FVM :: computeFluxes () { applyBCs () ; // hook point computeJacob RH S () ; // hook point } void FVM :: com pu t eJ ac ob R HS () { compute_jacobrhs - > execute () ; } void FVM :: applyBCs () { for ( int i = 0; i < bcs . size () ; ++ i ) bcs [ i ] - > execute () ; }

Code Listing 5.16: Implementation of the FVM generic algorithm.

In figure 5.5 we see that NMethod holds the NCommand’s. The end-user configures the behaviour of NMethod by selecting which commands are run. NCommand’s can be used to implement variations of numerical algorithms, boundary conditions, preprocessing steps, pre-allocation of memory and deallocation, etc. The NMethod class also holds the NMethodData, such that all commands can share common data via this object. NStrategy’s are also shared via the NMethodData. These are family of algorithms with specific interfaces. For example, the FVM method might create a family of flux Limiter’s that can be applied by the FVM commands to fluxes on the cell faces. NStrategy’s contribute to solve the type optimisation problem, because they can be statically bound to the commands. It is typical that the command implements the loop over the computational entities (cells, faces, unknowns) and the strategy handles the concrete numerical action in each of them.

137

Chapter 5 - Framework Design

5.1.4 Summary of the technique The presented technique results in the Method-Command-Strategy pattern, also presented in [105]. It is a compositition of adapted design patterns to provide a structured way of creating components out of numerical methods. These numerical methods are not necessarily just space discretisation methods. This technique has been applied in COOLFluiD to interface components that perform diverse tasks as input/output, handle convergence procedures, solve linear systems, move and adapt meshes, etc. The technique is based on (1) giving structure to the algorithm with fixed steps and hook points via the NMethod design; (2) enabling configuration of those hook points with flexible actions via the NCommand design and (3) allowing the commands to be parametrised either dynamically or statically to ensure high performance. Related work Notice that components within COOLFluiD are not components according to the definition by [125], because they are context specific. They cannot be used outside of the COOLFluiD environment. They can only be reused within the environment to construct complex simulations. True components are able to operate in different environments. The Common Component Architecture (CCA) project provides such components [5, 30] for scientific simulations and high performace computing [17]. SCIRun is a problem solving environment that allows interactive construction and steering of large-scale scientific simulations [64]. It has been developed by Johnson and colleagues, and it features a component environment based on generalised data-flow programming. SCIRun can also export its components within the CCA environment. The data structure management provides memory management techniques to deal with similar issues as presented in section 4.1.3.2. Quinlan and colleagues developed the Scientific Interface Definition Language (SIDL) to provide interfaces between components implemented in different languages [104]. Together with ROSE, which provides source-tosource translators, it aims at automating the generation of components from existing numerical algorithms. These can later be used within the CCA environment. In [86], MacDonald and colleagues showed how to use patterns in an automated way to generate code for parallel computing frameworks. This technique could be reused to generate automatically the structure of components, possibly connected with the techniques of section 4.3.

138

Managing Communication (5.2)

5.2 Managing Communication In this section we will present a technique that abstracts the parallel environment from the numerical methods. This technique separates the parallelisation details from the implementation of numerical algorithms, but does not deal with implementing the parallel environment itself. Parallel Environment A scientific application has a parallel environment when it computes simultaneously parts of the same simulation on different processing units. Processing units may refer to multiple computers, multiple CPU’s or multiple CPU cores. These applications are often just called parallel applications. Throughout this section we will refer to the parallel environment. In their book [89], Mattson et al. give a clear definition of this concept. We have chosen to describe it as: the group of entities that together manage the concurrency issues of the parallel execution of a scientific simulation. The environment entities are formed by algorithms, objects, functions, libraries, scripts, etc. Partition Paradigms Partitioning of scientific simulations follows one of two different paradigms: Task Partition or Data Partition [89]. Definition 5.1 (Task Partition). A simulation is partitioned by tasks when different algorithms execute simultaneously, operating the same set of data To use this paradigm, the tasks must be independent of each other. An example is the computation of the volume and surface terms on a domain. The algorithms are different and independent but operate on the same domain. They can be executed simultaneously. Definition 5.2 (Data Partition). A simulation is partitioned by data when the same algorithm is executed simultaneously on different sets of data. To use this paradigm, the data of the problem must be decomposable in independent parts. An example of data partition is a FEM simulation on a discretised domain where one group of elements is assigned to each processing unit. The present technique only focuses on the data partition paradigm since it is the most used in the solution of PDE problems.

139

Chapter 5 - Framework Design Reasons for partition One reason to partition the problem is to increase the speed of computation by employing more processing units. Another reason is to render a problem more manageable, by dividing it in simpler problems. For example, a CFD simulation around a full aircraft might be partitioned to provide a quicker solution, but also because the whole simulation data might not fit the memory of a single computer.

5.2.1 Problem Statement It has been recognised [89] that developing any kind of partition paradigm raises many difficulties. These difficulties are based in different but interacting issues: Parallelism is pervasive The most important issue is that the implementation of a parallel environment impacts the application in many points: the input and output functions; the computation of algorithms; the access to the data, etc. This impact is often at a very low programming level, for instance, with handling of distributed data. Since the impact is spread over the code, we say that the parallelism concern is pervasive. If it would be localised, it would be easier to handle by employing common OO techniques. Execution efficiency We partition the simulation to speed up its solution. The problem is that not all algorithms in a simulation can be made parallel (if they are not independent). These algorithms must be performed by one single processing unit in a a serial execution. The ratio between parallel and serial execution affects the speed-up of the solution. According to Amdahl’s law the theoretical maximum speed-up η achievable by N processors is: η=

1 (1 − P ) +

P N

(5.1)

where P is the parallel portion of the application and 1−P the serial portion. In the limit of infinite processors the maximum speed-up tends to (1 − P )−1 . For a code with a parallel portion of 95% the theoretical maximum is 20. Communication time The communication time between processing units is accounted as serial execution, since it is not present in the unpartitioned simulation. If the amount of communication increases with the number of partitions the efficiency of the parallel speed-up will degrade as shown in

140

Managing Communication (5.2) figure 5.6. A parallel application is said to be scalable if the slope of the plot is linear, i.e., adding more processors always improves the execution speed.

speed-up

s u b-

l i ne a

r

N, processors

Figure 5.6: Parallel speed-up behaviours

Efficient storage Data partition may be used to tackle large simulations that involve large data sets. Efficient storage of this data, regardless of its type, is a complex issue. For instance, efficient storage of large collections of objects of abstract types is more difficult than efficient storage of large arrays of floating point numbers. Multiple Parallel Environments Over the last decades many parallel environments [33, 52, 74, 92, 98, 99] have been developed for scientific simulations. Some implement international standards like MPI [91] or OpenMP [99] while others have a very specific nature. Each environment may be better suited for a specific problem or hardware. For instance, if the application is executed in a shared memory machine then OpenMP is preferable to MPI. Moreover, certain environments are better suited for data partition and others to task partition. This multitude of parallel environments coupled with the need to vary the environment according to the concrete problem complicates the development of parallel applications. Multiple Numerical Methods Different numerical methods require different functionality from the parallel environment. To develop a platform with multiple numerical methods we cannot tailor the parallel environment to

141

Chapter 5 - Framework Design a specific algorithm. It must provide generic functions to be used by all numerical methods, while still executing efficiently. Different developers All these issues above become more complex because not many developers of numerical algorithms are familiar with parallel programming. For the purpose of defining the research problem, we divide the population of developers into two categories: parallel developers: these developers not only develop numerical algorithms but also know about parallel programming. Typically they only use one type of parallel environment (OpenMP, MPI, PVM, etc). serial developers: these developers develop numerical algorithms but do not know, or are no interested in parallel programming. Nevertheless, they are interested in making their algorithms run faster or on larger domains. One reason for the serial developer not to use parallel programming is the complexity of the subject. It diverts his attention from his main concern. The arguments are often in the line of: “My objective is not to program parallel algorithms, but to solve some problem”. Therefore the serial developer concentrates on developing the algorithms that solve the concrete problem, and avoids programming them in a parallel environment. Problem statement The problem is the difference of objectives: the serial developer wants to concentrate solely on problem-oriented algorithms, while the parallel developer is willing to devote some effort in the parallel environment. This division of interests helps us identify the need of a software interface that will bridge both interests. Ideally, the serial developer should develop his algorithms thinking in a serial way, but be able to run them immediately in parallel. To accomplish this automatism he should use an interface which conceals any parallel operation. This interface should be totally transparent, so that it appears to conceal nothing. In fact it should seem as if the algorithm is implemented in a serial way. Taking into account the multiple issues listed above, we formulate the following research problem: 1. How can we create an interface under which parallel developers can program, and which serial developers can transparently use with minimal effort?

142

Managing Communication (5.2) 2. Can we make it efficient in communication time? 3. How can we vary the underlying parallel environment in order to use the most appropriate for each application? The solution for this problem must have the following qualities: Generic: the solution must allow to implement multiple numerical methods and allow to vary the underlying parallel environment. Transparent: the solution must be non-intrusive in the implementation of the numerical algorithms. The parallel interface should be transparent, such that developing parallel algorithms is not different than developing them in a serial environment. If so, we say that it is a “developer” friendly interface. Efficient: the solution must be efficient in terms of execution speed and data storage while minimising the time used for communication. Conceptual Solution We introduce the concept of a parallel layer which exposes an interface that the numerical methods use, but under which the concrete parallel environment operates. The parallel environment can be substituted (MPI, POSIX Threads, serial backend, etc) without affecting the code of the numerical algorithms. Thus, as seen in figure 5.7, our conceptual design of the parallel layer is composed of the parallel interface and the underlying parallel environment.

5.2.2 Partitioning the Domain Partition strategies Data partitioning of PDE problems on unstructured meshes results in the decomposition of the domain of computation. For example, this decomposition can be accomplished by the following two strategies [66, 129], where each strategy is equivalent to partitioning the dual graph of the other strategy: nodewise decomposition states that the partition boundary lies across nodes of an element. This implies that elements are uniquely owned by a partition. See figure 5.8. elementwise decomposition states that the partition boundary lies across the elements of the mesh. This implies that nodes are uniquely owned by a partition. See figure 5.9.

143

Chapter 5 - Framework Design

Numerical Methods

FVM

DG

FEM

Parallel Interface

Parallel Layer

MPI

Threads

Serial

Figure 5.7: Conceptual diagram of the Parallel layer and its interface to the parallel environments: MPI, Threads, . . .

Node-wise decomposition element node shared element partition interface

Original Domain Discretization

Partition A

Partition B

Figure 5.8: Nodewise decomposition

144

Managing Communication (5.2) Element-wise decomposition element node shared node partition interface

Original Domain Discretization

Partition A Partition B

Figure 5.9: Elementwise decomposition

Traditionally, choosing a partition strategy has deep implications on the way the parallel paradigm will be handled. The important criterion is the location of the degrees of freedom (DOF) of the method. Each DOF should uniquely belong to a partition. For example, in a cell-centred FVM the preferable strategy is nodewise, while a cell-vertex FEM method having the DOF in the cell nodes works better with a elementwise decomposition. As one of our goals is to support multiple numerical methods, we cannot rely on the knowledge of DOF positions to predetermine our choice. Instead, we note that a DOF always belongs to an element, either on the element interior or on its nodes. Because the discretisation of the element, independent of the method, can be cast into a finite element discretisation (see section 5.3.4) we choose the elementwise decomposition. As we will see, this does not affect the flexibility nor the efficiency of the design. Locality and Transparency To be transparent, the parallel layer must allow the algorithms to be developed as local (serial computation). Therefore the data that they use must appear to be local, even if it will be globally

145

Chapter 5 - Framework Design decomposed and the algorithms executed in parallel. Thus the data management must be: Local: All data necessary for the computations must be accessible locally in that partition, such that the algorithms need not explicitly fetch data from other partitions. Transparent: All data must be accessed in the same way, irrespectively of the origin. In other words the developer must not be forced to use special functions or objects to access shared data entities. To have local data, partitions must be large enough to accommodate all necessary data entities. Moreover, since different numerical methods require different sets of entities, the size of the partition must adapt to each numerical method. For example, the FEM has a compact computation stencil, needing only the local element and DOF’s. However, the FVM needs the local element, its DOF’s and the neighbour elements. Therefore, the partitioning for FVM needs an additional layer of elements. Dynamic Overlap Region To vary the partition size we allow the partitions to overlap each other. This we call the overlap region. This means that to access any data relevant to the computation, no communication takes place. In other words, all necessary data is already present in that partition, thus satisfying the locality requirement. Since the numerical method is a user defined component, the width of overlap region has to be a dynamic parameter, to be computed at runtime. Before partitioning the domain, we ask the numerical method how many elements the partitions should overlap. The partitions are then created, and the correct overlap is added to each partition, thus providing all necessary data to the numerical method. The partitioning procedure is delegated to a special PartitionMesh object that implements many algorithms (see [68]), or uses an external library, like ParMetis [65, 66]. An overlap region, as depicted in figure 5.10, is composed of overlap elements and shared DOF’s. Each DOF is owned by a unique partition and when a DOF is present in another partition without being owned by it, it is called a ghost DOF of that partition. Ghost DOF’s are updated by their counterparts during a synchronisation phase. Suppose the case where the DOF’s are placed at the nodes. The synchronisation happens between the owned nodes in one processor and the ghost nodes of another (see figure 5.10). The owned nodes send their data to the receiving nodes.

146

Managing Communication (5.2)

Overlap Region

Partition B Partition A

owned element

partition interface

overlap element

data synchronization

owned node

ghost node

sending node

receiving node

Figure 5.10: Example of overlap region and data synchronisation. DOF’s are assumed to be placed at the nodes.

5.2.3 Handling Local and Global Data Hiding Data Location We are concerned with how to hide from the developer the knowledge whether the data is local or global. To this we access all data via a special handle. This handle, named DataHandle, is part of the classes that form the parallel interface. The DataHandle class provides an interface of accessors and mutators to the data, which the numerical algorithms use. This interface avoids exposure of the communication layer to the numerics. In figure 5.11 we see that DataHandle inherits from a DataHandleInternal which is not visible to the developer. This class implements the common functions of DataHandle. DataHandle is a template class parametrised with 2 different types. It works as an STL container [121, 122], where the first template parameter (TYPE) declares the type held by the container (the handle). The second parameter (COMTYPE) is a policy class, declaring the underlying parallel environment. This policy has a default type selected, so the casual developer is not required to declare it. The second branch of the inheritance tree of figure 5.11 shows that the DataHandle can be specialised to perform actions specific to a certain parallel environment. For example, the beginSync and endSync functions are empty in the default implementation, but will contain the synchronisation

147

Chapter 5 - Framework Design TYPE:class COMTYPE:class

DataHandleInternal +size(): int +resize(size:int): void +operator[](idx:int): TYPE&

TYPE:class COMTYPE:class

TYPE:class <>

DataHandle

DataHandle

+beginSync(): void +endSync(): void

+beginSync(): void +endSync(): void

Generic implementation synchronization is empty

specialisation for MPI environment

Figure 5.11: Class diagram for DataHandle

algorithms in the MPI implementation. DataHandle provides access to an array of data of generic type. The developer views the data as if no domain partition existed. Looping over this array, he imagines the algorithm looping over the whole domain. If the simulation is serial this is in fact true. In a parallel simulation the array only contains the local entities, including the overlap, and only those entities are visited. Changing the Parallel Environment By changing the COMTYPE parameter, the developer can switch parallel environments and recompile the application. For example, the developer may change from OpenMPI [98] to MPICH2 [92], recompile without parallel support or recompile with shared memory support. COMTYPE is defined as a policy class. It exposes a ContainerType which is used as the underlying container to store the entities. The parallel environments use it to declare their own storage strategies. In code listing 5.17, we present multiple policies. The first one, LocalStlVector is used in se-

148

Managing Communication (5.2) rial environments, and stores data in an STL vector. The second class LocalGrowArray uses a special container to take advantage of specific operating system memory functions (see [70]). The third policy is used in case of a compilation using MPI. It exposes a container developed using MPI constructs. In the end, this code defines two types LOCAL and GLOBAL. LOCAL is always the default template COMTYPE, so a developer allocating a DataHandle will not need to declare it. If no parallel communication is defined (no MPI) then GLOBAL reverts to LOCAL, and all data is allocated locally. Synchronisation The DataHandle provides functions to synchronise that array: beginSync and endSync. As described in the previous section, this synchronisation happens between the ghost and owned DOF’s in the overlap region. For a complete description of how the synchronisation is performed, see [68]. For data arrays of the LOCAL type which need not be synchronised, this reverts to an empty call. For the data arrays that have been allocated as GLOBAL, these calls instruct the parallel environment to synchronise the data according to this paradigm. An MPI environment will use network communication to pass the information. A threaded environment will not do anything, because the memory is shared between all processes. Decoupling Numerics from Data Representation As seen above, DataHandle is independent of the underlying parallel environment. From the developer point of view, this creates a consistent way to access parallel distributed data sets. Therefore, the DataHandle interface plays another role: it provides the numerical algorithms with a unique form to represent data. This improves not only code reuse, but it promotes the cooperation between components which need to share data even though they are isolated. This is described in detail in section 5.3.6.

5.2.4 Storing Local and Global Data Sharing Data All DataHandles are stored in a DataStorage facility which provides a centralised access point. It works like a database of all the large data sets that a simulation is using. This allows components to use the database to share data without requiring to call functions of other components or even know about the existence of one another. Similar to the DataHandle, the DataStorage is also a template class, but with only one template parameter, as shown in figure 5.12. This parameter is the same COMTYPE of the DataHandle and it declares the selected parallel

149

Chapter 5 - Framework Design

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43

// / policy class for simulations with no communication struct LocalStlVect or { template < typename ETYPE > struct StoragePolicy { typedef std :: vector < ETYPE > ContainerType ; typedef ETYPE ElemType ; }; }; // / If GrowArray is supported set it to the default LOCAL # if HAVE_GROW ARRAY // / policy class for simulations with no communication struct Local GrowArr ay { template < typename ETYPE > struct StoragePolicy { typedef Utils :: GrowArray < ETYPE > ContainerType ; typedef ETYPE ElemType ; }; }; typedef LocalGrow Array LOCAL ; # else typedef LocalStlV ector LOCAL ; # endif // / / / / / / / / / / / / / / / / / / / / / / / / / # if HAVE_MPI // / policy for simulations with MPI communication struct Glob al Mp i Ve ct or { template < typename ETYPE > struct StoragePolicy { typedef :: MPI :: Par_Vector_MPI < ETYPE > ContainerType ; typedef ETYPE ElemType ; }; }; // / this is the default MPI interface typedef Glob a lM pi V ec to r MPI ; // / With MPI support , GLOBAL is set to MPI typedef MPI GLOBAL ; # else // / Without MPI support , GLOBAL is set to LOCAL typedef LOCAL GLOBAL ; # endif // HAVE_MPI

Code Listing 5.17: Policy classes to change the parallel environment.

150

Managing Communication (5.2) environment. Each DataStorage only holds handles parametrised with the same COMTYPE. DataStorage can be specialised for specific parallel environments, as also shown in figure 5.12. COMTYPE is a policy class that allows the DataStorage to select appropriate algorithms for allocating and aligning memory to improve performance. COMTYPE:class

DataStorageInternal +getData(name:string): DataHandleInternal +createData(name:string,size:int): DataHandleInternal +deleteData(name:string): void

COMTYPE:class

<>

DataStorage

DataStorage

+getData(name:string): DataHandle

+getData(name:string): DataHandle +createData(name:string,size:int): DataHandleInternal

Generic implemntation

specialisation for MPI environment

Figure 5.12: Class diagram for DataHandle

When a component needs a large amount of memory allocated for a data array, it requests the centralised DataStorage to create this data. In the setup function in code listing 5.18, the component MethodA allocates the DataHandle volumes from the DataStorage, by using the createData template function. The same component requests the deallocation of the volumes again from the DataStorage. Using Data Storage Many different data sets need to be stored: solution unknowns, coordinates, temporary vectors and matrices, elements, etc. Which ones should be placed in the DataStorage facility? Whenever the data falls into one of the three cases below, it should be allocated within the DataStorage, else it should be allocated within each component. 1. If the data needs to be synchronised between processors. Usually because it is the result of a coupled computation effort. 2. If the data needs to be shared between isolated components. For example, a mesh moving algorithm may reuse the volumes computed by

151

Chapter 5 - Framework Design

1 3 5 7 9 11

// Method A void MethodA :: setup () { DataStorage * ds = MeshData :: getInstance () . getDa taStora ge () ; DataHandle < real > volumes = ds - > createData < real >( " volumes " , size ) ; } void MethodA :: unsetup () { DataStorage * ds = MeshData :: getInstance () . getDa taStora ge () ; ds - > deleteData < real >( " volumes " ) ; }

Code Listing 5.18: MethodA ignores the data representation from creation to deletion.

the discretisation method. 3. Whenever the data is big or scales with the size of the problem. Placing it on the storage allows to use optimised memory allocation. Memory management Because the DataHandle interface hides the concrete data representation and the DataStorage handles allocation, the parallel environment is free to manage memory as it finds better. This is used to optimise the memory layout of large data sets according to the operating system and hardware in use. Dedicated memory allocation algorithms can be used, for instance to improve memory alignment and reduce memory fragmentation. For example, we can use techniques like the EVector developed by Kimpe [70]. Type safety Another advantage of this design is that it keeps type safety. As seen in code listing 5.19, the function getData from DataStorage is a template function. It casts the data that was stored with a generic type void* to the correct type. If the developer calls the function with the wrong template parameter it will throw a runtime exception.

5.2.5 Summary of the technique The technique presented in this section provides a way to hide the parallel environment from the developer of numerical algorithms. The DataHandle design creates an abstraction of the parallel data, such that the developer does not realise that the data is distributed. A dynamic overlap region ensures that the data necessary for the computations is always present, and

152

Managing Communication (5.2)

1 3

5

7 9

11

template < class COMTYPE > template < class TYPE > DataHandleInternal < TYPE , COMTYPE > DataStorageInternal < COMTYPE >:: getData ( const string & name ) { typedef typename COMTYPE :: template StoragePolicy < TYPE >:: ContainerType ContainerType ; string fullname = name + typeid ( TYPE ) . name () ; if ( m_database . count ( fullname ) == 0) throw N o S u c h S t o r a g e E x c e p t i o n ( " didn ’t find " + name ) ; ContainerType * ptr = static_cast < ContainerType * >( m_database [ fullname ] ); return DataHandleInternal < TYPE , COMTYPE >( ptr ) ; }

Code Listing 5.19: Type safe mechanism for retrieving the data from DataStorage.

synchronised when requested. The DataStorage design provides also a centralised place for components to exchange data, irrespectively of whether they are in the same processing unit or not. Note that we have not presented how the underlying parallel environment is implemented. This has been the research work of Dries Kimpe [67, 68, 71, 72] within the same COOLFluiD project. Performance This technique does not allow runtime selection of the parallel models or paradigms, for example changing from using MPI to threads at runtime. This is not considered a limitation, since compile time selection offers a much better support for aggressive optimization and high performance. We have tested a polymorphic DataHandle with virtual accessor and mutator functions for the overloaded [] and () operators. This simple experiment has shown an increase of global computational time of approximately 15%, which is in agreement with the observations on the cost of virtual functions [44]. Therefore, the idea of supporting runtime selection of parallel models was discarded.

153

Chapter 5 - Framework Design

5.3 Representing the domain of computation This section presents a technique to represent the domain of computation. This technique takes into account that the simulation environment may be composed of multiple discretisation methods and that the data may be both continuous or discretised.

5.3.1 Problem Statement The domain of computation is the representation of the idealised physical domain within the computer simulation. For instance, when computing the heat transfer across a metal rod, the idealised rod has some representation within the computer. This representation often involves numerous simplifications and assumptions. For example, we may simplify the rod of any screws and approximate its shape by a piecewise linear representation. Domain representation Domain representation is the collection of software entities that provides information about a part or the complete domain of computation. For example, all the cells of a discretised mesh, or all the degrees of freedom, or the faces at the boundary, etc. Usually, domain representation is associated with data that describes the discretised domain (mesh). The definition above is broader. It includes not only data, but also functionality that may be called upon to provide additional information. For instance, computing the derivatives along the domain boundary by using analytical CAD representations. Different representations The discretisation of the governing PDE’s on the domain may be done with alternative numerical methods. Each numerical method requires different discrete representations of the domain. For example, the FEM requires elements forming a continuous solution space, while the cell centred FVM requires elements that form a discontinuous solution space. This difference in requirements poses a problem. The representation must be generic enough, but considering large scale problems, it must also be efficient in terms of memory usage on the computer. This means that data structures must only be present when a method requires them. For example, a connectivity from cell to degree of freedom (DOF) may be needed for a FEM computation, but it is unnecessary for a cell centred FVM. Also, for multiple numerical methods to coexist within the same framework, the domain representation must be uniform. Therefore the framework must provide

154

Representing the domain of computation (5.3) a unique and generic interface from which each numerical method chooses a subset of functionality that it needs. Because it is likely that methods require common functionality, potential for reuse is maximised. As an example, the framework may provide a function to store a linear system in sparse format. Any cell-vertex method, Fluctuation Splitting Method (FSM), FEM or cell-vertex FVM will reuse this functionality. One domain, Many views Depending on the type of computation, different views of the domain are required. For example, computing the volume of the domain requires a global view, whereas computing the tangent to the surface point, requires a localised view of the domain surface. The first distinction we introduce is between topology and geometry. Topology studies the intrinsic structure of geometric objects. The word topology was introduced by Listing [83] to distinguish qualitative geometry from ordinary geometry in which quantitative problems are treated. We understand the need for topology if we consider that some geometric problems depend not on the exact shape of objects, but on how they are assembled together. For instance, the topological view of the domain will show that an aircraft has wings, engines and fuselage or that the landing gear is retracted. Geometry is the study of questions related with size, shape and relative position of objects. The geometric view of the domain will show how much is the length of an aircraft and what is its wingspan. Another distinction we introduce is between analytical information and numerical information of the domain. Analytical information is described in terms of functions or sets of functions. It provides information about the domain as a globally continuous object. For instance, the analytical view will show what is the normal vector to the aircraft surface or the curvature on the leading edge of the wing. Numerical information is described in terms of isolated sets of numbers. It provides local information about points in the domain. For instance, the numerical view will show how big is the deflection of the aircraft wing tip or what are the coordinates of the aircraft nose. As such, we have distinguished four different but complementary views of the domain. The topological view is global and identifies discrete parts of

155

Chapter 5 - Framework Design the domain, while the geometrical view provides information on locally continuous domain subdivisions. The analytical view is global and continuous, composed of surfaces and bodies connected together, while the numerical view is local and discrete, made of raw data in the form of collections of numbers. These views are depicted in figure 5.13, where they present different information about the same domain. Topological View

Analytical View 248 Surfaces Fuselage

Global

Wings

Surface Area 3,5 m2

Engines

Local

Domain

Coordinates [XYZ] Pressure [P] Velocity [UVW]

36000 Triangles 17,5 k Nodes

Geometrical View

Numerical View Continuous

Discrete

Figure 5.13: Domain Represented by different Views.

Problem Many views of the domain are required by the numerical methods. Can we create a reduced number of concepts generic enough to support multiple numerical methods? Do these four views provide a complete view of the domain, or are other views needed? Even if they are complete, how can these views be efficiently represented in the computer? Trying to answer the stated problem, we search for an interface that represents the computational domain and has the following qualities: Generic The interface provides abstractions able to support different numerical methods. These abstractions should be complete, representing enough information of the domain to satisfy the requirements of the methods.

156

Representing the domain of computation (5.3) Reusable A reusable domain representation allows, if possible, numerical methods to reuse the implementations from each other. This reduces the effort to develop new numerical methods. While generic means reuse of interface functions, reusable implies the reuse of implementations. Consider the following example: the framework provides an interface JacobianSparsity, with 2 implementations for cell-centred and cell-vertex meshes, as shown in figure 5.14. JacobianSparsity +build(): void

CellCenter

CellVertex

+build(): void

+build(): void

Figure 5.14: Class diagram for interface to build Jacobian matrix sparsity.

This representation of the sparsity of the domain is both generic and reusable. It is generic because it provides an interface that can be used by any numerical method, by calling the build function. It is reusable because each implementation covers a wide range of numerical methods: CellCenter for FVM, Discontinuous Galerkin, Spectral Finite Volume, and CellVertex for FSM, FEM, etc. Flexible A flexible domain representation allows the numerical method to choose which data structures to build and does not impose the existence of unnecessary structures. Flexibility also implies the ability to change the data structure at runtime, for example as a result of a mesh enrichment process. Efficient An efficient domain representation requires minimal memory and allows the numerical method to access the data structures rapidly. The interface does not force the allocation of unnecessary data structures neither does it require unnecessary execution of code. This implies a balance between allocation and recomputation of data. Consider for example, a method that needs to compute cell volumes. These can be precomputed and stored or computed at each iteration. It is not always straightforward to recognise which is more efficient.

157

Chapter 5 - Framework Design

5.3.2 Proposed Solution Our solution is composed of four views. While analysing the solution, we will show that they are sufficient for our class of problems. One Concept per View For each view, we introduce a concept to represent it. Each concept generates one or more interfaces in the design that the numerical methods use to access the domain. In figure 5.15, we show these concepts together with the views they represent, and we summarise them below. Domain Model is the concept that represents the Analytical View. Its role is to provide an access to the continuous definition of the model surfaces, possibly providing an access to a CAD representation. Topological Region is the concept that represents the Topological View. It defines regions on the domain, where the numerical methods perform their tasks. Regions are grouped in sets, for easier handling. Geometric Entity is the concept that represents the Geometrical View. By using a generic connectivity storage, it provides geometrical information for the numerical methods to build the entities (cells, faces, edges). The numerical method uses these geometric entities to perform its actions. Data Storage is the concept that provides Numerical View. It stores the arrays that contain all the numeric raw data. It also provides ways for different components to share data. Many Concepts, One Access All these concepts are relative to the same domain, so it is sensible and convenient to group their access under the same generic interface, which we call MeshData. This interface provides the access to each of the underlying concepts, in a centralised manner. This allows to create multiple MeshData objects which coexist in the case of multi-domain simulations. This MeshData follows an adapted Facade design pattern from [51]. The class diagram is represented in figure 5.16. All concepts can be accessed by proper functions, with the exception of GeometricEntity which is not explicitly present, instead represented by the ConnectivityStorage. Geometric entities depend on the numerical method. Each method creates its own by using the connectivity information, therefore only the latter is present.

158

Representing the domain of computation (5.3) Topological View

Analytical View TRS (Wings)

TRS (Fuselage)

Global

CAD NURBS

DomainModel CAD Interface

Domain

Data Storage Data Handle

Triangle GeoE's

Local

Geometric Entities (GeoE) Shape Functions Connectivity Storage

Topological Region Sets (TRS) Topological Region (TR)

TRS (Engines)

Nodes

Geometrical View Continuous

Numerical View

Discrete

Figure 5.15: Relation of Domain Views to Solution Concepts (in red).

Building the MeshData We allow each numerical method to create its own specialised data-structures. Therefore, the framework loads the minimum mesh information into memory: the coordinates of each mesh point, the variables of the solution, and the cell to node connectivity. Then the MeshCreator delegates the rest of the building process to a builder object, called MeshDataBuilder. As described by the class diagram of figure 5.16, this object is provided by each numerical method and it builds the remaining structures according to the specific needs of the method. For example, the method may need a reverse connectivity from nodes to cell or maps from surfaces to boundary faces.

5.3.3 Representing the Topology Topological Regions To represent the topology of the domain we decompose the domain into regions of interest, which we call TopologicalRegion’s (TR). These regions represent the surfaces or volumes that form the domain. If the domain is a simple sphere, then we can imagine that the surface of the sphere is one TR and the volume is another TR albeit of different dimensionality. To these regions the numerical methods apply their computation

159

Chapter 5 - Framework Design

MeshData +getTRS(name:string): TRS* +getDomainModel(): DomainModel* +getDataStorage(): DataStorage* +getConnStorage(): ConnStorage*

TopologicalRegionSet +getTopologicalRegion(idx:int): TR*

DomainModel +computeCoord(tr:TR*,xi:Coord): Coord

DataStorage +<> getStorage(): DataHandle

ConnectivityStorage +getConnectivity(name:string): ConnTable*

Figure 5.16: Class Diagram of MeshData Concepts.

get builder

MeshCreator

NumericalMethod

+createMeshData(): void +getMeshBuilder(): MeshDataBuilder* +solve(): void

delegate build

MeshDataBuilder +build(): void

FEM_DataBuilder

FVM_DataBuilder

FSM_DataBuilder

+build(): void

+build(): void

+build(): void

Figure 5.17: Collaboration diagram of MeshBuilder.

160

Representing the domain of computation (5.3) actions. For instance, we can calculate the surface or the volume by numerical integration. Sets of Topological Regions It is convenient to group together the regions that have in common the same actions. Since order is not relevant, this group forms a set, called TopologicalRegionSet (TRS). In figure 5.18, one TRS is the “Wings” and is composed of all the CAD surfaces that define the aircraft wings. The same for the fuselage and engines. Each surface is a TR. Note that a TRS may form a discontinuous representation, in this case 2 separate wings. TR's compose TRS "Wings" TRS "Wings"

TRS "Fuselage"

TRS "Engines" Figure 5.18: TRS’s are composed of TR’s.

In figure 5.19, we show the relation between the classes that implement these concepts. A TRS provides an interface to access each TR. It also gives access to all the states (variables) and nodes (coordinates) in that TRS as a global group. The TR provides the information that the numerical methods need to build GeometricEntity’s (see section 5.3.4). This information is stored in large connectivity tables (class ConnTable) and indicates which Node’s and State’s compose a given entity. Numerical Actions Defining these TRS’s makes it easier to define where to apply the numerical actions. For example, consider applying the wall Boundary Condition (BC) to the fluid flow around the aircraft. It is simpler

161

Chapter 5 - Framework Design TopologicalRegionSet +getName(): string +getAllStates(): vector +getAllNodes(): vector +getTR(idx:int): TopologicalRegion 1 contains *

TopologicalRegion +getNodeConn(): ConnTable* +getStateConn(): ConnTable* +getNodesInGeo(idxGeo:int): vector +getStatesInGeo(idxGeo:int)(): vector

states and nodes connectivity

1

2

ConnTable +nbCols(row:int): int +nbRows(idxGeo:int): int +operator(row:int,col:int): int

Figure 5.19: Class diagram of TopologicalRegion’s.

to define that the BC is applied to the TRS’s “Wings”, “Fuselage” and “Engines” than to define that it applies to all 248 surfaces that make out the aircraft. Moreover if we want to compute the isolated force on the wings, it is more convenient to apply this computation to the “Wings” TRS. Thus, one of the objectives of the TR and TRS concepts is to create a logical representation of the domain where numerical actions can be applied. If we use the technique presented in section 5.1.3.1, numerical actions are designed a Commands with a generic execute function. For some commands this function calls an executeOnTRS function for all TRS’s that we pass to the command (at configuration). This design proves very flexible, because it decouples the knowledge of what to do from where to apply it. For example, we can define a command that adapts the mesh, but only apply it to a small region of the domain, or in a fluid-structure simulation we can define a command that applies to the fluid-structure interface and discretises the transfer of forces and heat flux from one system to the other. Bridge to CAD The other objective of the TR concept is to bridge the discrete view of the mesh, to the continuous CAD view provided by the DomainModel presented in section 5.3.5. Each surface of the CAD model is interfaced by a TR, and if required by the numerical methods, a CAD link is established. The methods can use this interface to interpolate points in the CAD model, using surface parametrisations. As shown in figure 5.20, this interface is used for moving nodes in the boundary from a piecewise linear approximation to a higher-order boundary

162

Representing the domain of computation (5.3) approximation. Complemented with proper high-order boundary conditions, this leads to highly accurate simulation of boundary phenomena [140].

Triag P2P2

CAD Surfac e

Curved Triag P2P2

Figure 5.20: Using CAD definition for high-order boundary representation.

5.3.4 Representing the Geometry Most numerical discretisations are performed on some geometric entity: compute the flux through a face, compute the heat source in a cell, compute the distance to a wall, etc. All these computations require geometry information, like coordinates, areas, volumes and solutions at certain points. Geometric Entities Within our design, GeometricEntity is the interface that provides such geometric information. Geometric entities decouple the numerical computations from the elements on which they are computed. This allows to vary the element characteristics without changing the numerical algorithms. Some examples are: • program an algorithm that computes element distance to the nearest wall independent of dimension (it will work for 2D as well as 3D). • program a finite element discretisation, independent of the element shapes: triangles, quadrilaterals, tetrahedra, prisms, etc. • program the same finite element discretisation, independent of the order of approximation of the solution. For example, for first order (P1) or higher order elements (P = 2).

163

Chapter 5 - Framework Design • select different solution spaces, for instance Lagrange nodal functions or Chebyshev modal functions. Presented in figure 5.21, GeometricEntity provides geometry and solution related functions. It is composed of Node’s and State’s. Nodes represent coordinates and States represent solution variables. Given an ordered set of Node’s and a function definition, it provides a continuous representation of the geometry. Given an ordered set of State’s and another function definition it provides a locally continuous solution representation. GeometricEntity +getStates(): vector +getNodes(): vector +getNeighbourGeoEnts(): GeometricEntity* +getShape(): GeoShape +computeVolume(): CFreal +computeSolution(x:Coord): State

1 *

Node

*

State

1

GEOSF:ShapeFunction SOLSF:ShapeFunction

Cell +getShape(): GeoShape +computeVolume(): real +computeSolution(x:Coord): State GEOSF:ShapeFunction SOLSF:ShapeFunction

Face +getShape(): GeoShape +computeVolume(): real +computeSolution(x:Coord): State GEOSF:ShapeFunction SOLSF:ShapeFunction

Edge +getShape(): GeoShape +computeVolume(): real +computeSolution(x:Coord): State

Figure 5.21: Class diagram for Geometric Entity.

This element representation can, for example, provide solution interpolation or compute the gradients on a finite element given the coordinates in the parametric space inside the element. This is done by mapping the valˆ to the physical space K via a transformation ues in the parametric space K

164

Representing the domain of computation (5.3) F , as shown in figure 5.22. To implement this, the concrete entities Cell, Face and Edge derive from GeometricEntity and are parametrised by the functions that define the solution and the geometry. These functions are C++ template parameters to ensure minimal performace loss. Only the function call across the GeometricEntity interface is virtual. Inside, the implementation is made of static calls to the functions which increase the probability of compiler optimisations. For this class this is very important since it is used in the innermost computation loops.

x = F( , )

1

Y

K

K 0

1

Node State

X

ˆ to physical space K. Figure 5.22: Transformation from parametric K

Using functions for describing the geometry and the solution independently is the standard Finite Element way of describing elements [41, 109, 146]. Because this element representation is very generic we assume it is enough to describe all elements for every numerical method. Section 8.1 confirms this by showing different numerical methods using this element description. For example, the finite volume cell-centered element is considered a P1 Lagrange function for the geometry and a P0 (constant) Lagrange function of the solution. Shape Functions Finite element representation is generic because it uses a very powerful mathematical abstraction: shape functions. They are used to describe the solution or the geometrical space. This allows to write a generic numerical method which can change its behaviour by choosing the proper function spaces. For example, in structural analysis continuous Lagrangian spaces are preferred, but for MHD applications the Raviart-Thomas [108] solution space solves the divergence free problem implicitly. As shown in figure 5.21, the concrete entities Cell, Face and Edge are generic templates

165

Chapter 5 - Framework Design which take two ShapeFunction’s as type parameters. By avoiding excessive subclassing, we reduce the size of the inheritance tree and simplify the code. In code listing 5.20 we see an example code that implements the interface class GeometricEntity and the implementation of Cell. Note how the pure virtual functions, like computeGeoShapeFunc, are implemented in Cell by calling statically the function computeShapeFunction in the shape function template. 1 3 5 7 9 11 13 15 17 19 21 23 25 27

typedef std : valarray < double > DVector ; class Geometr ic En t it y { public : std :: vector < State * >& getStates () { return m_states ; } std :: vector < Node * >& getNodes () { return m_nodes ; } virtual double computeVolume () = 0; virtual DVector c o m p u t e G e o S h a p e F u n c ( const DVector & mapcoord ) = 0; virtual DVector c o m p u t e S o l S h a p e F u n c ( const DVector & mapcoord ) = 0; }; template < typename GEO_SHAPE_FUNCTION , typename SOL_SHAPE_FUNCTION > class Cell : public Ge om et r ic En t it y { public : virtual double computeVolume () { return G E O _ S H A P E _ F U N C T I O N :: computeVolume ( m_nodes ) ; } virtual DVector c o m p u t e G e o S h a p e F u n c ( const DVector & mapcoord ) { return G E O _ S H A P E _ F U N C T I O N :: c o m p u t e S h a p e F u n c t i o n ( mappedCoord ) ; } virtual DVector c o m p u t e S o l S h a p e F u n c ( const DVector & mapcoord ) { return S O L _ S H A P E _ F U N C T I O N :: c o m p u t e S h a p e F u n c t i o n ( mappedCoord ) ; } };

Code Listing 5.20: Geometric Entity example code.

Together with the self-registration technique presented in section 4.1, the developer can register his own elements into Kernel by varying the parametrisation of the Geometric Entities. In code listing 5.21 we see a list of the providers that are registered by compiling and linking to this code. High order elements The high-order shape functions, see figure 5.23, are commonly used to increase the solution accuracy by improving the finite element approximation. In space, high-order shape functions are used to define curved elements, suitable for curved geometries. By combining solution and space variations, we can construct many different types of finite elements.

166

Representing the domain of computation (5.3)

1 3

// / Lagrange Triangle cell P1 geometry , P3 solution ObjectProvider < Cell < La grangeS hapeFun ctionTri agP1 , Lagr angeShap eFuncti onTriag P3 > > CellTriagLagrangeP1LagrangeP3 (" CellTriagLagrangeP1LagrangeP3 ");

5 7 9 11 13

// / Lagrange Quadrilateral cell P1 geometry , P2 solution ObjectProvider < Cell < LagrangeShapeFunctionQuadP1 , LagrangeShapeFunctionQuadP2 > > CellQuadLagrangeP1LagrangeP2 (" CellTriagLagrangeP1LagrangeP2 "); // / Lagrange Tetrahedron cell P2 geometry , P2 solution ObjectProvider < Cell < La grangeS hapeFun ctionTet raP2 , Lagr angeShap eFuncti onTetra P2 > > CellTetraLagrangeP2LagrangeP2 (" CellTetraLagrangeP2LagrangeP2 ");

Code Listing 5.21: Registering different types of Geometric Entities.

P2 Lagrange Triangle

P3 Lagrange Triangle Nodes

P1 Lagrange Quad

P2 Lagrange Quad

Figure 5.23: Example of shape functions of multiple orders.

167

Chapter 5 - Framework Design Parametrisations The element type on the left of figure 5.24 is constructed with a first order polynomial shape function (P1) for defining the geometry and a second order (P2) for the solution. Because the space parametrisation is lower than the solution order, we call it a subparametric element. The element in the right has the inverse setup, a P2 space parametrisation and a P1 solution approximation, therefore is a superparametric element. The element in the middle we call isoparametric because it has the same order (P2) for space and solution. This technique allows the three types of parametrisations.

Subparametric

P1P2

Isoparametric

Superparametric

P2P2

P2P1

States define Solution Nodes define Geometry Figure 5.24: Types of Element Parametrisations.

States and Nodes The classes Node and State provide the access to the numerical data that the shape functions use to compute the concrete values. They are stored globally in large arrays of data (see section 5.3.6) but accessed locally via the GeometricEntity class. Using nodes and states we can interpolate respectively the geometry and the solution in the entity, following for instance expressions as

x(ξ) =

X

Nix (ξ)xi

(5.2)

Niu (ξ)ui

(5.3)

i

u(ξ) =

X i

Here Nix,u , xi and ui are respectively the shape functions, the nodal values and the state values.

168

Representing the domain of computation (5.3) Storage of Connectivity To correctly associate the nodes and states to each geometric entity, it is necessary to know the mesh connectivity information, which is stored in ConnTable objects. This information is one of the most memory consuming in an unstructured simulation, therefore ConnTable has been developed strictly for low memory consumption. Figure 5.25 exemplifies the construction of ConnTable for a small mesh. Note that we support hybrid meshes with elements of different types, therefore the table has this mixed information. Nevertheless, to optimise memory alignment and avoid indirections, we designed this table to be a rectangular matrix. The invalid entries are marked with the maximum integer possible to represent by the computer. In principle, these entries are never accessed directly by the algorithms. The marker is useful to obtain the information of the number of valid columns per row of the matrix. This design has been tested against many other alternatives, like vector of vectors, or vector of pointers and it showed to have the fastest access time and the lighest memory footprint. The code that implements this class is presented in code listing 5.22 and code listing 5.23. Each ConnTable object is stored in a ConnectivityStorage facility, accessed via the MeshData interface. This allows each numerical method to create the dedicated connectivity tables to correctly build their own geometric entities. This is done by the MeshDataBuilder object, that the numerical method provides to the MeshCreator. The method associates a name to the connectivity when placing it in the storage, which can then be reused by any other numerical method. Sharing the connectivity tables avoids unnecessary double storage of large objects. Building the entities When we first implemented the GeometricEntity design and compared the memory usage to similar solvers, we were surprised to see that too much memory was used. With further investigations we learnt that storing all GeometricEntity objects in a simulation consumes too much memory. Therefore we introduced another concept in the design, the GeoBuilder. Its responsibility is to build the geometric entities on-the-fly, at the request of the numerical method, using the connectivity information. This avoids to store the GeometricEntity’s. Accessing the connectivity information is not done globally, but via the TopologicalRegionSet interface. This allows to loop only over the subset of entities that form TRS. This is the reason why the TR class provides only access to the connectivity, states and nodes although we defined a TR as being composed of GeometricEntity’s. In this design, we do not store the

169

Chapter 5 - Framework Design

ConnTable ID : 1

1

4

9

MAX

MAX

ID : 2

1

2

3

6

4

ID : 3

4

5

8

MAX

MAX

ID : 4

9

4

8

5

MAX

ID : 5

4

6

7

MAX

MAX

2 1

3 2

1

9

4 5

4 5

3 8

6

7

Figure 5.25: Example of a connectivity table (ConnTable).

objects but just what they are made of. In figure 5.26 we see that the client, in this case FEMCommand, accesses a FEMGeoBuilder object. This builder has a pool of GeometricEntity’s. These are pre-built and the command accesses them by calling buildGE. When the command has done its job, it returns the entity to the pool by using the function releaseGE. In code listing 5.24 we see how the client code uses the GeoBuilder. This client is usually a numerical command as described in section 5.1.3.1. First the client gets the GeoData from the GeoBuilder then it sets the TRS on the data. This will inform the GeoBuilder from where to retrieve the connectivity information to build the GeometricEntity. When the loop starts, the client assigns each iteration the ID of the cell to build to the GeoData. The client then requests the builder to build the GeometricEntity. Once built, the client uses the entity for his computations. In the end, before continuing to the next entity he must release the current entity. This allows

170

Representing the domain of computation (5.3)

2 4

6 8 10 12 14

16 18 20 22 24 26 28 30

typedef unsigned int uint ; template < class T > class C onn ect i v i t y T a b l e { public : // methods Co nne ctiv it y T a b l e () : m_nbentries (0) , m_nbrows (0) , m_nbcols (0) , m_table (0) {} Co nne ctiv it y T a b l e ( const valarray < uint >& colpattern , T value = T () ) { resize ( colpattern , value ) ; } ~ C onn ect ivi t y T a b l e () { deallocate () ; } void resize ( const valarray < uint >& colpattern , T value = T () ) { deallocate () ; const uint maxCol = findMaxCol ( colpattern ) ; allocate ( colpattern . size () , maxCol ) ; putPattern ( colpattern , value ) ; } const ConnectivityTable & operator = ( const ConnectivityTable & other ) { deallocate () ; allocate ( other . m_nbrows , other . m_nbcols ) ; copy ( other . m_table ) ; m_nbentries = other . m_nbentries ; return * this ; } T & operator () ( uint iRow , uint jCol ) { return m_table [ jCol * m_nbrows + iRow ]; } T operator () ( uint iRow , uint jCol ) const { return m_table [ jCol * m_nbrows + iRow ]; } uint nbRows () const { return m_nbrows ;} uint nbCols ( uint iRow ) const { uint nbcol = m_nbcols ; for (; nbcol >= 1; -- nbcol ) if ( m_table [( nbcol -1) * m_nbrows + iRow ] != NOVALUE ) return nbcol ; return 0; } };

Code Listing 5.22: Code for the hybrid connectivity table, public functions.

the builder to reuse it again. This design has shown not only a reduction of the memory, but in certain conditions also speed up of the loop over the elements in the mesh, by reducing the memory that the processor needs to fetch. Entities for every method Now we have set all the ingredients to enable a flexible building of specialised elements for each numerical method. We can create a special GeoBuilder for each numerical method, which assembles the entities on-the-fly. This builder gets the nodes, states and connectivity via the TRS to build each entity. The builder selects the concrete type of entity according to the shape functions best suited for the problem. If special connectivity tables are needed, these have been already created and registered in the ConnectivityStorage by the MeshDataBuilder of the same numerical method. This way, we can create specialised discrete entities, potentially

171

Chapter 5 - Framework Design

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38

40 42

template < class T > class C onn ect i v i t y T a b l e { private : // helper functions void allocate ( uint nbrows , uint nbcols ) { m_nbrows = nbrows ; m_nbcols = nbcols ; m_table . resize ( nbrows * nbcols ) ; } void deallocate () { m_nbentries = 0; m_nbrows = 0; m_nbcols = 0; if ( m_table . size () != 0) std :: vector () . swap ( m_table ) ; } uint findMaxCol ( const valarray < uint >& colpattern ) { uint maxCols = 0; for ( uint iRow = 0; iRow < colpattern . size () ; ++ iRow ) if ( maxCols < colpattern [ iRow ]) maxCols = colpattern [ iRow ]; return maxCols ; } void putPattern ( const valarray < uint >& colpattern , const T & value ) { m_nbentries = 0; for ( uint iRow = 0; iRow < m_nbrows ; ++ iRow ) { m_nbentries += colpattern [ iRow ]; for ( uint jCol = 0; jCol < m_nbcols ; ++ jCol ) { if ( jCol < colpattern [ iRow ]) (* this ) ( iRow , jCol ) = value ; else (* this ) ( iRow , jCol ) = NOVALUE ; } } } void copy ( const std :: vector & other ) { const uint tsize = m_nbrows * m_nbcols ; for ( uint i = 0; i < tsize ; ++ i ) m_table [ i ] = other [ i ]; } // data uint m_nbentries ; // / number of entries in the table uint m_nbrows ; // / row size uint m_nbcols ; // / maximum column size std :: vector m_table ; // / storage of the table static const T NOVALUE ; // / value means place in memory shouldn ’t be used }; template < typename T > const T ConnectivityTable :: NOVALUE = std :: numeric_limits :: max () ;

Code Listing 5.23: Code for the hybrid connectivity table, private functions and data.

172

Representing the domain of computation (5.3)

FEMCommand

Client

buildGE releaseGE

FEMGeoBuilder

GeoEntPool

-states: DataHandle -nodes: DataHandle -pool: GeoEntPool -data: FEMGeoData +buildGE(): GeometricEntity* +releaseGE()() +getGeoData(): FEMGeoData

+getGeoEnt(): GeometricEntity* +returnGeoEnt(geoent:GeometricEntity*): void

Geometric Entities

GeoData +trs: TopologicalRegionSet* +cellID: unsigned int

Figure 5.26: Example of a GeoBuilder for the FEM method.

2

// / these usually are given to the function T o p o l o g i c a l R e g i o n S e t cells = MeshData :: getInstance () . getTrs ( " cells " ) ; FEMGeoBuilder geoBuilder ;

4

// **** LOOP OVER CELLS **** // 6 8 10 12 14

FEMGeoBuilder :: GeoData & geoData = geoBuilder - > getGeoData () ; geoData . trs = cells ; // choose on which TRS to loop const CFuint nbGeos = cells - > getNbGeoEnts () ; for ( unsigned int cellID = 0; cellID < nbGeos ; ++ cellID ) { // build the Ge om et r ic En ti t y geoData . idx = cellID ; GeometricEnt it y & cell = * geoBuilder - > buildGE () ;

16

// ... computation on the cell goes here ... // vector < State * >& cell_states = cell . getStates () ; vector < Node * >& cell_nodes = cell . getNode () ; ... ... // release the G e om et ri c En ti t y when done geoBuilder - > releaseGE () ;

18 20 22 24

}

Code Listing 5.24: Example of usage of the FEMGeoBuilder.

173

Chapter 5 - Framework Design very performant, without allocating unnecessary data structures. Moreover, the GeometricEntity provides convenient interpolation functions that allow the developer to program arbitrary high-order discretisations with less effort. This seems to satisfy all solution qualities defined in section 5.3.1.

5.3.5 Representing the Analytical Model Interfacing CAD information The analytical view over the domain is represented by the DomainModel abstraction. This class, shown in figure 5.27, provides functions to access the definition of the domain surfaces. This representation is usually piecewise analytical. Behind each surface there is a definition of each point in a parametric space that maps to a point in the physical space. This parametric function uses primitives like planes, spline curves, NURBS surfaces, etc. Using this parametrisation we can compute not only position, but also derivatives and curvature of the surfaces. DomainModel +computeCoord(tr:TR*,xi:Coord): Coord +compute1stD(tr:TR*,xi:Coord): Vector +compute2ndD(tr:TR*,xi:Coord): Matrix

AnalyticalModel +computeCoord(tr:TR*,xi:Coord): Coord +compute1stD(tr:TR*,xi:Coord): Vector +compute2ndD(tr:TR*,xi:Coord): Matrix

GET_Model +computeCoord(tr:TR*,xi:Coord): Coord +compute1stD(tr:TR*,xi:Coord): Vector +compute2ndD(tr:TR*,xi:Coord): Matrix

GET CAD Library

VTM_Model +computeCoord(tr:TR*,xi:Coord): Coord +compute1stD(tr:TR*,xi:Coord): Vector +compute2ndD(tr:TR*,xi:Coord): Matrix

VTM CAD Interface

Figure 5.27: Class diagram for the DomainModel.

For some definitions the underlying representation is not analytical, but an approximation with a fine geometry discretisation. This is the case of the STL [1] triangulated surface format, very popular due to its robustness. There are many CAD systems on the market capable of representing com-

174

Representing the domain of computation (5.3) plex geometries, so one of the objectives of the design is to be generic and independent from any specific system. To enable the use of any system, the DomainModel class is abstract and leaves the work to the concrete implementations, for example the VTM CAD library of Dorochenko [43] or the GET library of Majewski developed as a component. The functions assume that the underlying model is continuous and analytical. Its is up to the concrete implementation to mimic this behaviour in case of non-analytical representations. One implementation, provided with the solution, is the AnalyticalModel which allows the user to define analytical functions for each surface. This is to be used in the most simple models used for testing and research.

5.3.6 Representing the Numerical Data Storing Numerical Data To provide general access to stored data related to the geometry, like coordinates and normals, we created the interface DataStorage (described in detail in section 5.2.4). DataStorage allows numerical methods to declare a data array of arbitrary type and size, and to store it with an associated name. This name works like a contract. Any method that knows the name and the type of the data can access it. This allows for numerical methods to share data without having to explicitly declare it in their interfaces, as shown in code listing 5.25.

2 4

// Method A void MethodA :: do_something () { DataStorage * ds = MeshData :: getInstance () . getDa taStora ge () ; DataHandle < real > volumes = ds - > createData < real >( " volumes " , size ) ;

6

// computes the volumes volumes [ i ] = ... // and uses them for some computation

8 10

}

12

// Method B void MethodB :: do_stuff () { DataStorage * ds = MeshData :: getInstance () . getDa taStora ge () ; DataHandle < real > volumes = ds - > getData < real >( " volumes " ) ;

14 16

// reuses the volumes without knowing who computed them

18

}

Code Listing 5.25: Using and reusing numerical data between methods.

175

Chapter 5 - Framework Design Common Data Representation More important than sharing, this design creates a standardised format of data that all components in the environment use, thus improving coexistence of components. To handle the data in a uniform way, the methods access it via the DataHandle interface which is explained in section 5.2.3. This interface also shields the methods from the concrete communication paradigm that synchronises the data in a parallel simulation. This storage facility also decouples the way the data is placed in memory from the handling of the data. This way the storage algorithm may be changed to more efficient memory layouts without affecting the numerical methods. For example, when supported by the operating system, we are able to use growable arrays, as described in [69].

5.3.7 Summary of the technique Each numerical method requires different information of the domain of computation. In our design, this information is categorised in different views of the domain. To completely represent the domain we have selected four views. Each view is implemented by abstract interface. All four interfaces are connected to each other, both conceptually and in terms of code implementation. The TR links to the CAD definition that is provided by the DomainModel. The TRS groups the TR’s and holds the connectivity of the GeometricEntity’s that are created by a GeoBuilder which is method specific. The methods use the geometric entities together with numerical data, like coordinates and variables, stored in DataStorage and accessed via DataHandle’s to perform their computations. Alternative Representations The AOMD project [6] developed by Remacle et al. [110], focuses on managing the mesh as a collection of discretised entities. It provides much more services than our solution like automatic mesh refinement and coarsening. It does so in parallel, dealing with issues of load balancing. However, it does not provide a CAD view of the computational domain. The project CGAL [32] developed mainly by INRIA deals with operations on geometrical objects with focus on mathematical correctness. Its design [48] makes heavy use of template metaprogramming. It is targeted at triangulation and convex-hull problems. It is more appropriate for mesh generation or adaptation rather than representing meshes for numerical discretisation. It does however introduce interesting concepts. For example the circulators that represent circular sequences of edges and faces, could be used in our geometric entities.

176

Chapter 6

A system is a network of interdependent components that work together to try to accomplish the aim of the system. A system must have an aim. Without an aim, there is no system. W. Edwards Deming (1900-1993)

Component based architecture This chapter presents two techniques to construct a component environment for scientific computing. These techniques try to break dependencies between different aspects of the environment such that components can be developed in isolation. In section 6.1 we provide a technique to decouple the numerical discretisation methods from the components that describe the physical models. We do so to support the development of an environment where multiple methods can discretise multiple physical models developed in isolation. In section 6.2 we look at how we can combine multiple numerical discretisations within one simulation involving many different physical models. We call these multi-physics simulations. We search for solutions where we reuse components with minimal modification of the existing code.

6.1 Decoupling Numerics from Physics Numerical methods discretise PDE’s that represent some laws of physics. The PDE’s constitute a physical model of reality, composed of diverse entities like transport coefficients, constitutive laws, physical properties, that together form the description of some physical phenomena. For example, equation (6.1) shows a generic PDE of a conservation law, ∂u + ∇ · F(u) = S(u) (6.1) ∂t where u, F(u) and S(u) are physical entities, respectively, the solution variables, the conserved flux and the source term. These entities need to be modelled in the software.

177

Chapter 6 - Component based architecture

6.1.1 Problem Statement Different Methods PDE’s may be of different types and have different mathematical behaviour. For each mathematical type, some numerical methods are more appropriate. In a multi-physics environment it is important to allow multiple methods in order to deal with multiple kinds of physics. Different Formulations Moreover, each physical model may be recast with different formulations. For example, equation (6.1) is in differential form. It can be rewritten in integral form (see equation (6.2)) or quasi-linear form (equation (6.3)), I Z Z ∂u dV + F(u) · n d∂V = S(u) dV (6.2) ∂V V V ∂t ∂u ∂F + ∇ · u = S(u) ∂t ∂u

(6.3)

We use different formulations, because each numerical method has a preferential form: FVM discretises integral forms, FSM uses quasi-linear forms, and FEM is based on a weak formulation of the differential form. Models extend other models Another important issue is that quite often physical models are written as extensions to other physical models. For instance, the turbulent k −  Reynolds averaged Navier-Stokes equations are an extension to the laminar Navier-Stokes equations. All the terms remain the same, but additional equations for the variables k and , and some source terms are added. To avoid reimplementing the basic models, the extended models must be able to reuse the existing descriptions. Multiple vs Multiple Our ultimate goal is to have within the same environment, multiple discretisation methods that can be applied to any physical model whenever mathematically applicable. This implies that the numerical methods are not directly coupled with the physics but access them by some API definition. This multiplicity of connections is depicted in figure 6.1. It gives an idea of how complex it is to design such API. Incompatible connections Another evidence of the complex relation between numerics and physics is that some of the numerical components cannot be coupled with some physical models due to mathematical limitations. This

178

Decoupling Numerics from Physics (6.1)

Numerical Methods

Physical Models

FiniteVolume

NavierStokes

FiniteElement

MHD

FluctSplit

HeatTransfer

mathematically incompatible components

Elasticity

Component Environment Figure 6.1: Multiplicity of connections between Numerical Methods and Physical Models.

is exemplified in figure 6.1 by the incompatible links between FluctSplit and the HeatTransfer and Elasticity models.

6.1.2 Different Perspectives Difficult Interface A unique interface for the physical model concept is hard to define due to the multiplicity of numerics and formulations. Moreover, such interface would be hard pressed to forsee all types of requirements: in a pluggable architecture, new numerical models can be added that request different treatments of the physics. Perspectives Therefore, we design the physical model as a collection of Perspectives over the modelled physics. Each perspective offers a different view, according to the specific needs of different numerical methods. Each numerical method uses a certain set of perspectives. For example, a method based on the integral form of conservation laws requires the computation of the conserved fluxes (mass flux, energy, etc). Therefore, a Flux is created to provide this feature and it can be reused by other types of methods that also require fluxes.

179

Chapter 6 - Component based architecture PhysicalModel

VariableSet

+getNbEquations() +validate()

+computeEigenValues() +projectJacobian()

VarTransformer

OtherPerspective

+transform() +computeMatrix()

+function_1() +function_2()

ConcreteModel

MyVariableSet

MyVarTransformer

MyPerspective

+physicalData +someCoeff +getNbEquations() +validate() +getPhysicalData() +getCoeff()

+model: ConcreteModel* +computeEigenValues() +projectJacobian()

+model: ConcreteModel* +transform() +computeMatrix()

+model: ConcreteModel* +function_1() +function_2()

Figure 6.2: Class diagram of Perspectives of a Physical Model.

In figure 6.2, we see the class diagram of the perspectives of a generic physical model. The base PhysicalModel defines a general abstract interface. ConcreteModel derives from it, implements the virtual methods and defines another interface to which the concrete Perspective objects are statically bound. This other interface offers data and functionalities which are typical of a certain physics, but invariant to all its possible Perspectives. Static Binding Such design does not prevent numerical objects, like Strategies and Commands (see section 5.1), from binding statically to a given physics. This is used for high performance algorithms, special dedicated schemes or boundary conditions. In such cases, the use of perspective objects can improve reuse and avoid excessive sub-classing. Adding Perspectives If a new numerical method requires a different formulation, not yet foreseen, then the model description can be enhanced with a new perspective. In this case a new physical component must be created to provide the new perspective, while reusing the existing ones. For example, the FEM requires the weak formulation of the PDE’s, but it will reuse the same set of variables as the FVM. Comparison to Adapter Design Pattern This design can be considered as a modified composition-based Adapter pattern [51]. It is similar to the standard pattern, but with a single shared Adaptee object (ConcreteModel), and multiple abstract Targets which we call Perspectives, each one with a number of derived classes that work as Adapters.

180

Decoupling Numerics from Physics (6.1) Analysis The fact that several objects may be defined to describe the same physics may look like a disadvantage, but, in our experience, improves reusability and allows easier decoupling of physics from numerics while providing better support to a runtime plug-in policy. Nevertheless, we make the effort of keeping the number of perspectives reduced, to avoid that each numerical method requires its own interface. That would lead to extra effort to maintain a large number of combinations method-physics or create many incompatibilities. 6.1.2.1 Physical Model Abstract Interface We present code listing 6.1 as an example of the construction of the physical model interface. A polymorphic BaseTerm object and a corresponding enumerated variable are assigned to each physical term of the equation (convection, diffusion, reaction, dispersion, inertia, etc) and registered in an associative container. This means that a physical model equation is composed of terms. 1 3

class PhysicalModel { public : // enumerated variables used to define equation term types enum TermID { CONVECTION , DIFFUSION , REACTION , DISPERSION };

5 7 9 11 13

PhysicalModel ( string name ) ; // constructor virtual ~ PhysicalModel () ; // destructor virtual void setup () = 0; // set up private data virtual int getDimension () const = 0; // geometric dimension virtual int getN bEquatio ns () const = 0; // number of equations // accessor to the terms BaseTerm * getTerm ( TermID t ) { return term_map . find ( t ) ;}

15 17 19 21

protected : // registers the polymorphic BaseTerm object void registerTerm ( TermID t , BaseTerm * bt ) { term_map . insert (t , bt ) ;} private : // map between TermID and a polymorphic physical term std :: map < TermID , BaseTerm * > term_map ; };

Code Listing 6.1: Physical Model Interface.

Physical Terms Presented in code listing 6.2, BaseTerm manages a number of arrays with physics-dependent data, which contain thermodynamic quantities, transport coefficients, eigenvectors, eigenvalues, etc. These arrays are

181

Chapter 6 - Component based architecture continuously recomputed during the simulation and used to compute quantities of interest, like conserved fluxes or variable transformations. They are used, for example, to evaluate physical values in many quadrature points in one pass. Therefore, the maximum number of the these arrays is determined by the numerical method. The size and the content of these arrays is defined by the classes deriving from BaseTerm.

2

class BaseTerm { public : // constructor , virtual destructor ...

4

// sets the number of required data arrays void setNbDa t aA rr ay s ( int max_size ) ;

6

// gets the size the arrays virtual int getDataSize () const = 0;

8 10

// gets the array of data of the given index double * getP hy s ic al D at a ( int array_idx ) ;

12

};

Code Listing 6.2: BaseTerm Interface.

Composing the Physical Terms A template compositor object is defined for each possible combination of physical equation terms (Convection, Diffusion, ConvectionDiffusion, etc). It derives from PhysicalModel and aggregates trait classes [4] which are subclasses of BaseTerm. In code listing 6.3 we consider a ConvectionDiffusion compositor. The equation terms aggregated by the compositor are registered in the parent class (see line 18). Each one of them is made accessible polymorphically through the getTerm function defined in PhysicalModel. 6.1.2.2 Example of Application Consider a Navier-Stokes model. We implement the convective term, name EulerTerm, and the diffusive term, named NSTerm, as in code listing 6.4. The terms provide accessor functions to useful parameters (Mach, Pressure, etc) and libraries that compute thermodynamic and transport properties. They also define the size and the enumerated entries of the arrays of physical data (stored in BaseTerm). The last block of code listing 6.4 defines the NavierStokesModel in the role of the ConcreteModel described in the previous section. It is this model that is created and used by the perspectives. NavierStokesModel uses the

182

Decoupling Numerics from Physics (6.1)

1 3 5 7 9 11 13 15 17 19

template < class CT , class DT > class C o n v ec t i o n D i f f u s i o n : public PhysicalModel { public : // constructor , virtual destructor , // overridden parent class pure virtual methods protected : std :: auto_ptr < CT > m_convTerm ; // convective term std :: auto_ptr < DT > m_diffTerm ; // diffusive term }; template < class CT , class DT > ConvectionDiffusion < CT , DT >:: C o n v e c t i o n D i f f u s i o n ( string name ) : PhysicalModel ( name ) , m_convTerm ( new CT ( CT :: getTermName () ) ) , m_diffTerm ( new DT ( DT :: getTermName () ) ) { // register all the equation terms and their TypeIDs registerTerm ( PhysicalModel :: CONVECTION , m_convTerm . get () ) ; registerTerm ( PhysicalModel :: DIFFUSION , m_diffTerm . get () ) ; }

Code Listing 6.3: BaseTerm Interface.

knowledge of both its concrete convective and diffusive terms to compute, for instance, coefficients to adimensionalise the equations. 6.1.2.3 Variables Numerical methods do not access directly the ConcreteModel. They use the polymorphic perspective objects to perform the physics dependent tasks. It is the perspectives that are statically bound to the ConcreteModel. An example of such perspectives are the variable sets. Variable Sets A given set of PDE’s can be written in different variable sets. A simple example are the variables for the Navier-Stokes equations:



 ρ  ρu   u=  ρv  , ρE



 ρ  u   v=  v , p

  ∂w =  

∂p ρa



 ∂u ,  ∂v 2 ∂p − a ∂ρ

(6.4)

where u are the conservative variables, v are the primitive and ∂w are the symmetrising variables.

183

Chapter 6 - Component based architecture

2 4 6 8 10

class EulerTerm : public BaseTerm { public : // enumerated variables : density , pressure , enthalpy , etc . enum { RHO =0 , P =1 , H =2 , E =3 , A =4 , T =5 , V =6 , VX =7 , VY =8 , VZ =9}; // number of the enumerated variables virtual int getDataSize () const { return 10;} double getMachInf () const { return m_machInf ;} double getTempRef () const { return m_tempRef ;} double getPressRef () const { return m_pRef ;} static string getTermName () { return " Euler " ;}

12 14 16 18 20

// get library to compute thermodynamic quantities TDLibrary * g e t T h e r m o d y n a m i c L i b r a r y () const { return m_tdlibrary ;} private : double m_tempRef ; // free stream temperature double m_pRef ; // free stream pressure double m_machInf ; // free stream Mach number TDLibrary m_tdLibrary ; // library };

22

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 24 26 28 30 32 34 36

class NSTerm : public BaseTerm { public : // dynamic viscosity , thermal conductivity enum { MU =0 , LAMBDA =1}; // number of the enumerated variables virtual int getDataSize () const { return 2;} double getPrandtl () const { return m_Prandtl ;} double getRe ynoldsI nf () const { return m_ReynoldsInf ;} static string getTermName () { return " NS " ;} // get library to compute transport properties TPLibrary * g e t T r a n s p o r t P r o p e r t y L i b r a r y () const { return m_tpLibrary ;}

38 40 42

private : double m_ReynoldsInf ; // reference reynolds number double m_Prandtl ; // Prandtl number TPLibrary * m_tpLibrary ; // library };

44

// / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / / 46 48 50 52

class N avi erS t o k e s M o d e l : public C o nv e c t i o n D i f f u s i o n < EulerTerm , NSTerm > { public : virtual int getDimension () const ; // geometric dimension virtual int g etNbEqu ations () const ; // number of equations virtual int setUp () const ; // set up data };

Code Listing 6.4: Convective and Diffusive Terms for the Navier-Stokes equations.

184

Decoupling Numerics from Physics (6.1) VarSet Perspective To interface this concept, we created a perspective called VarSet, which decouples the usage of a physical model from the knowledge of the variables being used. It allows the numerical method, for example, to solve equations formulated in conservative variables, update in primitive and compute eigenvalues with the symmetrising set of variables.

PHYSICAL

CONVECTIVE

DIFFUSIVE

MODEL

VARSET

VARSET

NAVIER STOKES

EULER VARSET

NAVIER STOKES VARSET

EULER TERM

NAVIER STOKES TERM

EULER CONS

EULER PRIM

NS CONS

NS PRIM

Figure 6.3: Class diagram for variable sets of the Navier-Stokes model.

In figure 6.3 we see two interfaces to variable sets, ConvectiveVarSet and DiffusiveVarSet, from which EulerVarSet and NavierStokesVarSet respectively derive and whose class definitions are described in code listing 6.5. The function setPhysicalData takes a given state vector (unknown variables) and sets the corresponding physical dependent data. The function setFromPhysicalData does the opposite. The entries of the array of physical data are defined by the subclasses of BaseTerm. In the case of the EulerTerm, for instance, if we assume that the state variable holds primary variables, v = [p, vx , vy , vz , T ]

(6.5)

then setPhysicalData will compute an array of physical quantities, like density, pressure, total enthalpy, total internal energy, sound speed, temper-

185

Chapter 6 - Component based architecture ature, velocity module, etc, given by: w = [ρ, p, H, e, a, T, v, vx , vy , vz ]

(6.6)

from v by means of thermodynamic relations. w is then reused to compute eigenvalues, fluxes, etc. 1 3 5 7 9 11

class Convect i v e V a r S e t { public : virtual void setup () =0; // set up private data // set physical data starting from a state vector virtual void s et Ph y si ca lD a ta ( double * state , double * data ) =0; // set state vector starting from physical data virtual void s e t F r o m P h y s i c a l D a t a ( double * data , double * state ) =0; virtual void co m p u t e J a c o b i a n s (...) =0; virtual void c o m p u t e E i g e n S y s t e m (...) =0; virtual void c o m p u t e E i g e n V a l u e s (...) =0; virtual void computeFlux (...) =0; };

13 15 17 19 21 23 25 27 29

class Diffusi ve Va r Se t { public : virtual void setup () =0; // set up private data // set physical data starting from a state vector virtual void s et Ph y si ca lD a ta ( double * state , double * data ) =0; // set state vector starting from physical data virtual void s e t F r o m P h y s i c a l D a t a ( double * data , double * state ) =0; virtual void c o m p u t e W e a k D i f f M a t (...) =0; virtual void computeFlux (...) =0; }; class EulerVarSet : public Co n v e c t i v e V a r S e t { public : // constructor , virtual destructor , overriden functions protected : EulerTerm * physical_term ; };

31 33 35 37

class Na v ie rS t o k e s V a r S e t : public Di ff u si ve Va r Se t { public : // constructor , virtual destructor , overriden functions protected : NSTerm * physical_term ; };

Code Listing 6.5: Convective and Diffusive Variable Sets.

Analysis The design is flexible beacuse each VarSet has acquaintance of the corresponding physical equation term (see figure 6.3) but not of the resulting ConcreteModel. This allows to specialise the ConcreteModel by composing progressively more complex models, while fully reusing the individual equation terms.

186

Decoupling Numerics from Physics (6.1) The specialisation of the VarSet’s is done via subclassing. For example, it is useful to have derived classes for the NavierStokesVarSet since the transport properties can be computed differently in chemically reactive or non-reactive flows, in local thermal equilibrium (LTE) or non-equilibrium (NEQ). The method computeFlux calculates the diffusive flux and delegates the computation of the transport properties to subclasses, which have more specific knowledge. Some of these classes can then link to external libraries specialised in such computations like ChemKin [116] or Mutation[87]. Variable Transformations Because numerical methods may use different variable sets, we must provide a way to transform between them. Another polymorphic perspective is the VariableTransformer. This allows the client code to apply linear matrix transformations (∂U/∂V ) between variables and to compute one set of variables from another. For example, transforming primitive variables to conservative and viceversa.

6.1.3 Extending Physical Models In code listing 6.6, we see how to extend an existing physical model to a more complex model while reusing the equation terms. The example features a multi-species chemically reactive turbulent k- model. The compositor object is now a ConvectionDiffusionReaction and the ConcreteModel aggregates three terms: the EulerTurbKETerm, the TurbKEDiffTerm and the TurbKESourceTerm which are defined in code listing 6.6. Notice how the enum descriptor of the data set is extended in line 4 to accommodate the entries of the new model.

6.1.4 Summary of technique The combined use of perspective objects like VarSet’s and the transformers gives the freedom to use different variables to update the solution, compute resiudals or linearise jacobians. This requires no modifications in the numerical algorithms, since all calls are dynamic. Using dynamically bound perspective objects the numerical methods are unaware of the actual physics. The variable independent algorithms in COOLFluiD support the combination of multiple numerical methods with multiple physical models. Other CFD platforms like DiffPack [23, 76], Deal.II [11, 12], OpenFOAM [141, 142] or FEniCS [45, 49] also support multiple physical models but do not offer variance of the numerical method.

187

Chapter 6 - Component based architecture

1 3 5 7

class EulerTu rb KE T er m : public EulerTerm { public : // turbulent kinetic energy , viscous dissipation enum { K =11 , EPS =12}; // first species int startSpecies () const { EulerTerm :: getDataSize () + 2}; // number of species int getNbSpecies () const { return m_nbSpecies ;}

9

// size of the physical data array , including // the entries declared by the parent class virtual int getDataSize () const { return startSpecies () + m_nbSpecies ;}

11 13

// free stream turbulent kinetic energy double getKref () const { return m_kRef ;} // free stream viscous dissipation double getEpsRef () const { return m_epsRef ;}

15 17 19

static string getTermName () { return " EulerTurbKE " ;} 21

};

23

class TurbKEDiff Term : public NSTerm { public : // turbulent dynamic viscosity enum { MUT =3}; virtual int getDataSize () const { return NSTerm :: getDataSize () + 1;} double getAlpha () const { return m_alpha ;} double getBeta () const { return m_beta ;} static string getTermName () { return " TurbKEDiff " ;} };

25 27 29 31 33 35 37 39

class TurbKES o u r c e T e r m : public BaseTerm { public : // pairs of forward / backward reaction rates corresponding to each // chemical reaction are stored in the physical data array virtual int getDataSize () const { return m_nbReactions *2;} double * getCoeffs () const { return m_coeffs ;} static string getTermName () { return " TurbKESource " ;} };

41 43 45 47

class TurbKEModel : public C o n v e c t i o n D i f f u s i o n R e a c t i o n < EulerTurbKETerm , TurbKEDiffTerm , TurbKESourceTerm > { public : virtual void setup () const ; virtual int getDimension () const ; virtual int g etNbEqu ations () const ; };

Code Listing 6.6: Example of extension of existing physical model.

188

Handling multi-physics (6.2)

6.2 Handling multi-physics Handling the concurrent existence of multiple physical models within the same simulation environment poses certain problems for the platform that we have been developing over the last chapters. In this section we will describe the problem and propose a solution.

6.2.1 Problem Statement During the presentation of the design solution we will recur to an example to make the explanations more clear. The example is selected from a real life simulation, and chosen to be as simple as possible, but still having the ingredients to demonstrate all the features of the design. 6.2.1.1 Application example Consider the solution of the fluid flow around a moving airfoil, with some prescribed motion. The Navier-Stokes equations are solved via the FVM. The movement of the mesh cells is accomplished by the solution of a problem of pseudo-elasticity by the FEM. There are two physical models involved in this simulation: the NavierStokes equations and the elasticity model. These are solved by two distinct methods: FVM and FEM. The strong interdependence between these two different methods is evident, as one is responsible for the geometry of the cells upon which the other performs the discretisation. Both methods have different needs from the point of view of geometrical mesh data, and build different software entities to accomplish their tasks. This means that some sort of isolation would be advisable from the software design point of view, most probably in the form of independent components. Section 5.1 introduces a design that promotes the implementation of independent components with a common structured architecture. On the other hand, their dependence implies some communication, which can take the form of function calls or data interchange. Since this data often scales with the enormous size of the problem, this data will potentially be in a parallel distributed form. The data may be owned by each method, but also shared with some entity representing the mesh. Section 5.3 introduces such type of design techniques. From the FEM point of view we are solving a structural mechanics problem, while the FVM is completely oblivious to which method is moving

189

Chapter 6 - Component based architecture the mesh. The geometry of the mesh is the solution of the FEM, i.e., the coordinates of the nodes of the FVM are the degrees of freedom of the FEM. In figure 6.6 we can visualise the components that will be involved in the simulation: one MeshReader which will be responsible for loading the mesh data from a file and accomplishing the parallel partitioning; two space discretisation methods, FiniteVolume and FiniteElement; one method for dealing with the time marching procedure, named BackwardEuler; one method for controlling the prescribed movement of the geometry, MoveMesh; two LinearSystemSolver’s to solve the systems assembled by each space discretisation method respectively, and finally a method to write the solution periodically to files for visualisation, TecplotFormat. 6.2.1.2 Problem statement Suppose that both FEM and FVM are already implemented as distinct stand-alone components, the former able to solve structural mechanics problems, while the latter able to solve fluid flow problems. For simplicity, assume also that both have been implemented within a common framework, such as the one defined within chapter 5. Both make use of similar data storage strategies, as in section 5.3, and assume that the parallel communication for these data collections is already in place for single physics simulations, as presented in section 5.2. Question 1 How can we setup an environment where both methods can coexist and interact, where no significant change is needed to either method? Is it possible for both to share some data and operate on it without mutual acquaintance? Question 2 Is it possible to have both methods within the same process? If so, how can we solve the problem of concurrency of global static and singleton objects? Or are we obliged to resort to process isolation and/or parallel communication? The problem is then to reuse the existing implementations of each method within the enlarged scope of a multi-physics simulation, with minimal change to both methods. 6.2.1.3 Goals We propose a solution to the stated problem, but before proceeding, we define some goals which should be met.

190

Handling multi-physics (6.2) Several approaches have been suggested in the past for the same or similar problems, of which the Common Component Architecture (CCA) [5] is probably one of the most successful. These solutions may have had different goals. Efficiency was surely one of them, while modularity and isolation of components were certainly other goals. What about simplicity? Experience in previous projects shows that often the developers of numerical algorithms dislike to program in complex environments. Large interfaces, multiple tools, mixed programming languages, even object-oriented structures are viewed as shifting precious time from their main goal: solving a scientific problem. Hence it is our goal to keep the solution as simple as possible. This should be achieved by having a reduced interface, avoiding multiple languages, removing explicit parallel programming, and not resorting to exterior tools like parsers or code generators. Another issue is the simplicity for the end-user. A complex multi-physics simulation should not imply a complex process of configuration. Both concerns of simplicity, for developers and end-users are in harmony with the goal of usability defined in section 3.2. Objectives Taking all this into account, the proposed solution should have the following features and qualities: • allow multiple methods to handle multiple physics, • maintain high performance, both in processing time and memory, • maintain independence of the data communication paradigm, • simplicity of implementation for developers, • minimal configuration for end-users. 6.2.1.4 Assumptions We assume the system is already capable of single physics, single method scientific simulations in a parallel environment.

191

Chapter 6 - Component based architecture Component architecture To maintain some generality, we assume the existence of some sort of component architecture in the general sense. This means that components are independent and connect to some central Kernel. Without loosing generality, we divide components into two types: the ones which bring basic functionality to extend the bare-bone kernel, hence forth called Basic Components, and the ones that plug-in, possibly on user demand, to bring numerical functionality, which we’ll call Plug-In Components. This is depicted in figure 6.4. The way how these components connect to the Kernel or inter-operate is irrelevant for the design solution. For an example how to implement a component architecture we recommend the Micro-Kernel [26] architecture. Basic Component Basic Component

Basic Component

Micro Kernel Plug-in Component Plug-in Component

Plug-in Component

Figure 6.4: Component architecture - Basic components are present for complementing the Kernel with basic functionality, while Plug-in components are external and assumed to be loaded on end-user demand.

Parallel communication layer We also assume that a parallel communication layer is implemented, and that it deals with the communication of the large data collections across the parallel environment. We assume this layer does not expose the communication details to the numerical algorithms for the single physics / single method applications. The solution we present must maintain this independence. Static or Singleton Components Singleton objects [51] and static data are often used in OO designs. When implementing numerical algorithms, it is often convenient to define static global data that can be accessed from multiple points in the code. We do this to avoid passing many of this data to functions that are called a large number of times. This would result in a prohibitive performance penalty and complex function signatures.

192

Handling multi-physics (6.2) Another strong reason is that certain types of functionality are pervasive, i.e., there are objects whose responsibility is to perform tasks that influence the code all over. Examples of these tasks can be logging, parallel communication, locking of resources, interaction with the operating system, memory management, building objects through factories, etc. These are good candidates to apply Aspect-Oriented Programming (AOP), as these tasks are interwoven with the numerical algorithms but can be expressed somehow as independent concerns. In our solution we will not use AOP, because AspectC++ [120] adds extra complexity and another set of tools (an additional compiler and a weaver). This would complicate the developer environment. Since we expect the existence of singletons to introduce difficulties in our solution, we must assume that the system has some singleton objects and/or some static data. For the example at hand, we’ll define two components, MeshData and PhysicalModel, and state that they behave as singletons and are essential to the single physics simulation. We assume also that they are difficult to factor out of the current implementation, so we cannot remove them or turn them into normal data or objects. After all these assumptions, the system in its present single physics, single method implementation would be conceptually similar to what is presented in figure 6.5.

6.2.2 Exporting and Importing Data Following the assumptions on the existing system and the example description in section 6.2.1.1, we can assemble the set of components needed for the simulation described in the example. The new system should look like the one depicted in figure 6.6. Each of these components operates on large collections of data that represent multiple geometrical and numerical information. We consider first the problem of having components which do know about each other and must interchange and operate on common data. We propose a simple interface, as shown in figure 6.7, through which each component declares which data it imports and which data it exports. This interface is used by the Kernel to satisfy the needs of some components with the data provided by the other components.

193

Chapter 6 - Component based architecture

Framework Components

Global Objects

MeshReader

DataStorage

Finite Element

PhysicalModel

Linear System Solver

TecplotFormat

Single Method Single Physics System

Figure 6.5: Single Physics Single Method system with two singleton components.

6.2.3 Data Sockets At this point we introduce the concept of a socket. A socket is an object of one type which has a counter-part type to which it can plug into. Sockets wrap around the data that we want to export to the other components and describe a name and type of this data. Since in this technique we only deal with connections of data, we name our sockets DataSocket’s. They come in two abstract types: sources and sinks.

Source socket We define a source socket as a socket that exports data out of a component. This socket will be responsible for allocating and deallocating the data, possibly into a memory management facility. In order to maintain acquaintance of the data, the source socket holds a handle to this data. We will name the class that implements this concept: BaseDataSocketSource.

194

Handling multi-physics (6.2)

Framework

Legend: Basic Component Plug-in Component

MeshReader

Finite Volume

Finite Element

Backward Euler Time Stepper

Linear System Solver

Linear System Solver

MeshMove

TecplotFormat

Pseudo-Elastic PhysicalModel

Navier-Stokes PhysicalModel

Figure 6.6: Components present in a minimal multi-physics multi-method simulation. Component +needsSockets(): vector +providesSockets(): vector

BackwardEulerTimeStep

FEM -m_socketNode: BaseDataSocketSink -m_socketSolution: BaseDataSocketSink -m_socketRHS: BaseDataSocketSource +needsSockets(): vector +providesSockets(): vector

...

-m_socketRHS: BaseDataSocketSink -m_socketTime: BaseDataSocketSource
Loading...

High-Performance Object Oriented Framework for Scientific Computing

S KATHOLIEKE UNIVERSITEIT LEUVEN FACULTEIT INGENIEURSWETENSCHAPPEN DEPARTEMENT COMPUTERWETENSCHAPPEN AFDELING NUMERIEKE ANALYSE EN TOEGEPASTE WISKUND...

9MB Sizes 0 Downloads 0 Views

Recommend Documents

An Object-Oriented Framework for Dynamically Configuring Extensible
processor platforms. Ideally, software development tools should postpone these decisions until very late in the developm

Novel and Adapted Object-Oriented Design Patterns for Scientific
Novel and Adapted Object-Oriented Design Patterns for Scientific Software. Jesus A. Izaguirre. University of Notre Dame.

An Object-Oriented Framework for Robust Multivariate Analysis
Oct 26, 2009 - October 2009, Volume 32, Issue 3. http://www.jstatsoft.org/. An Object-Oriented Framework for Robust. Mul

object-oriented - eScholarship
of an Imperative Programing Language (IPL) and an Object-Oriented Programming Language. (OOPL). The latter, a VPL-TPL hy

Object oriented programming - OCR
Learners will be able to design, implement and test an object oriented program. Learners will ... problem. LO3 Be able t

Domain-Oriented Reuse Interfaces for Object-Oriented Frameworks
A manutenção assume uma relevância essencial, dado que as frameworks ..... problem space solution space models variab

Object Oriented Databases - CiteSeerX
different technologies of database management, relational database management systems (RDBMS) vs. object-oriented databa

The Object-Oriented Page
Dec 5, 1996 - Polifemo y El Polimorfismo quiere responder a las dos importantes preguntas "¿Está Visual Basic 4.0 Orient

Object-Oriented Frameworks for Network Programming
lines of C++ code while developing and applying ACE to networked applications as part of our work with dozens of ..... M

Object Oriented PHP for Beginners – KillerPHP.com
Preamble. Downloads: PDF: object oriented php in pdf; PDF: aprenda programacion orientada a objetos en php; Source Files