Submitted 4 May 2017 Accepted 24 October 2017 Published 20 November 2017 Corresponding author Sándor Szénási, szenasi.sandor@nik.uni-obuda.hu Academic editor Miriam Leeser Additional Information and Declarations can be found on page 18 DOI 10.7717/peerj-cs.138 Copyright 2017 Szénási Distributed under Creative Commons CC-BY 4.0 OPEN ACCESS Solving the inverse heat conduction problem using NVLink capable Power architecture Sándor Szénási John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary ABSTRACT The accurate knowledge of Heat Transfer Coefficients is essential for the design of precise heat transfer operations. The determination of these values requires Inverse Heat Transfer Calculations, which are usually based on heuristic optimisation techniques, like Genetic Algorithms or Particle Swarm Optimisation. The main bottleneck of these heuristics is the high computational demand of the cost function calculation, which is usually based on heat transfer simulations producing the thermal history of the workpiece at given locations. This Direct Heat Transfer Calculation is a well parallelisable process, making it feasible to implement an efficient GPU kernel for this purpose. This paper presents a novel step forward: based on the special requirements of the heuristics solving the inverse problem (executing hundreds of simulations in a parallel fashion at the end of each iteration), it is possible to gain a higher level of parallelism using multiple graphics accelerators. The results show that this implementation (running on 4 GPUs) is about 120 times faster than a traditional CPU implementation using 20 cores. The latest developments of the GPU-based High Power Computations area were also analysed, like the new NVLink connection between the host and the devices, which tries to solve the long time existing data transfer handicap of GPU programming. Subjects Distributed and Parallel Computing, Graphics, Scientific Computing and Simulation, Software Engineering Keywords GPU, CUDA, Inverse heat conduction problem, Heat transfer, Parallelisation, Data- parallel algorithm, Simulation, NVLink, Graphics accelerator, Optimisation INTRODUCTION As a fundamental experience of modern materials science, material properties are influenced by the microstructure; therefore, these can be altered to improve the mechanical attributes (Oksman et al., 2014). One of the most widely used methods for this purpose is heat treatment which usually consists of two consecutive steps: heating up the work object to a given high temperature and cooling it down in a precisely controlled environment. It is necessary to know the attributes of the given material and the environment, to achieve the best results, especially the Heat Transfer Coefficient (HTC) which shows the amount of heat exchanged between the object and the surrounding cooling medium. The Inverse Heat Conduction Problem (IHCP—the determination of the HTC) is a typical ill-posed problem (Beck, Blackwell & Clair st, 1985; Alifanov, 1994; Felde, 2016b). Without any known analytical solution, most methods are based on the comparison How to cite this article Szénási (2017), Solving the inverse heat conduction problem using NVLink capable Power architecture. PeerJ Comput. Sci. 3:e138; DOI 10.7717/peerj-cs.138 https://peerj.com mailto:szenasi.sandor@nik.uni-obuda.hu https://peerj.com/academic-boards/editors/ https://peerj.com/academic-boards/editors/ http://dx.doi.org/10.7717/peerj-cs.138 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ http://dx.doi.org/10.7717/peerj-cs.138 of temperature signals recorded during real heat treatment processes and estimated by simulations. The aim of these methods is to find the HTC function giving the minimal deviation of the measured and predicted temperature data. It is usual to use heuristic algorithms, like Genetic Algorithms (GAs) (Szénási & Felde, 2017), Particle Swarm Optimisation (PSO) (Felde & Szénási, 2016; Szénási & Felde, 2015) or some hybrid approaches (Felde, 2016a) to find this parameter. Kim & Baek (2004) presented a hybrid genetic algorithm for the analysis of inverse surface radiation in an axisymmetric cylindrical enclosure. Verma & Balaji (2007) used a stochastic Particle Swarm Optimization to estimate several parameters in the field of inverse conduction-radiation. Both papers have significant contribution in the field of inverse methods. In the case of GAs, every chromosome of the population encodes one possible HTC function in its genes. These are two-dimensional continuous functions given by the time and location. Therefore, a limited number of control points have been used to approximate these (340 floating point values per HTC). With the already existing Direct Heat Conduction Problem (DHCP) solver methods (based on finite-elements or finite-difference techniques), it is feasible to simulate the cooling process and to record the thermal history for each chromosome. The difference between this generated thermal history and the measured one produces the cost value for the individual. The purpose of the IHCP process is to find the best gene values resulting in minimal cost. The bottleneck of this process is the high computational demand. The runtime of one cooling process simulation is about 200 ms using one traditional CPU core, and it is necessary to run these simulations for each chromosome in each iteration. Assuming a population of 2,000 chromosomes and a GA with 3,000 iterations, it takes several days to finish the search. Furthermore, according to the random behaviour of the GA and the enormously large search space, it is worth running multiple searches. As a result, an overall IHCP process can take many weeks. There are several attempts at using graphics accelerators to speed up physical simulations, and there are several substantial achievements in this field too. Satake, Yoshimori & Suzuki (2012) presented a related method to solve heat conduction equations using the CUDA Fortran language. They worked out a very well optimised method (analysing the PTX code generated by the compiler), but they only dealt with the one-dimensional unsteady heat conduction equation for temperature distribution. Klimeš & Štětina (2015) presented another model using the finite difference method to simulate the three-dimensional heat transfer. Their results showed that the GPU implementation is 33–68 times faster than the same CPU-based simulation using one Tesla C2075 GPU for running kernels. This significant speed up makes it possible to use their method in a real-time fashion. Narang, Wu & Shakur (2012) and Narang, Wu & Shakur (2016) also used programmable graphics hardware to speed up the heat transfer calculations based on a similar finite difference method. They used a Quadro FX 4800 card, and the speed up was still significant (about 20× in the case of a large number of nodes). There are also several papers from similar areas. Humphrey et al. (2012) and He et al. (2013) have very significant contributions to the field of radiative heat transfer problems. These also show that it is worth implementing GPU codes for physical simulations. Szénási (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.138 2/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.138 The main difference between these studies and this research is that this paper is focusing on the two-dimensional IHCP. Heat transfer simulation is a major part of the IHCP solving process; moreover, it is necessary to run thousands of simulations. Accordingly, it is feasible to use a higher level of parallelism by using multi-GPU architectures (the presented papers are usually deal with only one device). It is possible to install multiple graphics cards into a standard PC motherboard, and the CUDA programming environment can handle all of them. Using multiple GPUs can double/triple/quadruple the processing power, but it is necessary to adapt the algorithms to this higher level of parallelism. One of the most interesting developments of 2016 in HTC computing is the result of the IBM and NVIDIA collaboration, the NVLink high-speed interface between IBM’s POWER8 CPUs and NVIDIA’s Pascal GPUs. Data transfer between the host and device memory was an important bottleneck in GPU programming, making several promising applications practically unusable. This high-speed connection and the existence of multiple Pascal based graphics cards give developers the ability to accelerate applications more easily. The assumption was that it is worth implementing an IHCP solver system based on this architecture. Based on these advancements, a novel numerical approach and a massively parallel implementation to estimate the theoretical thermal history are outlined. The rest of the paper is structured as follows: the next section presents the novel parallel DHCP and IHCP solver methods; ‘Results and Discussion’ presents the raw results of the benchmarks and the detailed analysis; finally, the last section contains the conclusions and further plans. MATERIALS & METHODS Direct heat conduction problem There are various fundamental modes of heat transfer, but this paper deals only with transient conduction when the temperature of the workpiece changes as a function of time. The determination of the temperature of any points of the object at any moment often calls for some computer-assisted approximation methods. In the case of three-dimensional objects, these calculations can be very complicated and resource intensive, preventing the efficient usage as a GA cost function. For cylindrical objects, an essential simplification can significantly decrease the computational effort. As can be seen in Fig. 1 it is enough to model the middle cross-section of the cylinder, resulting in the two-dimensional axis-symmetrical heat conduction model. The radius of the cylinder is noted by R and Z. The cylinder is subjected to a longitudinal local coordinate and time varying Heat Transfer Coefficient HTC(z,t) on all its surfaces. Both the thermal conductivity, density and the heat capacity are varying with the temperature, k(T), ρ(T) and Cp(T). It has to be noted that phase transformations of the materials applied do not occur during the experiments, therefore latent heat generation induced by phase transformations is not considered. Based on this simplified model, the mathematical formulation of the nonlinear transient heat conduction can be described as Eq. (1), with the following initial and boundary Szénási (2017), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.138 3/20 https://peerj.com http://dx.doi.org/10.7717/peerj-cs.138 top bottom in s id e s u rf a c e z=0 z=Z r=R r=0 Figure 1 Two-dimensional axis-symmetrical heat conduction model. Full-size DOI: 10.7717/peerjcs.138/fig-1 conditions Eqs. (2)–(5): ∂ ∂r (k ∂T ∂r )+ k r ∂T ∂r + ∂ ∂z (k ∂T ∂z )+qv =ρCp ∂T ∂t (1) T(r =0,z,t =0)=T0, (2) k ∂T ∂z |0≤z≤L,r=R=HTC(z,t)[Tq−T(r =R,z,t)], (3) k ∂T ∂z |z=0,0≤r