The race for more CPU power by boosting up the clock rate has been stagnating for a couple of years now. Instead of increasing the clock speed, CPU manufacturers are now creating more powerful computers by increasing the numbers of cores. Nowadays, almost any new desktop computer or even notebook has at least two CPU cores. But unfortunately, this extra computational power can only be taken advantage of when the software running on these computers is developed to do so. If this is not the case, you will loose half (2 cores) or 75% (4 cores) of your computational power because only one core can be used.
LMS Virtual.Lab allows you to use parallel processing in order to take full advantage of the available CPU power as well as memory. However, implementing parallel processing is not straightforward and one should be very careful while doing this. It is not easy for you as a user to know the optimal settings for a parallel job, so the goal of this section will be to give some guidance in order to select the available tools in an easy manner.
The different types of Parallelism available in LMS Virtual.Lab are:
- Single Process: means that only one process is used for computing the analysis cases.
- Multi Process Frequency Level: means that the frequency range to be computed will be divided into equal parts and each will be computed on one computation node. This is beneficial to accelerate the solution of any size of problem. It provides the best speed-up (almost linear), and is the preferred choice if there is enough memory to solve each process "in-core". For example, 3 GB of memory is needed to solve a system at 1 frequency. If 16 GB of physical memory with 4 CPUs is available, then one can use all the available 4 CPUs in frequency level, so that 4 frequencies are solved using the available 4 CPU at the same time.
- Multi Process Matrix Level: means that the assembled system matrix at one frequency will be processed on several nodes. This is mostly beneficial for very large problems on a system with less memory to solve each process "in-core". For example, 12GB is needed to solve system at one frequency. If 16GB of physical memory with 4 CPUs is available then use 4 CPUs in matrix level, so that each of the 4 CPUs solves for a sub-matrix at one frequency at the same time.
- Multi Process Domain Level: means that the mesh is split into several parts and reduced size problems are solved on each part by different processors. A domain decomposition technique (METIS) is used for this domain-level parallelism. Domain decomposition is only supported in uncoupled FEM, provided that they are no PML, no infinite elements, and no acoustic-acoustic transfer admittances since these are technically difficult to handle with geometric partitioning.
- Multi-threading: This delivers a less good speed-up. (In LMS Sysnoise, it is only available in some parts of the code). This is the last possible choice. While multi-threading can easily decrease the computation time for a given computation, it does not allow to decrease the memory consumption, nor does it allow to use several nodes of a cluster. The idea here is to distribute a run over several processors in order to run the solver where memory consumption doesn't allow running it sequentially. This issue is critical for computing 3D cases, which can be solved by using a parallel out-of-core solver.