19.01.2016: Invited keynote talk “Symbolic Loop Parallelization for Adaptive Multi-Core Systems – Recent Advances and Benefits”
Prof. Dr.-Ing. Jürgen Teich presented his invited keynote talk “Symbolic Loop Parallelization for Adaptive Multi-Core Systems – Recent Advances and Benefits” at IMPACT 2016 in Prag, Czech Republic.
With the advent of heterogeneous many-core systems including GPUs and coupled with CPUs and coarse-grain accelerator processor arrays, massively parallel computing on-a-chip is becoming more and more attractive, even for multiple concurrent parallel applications competing dynamically for a certain number and type of processor and memory resources. In this realm, current research initiatives such as Invasive Computing investigate novel solutions how to allocate available resources dynamically between competing applications upon their request to to obtain smallest execution times and achieve high resource utilizations.
In this context, nested loop programs not only form an important source of workload also for the above class of emerging many-core platforms due to their regular computations in polyhedral domains of iterations, but still impose a number of difficult problems to solve in order to adapt a schedule and mapping of a loop nest adaptively to an available region of processors which is not known in size and location until run-time.
In this keynote, symbolic (parametric) loop parallelization techniques are proposed as a remedy to avoid any time- or memory-intensive in-situ compilation on a chip at run-time. Here, some recent results will be summarized how an important class of nested loop programs with parameterized loop bounds may be scheduled and assigned optimally to virtual regions of processors without any need of recompilation at run-time by producing parameterized assembly programs and a proper run-time schedule candidate selection code that initializes the processor codes.
These results may be applied to a multitude of loop nests stemming from numerical benchmarks to signal processing applications to provide predictable and low cost solutions with adaptive speed and ultra-low power consumption. The presented symbolic loop parallelization techniques is applied to a class of massive parallel processor arrays called tightly coupled processor arrays (TCPAs) which allow for non-atomic inter-processor data transfers which are scheduled together with the loop statements. Finally, it is shown that symbolic loop transformations in the polyhedral model not only enable predictable execution time processing for loop nests, but also enable to specify fault-tolerance aspects adaptively.