Computation-intensive image processing applications need to be implemented on multicore architectures. If they are to be executed efficiently on such platforms, the underlying data and/or functions should be partitioned and distributed among the processors. The optimal partitioning approach is the one which aims to minimize the inter-processor communication while maximizing the load balance. With the continuously increasing number of cores which exacerbates the demand for more complex memory hierarchies, non-uniform memory access, etc., on-chip communication has gained a significant role in taking advantage of the multicore chips. Therefore, making partitioning decisions just based on conventional performance results and without communication profiling is suboptimal. In this paper, we explore the behavior of a mesh decoder as a case study in terms of communication and computation, and propose models that allow early prediction of the application's behavior. Using these models, profiling the application for all of the input samples is not necessary anymore. As a result, communication- and computation-aware parallelization could be performed faster and easier.
|21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2013)
|2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
|27/02/13 → 1/03/13
- communication profiling
- dynamic data flow graph
- load balance