

ask different folks and observe different takes. So, if someone could share one or more links to relevant documentation on these issues, I'd be very grateful. And to do that, it would be important to understand some OpenCL implementation details on specific chips or architectures. In any other case, it would perhaps be important to be aware of the penalties involved and the best strategies to minimize them. I'd say in SIMD problems this type of questions is probably useless, as long as one follows some general rule about the division of the task size.

Are workgroups in some way related with the 4400 20 pipelines ? How do compute units fit in the picture ? By establishing a local work size of 1, am I in some way forcing the use of a single thread inside a compute unit ? As such, I must say I'm completely blind when I'm preparing the command queues. Unfortunately, all the OpencCL-related documentation I've read so far is quite elusive about the relationship of its concepts with the specific hardware, and I couldn't find a single piece of information about how Intel chose to implement OpenCL in its GPU line(s). However, the performance gains of using the 4400 GPU instead of the CPU ( even when using kernels with several branching points ) are so significant that the issue becomes irrelevant. More specifically, I've been using the NOpenCL library ( created by Tunnel Vision Labs ) to perform OpenCL tasks in C# applications, on a low-end portable ( Intel i7-4510U CPU / Intel HD Graphics 4400 + AMD Radeon R7 M260 ).īeing an application developer, most of my work won't fit a SIMD model. I've been doing a lot of experiments with OpenCL in the last two months or so.
