OpenCL -> Vulkan: A Porting Guide (#1)
Vulkan is the newest kid on the block when it comes to cross-platform, widely supported, GPGPU compute. Vulkan’s primacy as the high performance rendering API powering the latest versions of Android, coupled with Windows and Linux desktop drivers from all major vendors means that we have a good way to run compute workloads on a wide range of devices.
OpenCL is the venerable old boy of GPGPU these days - having been around since 2009. A huge variety of software projects have made use of OpenCL as their way to run compute workloads enabling them to speed up their applications.
Given Vulkan’s rising prominence, how does one port from OpenCL to Vulkan?
This is part 1 of my guide for how things map between the APIs!
cl_platform_id -> VkInstance⌗
In OpenCL, the first thing you do is get the platform identifiers (using clGetPlatformIDs).
Each cl_platform_id is a handle into an individual vendors OpenCL driver - if you had an AMD and NVIDIA implementation of OpenCL on your system, you’d get two cl_platform_id’s returned.
Vulkan is different here - instead of getting one or more handles to individual vendors implementations, we instead create a single VkInstance (via vkCreateInstance).
This single instance allows us to access multiple vendor implementations of the Vulkan API through a single object.
cl_device_id -> VkPhysicalDevice⌗
In OpenCL, you can query one or more cl_device_id’s from each cl_platform_id that we previously queried (via clGetDeviceIDs). When querying for a device, we can specify a cl_device_type, where you can basically ask the driver to give you its default device (normally a GPU) or for a specific device type. We’ll use CL_DEVICE_TYPE_ALL, in that we are instructing the driver to return all the devices it knows about, and we can choose from them.
The code above is a bit of a mouthful - but it is the easiest way to get every device that the system knows about.
In contrast, since Vulkan gave us a single VkInstance, we query that single instance for all of the VkPhysicalDevice’s it knows about (via vkEnumeratePhysicalDevices). A Vulkan physical device is a link to the actual hardware that the Vulkan code is going to execute on.
A prominent API design fork can be seen between vkEnumeratePhysicalDevices and clGetDeviceIDs - Vulkan reuses the integer return parameter to the function (the parameter that lets you query the number of physical devices present) to also pass into the driver the number of physical devices we want filled out. In contrast, OpenCL uses an extra parameter for this. These patterns are repeated throughout both APIs.
cl_context -> VkDevice⌗
Here is where it gets trickier between the APIs. OpenCL has a notion of a context - you can think of this object as your way as the user to view and interact with what the system is doing. OpenCL allows multiple device’s that belong to a single platform to be shared within a context. In contrast, Vulkan is fixed to having a single physical device per it’s ‘context’, which Vulkan calls a VkDevice.
To make the porting easier, and because in all honesty I’ve yet to see any real use-case or benefit from having multiple OpenCL devices in a single context, we’ll make our OpenCL code create it’s cl_context using a single cl_device_id (via clCreateContext).
The above highlights the single biggest travesty in the OpenCL API - the error code has changed from being something returned from the API call, to an optional pointer parameter at the end of the signature. In API design, I’d say this is rule #1 in how not to mess up an API (If you’re interested, these are two great API talks Designing and Evaluating Reusable Components by Casey Muratori and Hourglass Interfaces for C++ APIs by Stefanus Du Toit).
For Vulkan, when creating our VkDevice object, we specifically enable the features we want to use from the device upfront. The easy way to do this is to first call vkGetPhysicalDeviceFeatures, and then pass the result of this into our create device call, enabling all features that the device supports.
When creating our VkDevice, we need to explicitly request which queues we want to use. OpenCL has no real analogous concept to this - the naive comparison is to compare VkQueue’s against cl_command_queue’s, but I’ll show in a later post that this is a wrong conflation. Suffice to say, for our purposes we’ll query for all queues that support compute functionality, as that is almost what OpenCL is doing behind the scenes in the cl_context.
Vulkan’s almost legendary verbosity strikes here - we’re having to write a lot more code than the equivalent in OpenCL to get an almost analogous handle. The plus here is that for the Vulkan driver, it can do a lot more upfront allocations because a much higher proportion of its state is known at creation time - that is the fundamental approach of Vulkan, we are trading upfront verbosity for a more efficient application overall.
Ok - so we’ve now got the API to the point where we can think about actually using the plethora of hardware available from these APIs! Stay tuned for the next in the series where I’ll cover porting from OpenCL’s cl_command_queue to Vulkan’s VkQueue.