Interface KernelInfo

All Known Implementing Classes:
ImmutableKernelInfo

@Immutable public interface KernelInfo
Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific device.

KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel, providing essential information for optimal work group configuration and resource allocation in GPU-accelerated evolutionary algorithms. This information is determined at kernel compilation time and varies by device.

Key kernel characteristics include:

  • Work group constraints: Maximum and preferred work group sizes for efficient execution
  • Memory usage: Local and private memory requirements per work-item
  • Performance optimization: Preferred work group size multiples for optimal resource utilization
  • Resource validation: Constraints for validating kernel launch parameters

Kernel optimization considerations for evolutionary algorithms:

  • Work group sizing: Configure launch parameters within device-specific limits
  • Memory allocation: Ensure sufficient local memory for parallel fitness evaluation
  • Performance tuning: Align work group sizes with preferred multiples
  • Resource planning: Account for per-work-item memory requirements

Common usage patterns for kernel configuration:


 // Query kernel information after compilation
 KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation");
 
 // Configure work group size within device limits
 long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize());
 
 // Optimize for preferred work group size multiple
 long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple();
 long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple;
 
 // Validate memory requirements for population size
 long populationSize = 1000;
 long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize;
 long totalPrivateMem = kernelInfo.privateMemSize() * populationSize;
 
 // Configure kernel execution with validated parameters
 clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, 
     new long[]{populationSize}, new long[]{optimalWorkGroupSize}, 0, null, null);
 

Performance optimization workflow:

  1. Kernel compilation: Compile kernel for target device
  2. Information query: Read kernel-specific execution characteristics
  3. Work group optimization: Calculate optimal work group size based on preferences
  4. Memory validation: Ensure memory requirements fit within device limits
  5. Launch configuration: Configure kernel execution with optimized parameters

Memory management considerations:

  • Local memory: Shared among work-items in the same work group
  • Private memory: Individual memory per work-item
  • Total allocation: Sum of all work-items' memory requirements
  • Device limits: Validate against device memory constraints

Error handling and validation:

  • Work group limits: Ensure launch parameters don't exceed kernel limits
  • Memory constraints: Validate total memory usage against device capabilities
  • Performance degradation: Monitor for suboptimal work group configurations
  • Resource conflicts: Handle multiple kernels competing for device resources
See Also:
  • Method Summary

    Modifier and Type
    Method
    Description
    Creates a new builder for constructing KernelInfo instances.
    long
    Returns the amount of local memory in bytes used by this kernel.
    Returns the name of the kernel function.
    long
    Returns the preferred work group size multiple for optimal kernel execution performance.
    long
    Returns the minimum amount of private memory in bytes used by each work-item.
    long
    Returns the maximum work group size that can be used when executing this kernel on the device.
  • Method Details

    • name

      String name()
      Returns the name of the kernel function.
      Returns:
      the kernel function name as specified in the OpenCL program
    • workGroupSize

      long workGroupSize()
      Returns the maximum work group size that can be used when executing this kernel on the device.

      This value represents the maximum number of work-items that can be in a work group when executing this specific kernel on the target device. It may be smaller than the device's general maximum work group size due to kernel-specific resource requirements.

      Returns:
      the maximum work group size for this kernel
    • preferredWorkGroupSizeMultiple

      long preferredWorkGroupSizeMultiple()
      Returns the preferred work group size multiple for optimal kernel execution performance.

      For optimal performance, the work group size should be a multiple of this value. This represents the native vector width or wavefront size of the device and helps achieve better resource utilization and memory coalescing.

      Returns:
      the preferred work group size multiple for performance optimization
    • localMemSize

      long localMemSize()
      Returns the amount of local memory in bytes used by this kernel.

      Local memory is shared among all work-items in a work group and includes both statically allocated local variables and dynamically allocated local memory passed as kernel arguments. This value is used to validate that the total local memory usage doesn't exceed the device's local memory capacity.

      Returns:
      the local memory usage in bytes per work group
    • privateMemSize

      long privateMemSize()
      Returns the minimum amount of private memory in bytes used by each work-item.

      Private memory is individual to each work-item and includes local variables, function call stacks, and other per-work-item data. This value helps estimate the total memory footprint when launching kernels with large work group sizes.

      Returns:
      the private memory usage in bytes per work-item
    • builder

      static ImmutableKernelInfo.Builder builder()
      Creates a new builder for constructing KernelInfo instances.
      Returns:
      a new builder for creating kernel information objects