1 package net.bmahe.genetics4j.gpu.opencl.model; 2 3 import org.immutables.value.Value; 4 5 /** 6 * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific 7 * device. 8 * 9 * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel, 10 * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated 11 * evolutionary algorithms. This information is determined at kernel compilation time and varies by device. 12 * 13 * <p>Key kernel characteristics include: 14 * <ul> 15 * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li> 16 * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li> 17 * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource 18 * utilization</li> 19 * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li> 20 * </ul> 21 * 22 * <p>Kernel optimization considerations for evolutionary algorithms: 23 * <ul> 24 * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li> 25 * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li> 26 * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li> 27 * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li> 28 * </ul> 29 * 30 * <p>Common usage patterns for kernel configuration: 31 * 32 * <pre>{@code 33 * // Query kernel information after compilation 34 * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation"); 35 * 36 * // Configure work group size within device limits 37 * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize()); 38 * 39 * // Optimize for preferred work group size multiple 40 * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple(); 41 * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple; 42 * 43 * // Validate memory requirements for population size 44 * long populationSize = 1000; 45 * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize; 46 * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize; 47 * 48 * // Configure kernel execution with validated parameters 49 * clEnqueueNDRangeKernel(commandQueue, 50 * kernel, 51 * 1, 52 * null, 53 * new long[] { populationSize }, 54 * new long[] { optimalWorkGroupSize }, 55 * 0, 56 * null, 57 * null); 58 * }</pre> 59 * 60 * <p>Performance optimization workflow: 61 * <ol> 62 * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li> 63 * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li> 64 * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li> 65 * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li> 66 * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li> 67 * </ol> 68 * 69 * <p>Memory management considerations: 70 * <ul> 71 * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li> 72 * <li><strong>Private memory</strong>: Individual memory per work-item</li> 73 * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li> 74 * <li><strong>Device limits</strong>: Validate against device memory constraints</li> 75 * </ul> 76 * 77 * <p>Error handling and validation: 78 * <ul> 79 * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li> 80 * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li> 81 * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li> 82 * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li> 83 * </ul> 84 * 85 * @see Device 86 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader 87 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils 88 */ 89 @Value.Immutable 90 public interface KernelInfo { 91 92 /** 93 * Returns the name of the kernel function. 94 * 95 * @return the kernel function name as specified in the OpenCL program 96 */ 97 String name(); 98 99 /** 100 * Returns the maximum work group size that can be used when executing this kernel on the device. 101 * 102 * <p>This value represents the maximum number of work-items that can be in a work group when executing this specific 103 * kernel on the target device. It may be smaller than the device's general maximum work group size due to 104 * kernel-specific resource requirements. 105 * 106 * @return the maximum work group size for this kernel 107 */ 108 long workGroupSize(); 109 110 /** 111 * Returns the preferred work group size multiple for optimal kernel execution performance. 112 * 113 * <p>For optimal performance, the work group size should be a multiple of this value. This represents the native 114 * vector width or wavefront size of the device and helps achieve better resource utilization and memory coalescing. 115 * 116 * @return the preferred work group size multiple for performance optimization 117 */ 118 long preferredWorkGroupSizeMultiple(); 119 120 /** 121 * Returns the amount of local memory in bytes used by this kernel. 122 * 123 * <p>Local memory is shared among all work-items in a work group and includes both statically allocated local 124 * variables and dynamically allocated local memory passed as kernel arguments. This value is used to validate that 125 * the total local memory usage doesn't exceed the device's local memory capacity. 126 * 127 * @return the local memory usage in bytes per work group 128 */ 129 long localMemSize(); 130 131 /** 132 * Returns the minimum amount of private memory in bytes used by each work-item. 133 * 134 * <p>Private memory is individual to each work-item and includes local variables, function call stacks, and other 135 * per-work-item data. This value helps estimate the total memory footprint when launching kernels with large work 136 * group sizes. 137 * 138 * @return the private memory usage in bytes per work-item 139 */ 140 long privateMemSize(); 141 142 /** 143 * Creates a new builder for constructing KernelInfo instances. 144 * 145 * @return a new builder for creating kernel information objects 146 */ 147 static ImmutableKernelInfo.Builder builder() { 148 return ImmutableKernelInfo.builder(); 149 } 150 }