| 1 | package net.bmahe.genetics4j.gpu.opencl.model; | |
| 2 | ||
| 3 | import org.immutables.value.Value; | |
| 4 | ||
| 5 | /** | |
| 6 | * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific | |
| 7 | * device. | |
| 8 | * | |
| 9 | * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel, | |
| 10 | * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated | |
| 11 | * evolutionary algorithms. This information is determined at kernel compilation time and varies by device. | |
| 12 | * | |
| 13 | * <p>Key kernel characteristics include: | |
| 14 | * <ul> | |
| 15 | * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li> | |
| 16 | * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li> | |
| 17 | * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource | |
| 18 | * utilization</li> | |
| 19 | * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li> | |
| 20 | * </ul> | |
| 21 | * | |
| 22 | * <p>Kernel optimization considerations for evolutionary algorithms: | |
| 23 | * <ul> | |
| 24 | * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li> | |
| 25 | * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li> | |
| 26 | * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li> | |
| 27 | * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li> | |
| 28 | * </ul> | |
| 29 | * | |
| 30 | * <p>Common usage patterns for kernel configuration: | |
| 31 | * | |
| 32 | * <pre>{@code | |
| 33 | * // Query kernel information after compilation | |
| 34 | * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation"); | |
| 35 | * | |
| 36 | * // Configure work group size within device limits | |
| 37 | * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize()); | |
| 38 | * | |
| 39 | * // Optimize for preferred work group size multiple | |
| 40 | * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple(); | |
| 41 | * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple; | |
| 42 | * | |
| 43 | * // Validate memory requirements for population size | |
| 44 | * long populationSize = 1000; | |
| 45 | * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize; | |
| 46 | * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize; | |
| 47 | * | |
| 48 | * // Configure kernel execution with validated parameters | |
| 49 | * clEnqueueNDRangeKernel(commandQueue, | |
| 50 | * kernel, | |
| 51 | * 1, | |
| 52 | * null, | |
| 53 | * new long[] { populationSize }, | |
| 54 | * new long[] { optimalWorkGroupSize }, | |
| 55 | * 0, | |
| 56 | * null, | |
| 57 | * null); | |
| 58 | * }</pre> | |
| 59 | * | |
| 60 | * <p>Performance optimization workflow: | |
| 61 | * <ol> | |
| 62 | * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li> | |
| 63 | * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li> | |
| 64 | * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li> | |
| 65 | * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li> | |
| 66 | * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li> | |
| 67 | * </ol> | |
| 68 | * | |
| 69 | * <p>Memory management considerations: | |
| 70 | * <ul> | |
| 71 | * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li> | |
| 72 | * <li><strong>Private memory</strong>: Individual memory per work-item</li> | |
| 73 | * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li> | |
| 74 | * <li><strong>Device limits</strong>: Validate against device memory constraints</li> | |
| 75 | * </ul> | |
| 76 | * | |
| 77 | * <p>Error handling and validation: | |
| 78 | * <ul> | |
| 79 | * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li> | |
| 80 | * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li> | |
| 81 | * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li> | |
| 82 | * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li> | |
| 83 | * </ul> | |
| 84 | * | |
| 85 | * @see Device | |
| 86 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader | |
| 87 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils | |
| 88 | */ | |
| 89 | @Value.Immutable | |
| 90 | public interface KernelInfo { | |
| 91 | ||
| 92 | /** | |
| 93 | * Returns the name of the kernel function. | |
| 94 | * | |
| 95 | * @return the kernel function name as specified in the OpenCL program | |
| 96 | */ | |
| 97 | String name(); | |
| 98 | ||
| 99 | /** | |
| 100 | * Returns the maximum work group size that can be used when executing this kernel on the device. | |
| 101 | * | |
| 102 | * <p>This value represents the maximum number of work-items that can be in a work group when executing this specific | |
| 103 | * kernel on the target device. It may be smaller than the device's general maximum work group size due to | |
| 104 | * kernel-specific resource requirements. | |
| 105 | * | |
| 106 | * @return the maximum work group size for this kernel | |
| 107 | */ | |
| 108 | long workGroupSize(); | |
| 109 | ||
| 110 | /** | |
| 111 | * Returns the preferred work group size multiple for optimal kernel execution performance. | |
| 112 | * | |
| 113 | * <p>For optimal performance, the work group size should be a multiple of this value. This represents the native | |
| 114 | * vector width or wavefront size of the device and helps achieve better resource utilization and memory coalescing. | |
| 115 | * | |
| 116 | * @return the preferred work group size multiple for performance optimization | |
| 117 | */ | |
| 118 | long preferredWorkGroupSizeMultiple(); | |
| 119 | ||
| 120 | /** | |
| 121 | * Returns the amount of local memory in bytes used by this kernel. | |
| 122 | * | |
| 123 | * <p>Local memory is shared among all work-items in a work group and includes both statically allocated local | |
| 124 | * variables and dynamically allocated local memory passed as kernel arguments. This value is used to validate that | |
| 125 | * the total local memory usage doesn't exceed the device's local memory capacity. | |
| 126 | * | |
| 127 | * @return the local memory usage in bytes per work group | |
| 128 | */ | |
| 129 | long localMemSize(); | |
| 130 | ||
| 131 | /** | |
| 132 | * Returns the minimum amount of private memory in bytes used by each work-item. | |
| 133 | * | |
| 134 | * <p>Private memory is individual to each work-item and includes local variables, function call stacks, and other | |
| 135 | * per-work-item data. This value helps estimate the total memory footprint when launching kernels with large work | |
| 136 | * group sizes. | |
| 137 | * | |
| 138 | * @return the private memory usage in bytes per work-item | |
| 139 | */ | |
| 140 | long privateMemSize(); | |
| 141 | ||
| 142 | /** | |
| 143 | * Creates a new builder for constructing KernelInfo instances. | |
| 144 | * | |
| 145 | * @return a new builder for creating kernel information objects | |
| 146 | */ | |
| 147 | static ImmutableKernelInfo.Builder builder() { | |
| 148 |
2
1. builder : replaced return value with null for net/bmahe/genetics4j/gpu/opencl/model/KernelInfo::builder → NO_COVERAGE 2. builder : removed call to net/bmahe/genetics4j/gpu/opencl/model/ImmutableKernelInfo::builder → NO_COVERAGE |
return ImmutableKernelInfo.builder(); |
| 149 | } | |
| 150 | } | |
Mutations | ||
| 148 |
1.1 2.2 |