1 | package net.bmahe.genetics4j.gpu.opencl.model; | |
2 | ||
3 | import org.immutables.value.Value; | |
4 | ||
5 | /** | |
6 | * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific | |
7 | * device. | |
8 | * | |
9 | * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel, | |
10 | * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated | |
11 | * evolutionary algorithms. This information is determined at kernel compilation time and varies by device. | |
12 | * | |
13 | * <p>Key kernel characteristics include: | |
14 | * <ul> | |
15 | * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li> | |
16 | * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li> | |
17 | * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource | |
18 | * utilization</li> | |
19 | * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li> | |
20 | * </ul> | |
21 | * | |
22 | * <p>Kernel optimization considerations for evolutionary algorithms: | |
23 | * <ul> | |
24 | * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li> | |
25 | * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li> | |
26 | * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li> | |
27 | * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li> | |
28 | * </ul> | |
29 | * | |
30 | * <p>Common usage patterns for kernel configuration: | |
31 | * | |
32 | * <pre>{@code | |
33 | * // Query kernel information after compilation | |
34 | * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation"); | |
35 | * | |
36 | * // Configure work group size within device limits | |
37 | * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize()); | |
38 | * | |
39 | * // Optimize for preferred work group size multiple | |
40 | * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple(); | |
41 | * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple; | |
42 | * | |
43 | * // Validate memory requirements for population size | |
44 | * long populationSize = 1000; | |
45 | * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize; | |
46 | * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize; | |
47 | * | |
48 | * // Configure kernel execution with validated parameters | |
49 | * clEnqueueNDRangeKernel(commandQueue, | |
50 | * kernel, | |
51 | * 1, | |
52 | * null, | |
53 | * new long[] { populationSize }, | |
54 | * new long[] { optimalWorkGroupSize }, | |
55 | * 0, | |
56 | * null, | |
57 | * null); | |
58 | * }</pre> | |
59 | * | |
60 | * <p>Performance optimization workflow: | |
61 | * <ol> | |
62 | * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li> | |
63 | * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li> | |
64 | * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li> | |
65 | * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li> | |
66 | * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li> | |
67 | * </ol> | |
68 | * | |
69 | * <p>Memory management considerations: | |
70 | * <ul> | |
71 | * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li> | |
72 | * <li><strong>Private memory</strong>: Individual memory per work-item</li> | |
73 | * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li> | |
74 | * <li><strong>Device limits</strong>: Validate against device memory constraints</li> | |
75 | * </ul> | |
76 | * | |
77 | * <p>Error handling and validation: | |
78 | * <ul> | |
79 | * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li> | |
80 | * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li> | |
81 | * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li> | |
82 | * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li> | |
83 | * </ul> | |
84 | * | |
85 | * @see Device | |
86 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader | |
87 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils | |
88 | */ | |
89 | @Value.Immutable | |
90 | public interface KernelInfo { | |
91 | ||
92 | /** | |
93 | * Returns the name of the kernel function. | |
94 | * | |
95 | * @return the kernel function name as specified in the OpenCL program | |
96 | */ | |
97 | String name(); | |
98 | ||
99 | /** | |
100 | * Returns the maximum work group size that can be used when executing this kernel on the device. | |
101 | * | |
102 | * <p>This value represents the maximum number of work-items that can be in a work group when executing this specific | |
103 | * kernel on the target device. It may be smaller than the device's general maximum work group size due to | |
104 | * kernel-specific resource requirements. | |
105 | * | |
106 | * @return the maximum work group size for this kernel | |
107 | */ | |
108 | long workGroupSize(); | |
109 | ||
110 | /** | |
111 | * Returns the preferred work group size multiple for optimal kernel execution performance. | |
112 | * | |
113 | * <p>For optimal performance, the work group size should be a multiple of this value. This represents the native | |
114 | * vector width or wavefront size of the device and helps achieve better resource utilization and memory coalescing. | |
115 | * | |
116 | * @return the preferred work group size multiple for performance optimization | |
117 | */ | |
118 | long preferredWorkGroupSizeMultiple(); | |
119 | ||
120 | /** | |
121 | * Returns the amount of local memory in bytes used by this kernel. | |
122 | * | |
123 | * <p>Local memory is shared among all work-items in a work group and includes both statically allocated local | |
124 | * variables and dynamically allocated local memory passed as kernel arguments. This value is used to validate that | |
125 | * the total local memory usage doesn't exceed the device's local memory capacity. | |
126 | * | |
127 | * @return the local memory usage in bytes per work group | |
128 | */ | |
129 | long localMemSize(); | |
130 | ||
131 | /** | |
132 | * Returns the minimum amount of private memory in bytes used by each work-item. | |
133 | * | |
134 | * <p>Private memory is individual to each work-item and includes local variables, function call stacks, and other | |
135 | * per-work-item data. This value helps estimate the total memory footprint when launching kernels with large work | |
136 | * group sizes. | |
137 | * | |
138 | * @return the private memory usage in bytes per work-item | |
139 | */ | |
140 | long privateMemSize(); | |
141 | ||
142 | /** | |
143 | * Creates a new builder for constructing KernelInfo instances. | |
144 | * | |
145 | * @return a new builder for creating kernel information objects | |
146 | */ | |
147 | static ImmutableKernelInfo.Builder builder() { | |
148 |
2
1. builder : replaced return value with null for net/bmahe/genetics4j/gpu/opencl/model/KernelInfo::builder → NO_COVERAGE 2. builder : removed call to net/bmahe/genetics4j/gpu/opencl/model/ImmutableKernelInfo::builder → NO_COVERAGE |
return ImmutableKernelInfo.builder(); |
149 | } | |
150 | } | |
Mutations | ||
148 |
1.1 2.2 |