1 | package net.bmahe.genetics4j.gpu.opencl.model; | |
2 | ||
3 | import org.immutables.value.Value; | |
4 | ||
5 | /** | |
6 | * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific device. | |
7 | * | |
8 | * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel, | |
9 | * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated | |
10 | * evolutionary algorithms. This information is determined at kernel compilation time and varies by device. | |
11 | * | |
12 | * <p>Key kernel characteristics include: | |
13 | * <ul> | |
14 | * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li> | |
15 | * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li> | |
16 | * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource utilization</li> | |
17 | * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li> | |
18 | * </ul> | |
19 | * | |
20 | * <p>Kernel optimization considerations for evolutionary algorithms: | |
21 | * <ul> | |
22 | * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li> | |
23 | * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li> | |
24 | * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li> | |
25 | * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li> | |
26 | * </ul> | |
27 | * | |
28 | * <p>Common usage patterns for kernel configuration: | |
29 | * <pre>{@code | |
30 | * // Query kernel information after compilation | |
31 | * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation"); | |
32 | * | |
33 | * // Configure work group size within device limits | |
34 | * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize()); | |
35 | * | |
36 | * // Optimize for preferred work group size multiple | |
37 | * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple(); | |
38 | * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple; | |
39 | * | |
40 | * // Validate memory requirements for population size | |
41 | * long populationSize = 1000; | |
42 | * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize; | |
43 | * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize; | |
44 | * | |
45 | * // Configure kernel execution with validated parameters | |
46 | * clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, | |
47 | * new long[]{populationSize}, new long[]{optimalWorkGroupSize}, 0, null, null); | |
48 | * }</pre> | |
49 | * | |
50 | * <p>Performance optimization workflow: | |
51 | * <ol> | |
52 | * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li> | |
53 | * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li> | |
54 | * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li> | |
55 | * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li> | |
56 | * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li> | |
57 | * </ol> | |
58 | * | |
59 | * <p>Memory management considerations: | |
60 | * <ul> | |
61 | * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li> | |
62 | * <li><strong>Private memory</strong>: Individual memory per work-item</li> | |
63 | * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li> | |
64 | * <li><strong>Device limits</strong>: Validate against device memory constraints</li> | |
65 | * </ul> | |
66 | * | |
67 | * <p>Error handling and validation: | |
68 | * <ul> | |
69 | * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li> | |
70 | * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li> | |
71 | * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li> | |
72 | * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li> | |
73 | * </ul> | |
74 | * | |
75 | * @see Device | |
76 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader | |
77 | * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils | |
78 | */ | |
79 | @Value.Immutable | |
80 | public interface KernelInfo { | |
81 | ||
82 | /** | |
83 | * Returns the name of the kernel function. | |
84 | * | |
85 | * @return the kernel function name as specified in the OpenCL program | |
86 | */ | |
87 | String name(); | |
88 | ||
89 | /** | |
90 | * Returns the maximum work group size that can be used when executing this kernel on the device. | |
91 | * | |
92 | * <p>This value represents the maximum number of work-items that can be in a work group when | |
93 | * executing this specific kernel on the target device. It may be smaller than the device's | |
94 | * general maximum work group size due to kernel-specific resource requirements. | |
95 | * | |
96 | * @return the maximum work group size for this kernel | |
97 | */ | |
98 | long workGroupSize(); | |
99 | ||
100 | /** | |
101 | * Returns the preferred work group size multiple for optimal kernel execution performance. | |
102 | * | |
103 | * <p>For optimal performance, the work group size should be a multiple of this value. | |
104 | * This represents the native vector width or wavefront size of the device and helps | |
105 | * achieve better resource utilization and memory coalescing. | |
106 | * | |
107 | * @return the preferred work group size multiple for performance optimization | |
108 | */ | |
109 | long preferredWorkGroupSizeMultiple(); | |
110 | ||
111 | /** | |
112 | * Returns the amount of local memory in bytes used by this kernel. | |
113 | * | |
114 | * <p>Local memory is shared among all work-items in a work group and includes both | |
115 | * statically allocated local variables and dynamically allocated local memory passed | |
116 | * as kernel arguments. This value is used to validate that the total local memory | |
117 | * usage doesn't exceed the device's local memory capacity. | |
118 | * | |
119 | * @return the local memory usage in bytes per work group | |
120 | */ | |
121 | long localMemSize(); | |
122 | ||
123 | /** | |
124 | * Returns the minimum amount of private memory in bytes used by each work-item. | |
125 | * | |
126 | * <p>Private memory is individual to each work-item and includes local variables, | |
127 | * function call stacks, and other per-work-item data. This value helps estimate | |
128 | * the total memory footprint when launching kernels with large work group sizes. | |
129 | * | |
130 | * @return the private memory usage in bytes per work-item | |
131 | */ | |
132 | long privateMemSize(); | |
133 | ||
134 | /** | |
135 | * Creates a new builder for constructing KernelInfo instances. | |
136 | * | |
137 | * @return a new builder for creating kernel information objects | |
138 | */ | |
139 | static ImmutableKernelInfo.Builder builder() { | |
140 |
2
1. builder : replaced return value with null for net/bmahe/genetics4j/gpu/opencl/model/KernelInfo::builder → NO_COVERAGE 2. builder : removed call to net/bmahe/genetics4j/gpu/opencl/model/ImmutableKernelInfo::builder → NO_COVERAGE |
return ImmutableKernelInfo.builder(); |
141 | } | |
142 | } | |
Mutations | ||
140 |
1.1 2.2 |