1 package net.bmahe.genetics4j.gpu.opencl.model;
2
3 import org.immutables.value.Value;
4
5 /**
6 * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific
7 * device.
8 *
9 * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel,
10 * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated
11 * evolutionary algorithms. This information is determined at kernel compilation time and varies by device.
12 *
13 * <p>Key kernel characteristics include:
14 * <ul>
15 * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li>
16 * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li>
17 * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource
18 * utilization</li>
19 * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li>
20 * </ul>
21 *
22 * <p>Kernel optimization considerations for evolutionary algorithms:
23 * <ul>
24 * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li>
25 * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li>
26 * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li>
27 * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li>
28 * </ul>
29 *
30 * <p>Common usage patterns for kernel configuration:
31 *
32 * <pre>{@code
33 * // Query kernel information after compilation
34 * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation");
35 *
36 * // Configure work group size within device limits
37 * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize());
38 *
39 * // Optimize for preferred work group size multiple
40 * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple();
41 * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple;
42 *
43 * // Validate memory requirements for population size
44 * long populationSize = 1000;
45 * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize;
46 * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize;
47 *
48 * // Configure kernel execution with validated parameters
49 * clEnqueueNDRangeKernel(commandQueue,
50 * kernel,
51 * 1,
52 * null,
53 * new long[] { populationSize },
54 * new long[] { optimalWorkGroupSize },
55 * 0,
56 * null,
57 * null);
58 * }</pre>
59 *
60 * <p>Performance optimization workflow:
61 * <ol>
62 * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li>
63 * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li>
64 * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li>
65 * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li>
66 * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li>
67 * </ol>
68 *
69 * <p>Memory management considerations:
70 * <ul>
71 * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li>
72 * <li><strong>Private memory</strong>: Individual memory per work-item</li>
73 * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li>
74 * <li><strong>Device limits</strong>: Validate against device memory constraints</li>
75 * </ul>
76 *
77 * <p>Error handling and validation:
78 * <ul>
79 * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li>
80 * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li>
81 * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li>
82 * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li>
83 * </ul>
84 *
85 * @see Device
86 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader
87 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils
88 */
89 @Value.Immutable
90 public interface KernelInfo {
91
92 /**
93 * Returns the name of the kernel function.
94 *
95 * @return the kernel function name as specified in the OpenCL program
96 */
97 String name();
98
99 /**
100 * Returns the maximum work group size that can be used when executing this kernel on the device.
101 *
102 * <p>This value represents the maximum number of work-items that can be in a work group when executing this specific
103 * kernel on the target device. It may be smaller than the device's general maximum work group size due to
104 * kernel-specific resource requirements.
105 *
106 * @return the maximum work group size for this kernel
107 */
108 long workGroupSize();
109
110 /**
111 * Returns the preferred work group size multiple for optimal kernel execution performance.
112 *
113 * <p>For optimal performance, the work group size should be a multiple of this value. This represents the native
114 * vector width or wavefront size of the device and helps achieve better resource utilization and memory coalescing.
115 *
116 * @return the preferred work group size multiple for performance optimization
117 */
118 long preferredWorkGroupSizeMultiple();
119
120 /**
121 * Returns the amount of local memory in bytes used by this kernel.
122 *
123 * <p>Local memory is shared among all work-items in a work group and includes both statically allocated local
124 * variables and dynamically allocated local memory passed as kernel arguments. This value is used to validate that
125 * the total local memory usage doesn't exceed the device's local memory capacity.
126 *
127 * @return the local memory usage in bytes per work group
128 */
129 long localMemSize();
130
131 /**
132 * Returns the minimum amount of private memory in bytes used by each work-item.
133 *
134 * <p>Private memory is individual to each work-item and includes local variables, function call stacks, and other
135 * per-work-item data. This value helps estimate the total memory footprint when launching kernels with large work
136 * group sizes.
137 *
138 * @return the private memory usage in bytes per work-item
139 */
140 long privateMemSize();
141
142 /**
143 * Creates a new builder for constructing KernelInfo instances.
144 *
145 * @return a new builder for creating kernel information objects
146 */
147 static ImmutableKernelInfo.Builder builder() {
148 return ImmutableKernelInfo.builder();
149 }
150 }