KernelInfo.java

1
package net.bmahe.genetics4j.gpu.opencl.model;
2
3
import org.immutables.value.Value;
4
5
/**
6
 * Represents kernel-specific execution characteristics and resource requirements for an OpenCL kernel on a specific device.
7
 * 
8
 * <p>KernelInfo encapsulates the device-specific compilation and execution characteristics of an OpenCL kernel,
9
 * providing essential information for optimal work group configuration and resource allocation in GPU-accelerated
10
 * evolutionary algorithms. This information is determined at kernel compilation time and varies by device.
11
 * 
12
 * <p>Key kernel characteristics include:
13
 * <ul>
14
 * <li><strong>Work group constraints</strong>: Maximum and preferred work group sizes for efficient execution</li>
15
 * <li><strong>Memory usage</strong>: Local and private memory requirements per work-item</li>
16
 * <li><strong>Performance optimization</strong>: Preferred work group size multiples for optimal resource utilization</li>
17
 * <li><strong>Resource validation</strong>: Constraints for validating kernel launch parameters</li>
18
 * </ul>
19
 * 
20
 * <p>Kernel optimization considerations for evolutionary algorithms:
21
 * <ul>
22
 * <li><strong>Work group sizing</strong>: Configure launch parameters within device-specific limits</li>
23
 * <li><strong>Memory allocation</strong>: Ensure sufficient local memory for parallel fitness evaluation</li>
24
 * <li><strong>Performance tuning</strong>: Align work group sizes with preferred multiples</li>
25
 * <li><strong>Resource planning</strong>: Account for per-work-item memory requirements</li>
26
 * </ul>
27
 * 
28
 * <p>Common usage patterns for kernel configuration:
29
 * <pre>{@code
30
 * // Query kernel information after compilation
31
 * KernelInfo kernelInfo = kernelInfoReader.read(deviceId, kernel, "fitness_evaluation");
32
 * 
33
 * // Configure work group size within device limits
34
 * long maxWorkGroupSize = Math.min(kernelInfo.workGroupSize(), device.maxWorkGroupSize());
35
 * 
36
 * // Optimize for preferred work group size multiple
37
 * long preferredMultiple = kernelInfo.preferredWorkGroupSizeMultiple();
38
 * long optimalWorkGroupSize = (maxWorkGroupSize / preferredMultiple) * preferredMultiple;
39
 * 
40
 * // Validate memory requirements for population size
41
 * long populationSize = 1000;
42
 * long totalLocalMem = kernelInfo.localMemSize() * optimalWorkGroupSize;
43
 * long totalPrivateMem = kernelInfo.privateMemSize() * populationSize;
44
 * 
45
 * // Configure kernel execution with validated parameters
46
 * clEnqueueNDRangeKernel(commandQueue, kernel, 1, null, 
47
 *     new long[]{populationSize}, new long[]{optimalWorkGroupSize}, 0, null, null);
48
 * }</pre>
49
 * 
50
 * <p>Performance optimization workflow:
51
 * <ol>
52
 * <li><strong>Kernel compilation</strong>: Compile kernel for target device</li>
53
 * <li><strong>Information query</strong>: Read kernel-specific execution characteristics</li>
54
 * <li><strong>Work group optimization</strong>: Calculate optimal work group size based on preferences</li>
55
 * <li><strong>Memory validation</strong>: Ensure memory requirements fit within device limits</li>
56
 * <li><strong>Launch configuration</strong>: Configure kernel execution with optimized parameters</li>
57
 * </ol>
58
 * 
59
 * <p>Memory management considerations:
60
 * <ul>
61
 * <li><strong>Local memory</strong>: Shared among work-items in the same work group</li>
62
 * <li><strong>Private memory</strong>: Individual memory per work-item</li>
63
 * <li><strong>Total allocation</strong>: Sum of all work-items' memory requirements</li>
64
 * <li><strong>Device limits</strong>: Validate against device memory constraints</li>
65
 * </ul>
66
 * 
67
 * <p>Error handling and validation:
68
 * <ul>
69
 * <li><strong>Work group limits</strong>: Ensure launch parameters don't exceed kernel limits</li>
70
 * <li><strong>Memory constraints</strong>: Validate total memory usage against device capabilities</li>
71
 * <li><strong>Performance degradation</strong>: Monitor for suboptimal work group configurations</li>
72
 * <li><strong>Resource conflicts</strong>: Handle multiple kernels competing for device resources</li>
73
 * </ul>
74
 * 
75
 * @see Device
76
 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoReader
77
 * @see net.bmahe.genetics4j.gpu.opencl.KernelInfoUtils
78
 */
79
@Value.Immutable
80
public interface KernelInfo {
81
82
	/**
83
	 * Returns the name of the kernel function.
84
	 * 
85
	 * @return the kernel function name as specified in the OpenCL program
86
	 */
87
	String name();
88
89
	/**
90
	 * Returns the maximum work group size that can be used when executing this kernel on the device.
91
	 * 
92
	 * <p>This value represents the maximum number of work-items that can be in a work group when
93
	 * executing this specific kernel on the target device. It may be smaller than the device's
94
	 * general maximum work group size due to kernel-specific resource requirements.
95
	 * 
96
	 * @return the maximum work group size for this kernel
97
	 */
98
	long workGroupSize();
99
100
	/**
101
	 * Returns the preferred work group size multiple for optimal kernel execution performance.
102
	 * 
103
	 * <p>For optimal performance, the work group size should be a multiple of this value.
104
	 * This represents the native vector width or wavefront size of the device and helps
105
	 * achieve better resource utilization and memory coalescing.
106
	 * 
107
	 * @return the preferred work group size multiple for performance optimization
108
	 */
109
	long preferredWorkGroupSizeMultiple();
110
111
	/**
112
	 * Returns the amount of local memory in bytes used by this kernel.
113
	 * 
114
	 * <p>Local memory is shared among all work-items in a work group and includes both
115
	 * statically allocated local variables and dynamically allocated local memory passed
116
	 * as kernel arguments. This value is used to validate that the total local memory
117
	 * usage doesn't exceed the device's local memory capacity.
118
	 * 
119
	 * @return the local memory usage in bytes per work group
120
	 */
121
	long localMemSize();
122
123
	/**
124
	 * Returns the minimum amount of private memory in bytes used by each work-item.
125
	 * 
126
	 * <p>Private memory is individual to each work-item and includes local variables,
127
	 * function call stacks, and other per-work-item data. This value helps estimate
128
	 * the total memory footprint when launching kernels with large work group sizes.
129
	 * 
130
	 * @return the private memory usage in bytes per work-item
131
	 */
132
	long privateMemSize();
133
134
	/**
135
	 * Creates a new builder for constructing KernelInfo instances.
136
	 * 
137
	 * @return a new builder for creating kernel information objects
138
	 */
139
	static ImmutableKernelInfo.Builder builder() {
140 2 1. builder : replaced return value with null for net/bmahe/genetics4j/gpu/opencl/model/KernelInfo::builder → NO_COVERAGE
2. builder : removed call to net/bmahe/genetics4j/gpu/opencl/model/ImmutableKernelInfo::builder → NO_COVERAGE
		return ImmutableKernelInfo.builder();
141
	}
142
}

Mutations

140

1.1
Location : builder
Killed by : none
replaced return value with null for net/bmahe/genetics4j/gpu/opencl/model/KernelInfo::builder → NO_COVERAGE

2.2
Location : builder
Killed by : none
removed call to net/bmahe/genetics4j/gpu/opencl/model/ImmutableKernelInfo::builder → NO_COVERAGE

Active mutators

Tests examined


Report generated by PIT 1.19.6