Class MultiStageFitness<T extends Comparable<T>>
- Type Parameters:
T
- the fitness value type, must be Comparable for optimization algorithms
MultiStageFitness provides a framework for implementing fitness evaluation that requires multiple sequential GPU kernel executions, where each stage can use results from previous stages as input. This is ideal for complex fitness functions that require multiple computational phases, such as neural network training, multi-objective optimization, or hierarchical problem decomposition.
Key features:
- Sequential execution: Multiple OpenCL kernels executed in sequence
- Inter-stage data flow: Results from earlier stages used as inputs to later stages
- Memory optimization: Automatic cleanup and reuse of intermediate results
- Pipeline processing: Support for complex computational pipelines
- Stage configuration: Individual configuration for each computational stage
Multi-stage computation architecture:
- Stage descriptors: Each stage defines its kernel, data loaders, and result allocators
- Data reuse patterns: Previous stage results can be reused as arguments or size parameters
- Memory lifecycle: Automatic management of intermediate results between stages
- Static data sharing: Algorithm parameters shared across all stages
Typical usage pattern:
// Define multi-stage descriptor with sequential kernels
MultiStageDescriptor descriptor = MultiStageDescriptor.builder()
.addStaticDataLoader("parameters", parametersLoader)
.addStage(StageDescriptor.builder()
.kernelName("preprocessing")
.addDataLoader(0, inputDataLoader)
.addResultAllocator(1, preprocessedResultAllocator)
.build())
.addStage(StageDescriptor.builder()
.kernelName("fitness_evaluation")
.reusePreviousResultAsArgument(1, 0) // Use previous result as input
.addResultAllocator(1, fitnessResultAllocator)
.build())
.build();
// Define fitness extraction from final stage results
FitnessExtractor<Double> extractor = (context, kernelCtx, executor, generation, genotypes, results) -> {
float[] fitnessValues = results.extractFloatArray(context, 1);
return Arrays.stream(fitnessValues)
.mapToDouble(f -> (double) f)
.boxed()
.collect(Collectors.toList());
};
// Create multi-stage fitness evaluator
MultiStageFitness<Double> fitness = MultiStageFitness.of(descriptor, extractor);
Stage execution workflow:
- Initialization: Load shared static data once before all evaluations
- Stage iteration: For each stage in sequence:
- Context computation: Calculate kernel execution parameters for the stage
- Data preparation: Load stage-specific data and map previous results
- Kernel execution: Execute the stage kernel with configured parameters
- Result management: Store results for potential use in subsequent stages
- Final extraction: Extract fitness values from the last stage results
- Cleanup: Release all intermediate and final result memory
Inter-stage data flow patterns:
- Result reuse: Use previous stage output buffers as input to subsequent stages
- Size propagation: Use previous stage result sizes as parameters for memory allocation
- Memory optimization: Automatic cleanup of intermediate results no longer needed
- Data type preservation: Maintain OpenCL data types across stage boundaries
Memory management strategy:
- Static data persistence: Shared parameters allocated once across all stages
- Intermediate cleanup: Automatic release of stage results when no longer needed
- Result chaining: Efficient memory reuse between consecutive stages
- Final cleanup: Complete memory cleanup after fitness extraction
Performance optimization features:
- Pipeline efficiency: Minimized memory transfers between stages
- Memory coalescing: Optimized data layouts for GPU memory access
- Stage-specific tuning: Individual work group optimization per stage
- Asynchronous execution: Non-blocking fitness computation
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final FitnessExtractor
<T> static final org.apache.logging.log4j.Logger
private final MultiStageDescriptor
-
Constructor Summary
ConstructorsConstructorDescriptionMultiStageFitness
(MultiStageDescriptor _multiStageDescriptor, FitnessExtractor<T> _fitnessExtractor) Constructs a MultiStageFitness with the specified stage descriptor and fitness extractor. -
Method Summary
Modifier and TypeMethodDescriptionvoid
afterAllEvaluations
(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService) Per-device cleanup hook called for each OpenCL execution context at the end.void
afterEvaluation
(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes) Per-device cleanup hook called after each device partition evaluation.private void
allocateLocalMemory
(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, long generation, List<Genotype> genotypes, KernelExecutionContext kernelExecutionContext) void
beforeAllEvaluations
(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService) Per-device initialization hook called for each OpenCL execution context.protected void
protected void
clearResultData
(Map<Integer, CLData> resultData) protected void
clearStaticData
(Device device) compute
(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes) Performs the actual fitness computation using OpenCL kernels on the GPU.protected void
loadData
(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, Map<Integer, CLData> data, long generation, List<Genotype> genotypes) static <U extends Comparable<U>>
MultiStageFitness<U> of
(MultiStageDescriptor multiStageDescriptor, FitnessExtractor<U> fitnessExtractor) Creates a new MultiStageFitness instance with the specified configuration.protected void
prepareStaticData
(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor) Methods inherited from class net.bmahe.genetics4j.gpu.spec.fitness.OpenCLFitness
afterAllEvaluations, afterEvaluation, beforeAllEvaluations, beforeEvaluation, beforeEvaluation
-
Field Details
-
logger
public static final org.apache.logging.log4j.Logger logger -
multiStageDescriptor
-
fitnessExtractor
-
staticData
-
-
Constructor Details
-
MultiStageFitness
public MultiStageFitness(MultiStageDescriptor _multiStageDescriptor, FitnessExtractor<T> _fitnessExtractor) Constructs a MultiStageFitness with the specified stage descriptor and fitness extractor.- Parameters:
_multiStageDescriptor
- configuration for multi-stage kernel execution and data management_fitnessExtractor
- function to extract fitness values from final stage results- Throws:
IllegalArgumentException
- if any parameter is null
-
-
Method Details
-
clearStaticData
-
clearData
-
clearResultData
-
prepareStaticData
protected void prepareStaticData(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor) -
allocateLocalMemory
private void allocateLocalMemory(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, long generation, List<Genotype> genotypes, KernelExecutionContext kernelExecutionContext) -
loadData
protected void loadData(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, Map<Integer, CLData> data, long generation, List<Genotype> genotypes) -
beforeAllEvaluations
public void beforeAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService) Description copied from class:OpenCLFitness
Per-device initialization hook called for each OpenCL execution context.This method is called once for each OpenCL device that will be used for fitness evaluation. It allows device-specific initialization such as memory allocation, buffer creation, and device-specific resource setup.
Typical use cases:
- Allocate GPU memory buffers that persist across generations
- Pre-load static data to GPU memory
- Initialize device-specific data structures
- Set up device-specific kernels or configurations
Memory allocated in this method should typically be released in the corresponding
OpenCLFitness.afterAllEvaluations(OpenCLExecutionContext, ExecutorService)
method.- Overrides:
beforeAllEvaluations
in classOpenCLFitness<T extends Comparable<T>>
- Parameters:
openCLExecutionContext
- the OpenCL execution context for a specific deviceexecutorService
- the executor service for asynchronous operations- See Also:
-
compute
public CompletableFuture<List<T>> compute(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes) Description copied from class:OpenCLFitness
Performs the actual fitness computation using OpenCL kernels on the GPU.This is the core method that implements GPU-based fitness evaluation. It receives a partition of the population and must return corresponding fitness values using OpenCL kernel execution on the specified device.
Implementation requirements:
- Return order: Fitness values must correspond to genotypes in the same order
- Size consistency: Return exactly one fitness value per input genotype
- Asynchronous execution: Use the executor service for non-blocking GPU operations
- Error handling: Handle GPU errors gracefully and provide meaningful exceptions
Common implementation pattern:
- Data transfer: Copy genotype data to GPU memory
- Kernel setup: Configure kernel arguments and work group parameters
- Kernel execution: Launch OpenCL kernels for fitness computation
- Result retrieval: Read fitness values from GPU memory
- Data conversion: Convert GPU results to appropriate fitness type
- Specified by:
compute
in classOpenCLFitness<T extends Comparable<T>>
- Parameters:
openCLExecutionContext
- the OpenCL execution context providing device accessexecutorService
- the executor service for asynchronous operationsgeneration
- the current generation number for contextgenotypes
- the genotypes to evaluate on this device- Returns:
- a CompletableFuture that will complete with fitness values for each genotype
-
afterEvaluation
public void afterEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes) Description copied from class:OpenCLFitness
Per-device cleanup hook called after each device partition evaluation.This method is called for each device after its partition evaluation completes, providing an opportunity for device-specific cleanup and resource management.
Typical use cases:
- Clean up temporary GPU memory allocations
- Log device-specific performance metrics
- Update device-specific statistics or state
- Perform device-specific validation or debugging
- Overrides:
afterEvaluation
in classOpenCLFitness<T extends Comparable<T>>
- Parameters:
openCLExecutionContext
- the OpenCL execution context for this deviceexecutorService
- the executor service for asynchronous operationsgeneration
- the current generation number (0-based)genotypes
- the partition of genotypes that were evaluated on this device- See Also:
-
afterAllEvaluations
public void afterAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService) Description copied from class:OpenCLFitness
Per-device cleanup hook called for each OpenCL execution context at the end.This method is called once for each OpenCL device when fitness evaluation is complete, providing an opportunity to clean up device-specific resources that were allocated in
OpenCLFitness.beforeAllEvaluations(OpenCLExecutionContext, ExecutorService)
.Typical use cases:
- Release GPU memory buffers and resources
- Clean up device-specific data structures
- Log device-specific performance summaries
- Ensure no GPU memory leaks occur
This method should ensure proper cleanup even if exceptions occurred during evaluation, as it may be the only opportunity to prevent resource leaks.
- Overrides:
afterAllEvaluations
in classOpenCLFitness<T extends Comparable<T>>
- Parameters:
openCLExecutionContext
- the OpenCL execution context for this deviceexecutorService
- the executor service for asynchronous operations- See Also:
-
of
public static <U extends Comparable<U>> MultiStageFitness<U> of(MultiStageDescriptor multiStageDescriptor, FitnessExtractor<U> fitnessExtractor) Creates a new MultiStageFitness instance with the specified configuration.- Type Parameters:
U
- the fitness value type- Parameters:
multiStageDescriptor
- configuration for multi-stage kernel execution and data managementfitnessExtractor
- function to extract fitness values from final stage results- Returns:
- a new MultiStageFitness instance
- Throws:
IllegalArgumentException
- if any parameter is null
-