Class MultiStageFitness<T extends Comparable<T>>

java.lang.Object
net.bmahe.genetics4j.gpu.spec.fitness.OpenCLFitness<T>
net.bmahe.genetics4j.gpu.spec.fitness.MultiStageFitness<T>
Type Parameters:
T - the fitness value type, must be Comparable for optimization algorithms

public class MultiStageFitness<T extends Comparable<T>> extends OpenCLFitness<T>
GPU-accelerated fitness evaluator that executes multiple sequential OpenCL kernels for complex fitness computation.

MultiStageFitness provides a framework for implementing fitness evaluation that requires multiple sequential GPU kernel executions, where each stage can use results from previous stages as input. This is ideal for complex fitness functions that require multiple computational phases, such as neural network training, multi-objective optimization, or hierarchical problem decomposition.

Key features:

  • Sequential execution: Multiple OpenCL kernels executed in sequence
  • Inter-stage data flow: Results from earlier stages used as inputs to later stages
  • Memory optimization: Automatic cleanup and reuse of intermediate results
  • Pipeline processing: Support for complex computational pipelines
  • Stage configuration: Individual configuration for each computational stage

Multi-stage computation architecture:

  • Stage descriptors: Each stage defines its kernel, data loaders, and result allocators
  • Data reuse patterns: Previous stage results can be reused as arguments or size parameters
  • Memory lifecycle: Automatic management of intermediate results between stages
  • Static data sharing: Algorithm parameters shared across all stages

Typical usage pattern:


 // Define multi-stage descriptor with sequential kernels
 MultiStageDescriptor descriptor = MultiStageDescriptor.builder()
     .addStaticDataLoader("parameters", parametersLoader)
     .addStage(StageDescriptor.builder()
         .kernelName("preprocessing")
         .addDataLoader(0, inputDataLoader)
         .addResultAllocator(1, preprocessedResultAllocator)
         .build())
     .addStage(StageDescriptor.builder()
         .kernelName("fitness_evaluation")
         .reusePreviousResultAsArgument(1, 0)  // Use previous result as input
         .addResultAllocator(1, fitnessResultAllocator)
         .build())
     .build();
 
 // Define fitness extraction from final stage results
 FitnessExtractor<Double> extractor = (context, kernelCtx, executor, generation, genotypes, results) -> {
     float[] fitnessValues = results.extractFloatArray(context, 1);
     return Arrays.stream(fitnessValues)
         .mapToDouble(f -> (double) f)
         .boxed()
         .collect(Collectors.toList());
 };
 
 // Create multi-stage fitness evaluator
 MultiStageFitness<Double> fitness = MultiStageFitness.of(descriptor, extractor);
 

Stage execution workflow:

  1. Initialization: Load shared static data once before all evaluations
  2. Stage iteration: For each stage in sequence:
  3. Context computation: Calculate kernel execution parameters for the stage
  4. Data preparation: Load stage-specific data and map previous results
  5. Kernel execution: Execute the stage kernel with configured parameters
  6. Result management: Store results for potential use in subsequent stages
  7. Final extraction: Extract fitness values from the last stage results
  8. Cleanup: Release all intermediate and final result memory

Inter-stage data flow patterns:

  • Result reuse: Use previous stage output buffers as input to subsequent stages
  • Size propagation: Use previous stage result sizes as parameters for memory allocation
  • Memory optimization: Automatic cleanup of intermediate results no longer needed
  • Data type preservation: Maintain OpenCL data types across stage boundaries

Memory management strategy:

  • Static data persistence: Shared parameters allocated once across all stages
  • Intermediate cleanup: Automatic release of stage results when no longer needed
  • Result chaining: Efficient memory reuse between consecutive stages
  • Final cleanup: Complete memory cleanup after fitness extraction

Performance optimization features:

  • Pipeline efficiency: Minimized memory transfers between stages
  • Memory coalescing: Optimized data layouts for GPU memory access
  • Stage-specific tuning: Individual work group optimization per stage
  • Asynchronous execution: Non-blocking fitness computation
See Also:
  • Field Details

  • Constructor Details

    • MultiStageFitness

      public MultiStageFitness(MultiStageDescriptor _multiStageDescriptor, FitnessExtractor<T> _fitnessExtractor)
      Constructs a MultiStageFitness with the specified stage descriptor and fitness extractor.
      Parameters:
      _multiStageDescriptor - configuration for multi-stage kernel execution and data management
      _fitnessExtractor - function to extract fitness values from final stage results
      Throws:
      IllegalArgumentException - if any parameter is null
  • Method Details

    • clearStaticData

      protected void clearStaticData(Device device)
    • clearData

      protected void clearData(Map<Integer,CLData> data)
    • clearResultData

      protected void clearResultData(Map<Integer,CLData> resultData)
    • prepareStaticData

      protected void prepareStaticData(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor)
    • allocateLocalMemory

      private void allocateLocalMemory(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, long generation, List<Genotype> genotypes, KernelExecutionContext kernelExecutionContext)
    • loadData

      protected void loadData(OpenCLExecutionContext openCLExecutionContext, StageDescriptor stageDescriptor, Map<Integer,CLData> data, long generation, List<Genotype> genotypes)
    • beforeAllEvaluations

      public void beforeAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Description copied from class: OpenCLFitness
      Per-device initialization hook called for each OpenCL execution context.

      This method is called once for each OpenCL device that will be used for fitness evaluation. It allows device-specific initialization such as memory allocation, buffer creation, and device-specific resource setup.

      Typical use cases:

      • Allocate GPU memory buffers that persist across generations
      • Pre-load static data to GPU memory
      • Initialize device-specific data structures
      • Set up device-specific kernels or configurations

      Memory allocated in this method should typically be released in the corresponding OpenCLFitness.afterAllEvaluations(OpenCLExecutionContext, ExecutorService) method.

      Overrides:
      beforeAllEvaluations in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for a specific device
      executorService - the executor service for asynchronous operations
      See Also:
    • compute

      public CompletableFuture<List<T>> compute(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Description copied from class: OpenCLFitness
      Performs the actual fitness computation using OpenCL kernels on the GPU.

      This is the core method that implements GPU-based fitness evaluation. It receives a partition of the population and must return corresponding fitness values using OpenCL kernel execution on the specified device.

      Implementation requirements:

      • Return order: Fitness values must correspond to genotypes in the same order
      • Size consistency: Return exactly one fitness value per input genotype
      • Asynchronous execution: Use the executor service for non-blocking GPU operations
      • Error handling: Handle GPU errors gracefully and provide meaningful exceptions

      Common implementation pattern:

      1. Data transfer: Copy genotype data to GPU memory
      2. Kernel setup: Configure kernel arguments and work group parameters
      3. Kernel execution: Launch OpenCL kernels for fitness computation
      4. Result retrieval: Read fitness values from GPU memory
      5. Data conversion: Convert GPU results to appropriate fitness type
      Specified by:
      compute in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context providing device access
      executorService - the executor service for asynchronous operations
      generation - the current generation number for context
      genotypes - the genotypes to evaluate on this device
      Returns:
      a CompletableFuture that will complete with fitness values for each genotype
    • afterEvaluation

      public void afterEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Description copied from class: OpenCLFitness
      Per-device cleanup hook called after each device partition evaluation.

      This method is called for each device after its partition evaluation completes, providing an opportunity for device-specific cleanup and resource management.

      Typical use cases:

      • Clean up temporary GPU memory allocations
      • Log device-specific performance metrics
      • Update device-specific statistics or state
      • Perform device-specific validation or debugging
      Overrides:
      afterEvaluation in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      generation - the current generation number (0-based)
      genotypes - the partition of genotypes that were evaluated on this device
      See Also:
    • afterAllEvaluations

      public void afterAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Description copied from class: OpenCLFitness
      Per-device cleanup hook called for each OpenCL execution context at the end.

      This method is called once for each OpenCL device when fitness evaluation is complete, providing an opportunity to clean up device-specific resources that were allocated in OpenCLFitness.beforeAllEvaluations(OpenCLExecutionContext, ExecutorService).

      Typical use cases:

      • Release GPU memory buffers and resources
      • Clean up device-specific data structures
      • Log device-specific performance summaries
      • Ensure no GPU memory leaks occur

      This method should ensure proper cleanup even if exceptions occurred during evaluation, as it may be the only opportunity to prevent resource leaks.

      Overrides:
      afterAllEvaluations in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      See Also:
    • of

      public static <U extends Comparable<U>> MultiStageFitness<U> of(MultiStageDescriptor multiStageDescriptor, FitnessExtractor<U> fitnessExtractor)
      Creates a new MultiStageFitness instance with the specified configuration.
      Type Parameters:
      U - the fitness value type
      Parameters:
      multiStageDescriptor - configuration for multi-stage kernel execution and data management
      fitnessExtractor - function to extract fitness values from final stage results
      Returns:
      a new MultiStageFitness instance
      Throws:
      IllegalArgumentException - if any parameter is null