Class SingleKernelFitness<T extends Comparable<T>>

java.lang.Object
net.bmahe.genetics4j.gpu.spec.fitness.OpenCLFitness<T>
net.bmahe.genetics4j.gpu.spec.fitness.SingleKernelFitness<T>
Type Parameters:
T - the fitness value type, must be Comparable for optimization algorithms

public class SingleKernelFitness<T extends Comparable<T>> extends OpenCLFitness<T>
GPU-accelerated fitness evaluator that executes a single OpenCL kernel for fitness computation.

SingleKernelFitness provides a comprehensive framework for implementing fitness evaluation using a single OpenCL kernel. It manages the complete lifecycle of GPU computation including data loading, kernel execution, and result extraction, making it suitable for most GPU-accelerated evolutionary algorithm scenarios.

Key features:

  • Single kernel execution: Executes one OpenCL kernel per fitness evaluation
  • Data management: Handles static data, dynamic data, and result allocation
  • Memory lifecycle: Automatic cleanup of OpenCL memory objects
  • Multi-device support: Supports concurrent execution across multiple devices
  • Local memory: Configurable local memory allocation for kernel optimization

Data flow architecture:

  • Static data: Algorithm parameters loaded once before all evaluations
  • Dynamic data: Population data loaded before each generation
  • Local memory: Work group local memory allocated based on kernel requirements
  • Result data: Output buffers allocated for fitness results and intermediate data

Typical usage pattern:


 // Define kernel and data configuration
 SingleKernelFitnessDescriptor descriptor = SingleKernelFitnessDescriptor.builder()
     .kernelName("fitness_evaluation")
     .addDataLoader(0, populationDataLoader)
     .addStaticDataLoader(1, parametersDataLoader)
     .addResultAllocator(2, fitnessResultAllocator)
     .kernelExecutionContextComputer(executionContextComputer)
     .build();
 
 // Define fitness extraction from GPU results
 FitnessExtractor<Double> extractor = (context, kernelCtx, executor, generation, genotypes, results) -> {
     float[] fitnessValues = results.extractFloatArray(context, 2);
     return Arrays.stream(fitnessValues)
         .mapToDouble(f -> (double) f)
         .boxed()
         .collect(Collectors.toList());
 };
 
 // Create single kernel fitness evaluator
 SingleKernelFitness<Double> fitness = SingleKernelFitness.of(descriptor, extractor);
 

Kernel execution workflow:

  1. Initialization: Load static data once before all evaluations
  2. Data preparation: Load generation-specific data and allocate result buffers
  3. Kernel setup: Configure kernel arguments with data references
  4. Execution: Launch kernel with optimized work group configuration
  5. Result extraction: Extract fitness values from GPU memory
  6. Cleanup: Release generation-specific memory resources

Memory management strategy:

  • Static data persistence: Static data remains allocated across generations
  • Dynamic allocation: Generation data is allocated and released per evaluation
  • Result buffer reuse: Result buffers can be reused with proper sizing
  • Automatic cleanup: Memory is automatically released in lifecycle methods

Performance optimization features:

  • Asynchronous execution: Kernel execution returns CompletableFuture for pipeline processing
  • Work group optimization: Configurable work group sizes for optimal device utilization
  • Memory coalescing: Support for optimized memory access patterns
  • Local memory utilization: Efficient use of device local memory for performance
See Also:
  • Field Details

  • Constructor Details

    • SingleKernelFitness

      public SingleKernelFitness(SingleKernelFitnessDescriptor _singleKernelFitnessDescriptor, FitnessExtractor<T> _fitnessExtractor)
      Constructs a SingleKernelFitness with the specified kernel descriptor and fitness extractor.
      Parameters:
      _singleKernelFitnessDescriptor - configuration for kernel execution and data management
      _fitnessExtractor - function to extract fitness values from GPU computation results
      Throws:
      IllegalArgumentException - if any parameter is null
  • Method Details

    • clearStaticData

      protected void clearStaticData(Device device)
    • clearData

      protected void clearData(Device device)
    • clearResultData

      protected void clearResultData(Device device)
    • beforeAllEvaluations

      public void beforeAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Description copied from class: OpenCLFitness
      Per-device initialization hook called for each OpenCL execution context.

      This method is called once for each OpenCL device that will be used for fitness evaluation. It allows device-specific initialization such as memory allocation, buffer creation, and device-specific resource setup.

      Typical use cases:

      • Allocate GPU memory buffers that persist across generations
      • Pre-load static data to GPU memory
      • Initialize device-specific data structures
      • Set up device-specific kernels or configurations

      Memory allocated in this method should typically be released in the corresponding OpenCLFitness.afterAllEvaluations(OpenCLExecutionContext, ExecutorService) method.

      Overrides:
      beforeAllEvaluations in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for a specific device
      executorService - the executor service for asynchronous operations
      See Also:
    • beforeEvaluation

      public void beforeEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Description copied from class: OpenCLFitness
      Per-device preparation hook called before each device partition evaluation.

      This method is called for each device before evaluating its assigned partition of the population. It provides access to the device context and the specific genotypes that will be evaluated on this device.

      Typical use cases:

      • Transfer genotype data to device memory
      • Update device-specific parameters for this generation
      • Prepare input buffers with population data
      • Set up kernel arguments that vary by generation
      Overrides:
      beforeEvaluation in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      generation - the current generation number (0-based)
      genotypes - the partition of genotypes to be evaluated on this device
      See Also:
    • compute

      public CompletableFuture<List<T>> compute(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Description copied from class: OpenCLFitness
      Performs the actual fitness computation using OpenCL kernels on the GPU.

      This is the core method that implements GPU-based fitness evaluation. It receives a partition of the population and must return corresponding fitness values using OpenCL kernel execution on the specified device.

      Implementation requirements:

      • Return order: Fitness values must correspond to genotypes in the same order
      • Size consistency: Return exactly one fitness value per input genotype
      • Asynchronous execution: Use the executor service for non-blocking GPU operations
      • Error handling: Handle GPU errors gracefully and provide meaningful exceptions

      Common implementation pattern:

      1. Data transfer: Copy genotype data to GPU memory
      2. Kernel setup: Configure kernel arguments and work group parameters
      3. Kernel execution: Launch OpenCL kernels for fitness computation
      4. Result retrieval: Read fitness values from GPU memory
      5. Data conversion: Convert GPU results to appropriate fitness type
      Specified by:
      compute in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context providing device access
      executorService - the executor service for asynchronous operations
      generation - the current generation number for context
      genotypes - the genotypes to evaluate on this device
      Returns:
      a CompletableFuture that will complete with fitness values for each genotype
    • afterEvaluation

      public void afterEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Description copied from class: OpenCLFitness
      Per-device cleanup hook called after each device partition evaluation.

      This method is called for each device after its partition evaluation completes, providing an opportunity for device-specific cleanup and resource management.

      Typical use cases:

      • Clean up temporary GPU memory allocations
      • Log device-specific performance metrics
      • Update device-specific statistics or state
      • Perform device-specific validation or debugging
      Overrides:
      afterEvaluation in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      generation - the current generation number (0-based)
      genotypes - the partition of genotypes that were evaluated on this device
      See Also:
    • afterAllEvaluations

      public void afterAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Description copied from class: OpenCLFitness
      Per-device cleanup hook called for each OpenCL execution context at the end.

      This method is called once for each OpenCL device when fitness evaluation is complete, providing an opportunity to clean up device-specific resources that were allocated in OpenCLFitness.beforeAllEvaluations(OpenCLExecutionContext, ExecutorService).

      Typical use cases:

      • Release GPU memory buffers and resources
      • Clean up device-specific data structures
      • Log device-specific performance summaries
      • Ensure no GPU memory leaks occur

      This method should ensure proper cleanup even if exceptions occurred during evaluation, as it may be the only opportunity to prevent resource leaks.

      Overrides:
      afterAllEvaluations in class OpenCLFitness<T extends Comparable<T>>
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      See Also:
    • of

      public static <U extends Comparable<U>> SingleKernelFitness<U> of(SingleKernelFitnessDescriptor singleKernelFitnessDescriptor, FitnessExtractor<U> fitnessExtractor)
      Creates a new SingleKernelFitness instance with the specified configuration.
      Type Parameters:
      U - the fitness value type
      Parameters:
      singleKernelFitnessDescriptor - configuration for kernel execution and data management
      fitnessExtractor - function to extract fitness values from GPU computation results
      Returns:
      a new SingleKernelFitness instance
      Throws:
      IllegalArgumentException - if any parameter is null