Class OpenCLFitness<T extends Comparable<T>>

java.lang.Object
net.bmahe.genetics4j.gpu.spec.fitness.OpenCLFitness<T>
Type Parameters:
T - the type of fitness values produced, must be comparable for selection operations
Direct Known Subclasses:
MultiStageFitness, SingleKernelFitness

public abstract class OpenCLFitness<T extends Comparable<T>> extends Object
Abstract base class for implementing OpenCL-based fitness evaluation in GPU-accelerated evolutionary algorithms.

OpenCLFitness provides the framework for evaluating population fitness using OpenCL kernels executed on GPU devices. This class defines the lifecycle and coordination patterns needed for efficient GPU-based fitness computation, including resource management, data transfer, and kernel execution orchestration.

The fitness evaluation lifecycle consists of several phases:

  1. Global initialization: One-time setup before any evaluations (beforeAllEvaluations())
  2. Per-device initialization: Setup for each OpenCL device context
  3. Generation setup: Preparation before each generation evaluation
  4. Computation: Actual fitness evaluation using OpenCL kernels
  5. Generation cleanup: Cleanup after each generation evaluation
  6. Per-device cleanup: Cleanup for each OpenCL device context
  7. Global cleanup: Final cleanup after all evaluations (afterAllEvaluations(net.bmahe.genetics4j.gpu.opencl.OpenCLExecutionContext, java.util.concurrent.ExecutorService))

Key responsibilities for implementations:

  • Data preparation: Convert genotypes to GPU-compatible data formats
  • Memory management: Allocate and manage GPU memory buffers
  • Kernel execution: Configure and execute OpenCL kernels with appropriate parameters
  • Result extraction: Retrieve and convert fitness values from GPU memory
  • Resource cleanup: Ensure proper cleanup of GPU resources

Common implementation patterns:


 public class MyGPUFitness extends OpenCLFitness<Double> {
     
     private CLData inputBuffer;
     private CLData outputBuffer;
     
     @Override
     public void beforeAllEvaluations(OpenCLExecutionContext context, ExecutorService executor) {
         // Allocate GPU memory buffers that persist across generations
         int maxPopulationSize = getMaxPopulationSize();
         inputBuffer = CLData.allocateFloat(context, maxPopulationSize * chromosomeSize);
         outputBuffer = CLData.allocateFloat(context, maxPopulationSize);
     }
     
     @Override
     public CompletableFuture<List<Double>> compute(OpenCLExecutionContext context, 
             ExecutorService executor, long generation, List<Genotype> genotypes) {
         
         return CompletableFuture.supplyAsync(() -> {
             // Transfer genotype data to GPU
             transferGenotypesToGPU(context, genotypes, inputBuffer);
             
             // Execute fitness evaluation kernel
             executeKernel(context, "fitness_kernel", genotypes.size());
             
             // Retrieve results from GPU
             return extractFitnessValues(context, outputBuffer, genotypes.size());
         }, executor);
     }
     
     @Override
     public void afterAllEvaluations(OpenCLExecutionContext context, ExecutorService executor) {
         // Clean up GPU memory
         inputBuffer.release();
         outputBuffer.release();
     }
 }
 

Performance optimization strategies:

  • Memory reuse: Allocate buffers once in beforeAllEvaluations() and reuse across generations
  • Asynchronous execution: Use CompletableFuture for non-blocking GPU operations
  • Batch processing: Process entire populations in single kernel launches
  • Memory coalescing: Organize data layouts for optimal GPU memory access patterns
  • Kernel optimization: Design kernels to maximize GPU utilization and minimize divergence

Error handling and robustness:

  • GPU errors: Handle OpenCL errors gracefully and provide meaningful error messages
  • Memory management: Ensure proper cleanup even in exceptional circumstances
  • Device failures: Support graceful degradation when GPU devices fail
  • Timeout handling: Implement appropriate timeouts for long-running kernels

Multi-device considerations:

  • Device-specific setup: Separate contexts and buffers for each device
  • Load balancing: Coordinate with the framework's automatic population partitioning
  • Resource isolation: Ensure proper isolation of resources between devices
  • Synchronization: Coordinate results from multiple devices
See Also:
  • Field Details

    • logger

      public static final org.apache.logging.log4j.Logger logger
  • Constructor Details

    • OpenCLFitness

      public OpenCLFitness()
  • Method Details

    • beforeAllEvaluations

      public void beforeAllEvaluations()
      Global initialization hook called once before any fitness evaluations begin.

      This method is called once at the beginning of the evolutionary algorithm execution, before any OpenCL contexts are created or evaluations are performed. Use this method for global initialization that applies to all devices and generations.

      Typical use cases:

      • Initialize problem-specific constants or parameters
      • Load reference data or configuration
      • Set up logging or monitoring infrastructure
      • Validate problem constraints or requirements

      This method is called on the main thread before any concurrent operations begin.

      See Also:
    • beforeAllEvaluations

      public void beforeAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Per-device initialization hook called for each OpenCL execution context.

      This method is called once for each OpenCL device that will be used for fitness evaluation. It allows device-specific initialization such as memory allocation, buffer creation, and device-specific resource setup.

      Typical use cases:

      • Allocate GPU memory buffers that persist across generations
      • Pre-load static data to GPU memory
      • Initialize device-specific data structures
      • Set up device-specific kernels or configurations

      Memory allocated in this method should typically be released in the corresponding afterAllEvaluations(OpenCLExecutionContext, ExecutorService) method.

      Parameters:
      openCLExecutionContext - the OpenCL execution context for a specific device
      executorService - the executor service for asynchronous operations
      See Also:
    • beforeEvaluation

      public void beforeEvaluation(long generation, List<Genotype> genotypes)
      Global preparation hook called before each generation evaluation.

      This method is called before fitness evaluation of each generation, providing an opportunity for global preparation that applies across all devices. It receives the generation number and complete population for context.

      Typical use cases:

      • Update generation-specific parameters or configurations
      • Log generation start or population statistics
      • Prepare global data structures for the upcoming evaluation
      • Implement adaptive behavior based on generation number
      Parameters:
      generation - the current generation number (0-based)
      genotypes - the complete population to be evaluated
      See Also:
    • beforeEvaluation

      public void beforeEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Per-device preparation hook called before each device partition evaluation.

      This method is called for each device before evaluating its assigned partition of the population. It provides access to the device context and the specific genotypes that will be evaluated on this device.

      Typical use cases:

      • Transfer genotype data to device memory
      • Update device-specific parameters for this generation
      • Prepare input buffers with population data
      • Set up kernel arguments that vary by generation
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      generation - the current generation number (0-based)
      genotypes - the partition of genotypes to be evaluated on this device
      See Also:
    • compute

      public abstract CompletableFuture<List<T>> compute(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Performs the actual fitness computation using OpenCL kernels on the GPU.

      This is the core method that implements GPU-based fitness evaluation. It receives a partition of the population and must return corresponding fitness values using OpenCL kernel execution on the specified device.

      Implementation requirements:

      • Return order: Fitness values must correspond to genotypes in the same order
      • Size consistency: Return exactly one fitness value per input genotype
      • Asynchronous execution: Use the executor service for non-blocking GPU operations
      • Error handling: Handle GPU errors gracefully and provide meaningful exceptions

      Common implementation pattern:

      1. Data transfer: Copy genotype data to GPU memory
      2. Kernel setup: Configure kernel arguments and work group parameters
      3. Kernel execution: Launch OpenCL kernels for fitness computation
      4. Result retrieval: Read fitness values from GPU memory
      5. Data conversion: Convert GPU results to appropriate fitness type
      Parameters:
      openCLExecutionContext - the OpenCL execution context providing device access
      executorService - the executor service for asynchronous operations
      generation - the current generation number for context
      genotypes - the genotypes to evaluate on this device
      Returns:
      a CompletableFuture that will complete with fitness values for each genotype
      Throws:
      RuntimeException - if GPU evaluation fails or setup errors occur
    • afterEvaluation

      public void afterEvaluation(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService, long generation, List<Genotype> genotypes)
      Per-device cleanup hook called after each device partition evaluation.

      This method is called for each device after its partition evaluation completes, providing an opportunity for device-specific cleanup and resource management.

      Typical use cases:

      • Clean up temporary GPU memory allocations
      • Log device-specific performance metrics
      • Update device-specific statistics or state
      • Perform device-specific validation or debugging
      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      generation - the current generation number (0-based)
      genotypes - the partition of genotypes that were evaluated on this device
      See Also:
    • afterEvaluation

      public void afterEvaluation(long generation, List<Genotype> genotypes)
      Global cleanup hook called after each generation evaluation.

      This method is called after fitness evaluation of each generation completes across all devices, providing an opportunity for global cleanup and statistics collection that applies to the entire population.

      Typical use cases:

      • Log generation completion and performance metrics
      • Update global statistics or progress tracking
      • Perform global validation or debugging
      • Clean up generation-specific global resources
      Parameters:
      generation - the current generation number (0-based)
      genotypes - the complete population that was evaluated
      See Also:
    • afterAllEvaluations

      public void afterAllEvaluations(OpenCLExecutionContext openCLExecutionContext, ExecutorService executorService)
      Per-device cleanup hook called for each OpenCL execution context at the end.

      This method is called once for each OpenCL device when fitness evaluation is complete, providing an opportunity to clean up device-specific resources that were allocated in beforeAllEvaluations(OpenCLExecutionContext, ExecutorService).

      Typical use cases:

      • Release GPU memory buffers and resources
      • Clean up device-specific data structures
      • Log device-specific performance summaries
      • Ensure no GPU memory leaks occur

      This method should ensure proper cleanup even if exceptions occurred during evaluation, as it may be the only opportunity to prevent resource leaks.

      Parameters:
      openCLExecutionContext - the OpenCL execution context for this device
      executorService - the executor service for asynchronous operations
      See Also:
    • afterAllEvaluations

      public void afterAllEvaluations()
      Global cleanup hook called once after all fitness evaluations complete.

      This method is called once at the end of the evolutionary algorithm execution, after all OpenCL contexts have been cleaned up and all evaluations are complete. Use this method for final global cleanup and resource deallocation.

      Typical use cases:

      • Clean up global resources and data structures
      • Log final performance summaries and statistics
      • Save results or generate reports
      • Perform final validation or cleanup

      This method is called on the main thread after all concurrent operations complete.

      See Also: