Class GPUFitnessEvaluator<T extends Comparable<T>>

java.lang.Object
net.bmahe.genetics4j.gpu.GPUFitnessEvaluator<T>
Type Parameters:
T - the type of fitness values produced, must be comparable for selection operations
All Implemented Interfaces:
FitnessEvaluator<T>

public class GPUFitnessEvaluator<T extends Comparable<T>> extends Object implements FitnessEvaluator<T>
GPU-accelerated fitness evaluator that leverages OpenCL for high-performance evolutionary algorithm execution.

GPUFitnessEvaluator implements the core FitnessEvaluator interface to provide GPU acceleration for fitness computation in evolutionary algorithms. This evaluator manages the complete OpenCL lifecycle, from device discovery and kernel compilation to memory management and resource cleanup.

Key responsibilities include:

  • OpenCL initialization: Platform and device discovery, context creation, and kernel compilation
  • Resource management: Managing OpenCL contexts, command queues, programs, and kernels
  • Population partitioning: Distributing work across multiple OpenCL devices
  • Asynchronous execution: Coordinating concurrent GPU operations with CPU-side logic
  • Memory lifecycle: Ensuring proper cleanup of GPU resources

Architecture overview:

  1. Initialization (preEvaluation()): Discover platforms/devices, compile kernels, create contexts
  2. Evaluation (evaluate(long, java.util.List<net.bmahe.genetics4j.core.Genotype>)): Partition population, execute fitness computation on GPU
  3. Cleanup (postEvaluation()): Release all OpenCL resources and contexts

Multi-device support:

  • Device filtering: Selects devices based on user-defined criteria (type, capabilities)
  • Load balancing: Automatically distributes population across available devices
  • Parallel execution: Concurrent fitness evaluation on multiple GPUs or devices
  • Asynchronous coordination: Non-blocking execution with CompletableFuture-based results

Resource management patterns:

  • Lazy initialization: OpenCL resources created only when needed
  • Automatic cleanup: Guaranteed resource release through lifecycle methods
  • Error recovery: Robust handling of OpenCL errors and device failures
  • Memory optimization: Efficient GPU memory usage and transfer patterns

Example usage in GPU EA system:


 // GPU configuration with OpenCL kernel
 Program fitnessProgram = Program.ofResource("/kernels/optimization.cl");
 GPUEAConfiguration<Double> config = GPUEAConfigurationBuilder.<Double>builder()
     .program(fitnessProgram)
     .fitness(new MyGPUFitness())
     // ... other EA configuration
     .build();
 
 // Execution context with device preferences
 GPUEAExecutionContext<Double> context = GPUEAExecutionContextBuilder.<Double>builder()
     .populationSize(2000)
     .deviceFilter(device -> device.type() == DeviceType.GPU)
     .platformFilter(platform -> platform.profile() == PlatformProfile.FULL_PROFILE)
     .build();
 
 // Evaluator handles all OpenCL lifecycle automatically
 GPUFitnessEvaluator<Double> evaluator = new GPUFitnessEvaluator<>(context, config, executorService);
 
 // Used by EA system - lifecycle managed automatically
 EASystem<Double> system = EASystemFactory.from(config, context, executorService, evaluator);
 

Performance characteristics:

  • Initialization overhead: One-time setup cost for OpenCL compilation and context creation
  • Scalability: Performance scales with population size and problem complexity
  • Memory bandwidth: Optimal for problems with high computational intensity
  • Concurrency: Supports concurrent evaluation across multiple devices

Error handling:

  • Device failures: Graceful degradation when devices become unavailable
  • Memory errors: Proper cleanup and error reporting for GPU memory issues
  • Compilation errors: Clear error messages for kernel compilation failures
  • Resource leaks: Guaranteed cleanup even in exceptional circumstances
See Also:
  • Field Details

    • logger

      public static final org.apache.logging.log4j.Logger logger
    • gpuEAExecutionContext

      private final GPUEAExecutionContext<T extends Comparable<T>> gpuEAExecutionContext
    • gpuEAConfiguration

      private final GPUEAConfiguration<T extends Comparable<T>> gpuEAConfiguration
    • executorService

      private final ExecutorService executorService
    • selectedPlatformToDevice

      private List<org.apache.commons.lang3.tuple.Pair<Platform,Device>> selectedPlatformToDevice
    • clContexts

      final List<org.jocl.cl_context> clContexts
    • clCommandQueues

      final List<org.jocl.cl_command_queue> clCommandQueues
    • clPrograms

      final List<org.jocl.cl_program> clPrograms
    • clKernels

      final List<Map<String,org.jocl.cl_kernel>> clKernels
    • clExecutionContexts

      final List<OpenCLExecutionContext> clExecutionContexts
  • Constructor Details

    • GPUFitnessEvaluator

      public GPUFitnessEvaluator(GPUEAExecutionContext<T> _gpuEAExecutionContext, GPUEAConfiguration<T> _gpuEAConfiguration, ExecutorService _executorService)
      Constructs a GPU fitness evaluator with the specified configuration and execution context.

      Initializes the evaluator with GPU-specific configuration and execution parameters. The evaluator will use the provided executor service for coordinating asynchronous operations between CPU and GPU components.

      The constructor performs minimal initialization - the actual OpenCL setup occurs during preEvaluation() to follow the fitness evaluator lifecycle pattern.

      Parameters:
      _gpuEAExecutionContext - the GPU execution context with device filters and population settings
      _gpuEAConfiguration - the GPU EA configuration with OpenCL program and fitness function
      _executorService - the executor service for managing asynchronous operations
      Throws:
      IllegalArgumentException - if any parameter is null
  • Method Details

    • loadResource

      private String loadResource(String filename)
    • grabProgramSources

      private List<String> grabProgramSources()
    • preEvaluation

      public void preEvaluation()
      Initializes OpenCL resources and prepares GPU devices for fitness evaluation.

      This method performs the complete OpenCL initialization sequence:

      1. Platform discovery: Enumerates available OpenCL platforms
      2. Device filtering: Selects devices based on configured filters
      3. Context creation: Creates OpenCL contexts for selected devices
      4. Queue setup: Creates command queues with profiling and out-of-order execution
      5. Program compilation: Compiles OpenCL kernels from source code
      6. Kernel preparation: Creates kernel objects and queries execution info
      7. Fitness initialization: Calls lifecycle hooks on the fitness function

      Device selection process:

      • Applies platform filters to discovered OpenCL platforms
      • Enumerates devices for each qualifying platform
      • Applies device filters to select appropriate devices
      • Validates that at least one device is available

      The method creates separate OpenCL contexts for each selected device to enable concurrent execution and optimal resource utilization. Each context includes compiled programs and kernel objects ready for fitness evaluation.

      Specified by:
      preEvaluation in interface FitnessEvaluator<T extends Comparable<T>>
      Throws:
      IllegalStateException - if no compatible devices are found
      RuntimeException - if OpenCL initialization, program compilation, or kernel creation fails
    • evaluate

      public List<T> evaluate(long generation, List<Genotype> genotypes)
      Evaluates fitness for a population of genotypes using GPU acceleration.

      This method implements the core fitness evaluation logic by distributing the population across available OpenCL devices and executing fitness computation concurrently. The evaluation process follows these steps:

      1. Population partitioning: Divides genotypes across available devices
      2. Parallel dispatch: Submits evaluation tasks to each device asynchronously
      3. GPU execution: Executes OpenCL kernels for fitness computation
      4. Result collection: Gathers fitness values from all devices
      5. Result aggregation: Combines results preserving original order

      Load balancing strategy:

      • Automatically calculates partition size based on population and device count
      • Round-robin assignment of partitions to devices for balanced workload
      • Asynchronous execution allows devices to work at their optimal pace

      The method coordinates with the configured fitness function through lifecycle hooks:

      • beforeEvaluation(): Called before each device partition evaluation
      • compute(): Executes the actual GPU fitness computation
      • afterEvaluation(): Called after each device partition completes

      Concurrency and performance:

      • Multiple devices execute evaluation partitions concurrently
      • CompletableFuture-based coordination for non-blocking execution
      • Automatic workload distribution across available GPU resources
      Specified by:
      evaluate in interface FitnessEvaluator<T extends Comparable<T>>
      Parameters:
      generation - the current generation number for context and logging
      genotypes - the population of genotypes to evaluate
      Returns:
      fitness values corresponding to each genotype in the same order
      Throws:
      IllegalArgumentException - if genotypes is null or empty
      RuntimeException - if GPU evaluation fails or OpenCL errors occur
    • postEvaluation

      public void postEvaluation()
      Cleans up OpenCL resources and releases GPU memory after evaluation completion.

      This method performs comprehensive cleanup of all OpenCL resources in the proper order to prevent memory leaks and ensure clean shutdown. The cleanup sequence follows OpenCL best practices for resource deallocation:

      1. Fitness cleanup: Calls lifecycle hooks on the fitness function
      2. Kernel release: Releases all compiled kernel objects
      3. Program release: Releases compiled OpenCL programs
      4. Queue release: Releases command queues and pending operations
      5. Context release: Releases OpenCL contexts and associated memory
      6. Reference cleanup: Clears internal data structures and references

      Resource management guarantees:

      • All GPU memory allocations are properly released
      • OpenCL objects are released in dependency order to avoid errors
      • No resource leaks occur even if individual cleanup operations fail
      • Evaluator returns to a clean state ready for potential reinitialization

      The method coordinates with the configured fitness function to ensure any fitness-specific resources (buffers, textures, etc.) are also properly cleaned up through the afterAllEvaluations() lifecycle hooks.

      Specified by:
      postEvaluation in interface FitnessEvaluator<T extends Comparable<T>>
      Throws:
      RuntimeException - if cleanup operations fail (logged but not propagated to prevent interference with EA system shutdown)