net.bmahe.genetics4j.gpu.GPUFitnessEvaluator<T>

Type Parameters:: T - the type of fitness values produced, must be comparable for selection operations

All Implemented Interfaces:: FitnessEvaluator<T>

public class GPUFitnessEvaluator<T extends Comparable<T>> extends Object implements FitnessEvaluator<T>

GPU-accelerated fitness evaluator that leverages OpenCL for high-performance evolutionary algorithm execution.

GPUFitnessEvaluator implements the core FitnessEvaluator interface to provide GPU acceleration for fitness computation in evolutionary algorithms. This evaluator manages the complete OpenCL lifecycle, from device discovery and kernel compilation to memory management and resource cleanup.

Key responsibilities include:

OpenCL initialization: Platform and device discovery, context creation, and kernel compilation
Resource management: Managing OpenCL contexts, command queues, programs, and kernels
Population partitioning: Distributing work across multiple OpenCL devices
Asynchronous execution: Coordinating concurrent GPU operations with CPU-side logic
Memory lifecycle: Ensuring proper cleanup of GPU resources

Architecture overview:

Initialization (preEvaluation()): Discover platforms/devices, compile kernels, create contexts
Evaluation (evaluate(long, java.util.List<net.bmahe.genetics4j.core.Genotype>)): Partition population, execute fitness computation on GPU
Cleanup (postEvaluation()): Release all OpenCL resources and contexts

Multi-device support:

Device filtering: Selects devices based on user-defined criteria (type, capabilities)
Load balancing: Automatically distributes population across available devices
Parallel execution: Concurrent fitness evaluation on multiple GPUs or devices
Asynchronous coordination: Non-blocking execution with CompletableFuture-based results

Resource management patterns:

Lazy initialization: OpenCL resources created only when needed
Automatic cleanup: Guaranteed resource release through lifecycle methods
Error recovery: Robust handling of OpenCL errors and device failures
Memory optimization: Efficient GPU memory usage and transfer patterns

Example usage in GPU EA system:


 // GPU configuration with OpenCL kernel
 Program fitnessProgram = Program.ofResource("/kernels/optimization.cl");
 GPUEAConfiguration<Double> config = GPUEAConfigurationBuilder.<Double>builder()
     .program(fitnessProgram)
     .fitness(new MyGPUFitness())
     // ... other EA configuration
     .build();
 
 // Execution context with device preferences
 GPUEAExecutionContext<Double> context = GPUEAExecutionContextBuilder.<Double>builder()
     .populationSize(2000)
     .deviceFilter(device -> device.type() == DeviceType.GPU)
     .platformFilter(platform -> platform.profile() == PlatformProfile.FULL_PROFILE)
     .build();
 
 // Evaluator handles all OpenCL lifecycle automatically
 GPUFitnessEvaluator<Double> evaluator = new GPUFitnessEvaluator<>(context, config, executorService);
 
 // Used by EA system - lifecycle managed automatically
 EASystem<Double> system = EASystemFactory.from(config, context, executorService, evaluator);

Performance characteristics:

Initialization overhead: One-time setup cost for OpenCL compilation and context creation
Scalability: Performance scales with population size and problem complexity
Memory bandwidth: Optimal for problems with high computational intensity
Concurrency: Supports concurrent evaluation across multiple devices

Error handling:

Device failures: Graceful degradation when devices become unavailable
Memory errors: Proper cleanup and error reporting for GPU memory issues
Compilation errors: Clear error messages for kernel compilation failures
Resource leaks: Guaranteed cleanup even in exceptional circumstances

See Also:

Field Summary

Fields

Modifier and Type

Field

Description

(package private) final List<org.jocl.cl_command_queue>

clCommandQueues

(package private) final List<org.jocl.cl_context>

clContexts

(package private) final List<OpenCLExecutionContext>

clExecutionContexts

(package private) final List<Map<String,org.jocl.cl_kernel>>

clKernels

(package private) final List<org.jocl.cl_program>

clPrograms

private final ExecutorService

executorService

private final GPUEAConfiguration<T>

gpuEAConfiguration

private final GPUEAExecutionContext<T>

gpuEAExecutionContext

static final org.apache.logging.log4j.Logger

logger

private List<org.apache.commons.lang3.tuple.Pair<Platform,Device>>

selectedPlatformToDevice
Constructor Summary

Constructors

Constructor

Description

GPUFitnessEvaluator(GPUEAExecutionContext<T> _gpuEAExecutionContext, GPUEAConfiguration<T> _gpuEAConfiguration, ExecutorService _executorService)

Constructs a GPU fitness evaluator with the specified configuration and execution context.
Method Summary

Modifier and Type

Method

Description

List<T>

evaluate(long generation, List<Genotype> genotypes)

Evaluates fitness for a population of genotypes using GPU acceleration.

private List<String>

grabProgramSources()

private String

loadResource(String filename)

void

postEvaluation()

Cleans up OpenCL resources and releases GPU memory after evaluation completion.

void

preEvaluation()

Initializes OpenCL resources and prepares GPU devices for fitness evaluation.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- logger
  
  public static final org.apache.logging.log4j.Logger logger
- gpuEAExecutionContext
  
  private final GPUEAExecutionContext<T extends Comparable<T>> gpuEAExecutionContext
- gpuEAConfiguration
  
  private final GPUEAConfiguration<T extends Comparable<T>> gpuEAConfiguration
- executorService
  
  private final ExecutorService executorService
- selectedPlatformToDevice
  
  private List<org.apache.commons.lang3.tuple.Pair<Platform,Device>> selectedPlatformToDevice
- clContexts
  
  final List<org.jocl.cl_context> clContexts
- clCommandQueues
  
  final List<org.jocl.cl_command_queue> clCommandQueues
- clPrograms
  
  final List<org.jocl.cl_program> clPrograms
- clKernels
  
  final List<Map<String,org.jocl.cl_kernel>> clKernels
- clExecutionContexts
  
  final List<OpenCLExecutionContext> clExecutionContexts
Constructor Details
- GPUFitnessEvaluator
  
  public GPUFitnessEvaluator(GPUEAExecutionContext<T> _gpuEAExecutionContext, GPUEAConfiguration<T> _gpuEAConfiguration, ExecutorService _executorService)
  
  Constructs a GPU fitness evaluator with the specified configuration and execution context.
  Initializes the evaluator with GPU-specific configuration and execution parameters. The evaluator will use the provided executor service for coordinating asynchronous operations between CPU and GPU components.
  The constructor performs minimal initialization - the actual OpenCL setup occurs during preEvaluation() to follow the fitness evaluator lifecycle pattern.
  
  Parameters:
  
  _gpuEAExecutionContext - the GPU execution context with device filters and population settings
  
  _gpuEAConfiguration - the GPU EA configuration with OpenCL program and fitness function
  
  _executorService - the executor service for managing asynchronous operations
  
  Throws:
  
  IllegalArgumentException - if any parameter is null
Method Details
- loadResource
  
  private String loadResource(String filename)
- grabProgramSources
  
  private List<String> grabProgramSources()
- preEvaluation
  
  public void preEvaluation()
  Initializes OpenCL resources and prepares GPU devices for fitness evaluation.
  This method performs the complete OpenCL initialization sequence:
  
  Platform discovery: Enumerates available OpenCL platforms
  
  Device filtering: Selects devices based on configured filters
  
  Context creation: Creates OpenCL contexts for selected devices
  
  Queue setup: Creates command queues with profiling and out-of-order execution
  
  Program compilation: Compiles OpenCL kernels from source code
  
  Kernel preparation: Creates kernel objects and queries execution info
  
  Fitness initialization: Calls lifecycle hooks on the fitness function
  
  Device selection process:
  
  Applies platform filters to discovered OpenCL platforms
  
  Enumerates devices for each qualifying platform
  
  Applies device filters to select appropriate devices
  
  Validates that at least one device is available
  
  The method creates separate OpenCL contexts for each selected device to enable concurrent execution and optimal resource utilization. Each context includes compiled programs and kernel objects ready for fitness evaluation.
  Specified by:
  
  preEvaluation in interface FitnessEvaluator<T extends Comparable<T>>
  
  Throws:
  
  IllegalStateException - if no compatible devices are found
  
  RuntimeException - if OpenCL initialization, program compilation, or kernel creation fails
- evaluate
  
  public List<T> evaluate(long generation, List<Genotype> genotypes)
  Evaluates fitness for a population of genotypes using GPU acceleration.
  This method implements the core fitness evaluation logic by distributing the population across available OpenCL devices and executing fitness computation concurrently. The evaluation process follows these steps:
  
  Population partitioning: Divides genotypes across available devices
  
  Parallel dispatch: Submits evaluation tasks to each device asynchronously
  
  GPU execution: Executes OpenCL kernels for fitness computation
  
  Result collection: Gathers fitness values from all devices
  
  Result aggregation: Combines results preserving original order
  
  Load balancing strategy:
  
  Automatically calculates partition size based on population and device count
  
  Round-robin assignment of partitions to devices for balanced workload
  
  Asynchronous execution allows devices to work at their optimal pace
  
  The method coordinates with the configured fitness function through lifecycle hooks:
  
  beforeEvaluation(): Called before each device partition evaluation
  
  compute(): Executes the actual GPU fitness computation
  
  afterEvaluation(): Called after each device partition completes
  
  Concurrency and performance:
  
  Multiple devices execute evaluation partitions concurrently
  
  CompletableFuture-based coordination for non-blocking execution
  
  Automatic workload distribution across available GPU resources
  Specified by:
  
  evaluate in interface FitnessEvaluator<T extends Comparable<T>>
  
  Parameters:
  
  generation - the current generation number for context and logging
  
  genotypes - the population of genotypes to evaluate
  
  Returns:
  
  fitness values corresponding to each genotype in the same order
  
  Throws:
  
  IllegalArgumentException - if genotypes is null or empty
  
  RuntimeException - if GPU evaluation fails or OpenCL errors occur
- postEvaluation
  
  public void postEvaluation()
  Cleans up OpenCL resources and releases GPU memory after evaluation completion.
  This method performs comprehensive cleanup of all OpenCL resources in the proper order to prevent memory leaks and ensure clean shutdown. The cleanup sequence follows OpenCL best practices for resource deallocation:
  
  Fitness cleanup: Calls lifecycle hooks on the fitness function
  
  Kernel release: Releases all compiled kernel objects
  
  Program release: Releases compiled OpenCL programs
  
  Queue release: Releases command queues and pending operations
  
  Context release: Releases OpenCL contexts and associated memory
  
  Reference cleanup: Clears internal data structures and references
  
  Resource management guarantees:
  
  All GPU memory allocations are properly released
  
  OpenCL objects are released in dependency order to avoid errors
  
  No resource leaks occur even if individual cleanup operations fail
  
  Evaluator returns to a clean state ready for potential reinitialization
  
  The method coordinates with the configured fitness function to ensure any fitness-specific resources (buffers, textures, etc.) are also properly cleaned up through the afterAllEvaluations() lifecycle hooks.
  Specified by:
  
  postEvaluation in interface FitnessEvaluator<T extends Comparable<T>>
  
  Throws:
  
  RuntimeException - if cleanup operations fail (logged but not propagated to prevent interference with EA system shutdown)

Class GPUFitnessEvaluator<T extends Comparable<T>>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

logger

gpuEAExecutionContext

gpuEAConfiguration

executorService

selectedPlatformToDevice

clContexts

clCommandQueues

clPrograms

clKernels

clExecutionContexts

Constructor Details

GPUFitnessEvaluator

Method Details

loadResource

grabProgramSources

preEvaluation

evaluate

postEvaluation