Nowadays, clusters containing multiple GPU nodes are widely used to execute high-performance computing applications. Diverse disciplines use these clusters to improve the performance of several services that consume high computational resources. The challenge of executing high-performance computing applications becomes harder when the applications are executed concurrently and each one of them may demand multiple GPU nodes for different periods of time. To tackle this challenge, we propose a multi-agent architecture for scheduling multiple services in a heterogeneous GPU cluster. We provide simulation results of our agent-based system utilizing three commonly used scheduling heuristics for several configuration settings.