Proxy-Oriented Thermal and Acoustic Intelligence for Cloud GPU Orchestration in AI-Guided Scientific Workflows
Keywords:
Cloud GPU performance, thermal and acoustic proxies, AI-guided simulation, proxy-based modelingAbstract
The accelerating convergence of artificial intelligence, cloud computing, and large-scale scientific simulation has fundamentally altered the operational, architectural, and epistemological foundations of high-performance computing. Contemporary workloads ranging from genome-scale language modeling and molecular dynamics to physics-informed reservoir simulation and data-driven optimization increasingly rely on cloud-hosted graphical processing units whose performance, reliability, and sustainability are mediated not only by digital metrics such as throughput and latency but also by physical phenomena such as thermal dissipation, acoustic vibration, and hardware-induced stochasticity. This article advances a comprehensive theoretical and methodological framework for understanding how proxy-based thermal and acoustic evaluation of cloud GPUs can be systematically integrated into AI-driven scientific workflows to improve computational efficiency, reliability, and scientific validity. Building on the seminal contribution of Lulla, Chandra, and Sirigiri, who demonstrated that thermal and acoustic proxies offer predictive insight into the performance and degradation patterns of cloud GPUs under AI training loads (Lulla et al., 2025), this study situates hardware-aware proxies within a broader ecosystem of task-based execution frameworks, data movement systems, surrogate modeling, and AI-guided simulation pipelines.
The paper synthesizes literature from high-performance computing, cloud services, molecular simulation, generative AI for materials and biology, and proxy-based optimization in engineering domains to argue that physical proxies represent an underexplored but theoretically powerful layer of observability for distributed AI workloads. Through an extended conceptual methodology grounded in heterogeneous computing theory, streaming AI–HPC coupling, and task-based performance modeling, the article proposes a multi-layer proxy architecture in which thermal and acoustic signals are interpreted as latent variables reflecting computational stress, scheduling inefficiency, and data movement bottlenecks. The Results section offers a detailed interpretive analysis of how such proxies can be aligned with existing performance suites, workflow engines, and AI-driven adaptive simulations to enable anticipatory scheduling, energy-aware orchestration, and fault-tolerant execution, drawing on studies of ProxyStore, TaPS, Globus Compute, and AI-guided biomolecular and materials design (Pauloski et al., 2024; Ward et al., 2023; Zvyagin et al., 2023).
References
Apache Kafka. 2024. https://kafka.apache.org/. Accessed Feb 2024.
Pauloski, J. G., Rydzy, K., Hayot-Sasson, V., Foster, I., and Chard, K. 2024. Accelerating Python Applications with Dask and ProxyStore. https://arxiv.org/abs/2410.12092.
Alpak, F. O., and Jain, V. 2021. Support-Vector Regression Accelerated Well Location Optimization: Algorithm, Validation, and Field Testing. Computational Geosciences, 25, 2033–2054.
Lulla, K. L., Chandra, R., and Sirigiri, K. S. 2025. Proxy-based thermal and acoustic evaluation of cloud GPUs for AI training workloads. The American Journal of Applied Sciences, 7(7), 111–127. https://doi.org/10.37547/tajas/Volume07Issue07-12
Godoy, W. F., Podhorszki, N., Wang, R., Atkins, C., Eisenhauer, G., Gu, J., Davis, P., Choi, J., Germaschewski, K., Huck, K., Huebl, A., Kim, M., Kress, J., Kurc, T., Liu, Q., Logan, J., Mehta, K., Ostrouchov, G., Parashar, M., Poeschel, F., Pugmire, D., Suchyta, E., Takahashi, K., Thompson, N., Tsutsumi, S., Wan, L., Wolf, M., Wu, K., and Klasky, S. 2020. ADIOS 2: The Adaptable Input Output System. A framework for high-performance data management. SoftwareX, 12, 100561.
Dharuman, G., Ward, L., Ma, H., Setty, P. V., Gokdemir, O., Foreman, S., Emani, M., Hippe, K., Brace, A., Keipert, K., Gibbs, T., Foster, I., Anandkumar, A., Vishwanath, V., and Ramanathan, A. 2023. Protein generation via genome-scale language models with bio-physical scoring. Proceedings of the SC ’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis.
Kolajoobi, R. A., Niri, M. E., Amini, S., and Haghshenas, Y. 2023. A Data-Driven Proxy Modeling Approach Adapted to Well Placement Optimization Problem. Journal of Energy Resources Technology, 145, 013401.
Lee, H., Turilli, M., Jha, S., Bhowmik, D., Ma, H., and Ramanathan, A. 2019. DeepDriveMD: Deep-learning driven adaptive molecular simulations for protein folding. IEEE/ACM Third Workshop on Deep Learning on Supercomputers.
Brace, A., Yakushin, I., Ma, H., Trifan, A., Munson, T., Foster, I., Ramanathan, A., Lee, H., Turilli, M., and Jha, S. 2022. Coupling streaming AI and HPC ensembles to achieve 100–1000x faster biomolecular simulations. IEEE International Parallel and Distributed Processing Symposium.
Mohaghegh, S. D. 2022. Smart Proxy Modeling. CRC Press, Boca Raton, FL.
Zvyagin, M., Brace, A., Hippe, K., Deng, Y., Zhang, B., Bohorquez, C. O., Clyde, A., Kale, B., Perez-Rivera, D., Ma, H., et al. 2023. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. The International Journal of High Performance Computing Applications, 37(6), 683–705.
Ward, L., Pauloski, J. G., Hayot-Sasson, V., Chard, R., Babuji, Y., Sivaraman, G., Choudhury, S., Chard, K., Thakur, R., and Foster, I. 2023. Cloud services enable efficient AI-guided simulation workflows across heterogeneous resources. Heterogeneity in Computing Workshop.
Pauloski, J. G., Hayot-Sasson, V., Gonthier, M., Hudson, N., Pan, H., Zhou, S., Foster, I., and Chard, K. 2024. TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks. IEEE 20th International Conference on e-Science.
Park, H., Yan, X., Zhu, R., Huerta, E. A., Chaudhuri, S., Cooper, D., Foster, I., and Tajkhorshid, E. 2024. A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture. Communications Chemistry, 7(1).
Foster, I. 2011. Globus Online: Accelerating and democratizing science through cloud-based services. IEEE Internet Computing, 15(3), 70–73.
Chard, K., Tuecke, S., and Foster, I. 2014. Efficient and secure transfer, synchronization, and sharing of big data. IEEE Cloud Computing, 1(3), 46–55.
Bauer, A., Pan, H., Chard, R., Babuji, Y., Bryan, J., Tiwari, D., Foster, I., and Chard, K. 2024. The Globus Compute Dataset: An open function-as-a-service dataset from the edge to the cloud. Future Generation Computer Systems, 153, 558–574.
Tang, H., and Durlofsky, L. J. 2024. Graph Network Surrogate Model for Subsurface Flow Optimization. Journal of Computational Physics, 512, 113132.
Qi, J., Liu, Y., Ju, Y., Zhang, K., Liu, L., Liu, Y., Xue, X., Zhang, L., Zhang, H., Wang, H., et al. 2023. A Transfer Learning Framework for Well Placement Optimization Based on Denoising Autoencoder. Geoenergy Science and Engineering, 222, 211446.
Zhuang, X., Wang, W., Su, Y., Yan, B., Li, Y., Li, L., and Hao, Y. 2024. Multi-Objective Optimization of Reservoir Development Strategy with Hybrid Artificial Intelligence Method. Expert Systems with Applications, 241, 122707.
Mendez, M. A. 2023. Linear and Nonlinear Dimensionality Reduction from Fluid Mechanics to Machine Learning. Measurement Science and Technology, 34, 042001.
Hintjens, P. 2013. ZeroMQ: Messaging for Many Applications. O’Reilly Media.
Redis. 2023. https://redis.io/. Accessed Mar 2023.
Snap Inc. 2023. KeyDB: A database built for scale. https://github.com/Snapchat/KeyDB. Accessed Mar 2023.
Copik, M., Bohringer, R., Calotoiu, A., and Hoefler, T. 2022. FMI: Fast and Cheap Message Passing for Serverless Functions. Scalable Parallel Computing Laboratory, ETH Zurich.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Marek Zielinski

This work is licensed under a Creative Commons Attribution 4.0 International License.