Pruning and Optimizing Large Language Models in an Era of GPU Scarcity

Date:

This talk, presented at the ICAI’24 as part of the 2024 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE’24), focuses on the critical issue of optimizing large language models (LLMs) during a time of GPU scarcity. The talk discusses novel pruning techniques, including “evolution of weights” and “smart pruning,” aimed at reducing the computational and environmental costs associated with training and deploying these models.

Watch the talk here

The abstract highlights the importance of network optimization in balancing performance with ecological responsibility, particularly for deep neural networks used in embedded devices. By evaluating parameter importance throughout the training process, the proposed methods achieve higher compression rates and faster computations with minimal loss in accuracy. The techniques have been successfully applied to LLMs with around 10 million parameters, and the experimental results are available for replication on GitHub.

Cite: Islam, A., & Belhaouari, S. B. (2022, September). Smart Pruning of Deep Neural Networks Using Curve Fitting and Evolution of Weights. In International Conference on Machine Learning, Optimization, and Data Science (pp. 62-76). Cham: Springer Nature Switzerland.