Pruning and Optimizing Large Language Models in an Era of GPU Scarcity
Conference proceedings talk, The 2024 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE'24) - ICAI'24 - The 26th Int'l Conf on Artificial Intelligence, Las Vegas, Nevada
This talk, presented at the ICAI’24 as part of the 2024 World Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE’24), focuses on the critical issue of optimizing large language models (LLMs) during a time of GPU scarcity. The talk discusses novel pruning techniques, including “evolution of weights” and “smart pruning,” aimed at reducing the computational and environmental costs associated with training and deploying these models.