Apple and Nvidia Partner to Speed Up AI Model Production
Sign up for ARPU: Stay ahead of the curve on tech news.
Apple's latest machine learning research, in collaboration with Nvidia, could significantly accelerate the creation of AI models for Apple Intelligence, reports AppleInsider. The partnership focuses on integrating Apple's "ReDrafter" technology into Nvidia's TensorRT-LLM inference acceleration framework, aiming to significantly enhance the efficiency of large language model (LLM) generation.
The challenge in creating LLMs lies in the resource-intensive and time-consuming training process. While hardware upgrades can address this, they often come with substantial financial and energy consumption burdens.
Earlier this year, Apple open-sourced "ReDrafter," a speculative decoding method that improves LLM training performance. ReDrafter utilizes a recurrent neural network (RNN) model to predict and verify draft tokens from multiple paths, resulting in up to a 3.5 times increase in token generation speed compared to traditional auto-regressive methods.
Nvidia GPUs are widely used in LLM generation, but their high cost presents a significant barrier. Apple and Nvidia's collaboration integrates ReDrafter into Nvidia's TensorRT-LLM, enabling developers using Nvidia GPUs to leverage ReDrafter's accelerated token generation for production-level LLMs.
Benchmarking a production model on Nvidia GPUs revealed a 2.7-fold increase in generated tokens per second for greedy encoding. This improvement could significantly reduce latency for users and minimize hardware requirements, leading to faster cloud-based query results and potentially lower operating costs for companies.
"This collaboration makes TensorRT-LLM more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them," Nvidia stated in its technical blog on the subject.
This announcement follows Apple's recent confirmation of its investigation into Amazon's Trainium2 chip for LLM training in Apple Intelligence. Apple anticipates a 50% efficiency improvement in pretraining using these chips compared to existing hardware.