NVIDIA Utilizes TensorRT To Expand Generative AI Dominance

NVIDIA boosts AI performance using TensorRT.

NVIDIA Utilizes TensorRT To Expand Generative AI Dominance
NVIDIA is making use of its TensorRT to expand the company's generative AI dominance by boosting the performance of its artificial intelligence/machine learning and large language model tools. Ethan Miller/Getty Images

NVIDIA is utilizing TensorRT to boost its performance in its artificial intelligence/machine learning (AI/ML) and large language model (LLM) tools.

TensorRT and TensortRT-LLM were designed to optimize the performance of consumer GPUs and many of the best graphic cards in the market for running tasks, such as Stable Diffusion and Llama 2 text generation.

NVIDIA's Generative AI Dominance

Tests have been conducted on some of the software company's latest GPUs using TensorRT that found that the performance in Stable Diffusion was improved by up to 70%. The new TensorRT should become available for download on NVIDIA's GitHub page.

There has been a lot of movement in Stable Diffusion over the past year or so, including Automatic1111's webui. It initially only had support for NVIDIA GPUs under Windows. But since then, the number of forks and alternative text-to-image AI generation tools has quickly grown. Both AMD and Intel released more finely tuned libraries that have closed the gap somewhat with NVIDIA's performance, as per TomsHardware.

But now, NVIDIA is again stepping up to widen the gap with TensorRT. The basic premise is similar to what AMD and Intel have already done. This is by leveraging ONNX, an open format for AI and ML models and operators, so that the base Hugging Face stable Diffusion model gets converted into an ONNX format.

After that, users can further optimize performance for the specific GPU that they are using. It only takes a few minutes for TensorRT to tune things. However, after completion, consumers should get a substantial performance boost and better memory utilization.

TensorRT can speed inference by going through pre-trained information and calculating probabilities to produce a result. Experts said TensorRT-LLM's acceleration significantly improves the experience for more sophisticated LLM use, such as writing and coding assistants.

With this, NVIDIA can provide the GPUs that train and run LLMs and the software that would allow models to run and work faster. According to The Verge, users will not have to seek other ways to make generative AI cost-efficient.

Upgrading Artificial Intelligence

The software company also noted that TensorRT-LLM will become available publicly to anyone who wants to use or integrate it and can access the SDK on its site. NVIDIA currently has a near monopoly on the powerful chips that train LLMs, such as GPT-4.

To train one of these programs, there typically needs to be a lot of GPUs and demand has skyrocketed for the company's H100 GPUs. Estimated prices for the products have reached up to $40,000 per chip.

The director of product management and product marketing for Windows AI at NVIDIA, Jesse Clayton, said the company is currently at a big moment. He noted that the latest development is one of the most important moments in the history of technology, particularly for AI on PCs.

Clayton said that artificial intelligence delivers new experiences and unlocks creativity. He added that the technology makes it easier for people to get more things done in a shorter time, said VentureBeat.

Tags
Nvidia, Artificial intelligence, AI
Real Time Analytics