Normal view

There are new articles available, click to refresh the page.
Before yesterdayNVIDIA deep learning blog

What Is Retrieval-Augmented Generation?

15 November 2023 at 16:00

To understand the latest advance in generative AI, imagine a courtroom.

Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute —  requires special expertise, so judges send court clerks to a law library, looking for precedents and specific cases they can cite.

Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers that cite sources, the model needs an assistant to do some research.

The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.

The Story of the Name

Patrick Lewis, lead author of the 2020 paper that coined the term, apologized for the unflattering acronym that now describes a growing family of methods across hundreds of papers and dozens of commercial services he believes represent the future of generative AI.

Picture of Patrick Lewis, lead author of RAG paper
Patrick Lewis

“We definitely would have put more thought into the name had we known our work would become so widespread,” Lewis said in an interview from Singapore, where he was sharing his ideas with a regional conference of database developers.

“We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea,” said Lewis, who now leads a RAG team at AI startup Cohere.

So, What Is Retrieval-Augmented Generation?

Retrieval-augmented generation is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

In other words, it fills a gap in how LLMs work. Under the hood, LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences.

That deep understanding, sometimes called parameterized knowledge, makes LLMs useful in responding to general prompts at light speed. However, it does not serve users who want a deeper dive into a current or more specific topic.

Combining Internal, External Resources

Lewis and colleagues developed retrieval-augmented generation to link generative AI services to external resources, especially ones rich in the latest technical details.

The paper, with coauthors from the former Facebook AI Research (now Meta AI), University College London and New York University, called RAG “a general-purpose fine-tuning recipe” because it can be used by nearly any LLM to connect with practically any external resource.

Building User Trust

Retrieval-augmented generation gives models sources they can cite, like footnotes in a research paper, so users can check any claims. That builds trust.

What’s more, the technique can help models clear up ambiguity in a user query. It also reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.

Another great advantage of RAG is it’s relatively easy. A blog by Lewis and three of the paper’s coauthors said developers can implement the process with as few as five lines of code.

That makes the method faster and less expensive than retraining a model with additional datasets. And it lets users hot-swap new sources on the fly.

How People Are Using Retrieval-Augmented Generation 

With retrieval-augmented generation, users can essentially have conversations with data repositories, opening up new kinds of experiences. This means the applications for RAG could be multiple times the number of available datasets.

For example, a generative AI model supplemented with a medical index could be a great assistant for a doctor or nurse. Financial analysts would benefit from an assistant linked to market data.

In fact, almost any business can turn its technical or policy manuals, videos or logs into resources called knowledge bases that can enhance LLMs. These sources can enable use cases such as customer or field support, employee training and developer productivity.

The broad potential is why companies including AWS, IBM, Glean, Google, Microsoft, NVIDIA, Oracle and Pinecone are adopting RAG.

Getting Started With Retrieval-Augmented Generation 

To help users get started, NVIDIA developed a reference architecture for retrieval-augmented generation. It includes a sample chatbot and the elements users need to create their own applications with this new method.

The workflow uses NVIDIA NeMo, a framework for developing and customizing generative AI models, as well as software like NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM for running generative AI models in production.

The software components are all part of NVIDIA AI Enterprise, a software platform that accelerates development and deployment of production-ready AI with the security, support and stability businesses need.

Getting the best performance for RAG workflows requires massive amounts of memory and compute to move and process data. The NVIDIA GH200 Grace Hopper Superchip, with its 288GB of fast HBM3e memory and 8 petaflops of compute, is ideal — it can deliver a 150x speedup over using a CPU.

Once companies get familiar with RAG, they can combine a variety of off-the-shelf or custom LLMs with internal or external knowledge bases to create a wide range of assistants that help their employees and customers.

RAG doesn’t require a data center. LLMs are debuting on Windows PCs, thanks to NVIDIA software that enables all sorts of applications users can access even on their laptops.

Chart shows running RAG on a PC
An example application for RAG on a PC.

PCs equipped with NVIDIA RTX GPUs can now run some AI models locally. By using RAG on a PC, users can link to a private knowledge source – whether that be emails, notes or articles – to improve responses. The user can then feel confident that their data source, prompts and response all remain private and secure.

A recent blog provides an example of RAG accelerated by TensorRT-LLM for Windows to get better results fast.

The History of Retrieval-Augmented Generation 

The roots of the technique go back at least to the early 1970s. That’s when researchers in information retrieval prototyped what they called question-answering systems, apps that use natural language processing (NLP) to access text, initially in narrow topics such as baseball.

The concepts behind this kind of text mining have remained fairly constant over the years. But the machine learning engines driving them have grown significantly, increasing their usefulness and popularity.

In the mid-1990s, the Ask Jeeves service, now, popularized question answering with its mascot of a well-dressed valet. IBM’s Watson became a TV celebrity in 2011 when it handily beat two human champions on the Jeopardy! game show.

Picture of Ask Jeeves, an early RAG-like web service

Today, LLMs are taking question-answering systems to a whole new level.

Insights From a London Lab

The seminal 2020 paper arrived as Lewis was pursuing a doctorate in NLP at University College London and working for Meta at a new London AI lab. The team was searching for ways to pack more knowledge into an LLM’s parameters and using a benchmark it developed to measure its progress.

Building on earlier methods and inspired by a paper from Google researchers, the group “had this compelling vision of a trained system that had a retrieval index in the middle of it, so it could learn and generate any text output you wanted,” Lewis recalled.

Picture of IBM Watson winning on "Jeopardy" TV show, popularizing a RAG-like AI service
The IBM Watson question-answering system became a celebrity when it won big on the TV game show Jeopardy!

When Lewis plugged into the work in progress a promising retrieval system from another Meta team, the first results were unexpectedly impressive.

“I showed my supervisor and he said, ‘Whoa, take the win. This sort of thing doesn’t happen very often,’ because these workflows can be hard to set up correctly the first time,” he said.

Lewis also credits major contributions from team members Ethan Perez and Douwe Kiela, then of New York University and Facebook AI Research, respectively.

When complete, the work, which ran on a cluster of NVIDIA GPUs, showed how to make generative AI models more authoritative and trustworthy. It’s since been cited by hundreds of papers that amplified and extended the concepts in what continues to be an active area of research.

How Retrieval-Augmented Generation Works

At a high level, here’s how an NVIDIA technical brief describes the RAG process.

When users ask an LLM a question, the AI model sends the query to another model that converts it into a numeric format so machines can read it. The numeric version of the query is sometimes called an embedding or a vector.

NVIDIA diagram of how RAG works with LLMs
Retrieval-augmented generation combines LLMs with embedding models and vector databases.

The embedding model then compares these numeric values to vectors in a machine-readable index of an available knowledge base. When it finds a match or multiple matches, it retrieves the related data, converts it to human-readable words and passes it back to the LLM.

Finally, the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user, potentially citing sources the embedding model found.

Keeping Sources Current

In the background, the embedding model continuously creates and updates machine-readable indices, sometimes called vector databases, for new and updated knowledge bases as they become available.

Chart of a RAG process described by LangChain
Retrieval-augmented generation combines LLMs with embedding models and vector databases.

Many developers find LangChain, an open-source library, can be particularly useful in chaining together LLMs, embedding models and knowledge bases. NVIDIA uses LangChain in its reference architecture for retrieval-augmented generation.

The LangChain community provides its own description of a RAG process.

Looking forward, the future of generative AI lies in creatively chaining all sorts of LLMs and knowledge bases together to create new kinds of assistants that deliver authoritative results users can verify.

Get a hands on using retrieval-augmented generation with an AI chatbot in this NVIDIA LaunchPad lab.

NVIDIA artist's concept of retrieval-augmented generation aka RAG

Igniting the Future: TensorRT-LLM Release Accelerates AI Inference Performance, Adds Support for New Models Running on RTX-Powered Windows 11 PCs

15 November 2023 at 16:00

Artificial intelligence on Windows 11 PCs marks a pivotal moment in tech history, revolutionizing experiences for gamers, creators, streamers, office workers, students and even casual PC users.

It offers unprecedented opportunities to enhance productivity for users of the more than 100 million Windows PCs and workstations that are powered by RTX GPUs. And NVIDIA RTX technology is making it even easier for developers to create AI applications to change the way people use computers.

New optimizations, models and resources announced at Microsoft Ignite will help developers deliver new end-user experiences, quicker.

An upcoming update to TensorRT-LLM — open-source software that increases AI inference performance — will add support for new large language models and make demanding AI workloads more accessible on desktops and laptops with RTX GPUs starting at 8GB of VRAM.

TensorRT-LLM for Windows will soon be compatible with OpenAI’s popular Chat API through a new wrapper. This will enable hundreds of developer projects and applications to run locally on a PC with RTX, instead of in the cloud — so users can keep private and proprietary data on Windows 11 PCs.

Custom generative AI requires time and energy to maintain projects. The process can become incredibly complex and time-consuming, especially when trying to collaborate and deploy across multiple environments and platforms.

AI Workbench is a unified, easy-to-use toolkit that allows developers to quickly create, test and customize pretrained generative AI models and LLMs on a PC or workstation. It provides developers a single platform to organize their AI projects and tune models to specific use cases.

This enables seamless collaboration and deployment for developers to create cost-effective, scalable generative AI models quickly. Join the early access list to be among the first to gain access to this growing initiative and to receive future updates.

To support AI developers, NVIDIA and Microsoft will release DirectML enhancements to accelerate one of the most popular foundational AI models, Llama 2. Developers now have more options for cross-vendor deployment, in addition to setting a new standard for performance.

Portable AI

Last month, NVIDIA announced TensorRT-LLM for Windows, a library for accelerating LLM inference.

The next TensorRT-LLM release, v0.6.0 coming later this month, will bring improved inference performance — up to 5x faster — and enable support for additional popular LLMs, including the new Mistral 7B and Nemotron-3 8B. Versions of these LLMs will run on any GeForce RTX 30 Series and 40 Series GPU with 8GB of RAM or more, making fast, accurate, local LLM capabilities accessible even in some of the most portable Windows devices.

TensorRT-LLM V0.6 Windows Perf Chart
Up to 5X performance with the new TensorRT-LLM v0.6.0.

The new release of TensorRT-LLM will be available for install on the /NVIDIA/TensorRT-LLM GitHub repo. New optimized models will be available on

Conversing With Confidence 

Developers and enthusiasts worldwide use OpenAI’s Chat API for a wide range of applications — from summarizing web content and drafting documents and emails to analyzing and visualizing data and creating presentations.

One challenge with such cloud-based AIs is that they require users to upload their input data, making them impractical for private or proprietary data or for working with large datasets.

To address this challenge, NVIDIA is soon enabling TensorRT-LLM for Windows to offer a similar API interface to OpenAI’s widely popular ChatAPI, through a new wrapper, offering a similar workflow to developers whether they are designing models and applications to run locally on a PC with RTX or in the cloud. By changing just one or two lines of code, hundreds of AI-powered developer projects and applications can now benefit from fast, local AI. Users can keep their data on their PCs and not worry about uploading datasets to the cloud.

Perhaps the best part is that many of these projects and applications are open source, making it easy for developers to leverage and extend their capabilities to fuel the adoption of generative AI on Windows, powered by RTX.

The wrapper will work with any LLM that’s been optimized for TensorRT-LLM (for example, Llama 2, Mistral and NV LLM) and is being released as a reference project on GitHub, alongside other developer resources for working with LLMs on RTX.

Model Acceleration

Developers can now leverage cutting-edge AI models and deploy with a cross-vendor API. As part of an ongoing commitment to empower developers, NVIDIA and Microsoft have been working together to accelerate Llama on RTX via the DirectML API.

Building on the announcements for the fastest inference performance for these models announced last month, this new option for cross-vendor deployment makes it easier than ever to bring AI capabilities to PC.

Developers and enthusiasts can experience the latest optimizations by downloading the latest ONNX runtime and following the installation instructions from Microsoft, and installing the latest driver from NVIDIA, which will be available on Nov. 21.

These new optimizations, models and resources will accelerate the development and deployment of AI features and applications to the 100 million RTX PCs worldwide, joining the more than 400 partners shipping AI-powered apps and games already accelerated by RTX GPUs.

As models become even more accessible and developers bring more generative AI-powered functionality to RTX-powered Windows PCs, RTX GPUs will be critical for enabling users to take advantage of this powerful technology.

New Class of Accelerated, Efficient AI Systems Mark the Next Era of Supercomputing

13 November 2023 at 20:00

NVIDIA today unveiled at SC23 the next wave of technologies that will lift scientific and industrial research centers worldwide to new levels of performance and energy efficiency.

“NVIDIA hardware and software innovations are creating a new class of AI supercomputers,” said Ian Buck, vice president of the company’s high performance computing and hyperscale data center business, in a special address at the conference.

Some of the systems will pack memory-enhanced NVIDIA Hopper accelerators, others a new NVIDIA Grace Hopper systems architecture. All will use the expanded parallelism to run a full stack of accelerated software for generative AI, HPC and hybrid quantum computing.

Buck described the new NVIDIA HGX H200 as “the world’s leading AI computing platform.”

Image of H200 GPU system
NVIDIA H200 Tensor Core GPUs pack HBM3e memory to run growing generative AI models.

It packs up to 141GB of HBM3e, the first AI accelerator to use the ultrafast technology. Running models like GPT-3, NVIDIA H200 Tensor Core GPUs provide an 18x performance increase over prior-generation accelerators.

Among other generative AI benchmarks, they zip through 12,000 tokens per second on a Llama2-13B large language model (LLM).

Buck also revealed a server platform that links four NVIDIA GH200 Grace Hopper Superchips on an NVIDIA NVLink interconnect. The quad configuration puts in a single compute node a whopping 288 Arm Neoverse cores and 16 petaflops of AI performance with up to 2.3 terabytes of high-speed memory.

Image of quad GH200 server node
Server nodes based on the four GH200 Superchips will deliver 16 petaflops of AI performance.

Demonstrating its efficiency, one GH200 Superchip using the NVIDIA TensorRT-LLM open-source library is 100x faster than a dual-socket x86 CPU system and nearly 2x more energy efficient than an X86 + H100 GPU server.

“Accelerated computing is sustainable computing,” Buck said. “By harnessing the power of accelerated computing and generative AI, together we can drive innovation across industries while reducing our impact on the environment.”

NVIDIA Powers 38 of 49 New TOP500 Systems

The latest TOP500 list of the world’s fastest supercomputers reflects the shift toward accelerated, energy-efficient supercomputing.

Thanks to new systems powered by NVIDIA H100 Tensor Core GPUs, NVIDIA now delivers more than 2.5 exaflops of HPC performance across these world-leading systems, up from 1.6 exaflops in the May rankings. NVIDIA’s contribution on the top 10 alone reaches nearly an exaflop of HPC and 72 exaflops of AI performance.

The new list contains the highest number of systems ever using NVIDIA technologies, 379 vs. 372 in May, including 38 of 49 new supercomputers on the list.

Microsoft Azure leads the newcomers with its Eagle system using H100 GPUs in NDv5 instances to hit No. 3 with 561 petaflops. Mare Nostrum5 in Barcelona ranked No. 8, and NVIDIA Eos — which recently set new AI training records on the MLPerf benchmarks — came in at No. 9.

Showing their energy efficiency, NVIDIA GPUs power 23 of the top 30 systems on the Green500. And they retained the No. 1 spot with the H100 GPU-based Henri system, which delivers 65.09 gigaflops per watt for the Flatiron Institute in New York.

Gen AI Explores COVID

Showing what’s possible, the Argonne National Laboratory used NVIDIA BioNeMo, a generative AI platform for biomolecular LLMs, to develop GenSLMs, a model that can generate gene sequences that closely resemble real-world variants of the coronavirus. Using NVIDIA GPUs and data from 1.5 million COVID genome sequences, it can also rapidly identify new virus variants.

The work won the Gordon Bell special prize last year and was trained on supercomputers, including Argonne’s Polaris system, the U.S. Department of Energy’s Perlmutter and NVIDIA’s Selene.

It’s “just the tip of the iceberg — the future is brimming with possibilities, as generative AI continues to redefine the landscape of scientific exploration,” said Kimberly Powell, vice president of healthcare at NVIDIA, in the special address.

Saving Time, Money and Energy

Using the latest technologies, accelerated workloads can see an order-of-magnitude reduction in system cost and energy used, Buck said.

For example, Siemens teamed with Mercedes to analyze aerodynamics and related acoustics for its new electric EQE vehicles. The simulations that took weeks on CPU clusters ran significantly faster using the latest NVIDIA H100 GPUs. In addition, Hopper GPUs let them reduce costs by 3x and reduce energy consumption by 4x (below).

Chart showing the performance and energy efficiency of H100 GPUs

Switching on 200 Exaflops Beginning Next Year

Scientific and industrial advances will come from every corner of the globe where the latest systems are being deployed.

“We already see a combined 200 exaflops of AI on Grace Hopper supercomputers going to production 2024,” Buck said.

They include the massive JUPITER supercomputer at Germany’s Jülich center. It can deliver 93 exaflops of performance for AI training and 1 exaflop for HPC applications, while consuming only 18.2 megawatts of power.

Chart of deployed performance of supercomputers using NVIDIA GPUs through 2024
Research centers are poised to switch on a tsunami of GH200 performance.

Based on Eviden’s BullSequana XH3000 liquid-cooled system, JUPITER will use the NVIDIA quad GH200 system architecture and NVIDIA Quantum-2 InfiniBand networking for climate and weather predictions, drug discovery, hybrid quantum computing and digital twins. JUPITER quad GH200 nodes will be configured with 864GB of high-speed memory.

It’s one of several new supercomputers using Grace Hopper that NVIDIA announced at SC23.

The HPE Cray EX2500 system from Hewlett Packard Enterprise will use the quad GH200 to power many AI supercomputers coming online next year.

For example, HPE uses the quad GH200 to power OFP-II, an advanced HPC system in Japan shared by the University of Tsukuba and the University of Tokyo, as well as the DeltaAI system, which will triple computing capacity for the U.S. National Center for Supercomputing Applications.

HPE is also building the Venado system for the Los Alamos National Laboratory, the first GH200 to be deployed in the U.S. In addition, HPE is building GH200 supercomputers in the Middle East, Switzerland and the U.K.

Grace Hopper in Texas and Beyond

At the Texas Advanced Computing Center (TACC), Dell Technologies is building the Vista supercomputer with NVIDIA Grace Hopper and Grace CPU Superchips.

More than 100 global enterprises and organizations, including NASA Ames Research Center and Total Energies, have already purchased Grace Hopper early-access systems, Buck said.

They join previously announced GH200 users such as SoftBank and the University of Bristol, as well as the massive Leonardo system with 14,000 NVIDIA A100 GPUs that delivers 10 exaflops of AI performance for Italy’s Cineca consortium.

The View From Supercomputing Centers

Leaders from supercomputing centers around the world shared their plans and work in progress with the latest systems.

“We’ve been collaborating with MeteoSwiss ECMWP as well as scientists from ETH EXCLAIM and NVIDIA’s Earth-2 project to create an infrastructure that will push the envelope in all dimensions of big data analytics and extreme scale computing,” said Thomas Schultess, director of the Swiss National Supercomputing Centre of work on the Alps supercomputer.

“There’s really impressive energy-efficiency gains across our stacks,” Dan Stanzione, executive director of TACC, said of Vista.

It’s “really the stepping stone to move users from the kinds of systems we’ve done in the past to looking at this new Grace Arm CPU and Hopper GPU tightly coupled combination and … we’re looking to scale out by probably a factor of 10 or 15 from what we are deploying with Vista when we deploy Horizon in a couple years,” he said.

Accelerating the Quantum Journey

Researchers are also using today’s accelerated systems to pioneer a path to tomorrow’s supercomputers.

In Germany, JUPITER “will revolutionize scientific research across climate, materials, drug discovery and quantum computing,” said Kristel Michelson, who leads Julich’s research group on quantum information processing.

“JUPITER’s architecture also allows for the seamless integration of quantum algorithms with parallel HPC algorithms, and this is mandatory for effective quantum HPC hybrid simulations,” she said.

CUDA Quantum Drives Progress

The special address also showed how NVIDIA CUDA Quantum — a platform for programming CPUs, GPUs and quantum computers also known as QPUs — is advancing research in quantum computing.

For example, researchers at BASF, the world’s largest chemical company, pioneered a new hybrid quantum-classical method for simulating chemicals that can shield humans against harmful metals. They join researchers at Brookhaven National Laboratory and HPE who are separately pushing the frontiers of science with CUDA Quantum.

NVIDIA also announced a collaboration with Classiq, a developer of quantum programming tools, to create a life sciences research center at the Tel Aviv Sourasky Medical Center, Israel’s largest teaching hospital.  The center will use Classiq’s software and CUDA Quantum running on an NVIDIA DGX H100 system.

Separately, Quantum Machines will deploy the first NVIDIA DGX Quantum, a system using Grace Hopper Superchips, at the Israel National Quantum Center that aims to drive advances across scientific fields. The DGX system will be connected to a superconducting QPU by Quantware and a photonic QPU from ORCA Computing, both powered by CUDA Quantum.

Logos of NVIDIA CUDA Quantum partners

“In just two years, our NVIDIA quantum computing platform has amassed over 120 partners [above], a testament to its open, innovative platform,” Buck said.

Overall, the work across many fields of discovery reveals a new trend that combines accelerated computing at data center scale with NVIDIA’s full-stack innovation.

“Accelerated computing is paving the path for sustainable computing with advancements that provide not just amazing technology but a more sustainable and impactful future,” he concluded.

Watch NVIDIA’s SC23 special address below.


Image of JUPITER supercomputer in Germany

Gen AI for the Genome: LLM Predicts Characteristics of COVID Variants

13 November 2023 at 14:00

A widely acclaimed large language model for genomic data has demonstrated its ability to generate gene sequences that closely resemble real-world variants of SARS-CoV-2, the virus behind COVID-19.

Called GenSLMs, the model, which last year won the Gordon Bell special prize for high performance computing-based COVID-19 research, was trained on a dataset of nucleotide sequences — the building blocks of DNA and RNA. It was developed by researchers from Argonne National Laboratory, NVIDIA, the University of Chicago and a score of other academic and commercial collaborators.

When the researchers looked back at the nucleotide sequences generated by GenSLMs, they discovered that specific characteristics of the AI-generated sequences closely matched the real-world Eris and Pirola subvariants that have been prevalent this year — even though the AI was only trained on COVID-19 virus genomes from the first year of the pandemic.

“Our model’s generative process is extremely naive, lacking any specific information or constraints around what a new COVID variant should look like,” said Arvind Ramanathan, lead researcher on the project and a computational biologist at Argonne. “The AI’s ability to predict the kinds of gene mutations present in recent COVID strains — despite having only seen the Alpha and Beta variants during training — is a strong validation of its capabilities.”

In addition to generating its own sequences, GenSLMs can also classify and cluster different COVID genome sequences by distinguishing between variants. In a demo available on NGC, NVIDIA’s hub for accelerated software, users can explore visualizations of GenSLMs’ analysis of the evolutionary patterns of various proteins within the COVID viral genome.


Reading Between the Lines, Uncovering Evolutionary Patterns

A key feature of GenSLMs is its ability to interpret long strings of nucleotides — represented with sequences of the letters A, T, G and C in DNA, or A, U, G and C in RNA — in the same way an LLM trained on English text would interpret a sentence. This capability enables the model to understand the relationship between different areas of the genome, which in coronaviruses consists of around 30,000 nucleotides.

In the NGC demo, users can choose from among eight different COVID variants to understand how the AI model tracks mutations across various proteins of the viral genome. The visualization depicts evolutionary couplings across the viral proteins — highlighting which snippets of the genome are likely to be seen in a given variant.

“Understanding how different parts of the genome are co-evolving gives us clues about how the virus may develop new vulnerabilities or new forms of resistance,” Ramanathan said. “Looking at the model’s understanding of which mutations are particularly strong in a variant may help scientists with downstream tasks like determining how a specific strain can evade the human immune system.”


GenSLMs was trained on more than 110 million prokaryotic genome sequences and fine-tuned with a global dataset of around 1.5 million COVID viral sequences using open-source data from the Bacterial and Viral Bioinformatics Resource Center. In the future, the model could be fine-tuned on the genomes of other viruses or bacteria, enabling new research applications.

To train the model, the researchers used NVIDIA A100 Tensor Core GPU-powered supercomputers, including Argonne’s Polaris system, the U.S. Department of Energy’s Perlmutter and NVIDIA’s Selene.

The GenSLMs research team’s Gordon Bell special prize was awarded at last year’s SC22 supercomputing conference. At this week’s SC23, in Denver, NVIDIA is sharing a new range of groundbreaking work in the field of accelerated computing. View the full schedule and catch the replay of NVIDIA’s special address below.

NVIDIA Research comprises hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics. Learn more about NVIDIA Research and subscribe to NVIDIA healthcare news.

Main image courtesy of Argonne National Laboratory’s Bharat Kale. 

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. DOE Office of Science and the National Nuclear Security Administration. Research was supported by the DOE through the National Virtual Biotechnology Laboratory, a consortium of DOE national laboratories focused on response to COVID-19, with funding from the Coronavirus CARES Act.

Acing the Test: NVIDIA Turbocharges Generative AI Training in MLPerf Benchmarks

8 November 2023 at 17:00

NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks.

Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a training benchmark based on a GPT-3 model with 175 billion parameters trained on one billion tokens in just 3.9 minutes.

That’s a nearly 3x gain from 10.9 minutes, the record NVIDIA set when the test was introduced less than six months ago.

NVIDIA H100 training results over time on MLPerf benchmarks

The benchmark uses a portion of the full GPT-3 data set behind the popular ChatGPT service that, by extrapolation, Eos could now train in just eight days, 73x faster than a prior state-of-the-art system using 512 A100 GPUs.

The acceleration in training time reduces costs, saves energy and speeds time-to-market. It’s heavy lifting that makes large language models widely available so every business can adopt them with tools like NVIDIA NeMo, a framework for customizing LLMs.

In a new generative AI test ‌this round, 1,024 NVIDIA Hopper architecture GPUs completed a training benchmark based on the Stable Diffusion text-to-image model in 2.5 minutes, setting a high bar on this new workload.

By adopting these two tests, MLPerf reinforces its leadership as the industry standard for measuring AI performance, since generative AI is the most transformative technology of our time.

System Scaling Soars

The latest results were due in part to the use of the most accelerators ever applied to an MLPerf benchmark. The 10,752 H100 GPUs far surpassed the scaling in AI training in June, when NVIDIA used 3,584 Hopper GPUs.

The 3x scaling in GPU numbers delivered a 2.8x scaling in performance, a 93% efficiency rate thanks in part to software optimizations.

Efficient scaling is a key requirement in generative AI because LLMs are growing by an order of magnitude every year. The latest results show NVIDIA’s ability to meet this unprecedented challenge for even the world’s largest data centers.

Chart of near linear scaling of H100 GPUs on MLPerf training

The achievement is thanks to a full-stack platform of innovations in accelerators, systems and software that both Eos and Microsoft Azure used in the latest round.

Eos and Azure both employed 10,752 H100 GPUs in separate submissions. They achieved within 2% of the same performance, demonstrating the efficiency of NVIDIA AI in data center and public-cloud deployments.

Chart of record Azure scaling in MLPerf training

NVIDIA relies on Eos for a wide array of critical jobs. It helps advance initiatives like NVIDIA DLSS, AI-powered software for state-of-the-art computer graphics and NVIDIA Research projects like ChipNeMo, generative AI tools that help design next-generation GPUs.

Advances Across Workloads

NVIDIA set several new records in this round in addition to making advances in generative AI.

For example, H100 GPUs were 1.6x faster than the prior-round training recommender models widely employed to help users find what they’re looking for online. Performance was up 1.8x on RetinaNet, a computer vision model.

These increases came from a combination of advances in software and scaled-up hardware.

NVIDIA was once again the only company to run all MLPerf tests. H100 GPUs demonstrated the fastest performance and the greatest scaling in each of the nine benchmarks.

List of six new NVIDIA records in MLPerf training

Speedups translate to faster time to market, lower costs and energy savings for users training massive LLMs or customizing them with frameworks like NeMo for the specific needs of their business.

Eleven systems makers used the NVIDIA AI platform in their submissions this round, including ASUS, Dell Technologies, Fujitsu, GIGABYTE, Lenovo, QCT and Supermicro.

NVIDIA partners participate in MLPerf because they know it’s a valuable tool for customers evaluating AI platforms and vendors.

HPC Benchmarks Expand

In MLPerf HPC, a separate benchmark for AI-assisted simulations on supercomputers, H100 GPUs delivered up to twice the performance of NVIDIA A100 Tensor Core GPUs in the last HPC round. The results showed up to 16x gains since the first MLPerf HPC round in 2019.

The benchmark included a new test that trains OpenFold, a model that predicts the 3D structure of a protein from its sequence of amino acids. OpenFold can do in minutes vital work for healthcare that used to take researchers weeks or months.

Understanding a protein’s structure is key to finding effective drugs fast because most drugs act on proteins, the cellular machinery that helps control many biological processes.

In the MLPerf HPC test, H100 GPUs trained OpenFold in 7.5 minutes.  The OpenFold test is a representative part of the entire AlphaFold training process that two years ago took 11 days using 128 accelerators.

A version of the OpenFold model and the software NVIDIA used to train it will be available soon in NVIDIA BioNeMo, a generative AI platform for drug discovery.

Several partners made submissions on the NVIDIA AI platform in this round. They included Dell Technologies and supercomputing centers at Clemson University, the Texas Advanced Computing Center and — with assistance from Hewlett Packard Enterprise (HPE) — Lawrence Berkeley National Laboratory.

Benchmarks With Broad Backing

Since its inception in May 2018, the MLPerf benchmarks have enjoyed broad backing from both industry and academia. Organizations that support them include Amazon, Arm, Baidu, Google, Harvard, HPE, Intel, Lenovo, Meta, Microsoft, NVIDIA, Stanford University and the University of Toronto.

MLPerf tests are transparent and objective, so users can rely on the results to make informed buying decisions.

All the software NVIDIA used is available from the MLPerf repository, so all developers can get the same world-class results. These software optimizations get continuously folded into containers available on NGC, NVIDIA’s software hub for GPU applications.

Learn more about MLPerf and the details of this round.

How AI-Based Cybersecurity Strengthens Business Resilience

3 November 2023 at 15:00

The world’s 5 billion internet users and nearly 54 billion devices generate 3.4 petabytes of data per second, according to IDC. As digitalization accelerates, enterprise IT teams are under greater pressure to identify and block incoming cyber threats to ensure business operations and services are not interrupted — and AI-based cybersecurity provides a reliable way to do so.

Few industries appear immune to cyber threats. This year alone, international hotel chains, financial institutions, Fortune 100 retailers, air traffic-control systems and the U.S. government have all reported threats and intrusions.

Whether from insider error, cybercriminals, hacktivists or other threats, risks in the cyber landscape can damage an enterprise’s reputation and bottom line. A breach can paralyze operations, jeopardize proprietary and customer data, result in regulatory fines and destroy customer trust.

Using AI and accelerated computing, businesses can reduce the time and operational expenses required to detect and block cyber threats while freeing up resources to focus on core business value operations and revenue-generating activities.

Here’s a look at how industries are applying AI techniques to safeguard data, enable faster threat detection and mitigate attacks to ensure the consistent delivery of service to customers and partners.

Public Sector: Protecting Physical Security, Energy Security and Citizen Services

AI-powered analytics and automation tools are helping government agencies provide citizens with instant access to information and services, make data-driven decisions, model climate change, manage natural disasters, and more. But public entities managing digital tools and infrastructure face a complex cyber risk environment that includes regulatory compliance requirements, public scrutiny, large interconnected networks and the need to protect sensitive data and high-value targets.

Adversary nation-states may initiate cyberattacks to disrupt networks, steal intellectual property or swipe classified government documents. Internal misuse of digital tools and infrastructure combined with sophisticated external espionage places public organizations at high risk of data breach. Espionage actors have also been known to recruit inside help, with 16% of public administration breaches showing evidence of collusion. To protect critical infrastructure, citizen data, public records and other sensitive information, federal organizations are turning to AI.

The U.S. Department of Energy’s (DOE) Office of Cybersecurity, Energy Security and Emergency Response (CESER) is tasked with strengthening the resilience of the country’s energy sector by addressing emerging threats and improving energy infrastructure security. The DOE-CESER has invested more than $240 million in cybersecurity research, development and demonstration projects since 2010.

In one project, the department developed a tool that uses AI to automate and optimize security vulnerability and patch management in energy delivery systems. Another project for artificial diversity and defense security uses software-defined networks to enhance the situational awareness of energy delivery systems, helping ensure uninterrupted flows of energy.

The Defense Advanced Research Projects Agency (DARPA), which is charged with researching and investing in breakthrough technologies for national security, is using machine learning and AI in several areas. The DARPA CASTLE program trains AI to defend against advanced, persistent cyber threats. As part of the effort, researchers intend to accelerate cybersecurity assessments with approaches that are automated, repeatable and measurable. The DARPA GARD program builds platforms, libraries, datasets and training materials to help developers build AI models that are resistant to deception and adversarial attacks.

To keep up with an evolving threat landscape and ensure physical security, energy security and data security, public organizations must continue integrating AI to achieve a dynamic, proactive and far-reaching cyber defense posture.

Financial Services: Securing Digital Transactions, Payments and Portfolios 

Banks, asset managers, insurers and other financial service organizations are using AI and machine learning to deliver superior performance in fraud detection, portfolio management, algorithmic trading and self-service banking.

With constant digital transactions, payments, loans and investment trades, financial service institutions manage some of the largest, most complex and most sensitive datasets of any industry. Behind only the healthcare industry, these organizations suffer the second highest cost of a data breach, at nearly $6 million per incident. This cost grows if regulators issue fines or if recovery includes legal fees and lawsuit settlements. Worse still, lost business may never be recovered if trust can’t be repaired.

Banks and financial institutions use AI to improve insider threat detection, detect phishing and ransomware, and keep sensitive information safe.

FinSec Innovation Lab, a joint venture by Mastercard and Enel X, is using AI to help its customers defend against ransomware. Prior to working with FinSec, one card-processing customer suffered a LockBit ransomware attack in which 200 company servers were infected in just 1.5 hours. The company was forced to shut down servers and suspend operations, resulting in an estimated $7 million in lost business.

FinSec replicated this attack in its lab but deployed the NVIDIA Morpheus cybersecurity framework, NVIDIA DOCA software framework for intrusion detection and NVIDIA BlueField DPU computing clusters. With this mix of AI and accelerated computing, FinSec was able to detect the ransomware attack in less than 12 seconds, quickly isolate virtual machines and recover 80% of the data on infected servers. This type of real-time response helps businesses avoid service downtime and lost business while maintaining customer trust.

With AI to help defend against cyberattacks, financial institutions can identify intrusions and anticipate future threats to keep financial records, accounts and transactions secure.

Retail: Keeping Sales Channels and Payment Credentials Safe

Retailers are using AI to power personalized product recommendations, dynamic pricing and customized marketing campaigns. Multichannel digital platforms have made in-store and online shopping more convenient: up to 48% of consumers save a card on file with a merchant, significantly boosting card-not-present transactions. While digitization has brought convenience, it has also made sensitive data more accessible to attackers.

Sitting on troves of digital payment credentials for millions of customers, retailers are a prime target for cybercriminals looking to take advantage of security gaps. According to a recent Data Breach Investigations Report from Verizon, 37% of confirmed data disclosures in the retail industry resulted in stolen payment card data.

Malware attacks, ransomware and distributed denial of service attacks are all on the rise, but phishing remains the favored vector for an initial attack. With a successful phishing intrusion, criminals can steal credentials, access systems and launch ransomware.

Best Buy manages a network of more than 1,000 stores across the U.S. and Canada. With multichannel digital sales across both countries, protecting consumer information and transactions is critical. To defend against phishing and other cyber threats, Best Buy began using customized machine learning and NVIDIA Morpheus to better secure their infrastructure and inform their security analysts.

After deploying this AI-based cyber defense, the retail giant improved the accuracy of phishing detection to 96% while reducing false-positive rates. With a proactive approach to cybersecurity, Best Buy is protecting its reputation as a tech expert focused on customer needs.

From complex supply chains to third-party vendors and multichannel point-of-sale networks, expect retailers to continue integrating AI to protect operations as well as critical proprietary and customer data.

Smart Cities and Spaces: Protecting Critical Infrastructure and Transit Networks

IoT devices and AI that analyze movement patterns, traffic and hazardous situations have great potential to improve the safety and efficiency of spaces and infrastructure. But as airports, shipping ports, transit networks and other smart spaces integrate IoT and use data, they also become more vulnerable to attack.

In the last couple of years, there have been distributed denial of service (DDoS) attacks on airports and air traffic control centers and ransomware attacks on seaports, city municipalities, police departments and more. Attacks can paralyze information systems, ground flights, disrupt the flow of cargo and traffic, and delay the delivery of goods to markets. Hostile attacks could have far more serious consequences, including physical harm or loss of life.

In connected spaces, AI-driven security can analyze vast amounts of data to predict threats, isolate attacks and provide rapid self-healing after an intrusion. AI algorithms trained on emails can halt threats in the inbox and block phishing attempts like those that delivered ransomware to seaports earlier this year. Machine learning can be trained to recognize DDoS attack patterns to prevent the type of incoming malicious traffic that brought down U.S. airport websites last year.

Organizations adopting smart space technology, such as the Port of Los Angeles, are making efforts to get ahead of the threat landscape. In 2014, the Port of LA established a cybersecurity operations center and hired a dedicated cybersecurity team. In 2021, the port followed up with a cyber resilience center to enhance early-warning detection for cyberattacks that have the potential to impact the flow of cargo.

The U.S. Federal Aviation Administration has developed an AI certification framework that assesses the trustworthiness of AI and ML applications. The FAA also implements a zero-trust cyber approach, enforces strict access control and runs continuous verification across its digital environment.

By bolstering cybersecurity and integrating AI, smart space and transport infrastructure administrators can offer secure access to physical spaces and digital networks to protect the uninterrupted movement of people and goods.

Telecommunications: Ensure Network Resilience and Block Incoming Threats

Telecommunications companies are leaning into AI to power predictive maintenance and maximum network uptime, network optimization, equipment troubleshooting, call-routing and self-service systems.

The industry is responsible for critical national infrastructure in every country, supports over 5 billion customer endpoints and is expected to constantly deliver above 99% reliability. As reliance on cloud, IoT and edge computing expands and 5G becomes the norm, immense digital surface areas must be protected from misuse and malicious attack.

Telcos can deploy AI to ensure the security and resilience of networks. AI can monitor IoT devices and edge networks to detect anomalies and intrusions, identify fake users, mitigate attacks and quarantine infected devices. AI can continuously assess the trustworthiness of devices, users and applications, thereby shortening the time needed to identify fraudsters.

Pretrained AI models can be deployed to protect 5G networks from threats such as malware, data exfiltration and DOS attacks.

Using deep learning and NVIDIA BlueField DPUs, Palo Alto Networks has built a next-generation firewall addressing 5G needs, maximizing cybersecurity performance while maintaining a small infrastructure footprint. The DPU powers accelerated intelligent network filtering to parse, classify and steer traffic to improve performance and isolate threats. With more efficient computing that deploys fewer servers, telcos can maximize return on investment for compute investments and minimize digital attack surface areas.

By putting AI to work, telcos can build secure, encrypted networks to ensure network availability and data security for both individual and enterprise customers.

Automotive: Insulate Vehicle Software From Outside Influence and Attack 

Modern cars rely on complex AI and ML software stacks running on in-vehicle computers to process data from cameras and other sensors. These vehicles are essentially giant, moving IoT devices — they perceive the environment, make decisions, advise drivers and even control the vehicle with autonomous driving features.

Like other connected devices, autonomous vehicles are susceptible to various types of cyberattacks. Bad actors can infiltrate and compromise AV software both on board and from third-party providers. Denial of service attacks can disrupt over-the-air software updates that vehicles rely on to operate safely. Unauthorized access to communications systems like onboard WiFi, Bluetooth or RFID can expose vehicle systems to the risk of remote manipulation and data theft. This can jeopardize geolocation and sensor data, operational data, driver and passenger data, all of which are crucial to functional safety and the driving experience.

AI-based cybersecurity can help monitor in-car and network activities in real time, allowing for rapid response to threats. AI can be deployed to secure and authenticate over-the-air updates to prevent tampering and ensure the authenticity of software updates. AI-driven encryption can protect data transmitted over WiFi, Bluetooth and RFID connections. AI can also probe vehicle systems for vulnerabilities and take remedial steps.

Ranging from AI-powered access control to unlock and start vehicles to detecting deviations in sensor performance and patching security vulnerabilities, AI will play a crucial role in the safe development and deployment of autonomous vehicles on our roads.

Keeping Operations Secure and Customers Happy With AI Cybersecurity 

By deploying AI to protect valuable data and digital operations, industries can focus their resources on innovating better products, improving customer experiences and creating new business value.

NVIDIA offers a number of tools and frameworks to help enterprises swiftly adjust to the evolving cyber risk environment. The NVIDIA Morpheus cybersecurity framework provides developers and software vendors with optimized, easy-to-use tools to build solutions that can proactively detect and mitigate threats while drastically reducing the cost of cyber defense operations. To help defend against phishing attempts, the NVIDIA spear phishing detection AI workflow uses NVIDIA Morpheus and synthetic training data created with the NVIDIA NeMo generative AI framework to flag and halt inbox threats.

The Morpheus SDK also enables digital fingerprinting to collect and analyze behavior characteristics for every user, service, account and machine across a network to identify atypical behavior and alert network operators. With the NVIDIA DOCA software framework, developers can create software-defined, DPU-accelerated services, while leveraging zero trust to build more secure applications.

AI-based cybersecurity empowers developers across industries to build solutions that can identify, capture and act on threats and anomalies to ensure business continuity and uninterrupted service, keeping operations safe and customers happy.

Learn how AI can help your organization achieve a proactive cybersecurity posture to protect customer and proprietary data to the highest standards.

Turing’s Mill: AI Supercomputer Revs UK’s Economic Engine

1 November 2023 at 16:04

The home of the first industrial revolution just made a massive investment in the next one.

The U.K. government has announced it will spend £225 million ($273 million) to build one of the world’s fastest AI supercomputers.

Called Isambard-AI, it’s the latest in a series of systems named after a legendary 19th century British engineer and hosted by the University of Bristol. When fully installed next year, it will pack 5,448 NVIDIA GH200 Grace Hopper Superchips to deliver a whopping 21 exaflops of AI performance for researchers across the country and beyond.

The announcement was made at the AI Safety Summit, a gathering of over 100 global government and technology leaders, held in Bletchley Park, the site of the world’s first digital programmable computer, which reflected the work of innovators like Alan Turing, considered the father of AI.

AI “will bring a transformation as far-reaching as the industrial revolution, the coming of electricity or the birth of the internet,” said British Prime Minister Rishi Sunak in a speech last week about the event, designed to catalyze international collaboration.

Propelling the Modern Economy

Like one of Isambard Brunel’s creations — the first propeller-driven, ocean-going iron ship — the AI technology running on his namesake is already driving countries forward.

AI contributes more than £3.7 billion to the U.K. economy and employs more than 50,000 people, said Michelle Donelan, the nation’s Science, Innovation and Technology Secretary, in an earlier announcement about the system.

The investment in the so-called AI Research Resource in Bristol “will catalyze scientific discovery and keep the U.K. at the forefront of AI development,” she said.

Like AI itself, the system will be used across a wide range of organizations tapping the potential of machine learning to advance robotics, data analytics, drug discovery, climate research and more.

“Isambard-AI represents a huge leap forward for AI computational power in the U.K.,” said Simon McIntosh-Smith, a Bristol professor and director of the Isambard National Research Facility. “Today, Isambard-AI would rank within the top 10 fastest supercomputers in the world and, when in operation later in 2024, it will be one of the most powerful AI systems for open science anywhere.”

The Next Manufacturing Revolution

Like the industrial revolution, AI promises advances in manufacturing. That’s one reason why Isambard-AI will be based at the National Composites Centre (NCC, pictured above) in the Bristol and Bath Science Park, one of the country’s seven manufacturing research centers.

The U.K.’s Frontier AI Taskforce, a research group leading a global effort on how frontier AI can be safely developed, will also be a major user of the system.

Hewlett Packard Enterprise, which is building Isambard-AI, is also collaborating with the University of Bristol on energy-efficiency plans that support net-zero carbon targets mandated by the British government.

Energy-Efficient HPC

A second system coming next year to the NCC will show Arm’s energy efficiency for non-accelerated high performance computing workloads.

Isambard-3 will deliver an estimated 2.7 petaflops of FP64 peak performance and consume less than 270 kilowatts of power, ranking it among the world’s three greenest non-accelerated supercomputers. That’s because the system — part of a research alliance among universities of Bath, Bristol, Cardiff and Exeter — will sport 384 Arm-based NVIDIA Grace CPU Superchips to power medical and scientific research.

“Isambard-3’s application performance efficiency of up to 6x its predecessor, which rivals many of the 50 fastest TOP500 systems, will provide scientists with a revolutionary new supercomputing platform to advance groundbreaking research,” said Bristol’s McIntosh-Smith, when the system was announced in March.

Site for Isambard-AI supercomputer at the University of Bristol

Unlocking the Power of Language: NVIDIA’s Annamalai Chockalingam on the Rise of LLMs

1 November 2023 at 13:00

Generative AI and large language models are stirring change across industries — but according to NVIDIA Senior Product Manager of Developer Marketing Annamalai Chockalingam, “we’re still in the early innings.”

In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Chockalingam about LLMs: what they are, their current state and their future potential.

Describing LLMs as a “subset of the larger generative AI movement,” Chockalingam says they can do five things with language: generate, summarize, translate, instruct or chat. With a combination of “these modalities and actions, you can build applications” to solve any problem, he said.

Enterprises are tapping LLMs to drive innovation, develop new customer experiences and gain a competitive advantage, he said. They’re also exploring what safe deployment of those models looks like, aiming to achieve responsible development, trustworthiness and repeatability.

New techniques like retrieval augmented generation could boost LLM development. RAG involves feeding models with up-to-date data sources or third-party APIs to achieve “more appropriate responses,” Chockalingam said — granting them current context so that they can “generate better” answers.

Chockalingam encourages those interested in LLMs to “get your hands dirty and get started” — whether that means using popular applications like ChatGPT or playing with pretrained models in the NVIDIA NGC catalog.NVIDIA offers a full-stack computing platform for developers and enterprises experimenting with LLMs, with an ecosystem of over 4 million developers and 1,600 generative AI organizations. To learn more, register for LLM Developer Day on Nov. 17 to hear from NVIDIA experts about how best to develop applications.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.


Silicon Volley: Designers Tap Generative AI for a Chip Assist

30 October 2023 at 16:00

A research paper released today describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors.

The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity.

Few pursuits are as challenging as semiconductor design. Under a microscope, a state-of-the-art chip like an NVIDIA H100 Tensor Core GPU (above) looks like a well-planned metropolis, built with tens of billions of transistors, connected on streets 10,000x thinner than a human hair.

Multiple engineering teams coordinate for as long as two years to construct one of these digital megacities.

Some groups define the chip’s overall architecture, some craft and place a variety of ultra-small circuits, and others test their work. Each job requires specialized methods, software programs and computer languages.

A Broad Vision for LLMs

“I believe over time large language models will help all the processes, across the board,” said Mark Ren, an NVIDIA Research director and lead author on the paper.

Bill Dally, NVIDIA’s chief scientist, announced the paper today in a keynote at the International Conference on Computer-Aided Design, an annual gathering of hundreds of engineers working in the field called electronic design automation, or EDA.

“This effort marks an important first step in applying LLMs to the complex work of designing semiconductors,” said Dally at the event in San Francisco. “It shows how even highly specialized fields can use their internal data to train useful generative AI models.”

ChipNeMo Surfaces

The paper details how NVIDIA engineers created for their internal use a custom LLM, called ChipNeMo, trained on the company’s internal data to generate and optimize software and assist human designers.

Long term, engineers hope to apply generative AI to each stage of chip design, potentially reaping significant gains in overall productivity, said Ren, whose career spans more than 20 years in EDA.

After surveying NVIDIA engineers for possible use cases, the research team chose three to start: a chatbot, a code generator and an analysis tool.

Initial Use Cases

The latter — a tool that automates the time-consuming tasks of maintaining updated descriptions of known bugs — has been the most well-received so far.

A prototype chatbot that responds to questions about GPU architecture and design helped many engineers quickly find technical documents in early tests.

Animation of a generative AI code generator using an LLM
A code generator will help designers write software for a chip design.

A code generator in development (demonstrated above)  already creates snippets of about 10-20 lines of software in two specialized languages chip designers use. It will be integrated with existing tools, so engineers have a handy assistant for designs in progress.

Customizing AI Models With NVIDIA NeMo

The paper mainly focuses on the team’s work gathering its design data and using it to create a specialized generative AI model, a process portable to any industry.

As its starting point, the team chose a foundation model and customized it with NVIDIA NeMo, a framework for building, customizing and deploying generative AI models that’s included in the NVIDIA AI Enterprise software platform. The selected NeMo model sports 43 billion parameters, a measure of its capability to understand patterns. It was trained using more than a trillion tokens, the words and symbols in text and software.

Diagram of the ChipNeMo workflow for training a custom model
ChipNeMo provides an example of how one deeply technical team refined a pretrained model with its own data.

The team then refined the model in two training rounds, the first using about 24 billion tokens worth of its internal design data and the second on a mix of about 130,000 conversation and design examples.

The work is among several examples of research and proofs of concept of generative AI in the semiconductor industry, just beginning to emerge from the lab.

Sharing Lessons Learned

One of the most important lessons Ren’s team learned is the value of customizing an LLM.

On chip-design tasks, custom ChipNeMo models with as few as 13 billion parameters match or exceed performance of even much larger general-purpose LLMs like LLaMA2 with 70 billion parameters. In some use cases, ChipNeMo models were dramatically better.

Along the way, users need to exercise care in what data they collect and how they clean it for use in training, he added.

Finally, Ren advises users to stay abreast of the latest tools that can speed and simplify the work.

NVIDIA Research has hundreds of scientists and engineers worldwide focused on topics such as AI, computer graphics, computer vision, self-driving cars and robotics. Other recent projects in semiconductors include using AI to design smaller, faster circuits and to optimize placement of large blocks.

Enterprises looking to build their own custom LLMs can get started today using NeMo framework available from GitHub and NVIDIA NGC catalog.

Diagram of NVIDIA Hopper GPU

Next-Gen Neural Networks: NVIDIA Research Announces Array of AI Advancements at NeurIPS

25 October 2023 at 13:00

NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences.

Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer vision and more. Among the innovations NVIDIA Research will present are new techniques for transforming text to images, photos to 3D avatars, and specialized robots into multi-talented machines.

“NVIDIA Research continues to drive progress across the field — including generative AI models that transform text to images or speech, autonomous AI agents that learn new tasks faster, and neural networks that calculate complex physics,” said Jan Kautz, vice president of learning and perception research at NVIDIA. “These projects, often done in collaboration with leading minds in academia, will help accelerate developers of virtual worlds, simulations and autonomous machines.”

Picture This: Improving Text-to-Image Diffusion Models

Diffusion models have become the most popular type of generative AI models to turn text into realistic imagery. NVIDIA researchers have collaborated with universities on multiple projects advancing diffusion models that will be presented at NeurIPS.

  • A paper accepted as an oral presentation focuses on improving generative AI models’ ability to understand the link between modifier words and main entities in text prompts. While existing text-to-image models asked to depict a yellow tomato and a red lemon may incorrectly generate images of yellow lemons and red tomatoes, the new model analyzes the syntax of a user’s prompt, encouraging a bond between an entity and its modifiers to deliver a more faithful visual depiction of the prompt.
  • SceneScape, a new framework using diffusion models to create long videos of 3D scenes from text prompts, will be presented as a poster. The project combines a text-to-image model with a depth prediction model that helps the videos maintain plausible-looking scenes with consistency between the frames — generating videos of art museums, haunted houses and ice castles (pictured above).
  • Another poster describes work that improves how text-to-image models generate concepts rarely seen in training data. Attempts to generate such images usually result in low-quality visuals that aren’t an exact match to the user’s prompt. The new method uses a small set of example images that help the model identify good seeds — random number sequences that guide the AI to generate images from the specified rare classes.
  • A third poster shows how a text-to-image diffusion model can use the text description of an incomplete point cloud to generate missing parts and create a complete 3D model of the object. This could help complete point cloud data collected by lidar scanners and other depth sensors for robotics and autonomous vehicle AI applications. Collected imagery is often incomplete because objects are scanned from a specific angle — for example, a lidar sensor mounted to a vehicle would only scan one side of each building as the car drives down a street.

Character Development: Advancements in AI Avatars

AI avatars combine multiple generative AI models to create and animate virtual characters, produce text and convert it to speech. Two NVIDIA posters at NeurIPS present new ways to make these tasks more efficient.

  • A poster describes a new method to turn a single portrait image into a 3D head avatar while capturing details including hairstyles and accessories. Unlike current methods that require multiple images and a time-consuming optimization process, this model achieves high-fidelity 3D reconstruction without additional optimization during inference. The avatars can be animated either with blendshapes, which are 3D mesh representations used to represent different facial expressions, or with a reference video clip where a person’s facial expressions and motion are applied to the avatar.
  • Another poster by NVIDIA researchers and university collaborators advances zero-shot text-to-speech synthesis with P-Flow, a generative AI model that can rapidly synthesize high-quality personalized speech given a three-second reference prompt. P-Flow features better pronunciation, human likeness and speaker similarity compared to recent state-of-the-art counterparts. The model can near-instantly convert text to speech on a single NVIDIA A100 Tensor Core GPU.

Research Breakthroughs in Reinforcement Learning, Robotics

In the fields of reinforcement learning and robotics, NVIDIA researchers will present two posters highlighting innovations that improve the generalizability of AI across different tasks and environments.

  • The first proposes a framework for developing reinforcement learning algorithms that can adapt to new tasks while avoiding the common pitfalls of gradient bias and data inefficiency. The researchers showed that their method — which features a novel meta-algorithm that can create a robust version of any meta-reinforcement learning model — performed well on multiple benchmark tasks.
  • Another by an NVIDIA researcher and university collaborators tackles the challenge of object manipulation in robotics. Prior AI models that help robotic hands pick up and interact with objects can handle specific shapes but struggle with objects unseen in the training data. The researchers introduce a new framework that estimates how objects across different categories are geometrically alike — such as drawers and pot lids that have similar handles — enabling the model to more quickly generalize to new shapes.

Supercharging Science: AI-Accelerated Physics, Climate, Healthcare

NVIDIA researchers at NeurIPS will also present papers across the natural sciences — covering physics simulations, climate models and AI for healthcare.

  • To accelerate computational fluid dynamics for large-scale 3D simulations, a team of NVIDIA researchers proposed a neural operator architecture that combines accuracy and computational efficiency to estimate the pressure field around vehicles — the first deep learning-based computational fluid dynamics method on an industry-standard, large-scale automotive benchmark. The method achieved 100,000x acceleration on a single NVIDIA Tensor Core GPU compared to another GPU-based solver, while reducing the error rate. Researchers can incorporate the model into their own applications using the open-source neuraloperator library.


  • A consortium of climate scientists and machine learning researchers from universities, national labs, research institutes, Allen AI and NVIDIA collaborated on ClimSim, a massive dataset for physics and machine learning-based climate research that will be shared in an oral presentation at NeurIPS. The dataset covers the globe over multiple years at high resolution — and machine learning emulators built using that data can be plugged into existing operational climate simulators to improve their fidelity, accuracy and precision. This can help scientists produce better predictions of storms and other extreme events.
  • NVIDIA Research interns are presenting a poster introducing an AI algorithm that provides personalized predictions of the effects of medicine dosage on patients. Using real-world data, the researchers tested the model’s predictions of blood coagulation for patients given different dosages of a treatment. They also analyzed the new algorithm’s predictions of the antibiotic vancomycin levels in patients who received the medication — and found that prediction accuracy significantly improved compared to prior methods.

NVIDIA Research comprises hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Street View to the Rescue: Deep Learning Paves the Way to Safer Buildings

23 October 2023 at 20:30

Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang.

He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster damage by providing the information needed for decision-makers to bolster building structures or perform post-disaster recovery.

After a natural disaster such as an earthquake, local governments send teams to check and evaluate building conditions. Manually done, it can take up to months to go through the full stock of a city.

Wang’s project uses AI to accelerate the evaluation process — cutting the time needed to a few hours. The AI model is trained using images sourced from Google Street View and local governments to assign scores to buildings based on Federal Emergency Management Agency (FEMA) P-154 standards, which provide assessment guidelines based on factors like wall material, structure type, building age and more. Wang also collaborated with the World Bank Global Program for Resilient Housing to collect images and perform annotations, which were used to improve the model.

The collected images are placed in a data repository. The AI model reads the repository and performs inference on the images — a process accelerated by NVIDIA DGX A100 systems.

“Without NVIDIA GPUs, we wouldn’t have been able to do this,” Wang said. “They significantly accelerate the process, ensuring timely results.”

Wang used the DGX A100 nodes in the University of Florida’s supercomputer, HiPerGator. HiPerGator is one of the world’s fastest AI supercomputers in academia, delivering 700 petaflops of AI performance, and was built with the support of NVIDIA founder and UF alumnus Chris Malachowsky and hardware, software, training and services from NVIDIA.

The AI model’s output is compiled into a database that feeds into a web portal, which shows information — including the safety assessment score, building type and even roof or wall material — in a map-based format.

Wang’s work was funded by the NVIDIA Applied Research Accelerator Program, which supports research projects that have the potential to make a real-world impact through the deployment of NVIDIA-accelerated applications adopted by commercial and government organizations.

A Helping Eye

Wang says that the portal can serve different needs depending on the use case. To prepare for a natural disaster, a government can use predictions solely from street view images.

“Those are static images — one example is Google Street View images, which get updated every several years,” he said. “But that’s good enough for collecting information and getting a general understanding about certain statistics.”

But for rural areas or developing regions, where such images aren’t available or not frequently updated, governments can collect the images themselves. Powered by NVIDIA GPUs, the timely delivery of building assessments can help accelerate analyses.

Wang also suggests that with enough refinement, his research could also create ripples for the urban planning and insurance industries.

The project is currently being tested by a few local governments in Mexico and is garnering interest in some African, Asian and South American countries. At its current state, it can achieve over 85% accuracy in its assessment scores, per ‌FEMA P-154 standards.

Survey of the Land

One challenge Wang cites is the variation in urban landscapes in different countries. Different regions have their own cultural and architectural styles. Not trained on a large or diverse enough pool of images, the AI model could be thrown off by factors like paint color when performing wall material analysis. Another challenge is urban density variation.

“It is a very general limitation of current AI technology,” Wang said. “In order to be useful, it requires enough training data to represent the distribution of the real world, so we’re putting efforts into the data collection process to solve the generalization issue.”

To overcome this challenge, Wang aims to train and test the model for more cities. So far, he’s tested about eight cities in different countries.

“We need to generate more detailed and high-quality annotations to train the model with,” he said. “That is the way we can improve the model in the future so that it can be used more widely.”

Wang’s goal is to get the project to a point where it can be deployed as a service for more general industry use.

“We are creating application programming interfaces that can estimate and analyze buildings and households to allow seamless integration with other products,” he said. “We are also building a user-friendly application that all government agencies and organizations can use.”

Eureka! NVIDIA Research Breakthrough Puts New Spin on Robot Learning

By: Angie Lee
20 October 2023 at 13:00

A new AI agent developed by NVIDIA Research that can teach robots complex skills has trained a robotic hand to perform rapid pen-spinning tricks — for the first time as well as a human can.

The stunning prestidigitation, showcased in the video above, is one of nearly 30 tasks that robots have learned to expertly accomplish thanks to Eureka, which autonomously writes reward algorithms to train bots.

Eureka has also taught robots to open drawers and cabinets, toss and catch balls, and manipulate scissors, among other tasks.

The Eureka research, published today, includes a paper and the project’s AI algorithms, which developers can experiment with using NVIDIA Isaac Gym, a physics simulation reference application for reinforcement learning research. Isaac Gym is built on NVIDIA Omniverse, a development platform for building 3D tools and applications based on the OpenUSD framework. Eureka itself is powered by the GPT-4 large language model.

“Reinforcement learning has enabled impressive wins over the last decade, yet many challenges still exist, such as reward design, which remains a trial-and-error process,” said Anima Anandkumar, senior director of AI research at NVIDIA and an author of the Eureka paper. “Eureka is a first step toward developing new algorithms that integrate generative and reinforcement learning methods to solve hard tasks.”

AI Trains Robots

Eureka-generated reward programs — which enable trial-and-error learning for robots — outperform expert human-written ones on more than 80% of tasks, according to the paper. This leads to an average performance improvement of more than 50% for the bots.

Robot arm taught by Eureka to open a drawer.

The AI agent taps the GPT-4 LLM and generative AI to write software code that rewards robots for reinforcement learning. It doesn’t require task-specific prompting or predefined reward templates — and readily incorporates human feedback to modify its rewards for results more accurately aligned with a developer’s vision.

Using GPU-accelerated simulation in Isaac Gym, Eureka can quickly evaluate the quality of large batches of reward candidates for more efficient training.

Eureka then constructs a summary of the key stats from the training results and instructs the LLM to improve its generation of reward functions. In this way, the AI is self-improving. It’s taught all kinds of robots — quadruped, bipedal, quadrotor, dexterous hands, cobot arms and others — to accomplish all kinds of tasks.

The research paper provides in-depth evaluations of 20 Eureka-trained tasks, based on open-source dexterity benchmarks that require robotic hands to demonstrate a wide range of complex manipulation skills.

The results from nine Isaac Gym environments are showcased in visualizations generated using NVIDIA Omniverse.

Humanoid robot learns a running gait via Eureka.

“Eureka is a unique combination of large language models and NVIDIA GPU-accelerated simulation technologies,” said Linxi “Jim” Fan, senior research scientist at NVIDIA, who’s one of the project’s contributors. “We believe that Eureka will enable dexterous robot control and provide a new way to produce physically realistic animations for artists.”

It’s breakthrough work bound to get developers’ minds spinning with possibilities, adding to recent NVIDIA Research advancements like Voyager, an AI agent built with GPT-4 that can autonomously play Minecraft.

NVIDIA Research comprises hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Learn more about Eureka and NVIDIA Research.

NVIDIA AI Now Available in Oracle Cloud Marketplace

19 October 2023 at 19:00

Training generative AI models just got easier.

NVIDIA DGX Cloud AI supercomputing platform and NVIDIA AI Enterprise software are now available in Oracle Cloud Marketplace, making it possible for Oracle Cloud Infrastructure customers to access high-performance accelerated computing and software to run secure, stable and supported production AI in just a few clicks.

The addition — an industry first — brings new capabilities for end-to-end development and deployment on Oracle Cloud. Enterprises can get started from the Oracle Cloud Marketplace to train models on DGX Cloud, and then deploy their applications on OCI with NVIDIA AI Enterprise.

Oracle Cloud and NVIDIA Lift Industries Into Era of AI

Thousands of enterprises around the world rely on OCI to power the applications that drive their businesses. Its customers include leaders across industries such as healthcare, scientific research, financial services, telecommunications and more.

Oracle Cloud Marketplace is a catalog of solutions that offers customers flexible consumption models and simple billing. Its addition of DGX Cloud and NVIDIA AI Enterprise lets OCI customers use their existing cloud credits to integrate NVIDIA’s leading AI supercomputing platform and software into their development and deployment pipelines.

With DGX Cloud, OCI customers can train models for generative AI applications like intelligent chatbots, search, summarization and content generation.

The University at Albany, in upstate New York, recently launched its AI Plus initiative, which is integrating teaching and learning about AI across the university’s research and academic enterprise, in fields such as cybersecurity, weather prediction, health data analytics, drug discovery and next-generation semiconductor design. It will also foster collaborations across the humanities, social sciences, public policy and public health. The university is using DGX Cloud AI supercomputing instances on OCI as it builds out an on-premises supercomputer.

“We’re accelerating our mission to infuse AI into virtually every academic and research disciplines,” said Thenkurussi (Kesh) Kesavadas, vice president for research and economic development at UAlbany. “We will drive advances in healthcare, security and economic competitiveness, while equipping students for roles in the evolving job market.”

NVIDIA AI Enterprise brings the software layer of the NVIDIA AI platform to OCI. It includes NVIDIA NeMo frameworks for building LLMs, NVIDIA RAPIDS for data science and NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server for supercharging production AI. NVIDIA software for cybersecurity, computer vision, speech AI and more is also included. Enterprise-grade support, security and stability ensure a smooth transition of AI projects from pilot to production.

NVIDIA DGX Cloud generative AI training
NVIDIA DGX Cloud provides enterprises immediate access to AI supercomputing platform and software hosted by their preferred cloud provider.

AI Supercomputing Platform Hosted by OCI

NVIDIA DGX Cloud provides enterprises immediate access to an AI supercomputing platform and software.

Hosted by OCI, DGX Cloud provides enterprises with access to multi-node training on NVIDIA GPUs, paired with NVIDIA AI software, for training advanced models for generative AI and other groundbreaking applications.

Each DGX Cloud instance consists of eight NVIDIA Tensor Core GPUs interconnected with network fabric, purpose-built for multi-node training. This high-performance computing architecture also includes industry-leading AI development software and offers direct access to NVIDIA AI expertise so businesses can train LLMs faster.

OCI customers access DGX Cloud using NVIDIA Base Command Platform, which gives developers access to an AI supercomputer through a web browser. By providing a single-pane view of the customer’s AI infrastructure, Base Command Platform simplifies the management of multinode clusters.

NVIDIA AI Enterprise software
NVIDIA AI Enterprise software powers secure, stable and supported production AI and data science.

Software for Secure, Stable and Supported Production AI

NVIDIA AI Enterprise enables rapid development and deployment of AI and data science.

With NVIDIA AI Enterprise on Oracle Cloud Marketplace, enterprises can efficiently build an application once and deploy it on OCI and their on-prem infrastructure, making a multi- or hybrid-cloud strategy cost-effective and easy to adopt. Since NVIDIA AI Enterprise is also included in NVIDIA DGX Cloud, customers can streamline the transition from training on DGX Cloud to deploying their AI application into production with NVIDIA AI Enterprise on OCI, since the AI software runtime is consistent across the environments.

Qualified customers can purchase NVIDIA AI Enterprise and NVIDIA DGX Cloud with their existing Oracle Universal Credits.

Visit NVIDIA AI Enterprise and NVIDIA DGX Cloud on the Oracle Cloud Marketplace to get started today.

Making Machines Mindful: NYU Professor Talks Responsible AI

18 October 2023 at 13:00

Artificial intelligence is now a household term. Responsible AI is hot on its heels.

Julia Stoyanovich, associate professor of computer science and engineering at NYU and director of the university’s Center for Responsible AI, wants to make the terms “AI” and “responsible AI” synonymous.

In the latest episode of the NVIDIA AI Podcast, host Noah Kravitz ‌spoke with Stoyanovich about responsible AI, her advocacy efforts and how people can help.

Stoyanovich started her work at the Center for Responsible AI with basic research. She soon realized that what was needed were better guardrails, not just more algorithms.

As AI’s potential has grown, along with the ethical concerns surrounding its use, Stoyanovich clarifies that the “responsibility” lies with people, not AI.

“The responsibility refers to people taking responsibility for the decisions that we make individually and collectively about whether to build an AI system and how to build, test, deploy and keep it in check,” she said.

AI ethics is a related concern, used to refer to “the embedding of moral values and principles into the design, development and use of the AI,” she added.

Lawmakers have taken notice. For example, New York recently implemented a law that makes job candidate screening more transparent.

According to Stoyanovich, “the law is not perfect,” but “we can only learn how to regulate something if we try regulating” and converse openly with the “people at the table being impacted.”

Stoyanovich wants two things: for people to recognize that AI can’t predict human choices and that AI systems be transparent and accountable, carrying a “nutritional label.”

That process should include considerations on who is using AI tools, how they’re used to make decisions and who is subjected to those decisions, she said.

Stoyanovich urges people to “start demanding actions and explanations to understand” how AI is used at local, state and federal levels.

“We need to teach ourselves to help others learn about what AI is and why we should care,” she said. “So please get involved in how we govern ourselves, because we live in a democracy. We have to step up.”

You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games
A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry
Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
Luis Voloch talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunesGoogle PodcastsGoogle PlayCastbox, DoggCatcher, OvercastPlayerFM, Pocket Casts, PodbayPodBean, PodCruncher, PodKicker, SoundcloudSpotifyStitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.


Striking Performance: Large Language Models up to 4x Faster on RTX With TensorRT-LLM for Windows

17 October 2023 at 13:00

Generative AI is one of the most important trends in the history of personal computing, bringing advancements to gaming, creativity, video, productivity, development and more.

And GeForce RTX and NVIDIA RTX GPUs, which are packed with dedicated AI processors called Tensor Cores, are bringing the power of generative AI natively to more than 100 million Windows PCs and workstations.

Today, generative AI on PC is getting up to 4x faster via TensorRT-LLM for Windows, an open-source library that accelerates inference performance for the latest AI large language models, like Llama 2 and Code Llama. This follows the announcement of TensorRT-LLM for data centers last month.

NVIDIA has also released tools to help developers accelerate their LLMs, including scripts that optimize custom models with TensorRT-LLM, TensorRT-optimized open-source models and a developer reference project that showcases both the speed and quality of LLM responses.

TensorRT acceleration is now available for Stable Diffusion in the popular Web UI by Automatic1111 distribution. It speeds up the generative AI diffusion model by up to 2x over the previous fastest implementation.

Plus, RTX Video Super Resolution (VSR) version 1.5 is available as part of today’s Game Ready Driver release — and will be available in the next NVIDIA Studio Driver, releasing early next month.

Supercharging LLMs With TensorRT

LLMs are fueling productivity — engaging in chat, summarizing documents and web content, drafting emails and blogs — and are at the core of new pipelines of AI and other software that can automatically analyze data and generate a vast array of content.

TensorRT-LLM, a library for accelerating LLM inference, gives developers and end users the benefit of LLMs that can now operate up to 4x faster on RTX-powered Windows PCs.

At higher batch sizes, this acceleration significantly improves the experience for more sophisticated LLM use — like writing and coding assistants that output multiple, unique auto-complete results at once. The result is accelerated performance and improved quality that lets users select the best of the bunch.

TensorRT-LLM acceleration is also beneficial when integrating LLM capabilities with other technology, such as in retrieval-augmented generation (RAG), where an LLM is paired with a vector library or vector database. RAG enables the LLM to deliver responses based on a specific dataset, like user emails or articles on a website, to provide more targeted answers.

To show this in practical terms, when the question “How does NVIDIA ACE generate emotional responses?” was asked of the LLaMa 2 base model, it returned an unhelpful response.

Better responses, faster.

Conversely, using RAG with recent GeForce news articles loaded into a vector library and connected to the same Llama 2 model not only returned the correct answer — using NeMo SteerLM — but did so much quicker with TensorRT-LLM acceleration. This combination of speed and proficiency gives users smarter solutions.

TensorRT-LLM will soon be available to download from the NVIDIA Developer website. TensorRT-optimized open source models and the RAG demo with GeForce news as a sample project are available at and

Automatic Acceleration

Diffusion models, like Stable Diffusion, are used to imagine and create stunning, novel works of art. Image generation is an iterative process that can take hundreds of cycles to achieve the perfect output. When done on an underpowered computer, this iteration can add up to hours of wait time.

TensorRT is designed to accelerate AI models through layer fusion, precision calibration, kernel auto-tuning and other capabilities that significantly boost inference efficiency and speed. This makes it indispensable for real-time applications and resource-intensive tasks.

And now, TensorRT doubles the speed of Stable Diffusion.

Compatible with the most popular distribution, WebUI from Automatic1111, Stable Diffusion with TensorRT acceleration helps users iterate faster and spend less time waiting on the computer, delivering a final image sooner. On a GeForce RTX 4090, it runs 7x faster than the top implementation on Macs with an Apple M2 Ultra. The extension is available for download today.

The TensorRT demo of a Stable Diffusion pipeline provides developers with a reference implementation on how to prepare diffusion models and accelerate them using TensorRT. This is the starting point for developers interested in turbocharging a diffusion pipeline and bringing lightning-fast inferencing to applications.

Video That’s Super

AI is improving everyday PC experiences for all users. Streaming video — from nearly any source, like YouTube, Twitch, Prime Video, Disney+ and countless others — is among the most popular activities on a PC. Thanks to AI and RTX, it’s getting another update in image quality.

RTX VSR is a breakthrough in AI pixel processing that improves the quality of streamed video content by reducing or eliminating artifacts caused by video compression. It also sharpens edges and details.

Available now, RTX VSR version 1.5 further improves visual quality with updated models, de-artifacts content played in its native resolution and adds support for RTX GPUs based on the NVIDIA Turing architecture — both professional RTX and GeForce RTX 20 Series GPUs.

Retraining the VSR AI model helped it learn to accurately identify the difference between subtle details and compression artifacts. As a result, AI-enhanced images more accurately preserve details during the upscaling process. Finer details are more visible, and the overall image looks sharper and crisper.

RTX Video Super Resolution v1.5 improves detail and sharpness.

New with version 1.5 is the ability to de-artifact video played at the display’s native resolution. The original release only enhanced video when it was being upscaled. Now, for example, 1080p video streamed to a 1080p resolution display will look smoother as heavy artifacts are reduced.

RTX VSR now de-artifacts video played at its native resolution.

RTX VSR 1.5 is available today for all RTX users in the latest Game Ready Driver. It will be available in the upcoming NVIDIA Studio Driver, scheduled for early next month.

RTX VSR is among the NVIDIA software, tools, libraries and SDKs — like those mentioned above, plus DLSS, Omniverse, AI Workbench and others — that have helped bring over 400 AI-enabled apps and games to consumers.

The AI era is upon us. And RTX is supercharging at every step in its evolution.

UK Tech Festival Showcases Startups Using AI for Creative Industries

12 October 2023 at 19:58

At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest.

The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the U.K. authority on advanced digital technology — in collaboration with NVIDIA.

The challenge, which ran for four months, supported companies in developing a prototype or extending an innovation that could transform experiences using reality capture, real-time collaboration and creation, or cross-platform content delivery.

It’s part of MyWorld, an initiative for pioneering creative technology focused on the western U.K.

Each selected startup was given £50,000 to help develop projects that foster the advancement of generative AI, digital twins and other groundbreaking technologies for use in creative industries.

Lux Aeterna Explores Generative AI for Visual Effects

Emmy Award-winning independent visual effects studio Lux Aeterna — which is using gen AI and neural networks for VFX production — deployed its funds to develop a generative AI-powered text-to-image toolkit for creating maps, or 2D images used to represent aspects of a scene, object or effect.

At the Bristol Technology Festival, Lux Aeterna demonstrated this technology, powered by NVIDIA RTX 40 Series GPUs, with a focus on its ability to generate parallax occlusion maps, a method of creating the effect of depth for 3D textured surfaces.

“Our goal is to tackle the unique VFX challenges with bespoke AI-assisted solutions, and to put these tools of the future into the hands of our talented artists,” said James Pollock, creative technologist at Lux Aeterna. “NVIDIA’s insightful feedback on our work as a part of the MyWorld challenge has been invaluable in informing our strategy toward innovation in this rapidly changing space.”

Meaning Machine Brings AI to Game Characters, Dialogue

Meaning Machine, a studio pioneering gameplay that uses natural language AI, used its funds from the challenge to develop a generative AI system for in-game characters and dialogue. Its Game Consciousness technology enables in-game characters to accurately talk about their world, in real time, so that every line of dialogue reflects the game developer’s creative vision.

Meaning Machine’s demo at today’s showcase invited attendees to experience its interrogation game, “Dead Meat,” in which players must chat with an in-game character — a murder suspect — with the aim of manipulating them into giving a confession.

A member of the NVIDIA Inception program for cutting-edge startups, Meaning Machine powers its generative AI technology for game development using the NVIDIA NeMo framework for building, customizing and deploying large language models.

“NVIDIA NeMo enables us to deliver scalable model tuning and inference,” said Ben Ackland, cofounder and chief technology officer at Meaning Machine. “We see potential for Game Consciousness to transform blockbuster games — delivering next-gen characters that feel at home in bigger, deeper, more complex virtual worlds — and our collaboration with NVIDIA will help us make this a reality sooner.”

More Startups Showcase AI for Creative Industries

Additional challenge participants that hosted demos today at the Bristol Technology Festival include:

  • Black Laboratory, an NVIDIA Inception member demonstrating a live puppet-performance capture system, puppix, that can seamlessly transfer the physicality of puppets to digital characters.
  • IMPRESS, which is developing an AI-powered launchpad for self-publishing indie video games. It offers data-driven market research for game development, marketing campaign support, press engagement tools and more.
  • Larkhall, which is expanding Otto, its AI system that generates live, reactive visuals based on musical performances, as well as automatic, expressive captioning for speech-based performances.
  • Motion Impossible, which is building a software platform for centralized control of its AGITO systems — free-roaming, modular, camera dolly systems for filmmaking.
  • Zubr and Uninvited Guests, two companies collaborating on the development of augmented- and virtual-reality tools for designing futuristic urban environments.

“NVIDIA’s involvement in the MyWorld challenge, led by Digital Catapult, has created extraordinary value for the participating teams,” said Sarah Addezio, senior innovation partner and MyWorld program lead at Digital Catapult. “We’ve seen the benefit of our cohort having access to industry-leading technical and business-development expertise, elevating their projects in ways that would not have been possible otherwise.”

Learn more about NVIDIA Inception and NVIDIA generative AI technologies.

Take the Wheel: NVIDIA NeMo SteerLM Lets Companies Customize a Model’s Responses During Inference

Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations.

NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it lets a single training run create one model that can serve dozens or even hundreds of use cases, saving time and money.

NVIDIA researchers created SteerLM to teach AI models what users care about, like road signs to follow in their particular use cases or markets. These user-defined attributes can gauge nearly anything — for example, the degree of helpfulness or humor in the model’s responses.

One Model, Many Uses

The result is a new level of flexibility.

With SteerLM, users define all the attributes they want and embed them in a single model. Then they can choose the combination they need for a given use case while the model is running.

For example, a custom model can now be tuned during inference to the unique needs of, say, an accounting, sales or engineering department or a vertical market.

The method also enables a continuous improvement cycle. Responses from a custom model can serve as data for a future training run that dials the model into new levels of usefulness.

Saving Time and Money

To date, fitting a generative AI model to the needs of a specific application has been the equivalent of rebuilding an engine’s transmission. Developers had to painstakingly label datasets, write lots of new code, adjust the hyperparameters under the hood of the neural network and retrain the model several times.

SteerLM replaces those complex, time-consuming processes with three simple steps:

  • Using a basic set of prompts, responses and desired attributes, customize an AI model that predicts how those attributes will perform.
  • Automatically generating a dataset using this model.
  • Training the model with the dataset using standard supervised fine-tuning techniques.

Many Enterprise Use Cases

Developers can adapt SteerLM to nearly any enterprise use case that requires generating text.

With SteerLM, a company might produce a single chatbot it can tailor in real time to customers’ changing attitudes, demographics or circumstances in the many vertical markets or geographies it serves.

SteerLM also enables a single LLM to act as a flexible writing co-pilot for an entire corporation.

For example, lawyers can modify their model during inference to adopt a formal style for their legal communications. Or marketing staff can dial in a more conversational style for their audience.

Game On With SteerLM

To show the potential of SteerLM, NVIDIA demonstrated it on one of its classic applications — gaming (see the video below).

Today, some games pack dozens of non-playable characters — characters that the player can’t control — which mechanically repeat prerecorded text, regardless of the user or situation.

SteerLM makes these characters come alive, responding with more personality and emotion to players’ prompts. It’s a tool game developers can use to unlock unique new experiences for every player.

The Genesis of SteerLM

The concept behind the new method arrived unexpectedly.

“I woke up early one morning with this idea, so I jumped up and wrote it down,” recalled Yi Dong, an applied research scientist at NVIDIA who initiated the work on SteerLM.

While building a prototype, he realized a popular model-conditioning technique could also be part of the method. Once all the pieces came together and his experiment worked, the team helped articulate the method in four simple steps.

It’s the latest advance in model customization, a hot area in AI research.

“It’s a challenging field, a kind of holy grail for making AI more closely reflect a human perspective — and I love a new challenge,” said the researcher, who earned a Ph.D. in computational neuroscience at Johns Hopkins University, then worked on machine learning algorithms in finance before joining NVIDIA.

Get Hands on the Wheel

SteerLM is available as open-source software for developers to try out today. They can also get details on how to experiment with a Llama-2-13b model customized using the SteerLM method.

For users who want full enterprise security and support, SteerLM will be integrated into NVIDIA NeMo, a rich framework for building, customizing and deploying large generative AI models.

The SteerLM method works on all models supported on NeMo, including popular community-built pretrained LLMs such as Llama-2 and BLOOM.

Read a technical blog to learn more about SteerLM.

See notice regarding software product information.

Image for NVIDIA NeMo SteerLM

Keeping an AI on Quakes: Researchers Unveil Deep Learning Model to Improve Forecasts

6 October 2023 at 16:00

A research team is aiming to shake up the status quo for earthquake models.

Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting.

Dubbed RECAST, the model can use larger datasets and offer greater flexibility than the current model standard, ETAS, which has improved only incrementally since its development in 1988, it argues.

The paper’s authors — Kelian Dascher-Cousineau, Oleksandr Shchur, Emily Brodsky and Stephan Günnemann — trained the model on NVIDIA GPU workstations.

“There’s a whole field of research that explores how to improve ETAS,” said Dacher-Cousineau, a postdoctoral researcher at UC Berkeley. “It’s an immensely useful model that has been used a lot, but it’s been frustratingly hard to improve on it.”

AI Drives Seismology Ahead 

The promise of RECAST is that its model flexibility, self-learning capability and ability to scale will enable it to interpret larger datasets and make better predictions during earthquake sequences, he said.

Model advances with improved forecasts could help agencies such as the U.S. Geological Survey and its counterparts elsewhere offer better information to those who need to know. Firefighters and other first responders entering damaged buildings, for example, could benefit from more reliable forecasts on aftershocks.

“There’s a ton of room for improvement within the forecasting side of things. And for a variety of reasons, our community hasn’t really dove into the machine learning side of things, partly because of being conservative and partly because these are really impactful decisions,” said Dacher-Cousineau.

RECAST Model Moves the Needle

While past work on aftershock predictions has relied on statistical models, this doesn’t scale to handle the larger datasets becoming available from an explosion of newly enhanced data capabilities, according to the researchers.

The RECAST model architecture builds on developments in neural temporal point processes, which are probabilistic generative models for continuous time event sequences. In a nutshell, the model has an encoder-decoder neural network architecture used for predicting the timing of a next event based on a history of past events.

Dacher-Cousineau said that releasing and benchmarking the model in the paper demonstrates that it can quickly learn to do what ETAS can do, while it holds vast potential to do more.

“Our model is a generative model that, just like a natural language processing model, you can generate paragraphs and paragraphs of words, and you can sample it and make synthetic catalogs,” said Dacher-Cousineau. “Part of the paper is there to convince old-school seismologists that this is a model that’s doing the right thing — we’re not overfitting.”

Boosting Earthquake Data With Enhanced Catalogs 

Earthquake catalogs, or records of earthquake data, for particular geographies can be small. That’s because to this day many come from seismic analysts who interpret scribbles of raw data that comes from seismometers. But this, too, is an area where AI researchers are building models to autonomously interpret these P waves and other signals in the data in real time.

Enhanced data is meanwhile helping to fill the void. With the labeled data in earthquake catalogs, machine learning engineers are revisiting these sources of raw data and building enhanced catalogs to get 10x to 100x the number of earthquakes for training data and categories.

“So it’s not necessarily that we put out more instruments to gather data but rather that we enhance the datasets,” said Dacher-Cousineau.

Applying Larger Datasets to Other Settings

With the larger datasets, the researchers are starting to see improvements from RECAST over the standard ETAS model.

To advance the state of the art in earthquake forecasting, Dascher-Cousineau is working with a team of undergraduates at UC Berkeley to train earthquake catalogs on multiple regions for better predictions.

“I have the natural language processing analogies in mind, where it seems very plausible that earthquake sequences in Japan are useful to inform earthquakes in California,” he said. “And you can see that going in the right direction.”

Learn about synthetic data generation with NVIDIA Omniverse Replicator

Brains of the Operation: Atlas Meditech Maps Future of Surgery With AI, Digital Twins

5 October 2023 at 13:00

Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation.

Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations.

Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI medical imaging framework and NVIDIA Omniverse 3D development platform — to build AI-powered decision support and high-fidelity surgery rehearsal platforms. Its mission: improving surgical outcomes and patient safety.

“The Atlas provides a collection of multimedia tools for brain surgeons, allowing them to mentally rehearse an operation the night before a real surgery,” said Dr. Aaron Cohen-Gadol, founder of Atlas Meditech and its nonprofit counterpart, Neurosurgical Atlas. “With accelerated computing and digital twins, we want to transform this mental rehearsal into a highly realistic rehearsal in simulation.”

Neurosurgical Atlas offers case studies, surgical videos and 3D models of the brain to more than a million online users. Dr. Cohen-Gadol, also a professor of neurological surgery at Indiana University School of Medicine, estimates that more than 90% of brain surgery training programs in the U.S. — as well as tens of thousands of neurosurgeons in other countries — use the Atlas as a key resource during residency and early in their surgery careers.

Atlas Meditech’s Pathfinder software is integrating AI algorithms that can suggest safe surgical pathways for experts to navigate through the brain to reach a lesion.

And with NVIDIA Omniverse, a platform for connecting and building custom 3D pipelines and metaverse applications, the team aims to create custom virtual representations of individual patients’ brains for surgery rehearsal.

Custom 3D Models of Human Brains

A key benefit of Atlas Meditech’s advanced simulations — either onscreen or in immersive virtual reality — is the ability to customize the simulations, so that surgeons can practice on a virtual brain that matches the patient’s brain in size, shape and lesion position.

“Every patient’s anatomy is a little different,” said Dr. Cohen-Gadol. “What we can do now with physics and advanced graphics is create a patient-specific model of the brain and work with it to see and virtually operate on a tumor. The accuracy of the physical properties helps to recreate the experience we have in the real world during an operation.”

To create digital twins of patients’ brains, the Atlas Pathfinder tool has adopted MONAI Label, which can support radiologists by automatically annotating MRI and CT scans to segment normal structures and tumors.

“MONAI Label is the gateway to any healthcare project because it provides us with the opportunity to segment critical structures and protect them,” said Dr. Cohen-Gadol. “For the Atlas, we’re training MONAI Label to act as the eyes of the surgeon, highlighting what is a normal vessel and what’s a tumor in an individual patient’s scan.”

With a segmented view of a patient’s brain, Atlas Pathfinder can adjust its 3D brain model to morph to the patient’s specific anatomy, capturing how the tumor deforms the normal structure of their brain tissue.

Based on the visualization — which radiologists and surgeons can modify to improve the precision — Atlas Pathfinder suggests the safest surgical approaches to access and remove a tumor without harming other parts of the brain. Each approach links out to the Atlas website, which includes a written tutorial of the operative plan.

“AI-powered decision support can make a big difference in navigating a highly complex 3D structure where every millimeter is critical,” Dr. Cohen-Gadol said.

Realistic Rehearsal Environments for Practicing Surgeons 

Atlas Meditech is using NVIDIA Omniverse to develop a virtual operating room that can immerse surgeons into a realistic environment to rehearse upcoming procedures. In the simulation, surgeons can modify how the patient and equipment are positioned.

Using a VR headset, surgeons will be able to work within this virtual environment, going step by step through the procedure and receiving feedback on how closely they are adhering to the target pathway to reach the tumor. AI algorithms can be used to predict how brain tissue would shift as a surgeon uses medical instruments during the operation, and apply that estimated shift to the simulated brain.

“The power to enable surgeons to enter a virtual, 3D space, cut a piece of the skull and rehearse the operation with a simulated brain that has very similar physical properties to the patient would be tremendous,” said Dr. Cohen-Gadol.

To better simulate the brain’s physical properties, the team adopted NVIDIA PhysX, an advanced real-time physics simulation engine that’s part of NVIDIA Omniverse. Using haptic devices, they were able to experiment with adding haptic feedback to the virtual environment, mimicking the feeling of working with brain tissue.

Envisioning AI, Robotics in the Future of Surgery Training

Dr. Cohen-Gadol believes that in the coming years AI models will be able to further enhance surgery by providing additional insights during a procedure. Examples include warning surgeons about critical brain structures that are adjacent to the area they’re working in, tracking medical instruments during surgery, and providing a guide to next steps in the surgery.

Atlas Meditech plans to explore the NVIDIA Holoscan platform for streaming AI applications to power these real-time, intraoperative insights. Applying AI analysis to a surgeon’s actions during a procedure can provide the surgeon with useful feedback to improve their technique.

In addition to being used for surgeons to rehearse operations, Dr. Cohen-Gadol says that digital twins of the brain and of the operating room could help train intelligent medical instruments such as microscope robots using Isaac Sim, a robotics simulation application developed on Omniverse.

View Dr. Cohen-Gadol’s presentation at NVIDIA GTC.

Subscribe to NVIDIA healthcare news.