Title: NVIDIA Support URL Source: http://nvidia.custhelp.com/app/answers/detail/a_id/5487/~/tensorrt-extension-for-stable-diffusion-web-ui Markdown Content: This guide explains how to install and use the [TensorRT extension](https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT) for Stable Diffusion Web UI, using as an example Automatic1111, the most popular Stable Diffusion distribution. The extension doubles the performance of Stable Diffusion by leveraging the Tensor Cores in NVIDIA RTX GPUs. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_01.png) About Stable Diffusion and Automatic1111 [Stable Diffusion](https://stability.ai/stable-diffusion) is a generative AI image-based model that allows users to generate images from simple text descriptions. Users typically access this model through distributions that provide it with a UI and advanced features. The most popular of such distributions is the [Web UI from Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui). Support The TensorRT Extension supports: * Text-2-image and Image-2-image * Stable Diffusion 1.5 and 2.1 * LoRA Support for SDXL is coming in a future patch. GPU NVIDIA RTX GPUs with 8GB of VRAM RAM 16GB RAM Connection Internet connectivity during installation Driver NVIDIA Studio Driver 537.58, Game Ready Driver 537.58, NVIDIA RTX Enterprise Driver 537.58, and above Setup Guide In order to use the TensorRT Extension for Stable Diffusion you need to follow these steps: 1\. Install Stable Diffusion web UI from Automatic1111. 2\. Install the Tensor RT Extension. 3\. Generate the TensorRT Engines for your desired resolutions. 4\. Configure Stalbe Diffusion web UI to utilize the TensorRT pipeline. 1\. Install Stable Diffusion Web UI from Automatic1111 If you already have the Stable Diffusion Web UI from Automatic1111 installed, skip to the next step. These instructions will utilize the standalone installation. There are other methods available to install the Web UI on [Automatic1111’s Github page](https://github.com/AUTOMATIC1111/stable-diffusion-webui). 2\. Move the sd.webui.zip to a location on your local machine with enough hard drive space (20GB or more). 3. Extract the sd.webui.zip file. 4. In the folder that you extracted, click on the update.bat file which will update the web UI to the most recent version. a. When clicking on the update.bat file, you may get a warning pop-up from Windows. If so, click on More info \> Run Anyway. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_02.png) ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_03.png) 5\. Once the files have been updated, close the command prompt window. 6. Click on Run.bat. The install process will begin, and the necessary files will be downloaded and installed to your computer. This process can take several minutes, and will also download Stable Diffusion 1.5 (~4GB). a. If the Windows warning appears, repeat the above process. b. The installation will look like this: ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_04.png) c. The installation completes when you see this message in the console: Running on local URL: [http://127.0.0.1:7860](http://127.0.0.1:7860/). A webpage looking like this should also appear. If it doesn’t, you can manually open a browser and go to this URL: [http://127.0.0.1:7860](http://127.0.0.1:7860/). ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_05.png) 2\. Installing the TensorRT Extension for Automatic1111 Next we will install the TensorRT extension into our Automatic1111 installation. 1\. From the main UI tabs, click on Extensions \> Install from URL:  ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_06.png) 2\. In the URL for extension’s git repository text box, input the following and click Install: The installation will download and install the necessary files. This process will take several minutes depending on your internet connection. 3. Once installation is complete, click on the Installed tab, make sure the TensorRT extension box is checked, and click on the Apply and restart UI button. The UI will refresh. This may take up to 2 minutes. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_07.png) 4\. After the UI restarts, there will be a new tab for TensorRT tab. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_08.png) 3\. Building TensorRT Engines TensorRT is the fastest way to run AI on NVIDIA RTX GPUs. TensorRT can generate specific optimizations for your exact GPU for the AI model that you want to run. These optimizations are called TensorRT Engines. Below we will explain to you how you can generate a generic one, and how to create other custom ones. 1\. Click on the TensorRT Tab. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_09.png) 2\. The default engine will be automatically selected in the Preset dropdown. Click on Export Default Engine. This process should take between 4 and 10 minutes, depending on your GPU. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_10.png) a. This will generate a TensorRT Engine for Stable Diffusion 1.5, for image sizes between 512x512 and 768x768, and for any batch size of 1 to 4. This should cover the majority of use cases. b. Note that the first time you run this process the extension will generate a TensorRT optimized ONNX model. This only has to be done once. c. The Output section in the bottom part of the screen will show status information on the build process. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_11.png) d. You can also check the console output to see the exact status. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_12.png) 3\. Once the process completes, you can generate other engines that you may want to use. The extension will automatically use the best one from the available ones, so you can generate as many as you want. Just note that each engine is ~2GB. a. Note: There are Static and Dynamic presets. Static ones provide better performance but only work on the exact settings they are optimized for. Dynamic presets also consume more VRAM. You can read more about it in the Additional Information section below. b. You can see the engines which have been previously in the bottom of the page under the Available TensorRT engine-profiles section. Click on the checkpoint you want to check the available engines.  ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_13.png) Note that this section only updates upon a web UI start, so if you just generated engines you will need to restart the web UI. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_14.png) 4\. Activating TensorRT Image Generation 1\. From the Main UI tabs, select Settings \> User Interface. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_15.png) ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_16.png) 2\. Add sd\_unet to the list of Quick Settings. This will allow us to select the new TensorRT pipeline to generate images: a. Click on Show all pages, then locate the \[info\] Quicksettings list ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_17.png) b. Click in the Quicksettings list area and type: sd\_unet  ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_18.png) c. Then click on the sd\_unet item displayed to add it to the Quicksettings. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_19.png) d. At the top of the Settings page, click on Apply Settings and then Reload UI.  ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_20.png) e. The top area of the UI will now have a new dropdown for SD Unet. Click on the refresh button to load your new engines. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_21.png) 3\. In the new SD Unet dropdown, select the \[TRT\] v1-5-pruned-emaonly engine listed in the dropdown. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_22.png) a. The engine name shows the Stable Diffusion checkpoint this engine is generated for. You should always verify that the TRT engine matches your currently loaded checkpoint. b. When selecting the TRT SD\_Unet from the dropdown, the extension will automatically choose the best TensorRT engine you have built as long as at least one engine exists that matches your output configuration settings. And you are good to go! You can now go to the text2img tab and start generating images with optimized performance. Happy prompting! Additional Information Creating Custom Engines The TensorRT Exporter extension allows users to create custom engines that offer ranges beyond what is provided by the engine presets. In the TensorRT Extension tab, expand the Advanced Settings section. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_23.png) Users can create a new engine by selecting the configuration options for Minimum Batch Size, Optimal Batch Size, Max Batch Size, Minimum Height, Maximum Height, Optimal Height, Max Height, Minimum Width, Optimal Width, Max Width, Min, Optimal and Max Prompt Size. When setting the optimal sizes, choose the value that you most commonly use within the specified range. If you would like to build a static shaped engine, check the Use static shapes checkbox. ![](http://nvidia.custhelp.com/rnt/rnw/img/enduser/aid_5487_24.png) When building a static shape engine, you will only be able to input the optimal values, and the engine will be limited to using only the exact values specified. Dynamic Vs Static Engines The TensorRT Export extension provides presets for both Static and Dynamic engine creation. * Static Engines can only be configured to match a single resolution and batch size. Static engines provide the best performance at the cost of flexibility. Static engines use the least amount of VRAM. * Dynamic Engines can be configured for a range of height and width resolutions, and a range of batch sizes. Dynamic engines generally offer slightly lower performance than static engines, but allow for much greater flexibility by allowing a wider range of output settings with a single engine. * Performance or Flexibility…why not both? The TensorRT extension allows you to create both static engines and dynamic engines and will automatically choose the best engine for your needs. This means that you can create a dynamic engine with a range that covers a 512 height and width to 768 height and width, with batch sizes of 1 to 4, while also creating a static engine for 768x768 with a batch size of 2. * In this example, if you choose an output of 768x768 with a batch size of 1, the dynamic engine will automatically be used. * But if you choose an output of 768x768 with a batch size of 2, the static engine will automatically be used, providing better performance. * Any resolution variation between the two ranges, such as 768 width by 704 height with a batch size of 3, will automatically use the dynamic engine.  * By building a variety of static engines for output configurations you commonly use along with dynamic engines to cover a range of output options, you’ll get both the best performance and best flexibility.  LoRA (Experimental) To use LoRA checkpoints with TensorRT, install them normally and head to the LoRA tab within the TensorRT Extension. 1\. Select an available LoRA checkpoint from the dropdown. 2\. Export the model. This should take about a minute. 3\. Select the LoRA checkpoint from the sd\_unet dropdown in the main UI.