Downloading an LLM model manually

Download pre-trained data models in GGUF format.

Before you begin

Alternatively, you can configure dominoiq.nsf to have the Domino IQ administration server download LLM models for you. For more information, see Adding an LLM Model document.

About this task

Make sure that you have more available system RAM than the size of the model.

Procedure

  1. Go to the Models tab on the Hugging Face site.
  2. Filter the downloadable GGUF models by selecting Text Generation under tasks, GGUF under Libraries, any specific language, and specific license types under Licenses. Here is an example of the results if you filter TextGeneration GGUF files in English with MIT license.
  3. Select a model that fits your application needs. If you selected the llama.3.x license, you can choose, for example, the llmstudio-community/Llama-3.2-1B-Instruct-GGUFf model.
  4. On the same page, select one of the available bit quantized models. We recommend using 3b or 7b llama3.x models with 3-bit or 4-bit quantization levels, which are much smaller to load with an acceptable text generation quality. This example shows the metadata on the models.
  5. When you've made your choice, click the Download button to save the file on your computer.
  6. These models in GGUF format (.gguf file extension) can be copied to the llm_models subdirectory under the Domino data directory, as described in Enabling Domino IQ servers. Make sure these files are readable by the user account running the Domino server.
  7. Save the document.