Hugging Face Profile

The Hugging Face Profile supports both hosted models and custom endpoints, offering the most flexible model configuration in the library. This page describes how to select and configure Hugging Face models for use within the UnO Agentic AI Builder. To ensure seamless interaction within the chat interface and agentic workflows, users must select models specifically optimized for instruction following.

Before you begin

You must have a valid Hugging Face Credential configured in the Credential Library to authenticate this connection.
Ensure that all mandatory fields (marked with *) are completed accurately.

Tested Models

While the UnO Agentic AI Builder can connect to various models hosted on Hugging Face, only the following models currently support the Chat functionality required for interactive agents:

Qwen/Qwen2.5-7B-Instruct
Qwen/Qwen2.5-1.5B-Instruct
openai/gpt-oss-20b

The following models have been tested but are not supported for Chat functionality in the current version:

Incompatible Instruct Models:Qwen/Qwen2.5-3B-Instruct, meta-llama/Llama-3.1-8B-Instruct, and dphn/dolphin-2.9.1-yi-1.5-34b.
Base Models:Qwen/Qwen3-4B, Qwen/Qwen3-0.6B, and openai-community/gpt2. These models lack the instruction-tuning required to follow conversational logic and will not function correctly within the Agentic AI Builder.

Table 1. Mandatory fields
Option	Description
Profile name	A unique identifier for this configuration instance. This name will be used to reference this specific Hugging Face model setup in the Agentic AI Builder.
Models available	Select a supported model from the dropdown list.

Table 2. Optional fields
Options	Description
Other model (overrides mandatory selection above)	Type a specific model name if it does not appear in the standard dropdown list. This overrides the "Models available" selection.
Temperature	Controls the randomness of the output (range: 0.0 to 1.0). Lower values make the output more deterministic.
Max New Tokens	The maximum number of tokens to generate in the response.
Top P	Nucleus sampling: The model considers the smallest set of tokens whose cumulative probability exceeds the threshold `top_p`.
Top K	Samples from the `k` tokens with the highest probability.
Repetition Penalty	Prevents the model from repeating the same text. A value of `1.0` means no penalty.
Do Sample	(Checkbox) Enables sampling mode (checked by default).
Streaming	(Checkbox) Enables real-time token streaming for faster perceived response times.
Endpoint URL	(Optional) Provide a custom inference endpoint URL if you are using a dedicated Hugging Face Inference Endpoint.
Task	Define the specific task type. The default is usually set to `text-generation`.