Supported GitLab Duo Self-Hosted models and hardware requirements

Tier: Premium, Ultimate
Add-on: GitLab Duo Enterprise
Offering: GitLab Self-Managed

Version history

Introduced in GitLab 17.1 with a flag named ai_custom_model. Disabled by default.
Enabled on GitLab Self-Managed in GitLab 17.6.
Changed to require GitLab Duo add-on in GitLab 17.6 and later.
Feature flag ai_custom_model removed in GitLab 17.8.
Generally available in GitLab 17.9.
Changed to include Premium in GitLab 18.0.

GitLab Duo Self-Hosted supports integration with industry-leading models from Mistral, Meta, Anthropic, and OpenAI through your preferred serving platform.

You can choose from these supported models to match your specific performance needs and use cases.

In GitLab 18.3 and later, you can also bring your own compatible model, giving you the flexibility to experiment with additional language models beyond the officially supported options.

Supported models

Support for the following GitLab-supported large language models (LLMs) is generally available.

Fully compatible: The model can likely handle the feature without any loss of quality.
Largely compatible: The model supports the feature, but there might be compromises or limitations.
Not compatible: The model is unsuitable for the feature, likely resulting in significant quality loss or performance issues. Models that are marked not compatible for a feature will not receive GitLab support for that specific feature.

Model family	Model	Supported platforms	Code completion	Code generation	GitLab Duo Chat
Mistral Codestral	Codestral 22B v0.1	vLLM	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	Not applicable
Mistral	Mistral 7B-it v0.3 ¹	vLLM	{check-circle-dashed} Largely compatible	{check-circle-filled} Fully compatible	{dash-circle} Not compatible
Mistral	Mixtral 8x7B-it v0.1 ¹	vLLM, AWS Bedrock	{check-circle-dashed} Largely compatible	{check-circle-filled} Fully compatible	{check-circle-dashed} Largely compatible
Mistral	Mixtral 8x22B-it v0.1 ¹	vLLM	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-dashed} Largely compatible
Mistral	Mistral Small 24B Instruct 2506	vLLM	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
Claude 3	Claude 3.5 Sonnet	AWS Bedrock	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
Claude 3	Claude 3.7 Sonnet	AWS Bedrock	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
Claude 4	Claude 4 Sonnet	AWS Bedrock	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
GPT	GPT-4 Turbo	Azure OpenAI	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-dashed} Largely compatible
GPT	GPT-4o	Azure OpenAI	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
GPT	GPT-4o-mini	Azure OpenAI	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-dashed} Largely compatible
Llama	Llama 3 8B	vLLM	{check-circle-dashed} Largely compatible	{check-circle-filled} Fully compatible	{dash-circle} Not compatible
Llama	Llama 3.1 8B	vLLM	{check-circle-dashed} Largely compatible	{check-circle-filled} Fully compatible	{check-circle-dashed} Largely compatible
Llama	Llama 3 70B	vLLM	{check-circle-dashed} Largely compatible	{check-circle-filled} Fully compatible	{dash-circle} Not compatible
Llama	Llama 3.1 70B	vLLM	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible
Llama	Llama 3.3 70B	vLLM	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible	{check-circle-filled} Fully compatible

Footnotes:

This model is scheduled for deprecation in GitLab 18.5. Mistral Small 24B Instruct 2506 is the recommended alternative.

Bring your own compatible model

Status: Beta

Version history

Introduced in GitLab 18.3 as a beta.

You can bring your own compatible models to use with GitLab Duo features.

The general model family provides support for compatible models and platforms that adhere to the OpenAI API specification. Use this model family to try language models that are not explicitly supported by GitLab.

This feature is in beta and is therefore subject to change as we gather feedback and improve the integration:

GitLab does not provide technical support for issues specific to your chosen model or platform.
Not all GitLab Duo features are guaranteed to work optimally with every compatible model.
Response quality, speed, and performance overall might vary significantly based on your model choice.

Model Family	Model Requirements	Supported Platforms	Code completion	Code generation	GitLab Duo Chat
General	Any model compatible with the OpenAI API specification	Any platform that provides OpenAI-compatible API endpoints	{check-circle-dashed} Beta	{check-circle-dashed} Beta	{check-circle-dashed} Beta

Experimental and beta models

The following models are configurable for the functionalities marked below, but are currently in beta or experimental status, under evaluation, and are excluded from the "Customer Integrated Models" definition in the AI Functionality Terms:

Model family	Model	Supported platforms	Status	Code completion	Code generation	GitLab Duo Chat
OpenAI GPT	GPT OSS 20b	vLLM, AWS Bedrock, Azure OpenAI	Experimental	{check-circle} Yes	{dotted-circle} Yes	{dotted-circle} Yes
OpenAI GPT	GPT OSS 120b	vLLM, AWS Bedrock, Azure OpenAI	Experimental	{check-circle} Yes	{dotted-circle} Yes	{dotted-circle} Yes
CodeGemma	CodeGemma 2b	vLLM	Experimental	{check-circle} Yes	{dotted-circle} No	{dotted-circle} No
CodeGemma	CodeGemma 7b-it	vLLM	Experimental	{dotted-circle} No	{check-circle} Yes	{dotted-circle} No
CodeGemma	CodeGemma 7b-code	vLLM	Experimental	{check-circle} Yes	{dotted-circle} No	{dotted-circle} No
Code Llama	Code-Llama 13b	vLLM	Experimental	{dotted-circle} No	{check-circle} Yes	{dotted-circle} No
DeepSeek Coder	DeepSeek Coder 33b Instruct	vLLM	Experimental	{check-circle} Yes	{check-circle} Yes	{dotted-circle} No
DeepSeek Coder	DeepSeek Coder 33b Base	vLLM	Experimental	{check-circle} Yes	{dotted-circle} No	{dotted-circle} No
Mistral	Mistral 7B-it v0.2	vLLM AWS Bedrock	Experimental	{check-circle} Yes	{check-circle} Yes	{check-circle} Yes

GitLab AI vendor models

Status: Beta

Version history

Introduced in GitLab 18.3, with a feature flag named ai_self_hosted_vendored_features. Disabled by default.

The availability of this feature is controlled by a feature flag. For more information, see the history.

GitLab AI vendor models integrate with GitLab-hosted AI gateway infrastructure to provide access to AI models curated and made available by GitLab. Instead of using your own self-hosted models, you can choose to use GitLab AI vendor models for specific GitLab Duo features.

To choose which features use GitLab AI vendor models, see Configure GitLab AI vendor models.

When enabled for a specific feature:

All calls to those features configured with a GitLab AI vendor model use the GitLab-hosted AI gateway, not the self-hosted AI gateway.
No detailed logs are generated in the GitLab-hosted AI gateway, even when AI logs are enabled. This prevents unintended leaks of sensitive information.

Hardware requirements

The following hardware specifications are the minimum requirements for running GitLab Duo Self-Hosted on-premise. Requirements vary significantly based on the model size and intended usage:

Base system requirements

CPU:
- Minimum: 8 cores (16 threads)
- Recommended: 16+ cores for production environments
RAM:
- Minimum: 32 GB
- Recommended: 64 GB for most models
Storage:
- SSD with sufficient space for model weights and data.

GPU requirements by model size

Model size	Minimum GPU configuration	Minimum VRAM required
7B models (for example, Mistral 7B)	1x NVIDIA A100 (40 GB)	35 GB
22B models (for example, Codestral 22B)	2x NVIDIA A100 (80 GB)	110 GB
Mixtral 8x7B	2x NVIDIA A100 (80 GB)	220 GB
Mixtral 8x22B	8x NVIDIA A100 (80 GB)	526 GB

Use Hugging Face's memory utility to verify memory requirements.

Response time by model size and GPU

Small machine

With a a2-highgpu-2g (2x Nvidia A100 40 GB - 150 GB vRAM) or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	7.09	717.0	101.19	7.09	101.17
Mistral-7B-Instruct-v0.3	10	8.41	764.2	90.35	13.70	557.80
Mistral-7B-Instruct-v0.3	100	13.97	693.23	49.17	20.81	3331.59

Medium machine

With a a2-ultragpu-4g (4x Nvidia A100 40 GB - 340 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests	Total TPS
Mistral-7B-Instruct-v0.3	1	3.80	499.0	131.25	3.80	131.23
Mistral-7B-Instruct-v0.3	10	6.00	740.6	122.85	8.19	904.22
Mistral-7B-Instruct-v0.3	100	11.71	695.71	59.06	15.54	4477.34
Mixtral-8x7B-Instruct-v0.1	1	6.50	400.0	61.55	6.50	61.53
Mixtral-8x7B-Instruct-v0.1	10	16.58	768.9	40.33	32.56	236.13
Mixtral-8x7B-Instruct-v0.1	100	25.90	767.38	26.87	55.57	1380.68

Large machine

With a a2-ultragpu-8g (8 x NVIDIA A100 80 GB - 1360 GB vRAM) machine on GCP or equivalent:

Model name	Number of requests	Average time per request (sec)	Average tokens in response	Average tokens per second per request	Total time for requests (sec)	Total TPS
Mistral-7B-Instruct-v0.3	1	3.23	479.0	148.41	3.22	148.36
Mistral-7B-Instruct-v0.3	10	4.95	678.3	135.98	6.85	989.11
Mistral-7B-Instruct-v0.3	100	10.14	713.27	69.63	13.96	5108.75
Mixtral-8x7B-Instruct-v0.1	1	6.08	709.0	116.69	6.07	116.64
Mixtral-8x7B-Instruct-v0.1	10	9.95	645.0	63.68	13.40	481.06
Mixtral-8x7B-Instruct-v0.1	100	13.83	585.01	41.80	20.38	2869.12
Mixtral-8x22B-Instruct-v0.1	1	14.39	828.0	57.56	14.38	57.55
Mixtral-8x22B-Instruct-v0.1	10	20.57	629.7	30.24	28.02	224.71
Mixtral-8x22B-Instruct-v0.1	100	27.58	592.49	21.34	36.80	1609.85

AI Gateway Hardware Requirements

For recommendations on AI gateway hardware, see the AI gateway scaling recommendations.