PRODUCTS /

Xference products

Private AI Inference Platform: Powering Intelligent Solutions with Security and Performance




Xference's advanced inference engine transcends traditional sector boundaries, delivering precision-optimized AI solutions tailored to the unique data landscapes of each industry.

Become one of our early adopters by selecting the right plan for you needs, and preregistering for early access to our platform. You will then enter a queue and once approved, will receive an email with a login link. Happy Xferencing!

BETA

Standard

VIRTUAL PRIVATE INFRASTRUCTURE

Ideal for those who want to start using generative AI with awareness of their privacy protection. On your server, you can perform inference securely. All your data will remain stored within your private server.

Included in this plan:

  • 2 MB max file size
  • 100 MB file upload
  • 1 M token / month
  • Unlimited prompt
  • 1 user
COMING SOON
BETA

Pro

VIRTUAL PRIVATE INFRASTRUCTURE

Designed for users already leveraging generative AI platforms like GPT who now demand full data protection whether for work or personal use.

Included in this plan:

  • 10 MB max file size
  • 1 GB file upload
  • 3 M token / month
  • Unlimited prompt
  • 1 user
  • Plug-in with your mail, messaging, storage and database server
COMING SOON
BETA

Team

VIRTUAL PRIVATE INFRASTRUCTURE

Built for professionals and enterprises running large-scale inference workloads and demand uncompromising data protection.

Included in this plan:

  • 20 MB max file size
  • 3 GB file upload
  • 10 M token / month
  • Unlimited prompt
  • Up to 3 users included, then pay per user
  • Plug-in with your mail, messaging, storage and database server
COMING SOON

Enterprise

HOUSING OR ON-PREMISES

For large organizations that need to manage data sovereignty, choose between hosting your server in your data center, or deploying a physical server in Xference's datacenter, ensuring your data stays fully isolated and under your control.

Included in this plan:

  • Unlimited file upload
  • Unlimited token / month
  • Unlimited prompt
  • Unlimited users
  • Unlimited API connections
BETA

API

DEDICATED SERVER SOLUTIONS

For developers and teams who rely on AI APIs but demand absolute data privacy, Xference's Custom API delivers the power of generative AI without the risks. Built for experts, our API enables seamless integration with your existing workflows, while ensuring all prompts, responses, and data are processed entirely within your private infrastructure. No data leaves your environment. No third-party exposure. Just high-performance, secure inference, tailored to your exact needs.

Included in this plan:

  • API-first design: Easy integration with existing systems
  • No model selection needed: Xference optimizes the best LLM for your use case
  • Enterprise-grade security: End-to-end encryption and full control

Not sure which plan to choose? Take a look at the comparison chart below, to help you decide.

Standard Pro Team Enterprise
Max file size 2 MB 10 MB 20 MB Unlimited
Max file upload 100 MB 1 GB 3 GB Unlimited
Token / month 1 M 10 M 15 M Unlimited
Prompt limit Unlimited Unlimited Unlimited Unlimited
# of users 1 1 Up to 3 included, then pay per user Unlimited
API connections x With your mail, messaging and database server With your mail, messaging and database server Unlimited
Customer support x Email Email Custom

Other common questions

Tokens are small parts that text is split into when processed by AI. Think of them like words or pieces of words. For example, the sentence "Hello, how are you?" might be split into 5 tokens: "Hello", ",", "how", "are", "you?".

AI systems count tokens to measure how much text you're sending and how much they're processing. The more tokens, the more computing power and time it takes.

In Xference, all token usage is tracked to your account, and your data stays private—never shared or stored outside your server.
Xference supports multiple open-source LLM models. The best models are tested and selected based on performance, speed, accuracy, and resource efficiency. You don't need to choose, Xference selects the optimal model for your use case.
No. Xference's infrastructure is already optimized with the best GPUs for inference. You don't need to worry about hardware configuration, everything is handled for you.
Yes. Xference can currently be installed on dedicated servers chosen together with the client, in either an on-premise setup or in a hosting facility.
No. All chats and token usage are linked to your account. Sharing access is not supported to ensure data privacy and security.
Because the only limit is the hardware configuration, which is defined together with the client to meet all their requirements.

Still not sure? Don't hesitate to reach out and talk to a human.

GET IN TOUCH