Enterprise-grade infrastructure

From prototype
to production scale.

Pay-as-you-go from $10. Committed plans from $499/mo. Dedicated GPU instances from $5,000/mo. All 200+ models, one API.

Growth
Scaling teams
$499 / mo
monthly commitment
For teams moving past prototyping. Committed spend unlocks volume discounts and higher throughput.
15% discount on all models
600 req/min rate limit
All 200+ models
Priority support
Custom webhooks
Usage analytics
Get Started
Scale
Production workloads
$2,499 / mo
monthly commitment
For production systems demanding guaranteed performance, priority inference, and dedicated account management.
25% discount on all models
2,400 req/min rate limit
Priority inference queue
Dedicated account manager
99.9% SLA
Custom fine-tuning
Get Started
Dedicated
GPU instances
$5,000+ / mo
dedicated GPU · billed per GPU-hour
Isolated GPU instances (H100, A100, L40S) with dedicated capacity. No shared compute, no noisy neighbors.
Dedicated GPU instances
H100 ~$3.50/hr · A100 ~$2.10/hr
Custom fine-tuning included
99.99% SLA
Multi-tenant isolation
Private endpoints
Talk to Sales
Enterprise
Custom everything
Custom
volume pricing + dedicated cluster
For organizations processing millions of tokens monthly. Full infrastructure control, compliance, and named engineering support.
30-50% volume discounts
Dedicated cluster
On-premises deployment option
SSO / SAML
SOC 2 · HIPAA (health-faith)
Named engineer
Custom SLA
Contact Sales

All tiers include

OpenAI-compatible API
Streaming
Function calling
31 languages
Usage monitoring
JSON mode
Batch processing (50% off)
Compute

GPU infrastructure on demand

Provision dedicated GPU instances for inference, fine-tuning, and training. Available on Dedicated and Enterprise tiers.

GPU VRAM Price / hr Best For
NVIDIA H100 SXM 80 GB $3.49/hr Large models, fine-tuning, training
NVIDIA A100 SXM 80 GB $2.09/hr Training, high-throughput inference
NVIDIA L40S 48 GB $1.19/hr Inference, cost-efficient production
NVIDIA A10G 24 GB $0.75/hr Lightweight inference, experimentation

GPU instances billed per hour. Minimum 1-hour commitment. Multi-GPU clusters available on Enterprise. Pricing may vary by availability and region.

Tenancy

Choose your isolation level

From shared multi-tenant pools to fully isolated on-premises deployments. Pick the tenancy model that matches your compliance and performance needs.

Default
Shared
Multi-tenant infrastructure with shared GPU pool. Automatic load balancing and scaling. Best for most workloads.
Available on all tiers
Scale+
Reserved
Guaranteed capacity with your own GPU allocation. No cold starts, predictable performance. No shared queue contention.
Scale tier and above
Dedicated
Dedicated
Isolated cluster with private network. Your own hardware, your own endpoints. Full network isolation and data sovereignty.
Dedicated tier and above
Enterprise
On-Premises
Deploy XALEN on your own infrastructure. Air-gapped deployments, custom compliance, full operational control.
Enterprise tier only
Per-Token Pricing

Transparent pricing per model

Every model has clear input and output pricing. Use our Vedika models for faith-domain tasks, or route to any open-source model at competitive rates.

Cost Calculator
Input Cost
$0.60
Output Cost
$0.90
Total
$1.50
Model Input (per 1M tokens) Output (per 1M tokens) Context
Vedika Standard XALEN $0.60 $1.80 128K
Vedika Fast XALEN $0.10 $0.30 128K
Vedika Voice XALEN $0.02/sec 31 langs
Llama 3.1 405B Meta $0.88 $2.64 128K
Llama 3.1 70B Meta $0.54 $1.62 128K
Mixtral 8x22B Mistral AI $0.60 $1.80 65K
Qwen 2.5 72B Alibaba $0.54 $1.62 128K
DeepSeek V3 DeepSeek $0.27 $0.81 128K
Gemma 2 27B Google $0.20 $0.60 8K
Command R+ Cohere $2.50 $7.50 128K
+190 more models Full pricing in docs

Batch processing pricing is 50% of the rates shown above. All prices in USD. Volume discounts available on Enterprise plans.

FAQ

Frequently asked questions

How does Pay As You Go work?

Add a minimum of $10 to your wallet via Razorpay (UPI, cards, net banking). Use any of the 200+ models and pay per token consumed. Credits are valid for 1 year from purchase. No monthly fees, no commitments. When your balance hits zero, API requests return 402 until you top up.

What do I get with Growth vs. Scale?

Growth ($499/mo) gives you 15% off all models, 600 req/min, priority support, and custom webhooks. Scale ($2,499/mo) upgrades that to 25% off, 2,400 req/min, priority inference queue, a dedicated account manager, 99.9% SLA, and custom fine-tuning. Both tiers include all 200+ models.

How does Dedicated GPU pricing work?

Dedicated tier starts at $5,000/mo. You get isolated GPU instances billed per GPU-hour: H100 at ~$3.49/hr, A100 at ~$2.09/hr, L40S at ~$1.19/hr. Includes custom fine-tuning, 99.99% SLA, multi-tenant isolation, and private endpoints. Contact sales for exact pricing based on your configuration.

Can I upgrade or downgrade my tier?

Upgrades are immediate — your new rate limits and discounts apply instantly. Downgrades take effect at the end of your current billing cycle. You can always fall back to Pay As You Go with no penalty. Contact billing@xalen.io for tier changes.

Do you offer discounts for temples and nonprofits?

Yes. Verified religious organizations and registered nonprofits receive 30% off all token pricing on any tier. Contact enterprise@xalen.io with your organization verification documents and we will apply the discount within 48 hours.

What compliance certifications do you support?

Enterprise tier includes SOC 2 compliance and HIPAA support for health-faith applications. We also offer SSO/SAML integration, data residency options, and custom compliance documentation. Contact our enterprise team for specific certification requirements.

Enterprise

Talk to our team

Processing millions of tokens monthly? Need dedicated infrastructure, custom SLAs, or on-premise deployment? Let us build a plan for your organization.

We typically respond within 1 business day.

Ready to build?
Start at $10 or talk to sales.

From pay-as-you-go prototyping to dedicated GPU clusters. Pick the tier that fits your stage.