Slurm Workload Manager: The go-to scheduler for HPC and AI workloads
Slurm Workload Manager is a cornerstone of high-performance computing (HPC) infrastructure, trusted by supercomputing centers worldwide for its scalability and flexibility. As AI workloads grow in size and complexity, Slurm is gaining traction among ML teams as well. In this article, we will look at why it remains relevant, how it supports GPU clusters and what to consider when using it in AI workflows.
Agent 101: Launching production-grade agents at scale
To go from prototype to production, AI agents need more than just a good model. In this guide, we break down the four components that matter most: reliable LLMs, orchestration frameworks, evaluation tools, and memory systems. We cover how teams are using Nebius AI Studio with CrewAI, ADK, LangChain, and more to ship scalable, observability-friendly agent workflows, all powered by fast, cost-efficient inference.
From genome analysis to quantum chemistry: Nebius powers the next generation of biotech research with NVIDIA
As part of NVIDIA GTC Paris at VivaTech, Nebius has announced deeper integration of the NVIDIA AI Enterprise software suite. This includes NVIDIA BioNeMo, a collection of tools, applications, generative AI solutions, and pre-trained microservices (NVIDIA NIM) designed squarely for the biopharma sector.
What is object storage: Key differences from traditional storage explained
Learn the fundamentals of object storage, how it differs from traditional block storage solutions and why it is becoming the go-to choice for modern data management. With data volumes exploding and cloud systems becoming the norm, traditional systems are struggling to keep up with today’s data chaos. That’s where object storage comes in — built for the cloud, made to scale and ready for anything. This guide cuts through the jargon to show you why object storage is the future and how it outperforms block storage where it counts.
Kubernetes: How to use it for AI workloads
Building and deploying AI systems at scale means juggling complex infrastructure — and that’s where Kubernetes shines. From managing GPU resources to scaling inference endpoints, Kubernetes brings structure and automation to the chaos of machine learning pipelines. In this article, we’ll break down how Kubernetes works, why it’s a natural fit for AI workloads and what best practices help keep things resilient, reproducible and production-ready.
The role of compute cluster networking for AI training and inference
While earlier machine learning models could be trained on CPU servers with one or two GPUs, today’s generative AI models have billions of parameters — orders of magnitude more than their predecessors. Such models require terabytes of training data that can only be parallel processed over multiple GPU servers. These GPU servers work together in clusters to run the underlying data computations that make these models work. This article explores GPU cluster networking technologies and their critical role in accelerating AI workloads.
Introduction to model distillation: Efficient knowledge transfer for AI applications
Model distillation is a powerful technique in machine learning where a compact “student” model learns to replicate the behavior of a larger, more complex “teacher” model within the given task. In this tutorial, we demonstrate how to perform distillation by using Nebius AI Studio, to create a grammar-correcting model.
Incident post-mortem analysis: outage of the S3 service in the eu-north1 region
A detailed analysis of the incident on May 5th, 2025 that led to an outage of the S3 service in the eu-north1 region.
Serving Qwen3 models on Nebius AI Cloud by using SkyPilot and SGLang
Explore how to get Qwen3 running on Nebius AI Cloud with SkyPilot and SGLang. This setup enables you to deploy both the massive 235B MoE model and the efficient 32B variant seamlessly, leveraging high throughput, cost-effective scaling and robust multilingual support.
Serving Llama 4 models on Nebius AI Cloud with SkyPilot and SGLang
Let’s walk through how to get Llama 4 running on Nebius AI Cloud (recently integrated with SkyPilot) by using SGLang as the serving framework. This combo provides high throughput, efficient memory usage and none of the typical deployment headaches.
Understanding the Model Context Protocol: Architecture
As LLM-powered agents become more complex, integrating them with tools, APIs, and private data sources remains a major challenge. Model Context Protocol (MCP) offers a clean, open standard for connecting language models to real-world systems through a modular, plug-and-play interface. In this article, we explore how MCP works.
What is Apache Spark and how can it help with LLMs?
Large language models (LLMs) rely on fast data processing and distributed computing, making the efficiency of data processing tools a critical factor. Apache Spark streamlines text data preparation, enables parallel processing of massive datasets and simplifies the development of scalable ML workflows. This article explores Spark’s architecture, its advantages for data preparation and solutions to common limitations when working with LLM-scale models.