AI innovation is moving in-house. That means as you shift from leveraging cloud AI platforms to building powerful "AI factories" in your own data centers, you're increasing both opportunity and risk.
AI factories promise greater control, customization, and cost savings. However, they also introduce challenges for AI governance, IT, and security teams: autonomous AI agents, AI identity management complexities, unsanctioned model deployments, and excessive access to sensitive data.
If you’re not accustomed to securing highly specialized, high-performance environments, understanding the controls and resources necessary to reduce risk is more important now than ever.
This blog explores how to mitigate the risk of AI factories, using three pillars:
AI factories are on-premise or hybrid data centers purpose-built to develop, train, and deploy AI at scale.
Unlike traditional cloud-based AI services, AI factories give you full control over your data, infrastructure, and model pipelines. They typically combine high-performance GPU clusters, ultra-fast networking, parallel storage, and orchestration frameworks to industrialize the AI lifecycle, from raw data ingestion to model deployment, much like a factory produces goods from raw materials.
Unlike managed cloud services (where a third-party provider secures much of the stack), the organization running an AI factory is fully responsible for locking down these advanced environments.
Without effective identity management and privileged access controls, the AI engines driving innovation can become vectors for insider threats, data breaches, and compliance failures. In fact, insider misuse is a significant concern in high-performance computing (HPC)/AI clusters. Even authorized users might abuse high-performance systems, or malicious hackers might steal credentials to gain unauthorized access.
To address the unique nature of AI factories, NIST has developed the draft SP 800-234, High-Performance Computing (HPC) Security Overlay. This framework recognizes that large-scale AI training and simulation environments have requirements beyond traditional IT.
... identity-centric controls are non-negotiable in high-performance AI environments
The goal is practical, performance-conscious security that can safeguard AI models and sensitive data without hindering the mission. This emphasis on performance-conscious security is particularly significant for organizations that tend to perceive security as a bottleneck to progress.
NIST's guidance provides a crucial roadmap, allowing you to confidently accelerate your AI initiatives, knowing security is built in rather than an afterthought. By offering a structured framework, NIST reduces the uncertainties and ad-hoc security decisions that can impede fast-paced AI development.
This approach transforms security from a reactive impediment into a proactive enabler of AI progress.
NIST's guidance emphasizes a zone-based architecture (isolating access, management, compute, and storage zones) and enforces role-based access and least privilege across those zones. It also highlights strong authentication, software governance, and comprehensive auditing to address the performance and scale of AI environments. The table below highlights the important controls of NIST SP 800-234.
NIST Tailored Control | Risk Addressed in AI Factory |
AC-2 Account Management (zone-based roles) | Prevents unmanaged or excessive accounts by tying every identity to an authorized role and zone access. |
AC-6 Least Privilege (separate admin and user roles) | Avoids over-privileged Machine Learning (ML) workloads by enforcing clear separation of duties (admins shouldn't run jobs with root privileges). |
CM-11 User Installed Software (isolation and monitoring) | Blocks unsanctioned or "shadow AI" deployments by restricting unauthorized software installation and requiring oversight for user-developed code. |
IA-2 / SC-8 Strong Authentication (Kerberos) | Prevents impersonation and spoofing by requiring multi-factor/Kerberos authentication for all users and using secure, non-routable or encrypted channels for sensitive data. |
AU-2 Comprehensive Audit Logging | Ensures visibility into AI operations and helps detect unauthorized actions or anomalies, while balancing HPC performance (e.g., prioritizing critical logs in management zones). |
NIST's HPC overlay makes it clear that identity-centric controls are non-negotiable in high-performance AI environments. Following these tailored controls allows you to create zone-isolated, least-privilege ecosystems where every account, process, and dataset is governed. The overlay effectively puts an "umbrella" of best practices over all your AI factories, ensuring that fundamentals like account management, authentication, and auditing are never sidelined even at extreme scale and speed.
NVIDIA's AI factory concept aligns seamlessly with NIST's guidance. It's essentially a secure-by-design blueprint for AI data centers. These AI-centric facilities integrate large GPU clusters with ultra-fast networks (for example, NVLink, Spectrum-X, and InfiniBand for RDMA), shared high-performance storage (often NFS or a parallel file system accelerated by RDMA), and AI-focused orchestration software (such as NVIDIA AI Enterprise suites).
In this industrialized approach, data is the raw material, GPUs are the machinery, and AI models are the products. Organizations are increasingly adopting this on-premise AI factory model to regain control over sensitive data, manage costs, and customize AI development pipelines to their needs.
However, all is not rosy.
While NVIDIA's AI factory represents a revolutionary step for innovation, its immense power and flexibility paradoxically amplify security challenges. The very features that make these environments powerful—massive GPU clusters, complex software stacks, and operations requiring elevated privileges for performance—also create unique attack surfaces and identity management complexities that traditional security models often struggle to address.
In short, the AI factory can become a wild west of identities, credentials, and code if left unchecked.
NIST's zone-based approach directly helps here.
However, you need strong identity security tooling alongside the architecture to implement these practices effectively. This is where solutions like Delinea identity security come in.
Delinea pioneered security for large-scale computing clusters (Hadoop environments such as Cloudera, Hortonworks (merged with Cloudera), and MapR (acquired by HPE), which share many traits with AI factories. The same identity and privilege controls that kept big data platforms in check directly apply to today's AI data centers.
Delinea integrates Linux-based AI clusters with Active Directory (AD), providing unified identities and Kerberos single sign-on across all nodes. This unified identity service is often required across all nodes to enable job processes to run as the user who submitted the job.
Users and services authenticate via centrally managed Kerberos tickets, eliminating hard-coded passwords and ensuring trust between components. This also enables strong authentication from the cluster to network-attached storage via NFSv4 where Kerberos and unified identity control access to the remote file systems. This addresses AC-2 and IA-2 controls by tying accounts to a single directory and enabling strong, token-based authentication.
AI factories' unique scale and automation introduce a critical new dimension to identity management—the proliferation of machine and service identities.
High-performance systems often use service accounts for schedulers, data movers, AI microservices or Model Control Plane (MCP) Servers. These often-overlooked accounts, running with elevated privileges for performance, become significant vectors for attack if not centrally managed and secured.
Delinea's Server Suite automates account provisioning, credential rotation, and lifecycle management of these accounts (including Kerberos keytab distribution). This prevents credential sprawl and human error—no orphaned or default passwords—mitigating the risk of unmanaged identities or forgotten backdoor accounts.
By mapping AD Groups to role-based privileges on cluster resources, Delinea enforces least privilege (fulfilling AC-6). Administrators have just-in-time privileged access: for example, an admin can elevate to root on a specific node for maintenance with MFA and audit logging enforced, but cannot use that privilege to run regular AI workloads. Unprivileged users cannot escalate their rights.
This separation and control closes the door on root or sudo rights abuse, tackling the insider threat head-on. Multi-factor authentication (MFA) is applied to all sensitive actions, adding an extra hurdle against compromised credentials.
With Delinea, every access and administrative action on your AI factory infrastructure is recorded.
Delinea captures session logs, command histories, and security events across the environment (addressing AU-2 and related audit controls). In an AI context, this means the ability to trace which user or service accessed a training dataset, who modified a configuration, or which process initiated an unusual data transfer.
Comprehensive audit trails support compliance and are essential for forensic investigations in the event of an incident.
In practice, Delinea's Privilege Control for Servers (PCS) and Server Suite bring all these capabilities into an AI factory deployment:
1. Seamless AD/Kerberos identity integration across AI compute nodes, storage servers, and orchestration tools, ensuring one source of truth for user identities and credentials.By leveraging these controls, you can directly map to the NIST SP 800-234 controls and mitigate the risks unique to AI factories. A scalable, centralized platform handles the heavy lifting of identity security often overlooked in high-performance environments.
The rapid growth of enterprise AI is indeed creating new security blind spots. As you build AI factories, you are effectively establishing mini-supercomputers on-premise, and the responsibility for securing this dynamic, high-performance infrastructure rests entirely with you.
NIST SP 800-234 provides a much-needed roadmap to harden these AI factories, encompassing both technical controls and process guidance. NVIDIA's reference architectures demonstrate that performance and security can coexist, particularly when adhering to a zone-based, secure-by-design philosophy.
However, the identity and access layer truly integrates these components.
Delinea brings essential identity security rigor that these environments demand, ensuring that only the authorized individuals or services have the appropriate access at the right time, and that all actions are fully accountable.
Ultimately, robust, identity-centric security is not merely a defensive measure but also a strategic enabler. Organizations that proactively integrate security into their AI factories from Day One will mitigate risks and build a foundation of trust and compliance that accelerates their AI initiatives, turning security into a competitive differentiator in the race for AI leadership.
By treating identity as the connective tissue of their AI security strategy, you can confidently embrace the AI factory model to accelerate insights while safeguarding critical assets. The organizations that succeed with on-premise AI will embed security and identity into the fabric of their AI factories from the outset, rather than treating them as an afterthought.
This outcome is achievable through adherence to frameworks like NIST and using purpose-built tools from Delinea.