Jobtitel: 75% remote: Storage Operations Specialist (f/m/d)
Zahlungsintervall: Stündlich
Lohnsatz: Verhandelbar
Ort: Frankfurt am Main, Remote
Job veröffentlicht: 20-03-2026
Job-ID: 70821
Name: Niklas Machens
Telefonnummer: +4915119501867
E-Mail: niklas.machens@nemensis.de

Stellenbeschreibung

For our client we are looking for an Storage Operations Specialist (f/m/d)
 
Start: 04.05.2026
Duration: 31.12.2026++
Capacity: 100%
Location: 75% Remote, 25% Frankfurt or Berlin (1 week Frankfurt / 3 weeks remote in rotation), up to 50% onsite in peak times
Language: English is a must, German is a must (both C1)
 
Local Operations manages the on-premises production platform, which serves as the primary host for all mission-critical business applications. Local operations are responsible for the following core areas:
- Platform Stability: Ensuring the high availability and performance of the on-premises private cloud environment.
- Application Hosting: Consulting on the seamless operation of Germany-specific productive business applications.
- Incident Management: Resolving technical issues within standard business hours to minimize operational downtime.
- Lifecycle Maintenance: Executing routine updates, patches, and system optimizations within the local infrastructure.
 
Objective 1: Provide Tier-3 operational ownership for Storage Products for Local Production (DE)
Tasks:
• Handling of complex incidents, deep troubleshooting, and root cause analysis; drive permanent fixes and preventive measures.
• Ensuring operational readiness for storage changes
• Monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, runbooks.
• Automation of standard operational tasks (capacity checks, validation procedures, provisioning workflows where applicable).
 
Objective 2: Ensure operational readiness for deployments
Tasks:
• Validation of deployment artifacts from an operations perspective.
• Defining and enforcing quality assurance measures (e.g. required documentation of standard operation procedures, successful test reports, …) to ensure the high quality of delivered products and services.
 
Objective 3: Monitoring, Incident, Problem and Change Management/ Ensure operational stability and responsiveness for the managed Kubernetes platform
Tasks:
• Monitoring system health, performance metrics, and service availability across multi-tenant environments.
 
Objective 4: Automation/Reduce operational toil and improve service reliability
Tasks:
• Address recurring operational issues by automating remedial standard operations processes
• Validate all automated procedures following the established software development lifecycle including staging, testing, and validation reviews
 
Objective 5: Ensure platform operations adhere to security and compliance standards
Tasks:
• Implementing monitoring and logging strategies to support audit and compliance requirements.
• Performing routine security scans and remediating identified vulnerabilities
 
The contractor must be a senior level professional with proven experience in operations management of private cloud solutions, proficiency in managing storage operations on the platform:
 
Skills (must-have):
- 5+ years in IT storage operations / service delivery / platform operations with demonstrated leadership in mission-critical environments
- Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
- Experience supporting platform workloads that rely on shared storage services.
- Expertise with storage types: File Storage, Block Storage, Object Storage.
- Expertise with protocols/services: NFS; object storage operations (S3-like concepts).
- Experience with kubernetes storage integration: CSI driver concepts and troubleshooting (PV/PVC lifecycle understanding).
- Virtualization (Storage): Experience operating storage virtualization in enterprise environments.
- Expertise within ITSM: Jira Service Management (JSM), Jira, Confluence.
- Fundamental understanding of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts
- Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
- Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.
- Observability Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).
- Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).
- Strong understanding of modern platform operations (Kubernetes/containers, automation, observability), sufficient to govern specialists.
- Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.
 
Skills (should-have):
- Experience operating in regulated / high-availability industries (banking, telco, public sector, healthcare).
- Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.
- Experience operating storage services that integrate with Kubernetes platforms.
- Familiarity with IaC-based provisioning and GitOps-driven operational patterns.
Bewerben mit indeed
Dateitypen (doc, docx, pdf, rtf) mit einer Größe von bis zu 10 MB
Dateitypen (doc, docx, pdf, rtf) mit einer Größe von bis zu 10 MB