Frankfurt am Main, Remote
Job-ID:
70821
Job veröffentlicht am:
20-03-2026
Zusammenfassung
For our client we are looking for an Storage Operations Specialist (f/m/d)
Start: 04.05.2026
Duration: 31.12.2026++
Capacity: 100%
Location: 75% Remote, 25% Frankfurt or Berlin (1 week Frankfurt / 3 weeks remote in rotation), up to 50% onsite in peak times
Language: English is a must, German is a must (both C1)
Local Operations manages the on-premises production platform, which serves as the primary host for all mission-critical business applications. Local operations are responsible for the following core areas:
- Platform Stability: Ensuring the high availability and performance of the on-premises private cloud environment.
- Application Hosting: Consulting on the seamless operation of Germany-specific productive business applications.
- Incident Management: Resolving technical issues within standard business hours to minimize operational downtime.
- Lifecycle Maintenance: Executing routine updates, patches, and system optimizations within the local infrastructure.
Objective 1: Provide Tier-3 operational ownership for Storage Products for Local Production (DE)
Tasks:
• Handling of complex incidents, deep troubleshooting, and root cause analysis; drive permanent fixes and preventive measures.
• Ensuring operational readiness for storage changes
• Monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, runbooks.
• Automation of standard operational tasks (capacity checks, validation procedures, provisioning workflows where applicable).
Objective 2: Ensure operational readiness for deployments
Tasks:
• Validation of deployment artifacts from an operations perspective.
• Defining and enforcing quality assurance measures (e.g. required documentation of standard operation procedures, successful test reports, …) to ensure the high quality of delivered products and services.
Objective 3: Monitoring, Incident, Problem and Change Management/ Ensure operational stability and responsiveness for the managed Kubernetes platform
Tasks:
• Monitoring system health, performance metrics, and service availability across multi-tenant environments.
Objective 4: Automation/Reduce operational toil and improve service reliability
Tasks:
• Address recurring operational issues by automating remedial standard operations processes
• Validate all automated procedures following the established software development lifecycle including staging, testing, and validation reviews
Objective 5: Ensure platform operations adhere to security and compliance standards
Tasks:
• Implementing monitoring and logging strategies to support audit and compliance requirements.
• Performing routine security scans and remediating identified vulnerabilities
The contractor must be a senior level professional with proven experience in operations management of private cloud solutions, proficiency in managing storage operations on the platform:
Skills (must-have):
- 5+ years in IT storage operations / service delivery / platform operations with demonstrated leadership in mission-critical environments
- Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
- Experience supporting platform workloads that rely on shared storage services.
- Expertise with storage types: File Storage, Block Storage, Object Storage.
- Expertise with protocols/services: NFS; object storage operations (S3-like concepts).
- Experience with kubernetes storage integration: CSI driver concepts and troubleshooting (PV/PVC lifecycle understanding).
- Virtualization (Storage): Experience operating storage virtualization in enterprise environments.
- Expertise within ITSM: Jira Service Management (JSM), Jira, Confluence.
- Fundamental understanding of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts
- Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
- Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.
- Observability Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).
- Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).
- Strong understanding of modern platform operations (Kubernetes/containers, automation, observability), sufficient to govern specialists.
- Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.
Skills (should-have):
- Experience operating in regulated / high-availability industries (banking, telco, public sector, healthcare).
- Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.
- Experience operating storage services that integrate with Kubernetes platforms.
- Familiarity with IaC-based provisioning and GitOps-driven operational patterns.