Jobtitel: 75% remote: Operations Specialist (f/m/d) Compute & OS Operations
Zahlungsintervall: Stündlich
Lohnsatz: Verhandelbar
Ort: remote & Frankfurt/ Berlin
Job veröffentlicht: 23-03-2026
Job-ID: 70855
Name: Saifeddine Zitouni
Telefonnummer: +4915119535177
E-Mail: Saifeddine.Zitouni@nemensis.de

Stellenbeschreibung

For our client we are looking for an Operations Specialist (f/m/d) Compute.
 
Start: 04.05.2026
Duration: 31.12.2026++
Capacity: 100%
Location: 75% Remote, 25% Frankfurt or Berlin (1 week Frankfurt / 3 weeks remote in rotation), up to 50% onsite in peak times
Language: English is a must, German is a must (both C1)
 
Local Operations manages the on-premises production platform, which serves as the primary host for all mission-critical business applications. Local operations are responsible for the following core areas:
- Platform Stability: Ensuring the high availability and performance of the on-premises private cloud environment.
- Application Hosting: Consulting on the seamless operation of Germany-specific productive business applications.
- Incident Management: Resolving technical issues within standard business hours to minimize operational downtime.
- Lifecycle Maintenance: Executing routine updates, patches, and system optimizations within the local infrastructure.
 
Objectives:
- Provide Tier-3 operational ownership for Network & Security services for Local Production (DE).
- Ensure operational readiness for deployments
- Ensure operational stability and responsiveness for the managed Kubernetes Platform
- Reduce operational toil and improve service reliability
- Ensure platform operations adhere to security and compliance standards
 
Skills (must-have):
- 5-10+ years in IT operations / service delivery / platform operations with demonstrated leadership in mission-critical environments.
- Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
- Expertice with ITSM: Jira Service Management (JSM), Jira, Confluence.
- Experience of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts
- Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
- Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.
- Observability Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).
- Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).
- Expertise within modern platform operations (Kubernetes/containers, automation, observability), sufficient to govern specialists.
- Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.
 
Skills (should-have):
- Experience operating in regulated / high-availability industries (banking, telco, public sector, healthcare).
- Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.
Bewerben mit indeed
Dateitypen (doc, docx, pdf, rtf) mit einer Größe von bis zu 10 MB
Dateitypen (doc, docx, pdf, rtf) mit einer Größe von bis zu 10 MB