Jobtitel:	75% remote: Operations Specialist (f/m/d) Compute & OS Operations
Zahlungsintervall:	Stündlich
Lohnsatz:	Verhandelbar
Ort:	remote & Frankfurt/ Berlin
Job veröffentlicht:	23-03-2026
Job-ID:	70855
Name:	Saifeddine Zitouni
Telefonnummer:	+4915119535177
E-Mail:	Saifeddine.Zitouni@nemensis.de

Stellenbeschreibung

For our client we are looking for an Operations Specialist (f/m/d) Compute.

Start: 04.05.2026

Duration: 31.12.2026++

Capacity: 100%

Location: 75% Remote, 25% Frankfurt or Berlin (1 week Frankfurt / 3 weeks remote in rotation), up to 50% onsite in peak times

Language: English is a must, German is a must (both C1)

Local Operations manages the on-premises production platform, which serves as the primary host for all mission-critical business applications. Local operations are responsible for the following core areas:

- Platform Stability: Ensuring the high availability and performance of the on-premises private cloud environment.

- Application Hosting: Consulting on the seamless operation of Germany-specific productive business applications.

- Incident Management: Resolving technical issues within standard business hours to minimize operational downtime.

- Lifecycle Maintenance: Executing routine updates, patches, and system optimizations within the local infrastructure.

Objectives:

- Provide Tier-3 operational ownership for Network & Security services for Local Production (DE).

- Ensure operational readiness for deployments

- Ensure operational stability and responsiveness for the managed Kubernetes Platform

- Reduce operational toil and improve service reliability

- Ensure platform operations adhere to security and compliance standards

Skills (must-have):

- 5-10+ years in IT operations / service delivery / platform operations with demonstrated leadership in mission-critical environments.

- Proven experience implementing/leading Incident, Problem, Change, Release governance in production.

- Expertice with ITSM: Jira Service Management (JSM), Jira, Confluence.

- Experience of core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts

- Experience in gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.

- Hand-on experience in documenting procedures properly and enforcing clear runbooks or playbooks.

- Observability Hands-on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).

- Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).

- Expertise within modern platform operations (Kubernetes/containers, automation, observability), sufficient to govern specialists.

- Platform delivery concepts: GitOps and IaC awareness (Terraform/OpenTofu, ArgoCD, Helm) to govern deployment/readiness standards.

Skills (should-have):

- Experience operating in regulated / high-availability industries (banking, telco, public sector, healthcare).

- Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.

Stellenbeschreibung

Our use of cookies