DevOps/Site Reliability Engineer : The Sports Market

December 17, 2025
Apply Now

Job Description

  • Anywhere

We’re looking for a DevOps / SRE Engineer who will own the reliability, scalability, and automation of The Sports Market’s cloud infrastructure. This is a full-time contract-to-hire role, offering the opportunity to transition into a long-term core engineering position after an initial contract period.

You’ll be instrumental in our migration from a managed ledger platform to a fully self-hosted, Kubernetes-driven, AWS-based architecture. This role puts you at the heart of The Sports Market’s next evolution—building the platform that powers a global sports trading ecosystem.

If you thrive in cloud-native environments, enjoy automating complex systems, and want to help architect a next-generation platform, this role is built for you.

Key Responsibilities

Cloud Infrastructure & AWS Platform Engineering

– Build and maintain AWS infrastructure using Terraform (VPC, EKS, networking, IAM, Secrets Manager, Route53, ALBs/NLBs).

– Operate and optimize production-grade EKS clusters: node groups, autoscaling, RBAC, OIDC integration.

– Implement TLS, certificates, ingress controllers, and network policies.

– Ensure secure, consistent multi-environment deployments across staging and production.

Kubernetes Operations

– Deploy and manage workloads for integrations, adapters, backend services, ledger components, and payment orchestration.

– Configure Helm charts/manifests, resource limits, autoscaling (HPA/VPA), and pod governance.

– Support distributed ledger components (via Catalyst Blockchain Manager), including Canton participants and sequencer nodes.

– Maintain operational reliability for critical workloads: event ingestion, trading integrations, settlement flows, payment orchestration, and automations.

CI/CD, GitOps & Automation

– Build and maintain CI/CD pipelines (GitLab → ArgoCD) for automated deployments and infrastructure provisioning.

– Implement GitOps patterns and progressive delivery strategies (blue/green, canary).

– Automate secrets management, configuration flows, and cluster operations.

Observability, Monitoring & System Insights

– Expand platform observability using Datadog, Prometheus/Grafana, and log aggregation pipelines.

– Build dashboards and alerts for Kubernetes, ledger nodes, integrations, payment workflows, and API workloads.

– Establish SLIs/SLOs and ensure system reliability targets are consistently met.

System Stability & Reliability Engineering

– Investigate incidents, identify root causes, and implement long-term reliability improvements.

– Improve resiliency through redundancy, autoscaling, and failure recovery strategies.

– Maintain deployment safety, rollback strategies, and operational runbooks.

Security, Compliance & Operational Hardening

– Implement IAM least-privilege policies, encryption, secrets management, and secure network segmentation.

– Maintain secure ingress patterns for third-party services (payments, KYC, trading).

– Ensure operational readiness and compliance alignment with platform standards.

Collaboration & Platform Support

– Work closely with backend and full-stack teams to ensure smooth deployments and runtime reliability.

– Support teams during platform migration efforts and environment transitions.

– Participate in incident response, observability improvements, and overall DevOps best practices.

Qualifications

– Bachelor’s degree in Computer Science, Engineering, or equivalent experience.

– 5+ years of DevOps/SRE experience operating production-grade systems.

– Strong hands-on experience with:

– Kubernetes operations

– AWS services (EKS, VPC, IAM, LB, Secrets Manager, Route53)

– Terraform (IaC)

– GitOps tooling (ArgoCD)

– CI/CD pipelines (GitLab preferred)

– Docker & containerized systems

– Datadog (APM, logs, dashboards)


Preferred Qualifications

– Experience with distributed systems or ledger/blockchain platforms (Canton/CBM a plus).

– Familiarity with Prometheus/Grafana and alerting best practices.

– Experience supporting payment/KYC integrations from an infrastructure perspective.

– Background in high-availability or real-time platforms.

– AWS, Terraform, Kubernetes, or DevOps certifications.


Work Environment

– 100% remote workforce

– Modern cloud-native architecture

– High ownership, fast-moving environment

– Direct influence on the next generation of our platform

Photos