Senior Site Reliability (Onsite / Hybrid / Remote) - Europe at Kaiko

The Challenge

You will be joining a fast-paced engineering team made up of people with significant experience working with terabytes of data. We believe that everybody has something to bring to the table, and therefore put collaborative effort and team-work above all else (and not just from an engineering perspective).

You will be able to work autonomously as an equally trusted member of the team, and participate in efforts such as:

Addressing high availability problems: cross-region data replication, disaster recovery, etc.
Addressing “big data” problems: 200+ millions of messages/day, 160B data points since 2010 (currently growing at a rate of 10B per month).
Improving our development workflow, continuous integration, continuous delivery and in a broader sense our team practices
Expanding our platform’s observability through monitoring, logging, alerting and tracing

What you’ll doing

Deploy, maintain, evolve our infrastructures (we have 2 autonomous regions) for optimum data consistency, availability while keeping costs down
Automate what is not, fix what’s needed, providing ideas
Adapt fast

Our tech stack

Alerting: AlertManager, Karma, PagerDuty
Logging: Vector, Loki
Caching: FoundationDB
Secrets management and PKI: Vault
Configuration management and provisioning: Terraform, Ansible
Service discovery: Consul
Messaging: Kafka
Proxying: HAProxy, Traefik
Service deployment: Terraform, Nomad (plugged in Consul and Vault)
Database systems: ClickHouse (main datastore), FoundationDB (caching, deduplication), replicated PostgreSQL
Operating System: Ubuntu 20.04
Protocols: gRPC, HTTP (phasing out in favor of gRPC), WebSocket (phasing out in favor of gRPC)
Platform: containers

About You

Significant experience as a DevOps/System Engineer
Experienced about Linux system admin, automation (ansible at a minimum)
Worked with, in no particular order: troubleshooting crashes & performance issues, load-balancing, VIPs/fail-over IPs, RAID

You’ll notice that we don’t have any “hard” requirements in terms of development platforms or technologies: this is because we are primarily interested in people capable of adapting to an ever changing landscape of technical requirements, who learn fast and are not afraid to constantly push our technical boundaries.

It is not uncommon for us to benchmark new technologies for a specific feature, or to change our infrastructure in a big way to better suit our needs.

The most important skills for us revolve around two things:

What we like to call “core” knowledge: what’s a software process, how does it interact with a machine’s or the network’s resources, what kind of constraints can we expect for certain workloads, etc
How fast you can adapt to a technology you didn’t know existed 10 minutes ago

In short, we are looking for someone able to spot early on that spending 10 days to migrate data to a more efficient schema is the better solution compared to scaling out a database cluster in a matter of minutes if we are looking to improve performance in the long term.

Nice to have

Experience with HashiCorp tools (terraform, vault, consul, nomad)
Experience with orchestrating containers, micro-services
Experience with recent Ubuntu, systemd
Knowledgeable about network, routing (BGP, static, …), tunneling
Knowledge about encryption (PGP/TLS/SSH/WireGuard/…)
Basic knowledge of crypto-currencies

Personal Skills

Honest: receiving and giving feedback is very important to you
Humble: making new errors is an essential part of your journey
Empathetic: you feel a sense of responsibility for all the team’s endeavors rather than focus on individual contributions
Committed: as an equally important member of the team, you want to make yourself heard while respecting everybody’s point of view
Fluent in written and spoken English
You have the utmost respect for legacy code and infrastructure, with some occasional and perfectly understandable respectful complaints

What we offer

An entrepreneurial environment with a lot of autonomy and responsibilities
Opportunity to work with an internationally diverse team
Hardware of your choice
Perks: meal vouchers, multiple team events and staff surprises

Process

Introduction call (30mins)
Meeting with members of the team for a technical/product RPG: you read that right, no written test, no whiteboard quicksort implementation (1h30)
Cross team interviews (2-3 persons, 45m x2)
Meeting with VP of Engineering (20m)

As our working language is English, we would appreciate it if you send us your application and any accompanying documents in English.

Location

On-site in our Paris office, or full remote (+- 2h maximum with CET).

Diversity & Inclusion

At Kaiko, we believe in the diversity of thought because we appreciate that this makes us stronger. Therefore, we encourage applications from everyone who can offer their unique experience to our collective achievements.

Senior Site Reliability (Onsite / Hybrid / Remote) - Europe

The Challenge

What you’ll doing

Our tech stack

About You

Nice to have

Personal Skills

What we offer

Process

Location

Diversity & Inclusion

Location

Job type

Role

Keywords

Share

About Kaiko

Check out these similar roles

Senior DevOps Engineer

Risk Labs

DevOps Engineer

Horizon

Senior DevOps Engineer

Hedera Hashgraph

Front-End Engineer (React)

Blockchain

Senior Data Engineer

Ethena Labs

Head of Platform Engineering

Ethena Labs

Principal Engineer, Frontend

Trojan Trading

Data Analytics Engineer

Alchemy

Low-level C++ Engineer (Blockchain)

Logos

Junior Data Engineer, Economics

Chainlink Labs

Infrastructure Support Engineer

Wintermute

Technical Operations Engineer, Web3 Core Platform

QuickNode

The decentralized future needs you.