See all jobs
Jan 26th, 2022

Senior Software Engineer - Site Reliability

Full Time

Hatch is partnering with JupiterOne to find a Site Reliability Engineer. See below for more details:

As a Site Reliability Engineer (SRE) you'll be combining your software and systems engineering experience to help our software engineers build, deploy, and monitor distributed systems. You'll be tasked with increasing overall system reliability and fault tolerance, automating system continuity and recoverability, and improving system observability. Finally, we want you to be comfortable speaking up, identifying problems, and offering suggestions on how to improve.
 
The Site Reliability team works with internal development teams to maintain and support their services. Our work can range from system instrumentation, infrastructure design, and developing reliability best practices. Ultimately, we want to remove the burden (or at least greatly reduce the friction) of developing more reliable and secure systems. If wearing a lot of hats and learning is fun and enticing then this is the team for you!

Responsibilities

  • Serve as a subject matter expert for one or more of the SRE teams core initiatives (observability, infrastructure as code, CI/CD, system resilience, etc).
  • Help engineering teams define, measure, and meet Service Level Objectives (SLOs) around availability, reliability, performance, and time to resolution.
  • Help engineering teams identity Service Level Indicators (SLIs) that will help them meet objectives related to availability, reliability, and performance.
  • Drive standardization of service and application instrumentation to help development teams gain system observability.
  • Develop and maintain reusable infrastructure components to guide best practices.
  • Develop and maintain tooling to make our system more reliable, secure, and performant.
  • Create a framework for incident management to standardize how teams respond to and document outages.

Requirements

  • 3+ years of experience in a Site Reliability or Platform Engineering Role
  • Proficient in writing code in one or more languages such as TypeScript, Go (Golang), or Kotlin.
  • Experience using Infrastructure as Code tools such as Terraform, Pulumi, or Cloudformation.
  • Experience with managing cloud infrastructure in AWS, Azure, or Google Cloud.
  • Experience with containerization and container orchestration platforms (ECS, Kubernetes, Nomad, OpenShift).
  • Experience using observability tooling such as Grafana, Datadog, CloudWatch, or Honeycomb for diagnosing production issues.
  • Experience using security tools to keep infrastructure and services secure.

Desired qualifications

  • Experience with the OpenTelemetry standard and its ecosystem.
  • Experience with eBPF and its ecosystem.
  • Experience using Loki for log aggregation.
  • Experience using Prometheus for metric aggregation and alerting.
  • Experience using Tempo for trace aggregation.
  • Experience helping development teams instrument their services to gain deep insights into runtime behavior.
  • Experience making Kubernetes a seamless PaaS for software engineers.
If you are interested in learning more about this company or any Startups/Small Businesses in the area, please contact us and check us out here!! 
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status

Apply now