Senior Site Reliability Engineer
Company: Gearbox Software
Location: Frisco
Posted on: April 2, 2026
|
|
|
Job Description:
The Gearbox Entertainment Company is an award-winning creator
and distributor of entertainment for people around the world.
Gearbox Entertainment develops and publishes products through its
subsidiaries, Gearbox Software and Gearbox Publishing. Gearbox
Entertainment has become widely known for successful game
franchises including Brothers in Arms and Borderlands, as well as
acquired properties Duke Nukem and Homeworld. Gearbox’s ambition is
to entertain the world and its key driving objectives include the
pursuit of happiness for our talent, partners and customers, the
prioritization of entertainment and creativity and a measured
respect for profitability. For more information, visit
www.Gearbox.com. To further drive our vision of premier stability
and rapid feature delivery, we are looking for a Senior Site
Reliability Engineer to join our team. As a Senior SRE, you should
feel exceptionally comfortable bringing architectural design
proposals to the table for consideration among your colleagues on
our platform and infrastructure development teams. You will be one
of the principal technical designers helping push our cloud-native
platform toward the future. You will be responsible for driving the
implementation of flexible cloud architectures with an
automation-first emphasis; manual user intervention likely makes
you uneasy and maybe even a little twitchy. We would expect a
successful candidate for this position to be a self-starter with
the ability to complete tasks independently. Though you will have
access to technical leadership and senior engineers at your
disposal, you should feel well acquainted with tackling complex
problems without significant oversight. Observability is paramount.
If we can't measure it, we can't prove it works; if we can't prove
it works, it must be assumed it doesn't work. This is a philosophy
you hopefully love (and preferably obsess over). If we can't
observe how a new feature is behaving, our SRE team is excited to
dive into the application code and make the necessary improvements.
Typical Day Tl;dr: You will be deeply immersed in Go and Python
observability stacks; plenty of AWS and Terraform sprinkled in as
well. This is a very hands-on Senior Engineering role where your
days will be filled with building solutions to technical challenges
in the observability and availability of our SHiFT online services.
You will evangelize for and be obsessed with user experience as it
relates to the services you support. You will help manage and
orchestrate each of these by leaning heavily on technologies like
Go, Terraform, Docker, and Bash. On any given day, you should
expect to spend at least 80% of your time actively engineering and
developing solutions; the rest will be a mixture of planning,
reviewing code from your colleagues, participating in design
meetings, documentation, and self-development. This position will
eventually require you to carry a company-paid mobile device and
participate in 24/7 on-call rotations alongside your engineering
colleagues. Don't worry though, our on-call experience doesn't
suck. Core Responsibilities: Design, engineer, and develop
solutions for ensuring the observability and reliability of our
online platform Be a trusted voice in the evangelism of reliability
engineering throughout the team with an eagerness for mentoring
other developers on the team Help define and oversee short and
mid-term project roadmaps for the future of our SRE team
Participate in after-hours on-call support rotations Must Have (the
non-negotiable parts): Candidates must have at least 4 years of
professional experience instrumenting complex observability stacks
in object oriented programming languages, preferably Go.
Proficiency in AWS container management, orchestration, and
observability features (ECS, Fargate, Aurora, AppConfig,
CloudWatch, etc.) Professional Experience managing AWS access and
security services (IAM, kms, Secrets Manager, WAFv2, etc.)
Professional Experience in Terraform and/or CloudFormation Minimum
of 2 years experience with containers in a professional setting,
preferably Docker Adept understanding of observability stack
management (otel, tracing, monitoring, alerting, structured
logging, APM, etc.) Comfortable communicator, able to clearly
detail designs and implementations on an individual level and in
large group settings Should Have (some wiggle room): Extensive
hands-on experience with OpenTelemetry Hands-on experience
developing and maintaining CI/CD pipelines, preferably in
git/GitLab Understanding of RESTful and Websocket based APIs
Bachelor's degree in computer science, related field, or equivalent
training and professional experience Now you're just showing off:
Familiarity with Datadog Familiarity with Atlassian products
(OpsGenie, JIRA, Confluence) Experience working with developers in
an agile environment Experience in the games industry, preferably
launching multiple online-enabled AAAs Knowledge about
Gearbox-owned IPs Gearbox Entertainment believes that all team
members should be able to enjoy a work environment free from all
forms of discrimination and harassment. We are committed to
reflecting the diversity of the world we strive to entertain. As an
Equal Opportunity Employer, we provide fair and equal treatment to
all team members and applicants. We do not discriminate on the
basis of race, color, religion, sex, sexual orientation, gender
identity or expression, national origin, disability, genetic
information, pregnancy or maternity, veteran status, or any other
status protected by applicable national, federal, state or local
law.
Keywords: Gearbox Software, Wylie , Senior Site Reliability Engineer, IT / Software / Systems , Frisco, Texas