Develop tools to improve developer experience and productivity.
Contribute to dev infrastructure such as CI/CD pipeline.
Design and build performance tools for teams to leverage.
Develop innovative tools and automation to minimize manual work.
Monitor site availability and reliability on a daily basis.
Understand, investigate and triage production issues and bottlenecks, root cause and implement or help teams to implement solutions to eliminate future incidents.
Design and test disaster recovery strategies.
Ensure infrastructure standards are being followed.
You should have
Understanding of AWS or other Clouds and Docker
Extensive experience with logging, Application Performance Management, and other monitoring tools
Experience working with complex, enterprise-level architectures
Proficient in at least one scripting language (e.g. Java, Ruby, Python, etc)
Preferably 6+ years of relevant work experience in Linux environments
Team player with the ability to collaborate effectively across organizations