Older Talks


Intro to SRE

Reliability is a critical feature of most software, and maintenance rather than initial development predominates the cost of software. Yet, a large number of development teams treat operations as an afterthought instead of integrating operations into their development processes.

Error budgets and Site Reliability Engineering practices can improve the reliability, maintainability, and, yes, feature velocity, of products. This talk is an introduction to the basics of bringing SRE practices into your organization -- who to hire, how to organize, what projects to work on, how to measure reliability, and how to assess reliability risks.

Video: The Lead Developer NYC 2018 (slides)

Also presented at Code As Craft at Etsy (slides), PDX Women Talking Tech meetup, Toronto and Chicago Google Cloud Summits, and privately as training to dozens of current and prospective Google Cloud Platform customers. Co-developed with Alesia Braga.

An enterprise-flavored version of this talk was co-presented with Dave Rensin at Velocity NYC 2018 (slides).

Debugging Microservices

When using tens or hundreds of microservices to provide an application's critical functionality, diagnosing what interaction between components is causing an outage can be challenging. Engineers spend a lot of time building dashboards to improve monitoring but still spend a lot of time trying to figure out what’s going on and how to fix it when they get paged. Building more dashboards isn’t the solution; using dynamic query evaluation and integrating tracing is. Learn how SREs discover and debug problems at Google during outages, and hear real stories about our experiences.

Video: Systems at Scale 2018 (slides), All Day DevOps 2018 (slides)

Also presented at QCon NYC 2018, DevOpsDays NYC 2018, Gluecon 2018, and SREcon Americas 2018). Co-developed with George Talbot and Adam Mckaig.

Reliable Inclusion

Making your team safe and inclusive doesn’t end with unconscious bias training and learning to defuse harmful interpersonal interactions. Your codebase, design documents, and technical communications are likely littered with pitfalls that prevent everyone from feeling included. Liz discusses common inclusivity anti-patterns in code and technical communication and how to avoid them.

Presented at Flawless Hacks 2018 (slides), Velocity NY 2016 and privately as training within Google

Effective Service Level Objectives

Service level objectives and error budgets are the cornerstone of Site Reliability Engineering and a critical tool for organizations to find an appropriate balance between reliability and rates of feature development. In this talk, you will learn how to set and measure useful service level indicators and objectives for needs ranging from interactive, latency-sensitive, query-based systems to batch throughput-oriented systems. You will learn how to set high-signal-to-noise-ratio alerting based on the error budget, and how to make longer-term changes to development priorities if your budget is overspent or underspent.

Video: Datadog Dash 2018 (slides) and Google Cloud Next SF 2018 (slides)

Also presented at Code As Craft at Etsy (slides).

Co-developed with the CRE team at Google, including Kristina Bennett, Alex Bramley, David Ferguson, and Marie Cosgrove-Davies.

Relieving Tech Debt w/ Interrupt Reduction Projects

It's easy to plan out month-long or year-long projects, or to have an interrupts rotation for dealing with oncall/tickets, but how do you make sure you're also doing the short week-long projects that can relieve your technical debt? I'll cover a planning approach that my team found that makes room for all three sets of work, reducing in the long term the operational burden of the services we operate.

Presented at BoSRE in Boston, MA (slides),  SREcon Europe 2016 (video), and internal Google summits.

Concepts co-developed with John Tobin and Dave O'Connor.

Managing Up and Sideways

Ever have a bad manager? Or have a project go off the rails but feel powerless to stop the trainwreck? I'll talk about why knowing a little bit about management can help you as an individual contributor or tech lead, and talk about a few ways that you can help yourself and your team without ever formally managing yourself.

Video: Lesbians Who Tech NYC 2018 keynote (slides, a11y notes); also delivered at SREcon 2016 Europe

Build skills through hobbies! Bring them to work!

Building technical and leadership skills doesn’t only happen in the workplace! I became a better technical leader and Site Reliability Engineer from playing games such as Puzzle Pirates, World of Warcraft, EVE Online, and Factorio. I will share what I learned from these experiences, and how both hiring managers and employees can talk about non-traditional forms of experience.

Video: !!con NYC 2018 keynote (slides)