🍉
Irfan's Blog
  • Welcome!
  • 2023 Retrospection
  • Biographies
    • Biography: Lee Kuan Yew
  • Book Summaries
    • Book Summary: Increment Reliability
    • Book Summary: 7 habits of highly effective people
  • Equity Investment
    • Equity: Rubric on Choosing Companies to Invest
  • Body & Mental Health
  • Islam
  • Leadership
  • Life Values
    • Values: First Fundamental
    • Values: Starts with Why
  • Social Skills
  • Travel
    • Travel: Jun 2023 - SG, Singapore
    • Travel: Nov 2023 - UK, London
      • Travel: UK - Best Halal Chicken Food
    • Travel: Nov 2023 - TH, Bangkok
Powered by GitBook
On this page

Was this helpful?

  1. Book Summaries

Book Summary: Increment Reliability

Issue 16 - February 2021

To build reliable system, it starts from the correct team culture. Team culture is collective behavior of team members, it reflects team values. To ensure reliable culture, team must have three things:

  1. Collective, centralize knowledge base

  2. Mature tools, technologies and processes

  3. Psychology safety environment.

Then, the book enters detail into pseudo-methods which are methods with improper test cases as they don't cover much variety of test cases. Pseudo-methods bring reliability risk to teams. One way to check for pseudo-methods is via tool called Descartes. It's been tested on public repo, Apache common collections.

Reliability is shared responsibilities among engineers, not only managers, so then trust is required to achieve reliable system at scale. Peer-review, trust in the tools reliability are among things to be build. Once that's achieved, team will release more and system fails less, there are 4 metrics each team must calculate on regular basis:

  1. Lead time

  2. Deployment frequency

  3. Change failure rate

  4. Recovery time

What separate good and great teams are great teams have great numbers on the above metrics. It's impossible to eliminate risks in large system, what team should do is to make mechanics on how to handle failure. Difference between robust and resilience is that while robust is measured against known cases, while resilience measured against unexpected. One way to improve system resilience is via chaos engineering -- bringing planned, calculated amount of failure to system to check for bottlenecks.

PreviousBook SummariesNextBook Summary: 7 habits of highly effective people

Last updated 1 year ago

Was this helpful?

Page cover image