githubEdit

1. Reliable, Scalable, and Maintainable Applications

The Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a techโ€ nology with a scale like that was so error-free?

โ€” Alan Kayarrow-up-right, in interview with Dr Dobbโ€™s Journal (2012)

Many applications today are data-intensive, as opposed to compute-intensive. Raw CPU power is rarely a limiting factor for these applicationsโ€”bigger problems are usually the amount of data, the complexity of data, and the speed at which it is changing.

A data-intensive application is typically built from standard building blocks that proโ€ vide commonly needed functionality. For example, many applications need to:

  • Store data so that they, or another application, can find it again later (databases)

  • Remember the result of an expensive operation, to speed up reads (caches)

  • Allow users to search data by keyword or filter it in various ways (search indexes)

  • Send a message to another process, to be handled asynchronously (stream proโ€ cessing)

  • Periodically crunch a large amount of accumulated data (batch processing)

If that sounds painfully obvious, thatโ€™s just because these data systems are such a sucโ€ cessful abstraction: we use them all the time without thinking too much. When buildโ€ ing an application, most engineers wouldnโ€™t dream of writing a new data storage engine from scratch, because databases are a perfectly good tool for the job.

But reality is not that simple. There are many database systems with different characโ€ teristics, because different applications have different requirements. There are variโ€ ous approaches to caching, several ways of building search indexes, and so on. When building an application, we still need to figure out which tools and which approaches are the most appropriate for the task at hand. And it can be hard to combine tools when you need to do something that a single tool cannot do alone.

This book is a journey through both the principles and the practicalities of data sysโ€ tems, and how you can use them to build data-intensive applications. We will explore what different tools have in common, what distinguishes them, and how they achieve their characteristics.

In this chapter, we will start by exploring the fundamentals of what we are trying to achieve: reliable, scalable, and maintainable data systems. Weโ€™ll clarify what those things mean, outline some ways of thinking about them, and go over the basics that we will need for later chapters. In the following chapters we will continue layer by layer, looking at different design decisions that need to be considered when working on a data-intensive application.

โ€ฆโ€ฆ

Summary

In this chapter, we have explored some fundamental ways of thinking about data-intensive applications. These principles will guide us through the rest of the book, where we dive into deep technical detail.

An application has to meet various requirements in order to be useful. There are functional requirements (what it should do, such as allowing data to be stored, retrieved, searched, and processed in various ways), and some nonfunctional requireโ€ ments (general properties like security, reliability, compliance, scalability, compatibilโ€ ity, and maintainability). In this chapter we discussed reliability, scalability, and maintainability in detail.

Reliability means making systems work correctly, even when faults occur. Faults can be in hardware (typically random and uncorrelated), software (bugs are typically sysโ€ tematic and hard to deal with), and humans (who inevitably make mistakes from time to time). Fault-tolerance techniques can hide certain types of faults from the end user.

Scalability means having strategies for keeping performance good, even when load increases. In order to discuss scalability, we first need ways of describing load and performance quantitatively. We briefly looked at Twitterโ€™s home timelines as an example of describing load, and response time percentiles as a way of measuring performance. In a scalable system, you can add processing capacity in order to remain reliable under high load.

Maintainability has many facets, but in essence itโ€™s about making life better for the engineering and operations teams who need to work with the system. Good abstracโ€ tions can help reduce complexity and make the system easier to modify and adapt for new use cases. Good operability means having good visibility into the systemโ€™s health, and having effective ways of managing it.

There is unfortunately no easy fix for making applications reliable, scalable, or mainโ€ tainable. However, there are certain patterns and techniques that keep reappearing in different kinds of applications. In the next few chapters we will take a look at some examples of data systems and analyze how they work toward those goals.

Later in the book, in Part IIIarrow-up-right, we will look at patterns for systems that consist of sevโ€ eral components working together, such as the one in Figure 1-1arrow-up-right.

References

  1. Michael Stonebraker and UฤŸur ร‡etintemel: โ€œ'One Size Fits All': An Idea Whose Time Has Come and Gonearrow-up-right,โ€ at 21st International Conference on Data Engineering (ICDE), April 2005.

  2. Walter L. Heimerdinger and Charles B. Weinstock: โ€œA Conceptual Framework for System Fault Tolerancearrow-up-right,โ€ Technical Report CMU/SEI-92-TR-033, Software Engineering Institute, Carnegie Mellon University, October 1992.

  3. Ding Yuan, Yu Luo, Xin Zhuang, et al.: โ€œSimple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systemsarrow-up-right,โ€ at 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI), October 2014.

  4. Yury Izrailevsky and Ariel Tseitlin: โ€œThe Netflix Simian Armyarrow-up-right,โ€ netflixtechblog.com, July 19, 2011.

  5. Daniel Ford, Franรงois Labelle, Florentina I. Popovici, et al.: โ€œAvailability in Globally Distributed Storage Systemsarrow-up-right,โ€ at 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), October 2010.

  6. Brian Beach: โ€œHard Drive Reliability Update โ€“ Sep 2014arrow-up-right,โ€ backblaze.com, September 23, 2014.

  7. Laurie Voss: โ€œAWS: The Good, the Bad and the Uglyarrow-up-right,โ€ blog.awe.sm, December 18, 2012.

  8. Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, et al.: โ€œWhat Bugs Live in the Cloud?arrow-up-right,โ€ at 5th ACM Symposium on Cloud Computing (SoCC), November 2014. doi:10.1145/2670979.2670986arrow-up-right

  9. Nelson Minar: โ€œLeap Second Crashes Half the Internetarrow-up-right,โ€ somebits.com, July 3, 2012.

  10. Amazon Web Services: โ€œSummary of the Amazon EC2 and Amazon RDS Service Disruption in the US East Regionarrow-up-right,โ€ aws.amazon.com, April 29, 2011.

  11. Richard I. Cook: โ€œHow Complex Systems Failarrow-up-right,โ€ Cognitive Technologies Laboratory, April 2000.

  12. Jay Kreps: โ€œGetting Real About Distributed System Reliabilityarrow-up-right,โ€ blog.empathybox.com, March 19, 2012.

  13. David Oppenheimer, Archana Ganapathi, and David A. Patterson: โ€œWhy Do Internet Services Fail, and What Can Be Done About It?arrow-up-right,โ€ at 4th USENIX Symposium on Internet Technologies and Systems (USITS), March 2003.

  14. Nathan Marz: โ€œPrinciples of Software Engineering, Part 1arrow-up-right,โ€ nathanmarz.com, April 2, 2013.

  15. Michael Jurewitz: โ€œThe Human Impact of Bugsarrow-up-right,โ€ jury.me, March 15, 2013.

  16. Raffi Krikorian: โ€œTimelines at Scalearrow-up-right,โ€ at QCon San Francisco, November 2012.

  17. Martin Fowler: Patterns of Enterprise Application Architecture. Addison Wesley, 2002. ISBN: 978-0-321-12742-6

  18. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al.: โ€œDynamo: Amazon's Highly Available Key-Value Storearrow-up-right,โ€ at 21st ACM Symposium on Operating Systems Principles (SOSP), October 2007.

  19. Greg Linden: โ€œMake Data Usefularrow-up-right,โ€ slides from presentation at Stanford University Data Mining class (CS345), December 2006.

  20. Tammy Everts: โ€œThe Real Cost of Slow Time vs Downtimearrow-up-right,โ€ slideshare.net, November 5, 2014.

  21. Jake Brutlag: โ€œSpeed Mattersarrow-up-right,โ€ ai.googleblog.com, June 23, 2009.

  22. Tyler Treat: โ€œEverything You Know About Latency Is Wrongarrow-up-right,โ€ bravenewgeek.com, December 12, 2015.

  23. Jeffrey Dean and Luiz Andrรฉ Barroso: โ€œThe Tail at Scalearrow-up-right,โ€ Communications of the ACM, volume 56, number 2, pages 74โ€“80, February 2013. doi:10.1145/2408776.2408794arrow-up-right

  24. Graham Cormode, Vladislav Shkapenyuk, Divesh Srivastava, and Bojian Xu: โ€œForward Decay: A Practical Time Decay Model for Streaming Systemsarrow-up-right,โ€ at 25th IEEE International Conference on Data Engineering (ICDE), March 2009.

  25. Ted Dunning and Otmar Ertl: โ€œComputing Extremely Accurate Quantiles Using t-Digestsarrow-up-right,โ€ github.com, March 2014.

  26. Gil Tene: โ€œHdrHistogramarrow-up-right,โ€ hdrhistogram.org.

  27. Baron Schwartz: โ€œWhy Percentiles Donโ€™t Work the Way You Thinkarrow-up-right,โ€ solarwinds.com, November 18, 2016.

  28. James Hamilton: โ€œOn Designing and Deploying Internet-Scale Servicesarrow-up-right,โ€ at 21st Large Installation System Administration Conference (LISA), November 2007.

  29. Brian Foote and Joseph Yoder: โ€œBig Ball of Mudarrow-up-right,โ€ at 4th Conference on Pattern Languages of Programs (PLoP), September 1997.

  30. Frederick P Brooks: โ€œNo Silver Bullet โ€“ Essence and Accident in Software Engineering,โ€ in The Mythical Man-Month, Anniversary edition, Addison-Wesley, 1995. ISBN: 978-0-201-83595-3

  31. Ben Moseley and Peter Marks: โ€œOut of the Tar Pitarrow-up-right,โ€ at BCS Software Practice Advancement (SPA), 2006.

  32. Rich Hickey: โ€œSimple Made Easyarrow-up-right,โ€ at Strange Loop, September 2011.

  33. Hongyu Pei Breivold, Ivica Crnkovic, and Peter J. Eriksson: โ€œAnalyzing Software Evolvabilityarrow-up-right,โ€ at 32nd Annual IEEE International Computer Software and Applications Conference (COMPSAC), July 2008. doi:10.1109/COMPSAC.2008.50arrow-up-right

Last updated