BOF session on "Should we rearchitect cloud systems with stronger consistency
guarantees?".

The discussion in this session broadly centered around the topic of
consistency guarantees and the kinds of assurances provided by cloud providers
to their customers.  Ken Birman (Cornell) started the discussion by putting
forth a question to the audience: Given that strong consistency is a desirable
property for programmability, can we build a scalable PaaS-style platform for
applications where strong consistency is built into the platform itself ? Or
should we just accept the fact that only weaker consistency models can scale
at datacenter levels, and learnt to deal with it at the application level as
demonstrated by large providers today, such as Google, Amazon, and Facebook?

Marcos Aguilera (MSR) opined that we may have reached a tipping point, where
outages and data losses are causing people to lose confidence in existing
cloud systems and driving the community towards building strongly consistent
systems.  He emphasized that not only is it necessary to build strongly
consistent systems, it is also vital to retain scalability. The research
challenge, as Ken stated, lies herein.

Gregory Chockler (MIT) had a different take on the situation.  He recounted
his experiences with systems that attempted to provide strong consistency,
while struggling to achieve scale and availability. Ironically, not only did
the workloads running on these systems not require strong consistency, but in
fact, it was the attempt to provide this consistency that caused availability
issues. Ken recounted a similar story of how a broadcast based system that
relied on application-level filtering of packets, caused huge amounts of
congestion that resulted in failure at scale. Indranil Gupta (UIUC) responded
to these observations by stating that system designers in industry are
sticking to stronger consistency purely because of the fact that they do not
understand what it means to support weaker consistency models.  He emphasized
that we need to educate the community on various consistency models, so that
people can make the right choices depending on the situation. Marcos countered
that with people attempting to meet SLAs with hosted applications, they may
demand higher assurances from the provider.

Robbert van Renesse (Cornell) raised the issue of availability in the presence
of consistency. It is easy to design strongly consistent or highly available
systems, but not a system that has both properties. In fact, trying to provide
both could lead to a worse situation than just giving up: imagine a scenario
where a weakly-consistent system would limp along with some data taking longer
to propagate, while a strongly consistent system might just fail and make all
the data impossible to access.

Coming back to the discussion on weaker consistency models that Indranil
started, Ken asked the audience about their opinions on systems that push the
onus of consistency to the client side, citing Windows Azure as an example.
Marcos' response was that if human elements were involved, then this approach
would work. For example, if the user observed an inconsistency on part of a
web page, she would just reload the page. However, in large and complex
systems, where software components could be receiving the inconsistent data,
it would become extremely hard to distinguish consistent from inconsistent
data.

There was a bit of a digression into discussing the social factors around
these problems; for example, eBay and Amazon can provide weak consistency
because of the relative lack of monetary repercussions. Simultaneously
accepted orders for a single item are resolved by simply cancelling the order
at a later time, while discrepancies of a few dollars in a bidding process
were unlikely to cause controversy. However, should Sotheby's use such a
system while auctioning a million dollar painting, there would be far more of
an uproar if a lack of consistency caused discrepancies. Eventually, most
people agreed that the demand for consistency would only arise when companies
could be driven bankrupt by the consequences of its absence. For other
systems, it is free market dynamics -- as long as systems are "good enough"
and users put up with occasional errors, there is little incentive to change. 

Kastubh Joshi (AT&T Research)'s take on the situation as a whole was that
consistency decisions should be made by applications and not by the underlying
system. Someone else in the audience countered that point by stating that
developers would find it extremely hard to deal with various consistency
models.  However, other industry folks (NetApp, Facebook) agreed with
Kastubh's opinion, describing their personal experiences in developing large
scale systems where different consistency properties were enforced on
different parts of the application's data.  However, an audience member
commented that it would be better if the system (IaaS/PaaS platform) supported
different consistency models and applications were allowed to take advantage
of these properties as they desired.

Robbert observed that we often make assumptions when building systems and are
unhappy when they get violated. What system designers need to do is to
catalogue such assumptions, so that the application developer is better
informed while making various consistency versus scalability tradeoffs.

Marcos observed that a system could be consistent end-to-end without entirely
being built of consistent components; likewise pairing together strongly
consistent building blocks would not ensure that the system as a whole would
remain consistent.

The discussion then shifted to the economics of providing strong consistency:
Google has systems like Spanner, but customers cannot buy anything which
provides similar guarantees. While cost may be a factor, it is not the only
factor in making such decisions; for instance, people use medium-sized Amazon
EC2 instances, rather than micro-instances, for mission critical services. Ken
noted that there is a cost tipping point for customers, and the question was
whether a strongly consistent system could be built under that cost threshold.

Increasingly, the discussion shifted to the "one size fits all" model
currently provided. Not only do different applications require different
consistency guarantees, different components of the same system might also
desire them. As an example, Facebook, often a poster child for discussions on
eventual consistency, still uses strong consistency guarantees for the login
page and for users' settings, while 99% of the data was eventually consistent.

Supporting multiple consistency models, however, would significantly increase
the complexity of programming interfaces for these kinds of services. An
audience member likened this to the microkernel interface for OSes, and how
having a limited set of well-defined interfaces was perhaps more important
than being overly permissive in the kinds of modes supported.

The session concluded with Marcos' observation that while the jury was still
out on whether strong consistency was necessary, or if weak consistency would
be good enough, strongly defining the properties provided by a system to
hosted applications was imperative and long overdue.