Thursday 20 November 2014

Idempotence

More revision, a few musings and another aide-mémoire.

Idempotence is a subject of much debate and seems to mean different things to different people when used in different contexts. This post is just a quick mile-high overview by way of a bit of revision.

Idempotence defined

Lets pop over to Wikipedia and get a definition of idempotence:

“Idempotence (/ˌaɪdɨmˈpoʊtəns/ EYE-dəm-POH-təns) is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.” [1]

“In computer science, the term idempotent is used more comprehensively to describe an operation that will produce the same results if executed once or multiple times. This may have a different meaning depending on the context in which it is applied. In the case of methods or subroutine calls with side effects, for instance, it means that the modified state remains the same after the first call…

This is a very useful property in many situations, as it means that an operation can be repeated or retried as often as necessary without causing unintended effects. With non-idempotent operations, the algorithm may have to keep track of whether the operation was already performed or not.” [2]

So for our purposes idempotence is the property of an operation that means if it is executed once or multiple times the result will always be the same. Now it’s the result that seems to be cause of the debate.

There are often side effects – sometimes quite subtle - that could be considered part of the result. For example, if you were to create an audit record every time an apparently idempotent operation is executed is that still idempotent? The answer is ‘probably not’ but will depend on the expected behaviour of the application. Logging, auditing and monitoring may be considered side effects of message handling:

“These side effects are not relevant to the semantics of the application behavior, so the processing of an idempotent request is still considered idempotent even if side effects exist.” [6]

Idempotence in HTTP

In HTTP there are some methods that are considered idempotent.

“Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE should not have side effects, and so are inherently idempotent.” [3]

Actually in practice there are some potential pitfalls with methods like DELETE. Although the result of a delete operation on the server may be idempotent – deleting the same resource multiple times has the same effect - it is possible for repeated DELETE operations on the same resource to return different HTTP status codes. An initial call to DELETE may return a status code of 200 but subsequent calls could return 404. From the client side the operation may not appear to be idempotent.  

Idempotency is a concept that features in REST quite a bit and is regarded as an important characteristic of fault-tolerant APIs. In REST the result may well be the resource representation so issues about audit records etc. may not be a concern. To illustrate here’s a quote from http://restcookbook.com:

“An idempotent HTTP method is a HTTP method that can be called many times without different outcomes. It would not matter if the method is called only once, or ten times over. The result should be the same. Again, this only applies to the result, not the resource itself. This still can be manipulated (like an update-timestamp, provided this information is not shared in the (current) resource representation.” [4]

So the author is drawing a distinction here between the actual resource and it’s representation. As long as the representation doesn’t change then the operation is regarded as idempotent.

 

Idempotence in messaging

Idempotence in message-based systems is very useful because it helps avoid the necessity of using strategies such as two-phase commit (2PC) to manage distributed transactions. Essentially 2PC in a message-based system means that a message must be processed exactly once by the cohorts participating in the distributed transaction. As we know there are potential drawbacks to using 2PC because it is a blocking protocol that potentially ties up resources.

With a 2PC strategy in place the coordinating process has to determine if a failure condition has occurred and abort the distributed transaction. This causes all participating cohorts to abort their part of the distributed transaction. In a message-based system this effectively means that the messages sent to the cohorts are regarded as not having been processed. Recovering from the failure requires the coordinator to replay the whole distributed transaction by republishing the messages.

An alternative approach is to use a strategy where a message is processed at least once. With this strategy if a failure occurs the original message may be published again by the coordinating process but recipients that have already processed the message are expected to ignore it. This can be achieved 2 ways: have a mechanism in place to remove duplicate messages from being processed by participating cohorts, or messages have to be idempotent. [5]

A typical pattern for handling messages is for a message to be read from a queue, processed, and then removed from the queue.[6] In this scenario if a failure occurs after the message has been processed but before the message has been removed from the queue (e.g. there’s a power failure) the message will be processed twice. If the messages are idempotent this is no longer a concern.

Strategies for implementing idempotent messages include:

  • Natural idempotency – some messages are naturally idempotent (processing them multiple times has the same effect). In much the same way as an HTTP DELETE can be idempotent so could a message to delete a resource.
  • Use a correlation identifier – add a unique identifier to each message and use it to see if the message has been processed before.

 

Idempotence and SOA

Just a quick note here that idempotence has a place to play in a service oriented architecture because it facilities fault-tolerance:

“Idempotency guarantees that repeated invocations of a service capability are safe and will have no negative effect.

Idempotent capabilities are generally limited to read-only data retrieval and queries. For capabilities that do request changes to service state, their logic is generally based on "set", "put" or "delete" actions that have a post-condition that does not depend on the original state of the service.

The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests (with the same identifier value) that have already been processed will be discarded or ignored by the service capability, rather than being processed again.” [7]

 

References

[1] Idempotence (Wikipedia)

[2] Idempotence – Computer science meaning (Wikipedia)

[3] RFC 2616 Part 9

[4] The RESTful CookBook – Idempotency

[5] (Un) Reliability in messaging: idempotency and de-duplication (Jimmy Bogard, Los Techies)

[6] Idempotence Is Not a Medical Condition (Pat Helland, acm.org)

[7] Idempotent Capability