andyfrench.info: SOA

Showing posts with label SOA. Show all posts

Thursday, 20 November 2014

Idempotence

More revision, a few musings and another aide-mémoire.

Idempotence is a subject of much debate and seems to mean different things to different people when used in different contexts. This post is just a quick mile-high overview by way of a bit of revision.

Idempotence defined

Lets pop over to Wikipedia and get a definition of idempotence:

“Idempotence (/ˌaɪdɨmˈpoʊtəns/ EYE-dəm-POH-təns) is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.” [1]

“In computer science, the term idempotent is used more comprehensively to describe an operation that will produce the same results if executed once or multiple times. This may have a different meaning depending on the context in which it is applied. In the case of methods or subroutine calls with side effects, for instance, it means that the modified state remains the same after the first call…
This is a very useful property in many situations, as it means that an operation can be repeated or retried as often as necessary without causing unintended effects. With non-idempotent operations, the algorithm may have to keep track of whether the operation was already performed or not.” [2]

So for our purposes idempotence is the property of an operation that means if it is executed once or multiple times the result will always be the same. Now it’s the result that seems to be cause of the debate.

There are often side effects – sometimes quite subtle - that could be considered part of the result. For example, if you were to create an audit record every time an apparently idempotent operation is executed is that still idempotent? The answer is ‘probably not’ but will depend on the expected behaviour of the application. Logging, auditing and monitoring may be considered side effects of message handling:

“These side effects are not relevant to the semantics of the application behavior, so the processing of an idempotent request is still considered idempotent even if side effects exist.” [6]

Idempotence in HTTP

In HTTP there are some methods that are considered idempotent.

“Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE should not have side effects, and so are inherently idempotent.” [3]

Actually in practice there are some potential pitfalls with methods like DELETE. Although the result of a delete operation on the server may be idempotent – deleting the same resource multiple times has the same effect - it is possible for repeated DELETE operations on the same resource to return different HTTP status codes. An initial call to DELETE may return a status code of 200 but subsequent calls could return 404. From the client side the operation may not appear to be idempotent.

Idempotency is a concept that features in REST quite a bit and is regarded as an important characteristic of fault-tolerant APIs. In REST the result may well be the resource representation so issues about audit records etc. may not be a concern. To illustrate here’s a quote from http://restcookbook.com:

“An idempotent HTTP method is a HTTP method that can be called many times without different outcomes. It would not matter if the method is called only once, or ten times over. The result should be the same. Again, this only applies to the result, not the resource itself. This still can be manipulated (like an update-timestamp, provided this information is not shared in the (current) resource representation.” [4]

So the author is drawing a distinction here between the actual resource and it’s representation. As long as the representation doesn’t change then the operation is regarded as idempotent.

Idempotence in messaging

Idempotence in message-based systems is very useful because it helps avoid the necessity of using strategies such as two-phase commit (2PC) to manage distributed transactions. Essentially 2PC in a message-based system means that a message must be processed exactly once by the cohorts participating in the distributed transaction. As we know there are potential drawbacks to using 2PC because it is a blocking protocol that potentially ties up resources.

With a 2PC strategy in place the coordinating process has to determine if a failure condition has occurred and abort the distributed transaction. This causes all participating cohorts to abort their part of the distributed transaction. In a message-based system this effectively means that the messages sent to the cohorts are regarded as not having been processed. Recovering from the failure requires the coordinator to replay the whole distributed transaction by republishing the messages.

An alternative approach is to use a strategy where a message is processed at least once. With this strategy if a failure occurs the original message may be published again by the coordinating process but recipients that have already processed the message are expected to ignore it. This can be achieved 2 ways: have a mechanism in place to remove duplicate messages from being processed by participating cohorts, or messages have to be idempotent. [5]

A typical pattern for handling messages is for a message to be read from a queue, processed, and then removed from the queue.[6] In this scenario if a failure occurs after the message has been processed but before the message has been removed from the queue (e.g. there’s a power failure) the message will be processed twice. If the messages are idempotent this is no longer a concern.

Strategies for implementing idempotent messages include:

Natural idempotency – some messages are naturally idempotent (processing them multiple times has the same effect). In much the same way as an HTTP DELETE can be idempotent so could a message to delete a resource.
Use a correlation identifier – add a unique identifier to each message and use it to see if the message has been processed before.

Idempotence and SOA

Just a quick note here that idempotence has a place to play in a service oriented architecture because it facilities fault-tolerance:

“Idempotency guarantees that repeated invocations of a service capability are safe and will have no negative effect.
Idempotent capabilities are generally limited to read-only data retrieval and queries. For capabilities that do request changes to service state, their logic is generally based on "set", "put" or "delete" actions that have a post-condition that does not depend on the original state of the service.
The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests (with the same identifier value) that have already been processed will be discarded or ignored by the service capability, rather than being processed again.” [7]

References

[1] Idempotence (Wikipedia)

[2] Idempotence – Computer science meaning (Wikipedia)

[3] RFC 2616 Part 9

[4] The RESTful CookBook – Idempotency

[5] (Un) Reliability in messaging: idempotency and de-duplication (Jimmy Bogard, Los Techies)

[6] Idempotence Is Not a Medical Condition (Pat Helland, acm.org)

[7] Idempotent Capability

Validation in web services

A colleague recently informed me that doing validation in a web service was a bad idea and should be avoided. I think his reasoning was that in systems requiring massive throughput performing validation at a web service entry point would be a blocking operation that would hold resources and impact negatively on overall system performance. Sorry, but the idea of not validating a request to a web service struck me as stupid. Anyway, I like to get my ducks in a row so I thought I’d take the opportunity to revisit some SOA concerns around validation.

My usual source of reference on matters pertaining to SOA is Thomas Erl. So, I dug out my copy of “SOA – Principles of Service Design” and had a look at what it had to say on the matter.

“Regardless of the extent of indirect coupling a service contract imposes, there will always be the requirement for the consumer program to comply to the data model defined in the technical service contract definitions.

In the case of a Web service, this form of validation coupling refers to the XML schema complex types that represent individual incoming and outgoing messages. Schemas establish data types, constraints, and validation rules based on the size and complexity of the information being exchanged as well as the validation requirements of the service itself.” [1]

So, in the case of a SOAP-based XML Web service the schema specifies the nature and format of the data contained in the request. A Web service will perform validation against the schema and reject the request if it does not comply. Erl goes on to say,

“The extent of validation coupling required by each individual service capability can vary dramatically and is often tied directly to the measure of constraint granularity of service capabilities. As per the validation-related design patterns, each contract can be individually assessed as to the quantity of actual constraints required to increase its longevity.” [2]

When Erl refers to the constraint granularity he is talking about “the amount of detail which a particular constraint is expressed” [3]. Erl provides the example of a constraint to be applied to a product code; you could mandate that it is a string value between 1 and 4 characters (a course grained constraint) or that it is 4 characters with each character being numeric (a fine grained constraint).

Erl’s concern about service longevity appears to be that if constraint granularity is high so is the validation coupling. Changes to validation rules could pose a threat to service longevity because they would require a change to the service contract. As Erl puts it, “By reducing the overall quantity of constraints and especially filtering out those more prone to change, the longevity of a service contract can be extended.” [4]

Erl offers the Validation Abstraction pattern to help alleviate this issue.

So, does all of this mean that performing validation in a Web service is a bad idea. No, of course it doesn’t. What this means is that by reducing the validation detail within the schema can help increase the longevity of service contracts.

However, if you do this you must defer more detailed validation to the underlying service logic, not ignore it altogether. It is worthy of note that some validation logic may be better suited to being performed within the processing boundary of a business service (i.e. moving the validation logic closer to where specific business processing takes place). Use of a dedicated validation service is also an option so validation logic can be maintained on a single location and managed separately.

What I take away from this:

Validation rules are expressed for service contracts using schema.
Keeping validation constraints course grained can help extend service contract longevity.
Consideration should be given to move detailed validation logic into the underlying service logic.
Moving some validation logic within the business process boundary should be considered.
Use of an external validation service is an option.

As for my colleague’s original assertion, I think I’ll stick to performing validation in my Web services as and when required.

References

[1] “SOA – Principles of Service Design”, Thomas Erl, pp190-191

[2] “SOA – Principles of Service Design”, Thomas Erl, p191

[3] “SOA – Principles of Service Design”, Thomas Erl, p117

[4] http://www.soapatterns.org/validation_abstraction.php

Notes on Service-Orientation

I was thumbing through my copy of SOA Principles of Service Design by Thomas Erl, and thought I’d make a few notes on some of the standout concepts as well as notes picked up from other locations.

As usual this is going to be a work in progress…

OASIS (Organization for the Advancement of Structured Information Standards) defines SOA as:

“A paradigm for organizing and utilizing distributed capabilities that may be under the control of different ownership domains. It provides a uniform means to offer, discover, interact with and use capabilities to produce desired effects consistent with measurable preconditions and expectations.”

Principles of Service-Orientation as a design paradigm

Erl describes some of the principles of a service-oriented design paradigm in the following terms:

Standardised Service Contract

The purpose and capabilities of a service are exposed by a service contract.

Service Loose Coupling

Emphasis should be placed on reducing dependencies between the service contract, its implementation and its consumers.

Service Abstraction

Hide the underlying details of a service from its consumers.
Enables and preserves loose coupling.

Service Reusability

Emphasises the positioning of services as enterprise resources “with agnostic functional contexts”.

Service Autonomy

Underlying solution logic needs to have a significant degree of control over its environment and resources.

Service Statelessness

“The management of excessive state information can compromise the availability of a service and undermine its scalability and potential.”
Ideally, services should remain stateful only when required.

Service Discoverability

“For services to be positioned as IT assets with repeatable ROI they need to be easily identified and understood when opportunities present themselves.”

Service Composability

Services should be capable of participating as effective composition members.

andyfrench.info

“If it hurts, do it more frequently, and bring the pain forward.” - Jez Humble

Thursday, 20 November 2014

Idempotence

Idempotence defined

Idempotence in HTTP

Idempotence in messaging

Idempotence and SOA

References

Friday, 9 March 2012

Validation in web services

References

See also

Tuesday, 10 August 2010

Notes on Service-Orientation

Principles of Service-Orientation as a design paradigm

Popular Posts

Categories

Blog Archive