Sunday, 13 August 2017

Quick RabbitMq using Docker

I needed a quick RabbitMq installation so I could play with MassTransit, the free open-source .Net message bus framework. Docker to the rescue. I was using the Docker Toolbox on Windows 10 Home.

First things first - RabbitMq is available on the Docker Store as a Docker image called rabbitmq. The documentation is reasonable and I decided I wanted RabbitMq installed with the management plugin. For this test I decided to leave the default RabbitMq username (guest) and password in place. I also elected to expose the default ports for the management plugin (15672) and the standard RabbitMq port (5672) to the host.

Having scanned the documentation the following Docker run command would seem to be in order:

docker run -d --hostname my-rabbit --name some-rabbit -p 15672:15672 -p 5672:5672 rabbitmq:3-management

Note the two –p command line arguments exposing the ports to the host from the Docker container. So, time to crack open the Docker Quickstart Terminal and run the command.

SNAGHTMLf7f8e94

With the Docker container up-and-running I can now go to my local machine and access the RabbitMq management UI running on port 15672.

SNAGHTMLf80a65d

Cool! Looks like we’ve got RabbitMq running in a container.

A quick test application

I decided to run a quick test using a console application to check everything was working. Firstly, I set up a virtual host for the test in the management UI remembering to add the guest user to the virtual host.

SNAGHTMLf83a6cc

Then, using the RabbitMq .Net client I created the ‘Hello World’ application.

using System;
using System.Text;
using RabbitMQ.Client;

namespace RabbitMqTest
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                ConnectionFactory factory = new ConnectionFactory();
                factory.Uri = new Uri("amqp://guest:guest@192.168.99.100:5672/console-test");

                Console.WriteLine("Connecting...");
                IConnection conn = factory.CreateConnection();
                Console.WriteLine("Connected.");

                IModel model = conn.CreateModel();

                var exchangeName = "console-test-exchange";
                var queueName = "console-test-queue";
                var consoleTestRoutingKey = "console-test-routing-key";

                model.ExchangeDeclare(exchangeName, ExchangeType.Direct);
                model.QueueDeclare(queueName, false, false, false, null);
                model.QueueBind(queueName, exchangeName, consoleTestRoutingKey, null);

                byte[] messageBodyBytes = Encoding.UTF8.GetBytes("Hello, world!");
                model.BasicPublish(exchangeName, consoleTestRoutingKey, null, messageBodyBytes);
            }
            catch (Exception e)
            {
                Console.WriteLine(e);
            }
        }
    }
}

I ran the application and headed back into the RabbitMq management UI to check the results. Firstly, was the exchange created?

SNAGHTMLf8b041c

It was. And the queue?

SNAGHTMLf8c5e0f

Success again. You can see there’s one message ready too. By drilling in to the queue you can get messages in the management UI. Let’s see what we got.

SNAGHTMLf8dd03d

Bingo! So, all is well with RabbitMq.

The Sample-ShoppingWeb application

MassTransit has a sample application called Sample-ShoppingWeb which you can get from GitHub.

Firstly, I created a new virtual host in RabbitMq and added the guest user to it. I then made simple changes to the configuration to change the RabbitMqHost setting in the App.config and Web.config files of the TrackingService and Shopping.Web projects respectively.

SNAGHTML1002618c

I ran the example and added a few items to the cart using the Shopping.Web MVC application. Watched as the TrackingService picked up the items via RabbitMq. An examination of the RabbitMq management UI showed a bunch of new exchanges and two new queues.

Done. 


Docker recipes

This post is a quick aide-mémoire for basic command-line Docker operations. It’s well worth reading the Docker documentation for the real deal. I’ve been running Docker on Windows 10 Home – yes, Home – so I’ve had to use Docker Toolbox. I’ve run the commands listed here using the Docker Quickstart Terminal that comes with the Toolbox.

List containers

To list the all containers run the following command:

docker container ls

SNAGHTMLf550b66

See the docker container ls documentation.

Run a Bash shell on a container

To get access to a container using a Bash shell run the following command:

docker exec –it <container-name-here> bash

SNAGHTMLf58ed4c

Friday, 21 July 2017

Setting up SSH for BitBucket on Windows

Back to basics for me today. I’m rebuilding a machine and want to setup SSH to access my BitBucket account (I use BitBucket for my Git repositories). The new machine already has Git installed. I simply used Chocolatey to install Git.

You can skip to full instructions in the BitBucket help if you like.

Step 1 – Check the .ssh directory

The first step is to check that you’ve got an folder called .ssh in your home directory. If it’s missing you need to create it.

SNAGHTML27c673c3

Step 2 – Create the default identity

Run ssh-keygen to create the key. If this is a fresh install there won’t be a default key so you can just hit enter to accept the default name or ender a new one if you want. Enter the passphrase when prompted.

SNAGHTML27c76c2d

Step 3 – Create an SSH config file

Create a config file for SSH.

SNAGHTML27c81a4f

Open the file and edit it. Add the following:

Host bitbucket.org
 IdentityFile ~/.ssh/id_rsa


Step 4 – Update the .bashrc file

Check you’ve got a .bashrc file in your home directory. Create one if you don’t. Open the .bashrc file and edit it. Add the following:

#! /bin/bash 
eval `ssh-agent -s` 
ssh-add ~/.ssh/*_rsa

See also Enter RSA passphrase once when using Git bash.

Close and reopen GitBash. You’ll be prompted for the passphrase.

SNAGHTML27c90088

Step 5 - Configure BitBucket to use the new key

Go to your BitBucket account and navigate to your settings. Adding the key is easy. Follow the steps here.

SNAGHTML27cc8153

SNAGHTML27ce99f2

Check you can access BitBucket using the new key.

SNAGHTML27cf57a4

Done.

Thursday, 26 January 2017

Unscrambling a mess of .Net Core installations

I've had a variety of .Net Core installations on my laptop. The end result was .Net Core not working in Visual Studio 2015 anymore despite uninstalling and reinstalling .Net Core and associated Visual Studio tools. The problem was probably exacerbated by uninstallers not completing correctly and other sundary issues.

In a perfect world I would have paved the machine and started again but at this point that’s not practical. Working in VMs might also be an option but for now I just want a laptop with .Net Core up-and-runing.

Investigation

Firstly I decided to find out what was hanging around on the machine. I opened a command prompt and ran dotnet –version to find out.

2017-01-26 10_47_42-MINGW64__c_source (Admin)

So this is saying I’ve got version 1.0.0-beta-001598. That seems very old to me and something that should have gone ages ago.

Looking in Add/Remove Programs I see something different.

2017-01-26 10_48_16-Control Panel_Programs_Programs and Features

This suggests 2 other versions have been installed: 1.0.0 Preview2-003131 and 1.0.0 Preview2-003133. Hmm…

So, I had a look at the PATH environment variable on the machine and saw some interesting things. The following directories – that all seem related to .Net Core – were listed in this order:

  • c:\Program Files\dotnet\bin
  • c:\Program Files\dotnet
  • c:\Program Files\Microsoft DNX\Dnvm
  • c:\Users\username\.dnx\runtimes\dnx-coreclr-win-x86.1.0.0-rc1-update2\bin
  • c:\Users\username\.dnx\bin

 

Looking in the c:\Program Files\dotnet folder confirmed the installed SDKs but there was something odd. The dotnet.exe appeared in c:\Program Files\dotnet and in c:\Program Files\dotnet\bin. Running them from each location separately revealled different .Net Core versions.

2017-01-26 11_01_39-cmd (Admin)

Yep. Things are in a mess.

The c:\Users\username\.dnx\ diectory is also interesting. That isn’t used anymore. What’s in there I wonder?

2017-01-26 10_53_03-cmd (Admin)

Turns out it’s got the old – now redundant - dnvm application that gives a different version again. There’s a stack of other stuff in theer too. Sheesh, what a mess I’ve made.

Solution

My solution was to do the following:

  • Uninstall the .Net Core and VS Tooling using Add/Remove Programs.
  • Run dnvm uninstall from the old .dnx folder (probably not necessary but what the heck).
  • Manually delete the following directories (and contents):
    • c:\Program Files\dotnet
    • c:\Program Files\Microsoft DNX
    • c:\Users\username\.dnx
  • Remove the same directories as above (and/or sub directories) from both the user and environment $PATH.
  • Restart the machine (probably not necessary - belt and braces).
  • Installed the latest .Net Core SDK and Visual Studio 2015 Tools from here.

 

The result? Working .Net Core in Visual Studio!

There was a nice tidy c:\Program Files\dotnet folder with the dotnet.exe and a subfolder containing the SDKs. That’s it, no other folders required or present.

Thursday, 26 January 2017

Thursday, 12 January 2017

What is Enterprise Architecture as described by TOGAF 9.1?

 

What is Architecture?

The TOGAF documentation initially refers to the original version of ISO/IEC 42010:2007 (Systems and software engineering) which defines architecture in the following terms:

“The fundamental organization of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution.” [1] [2]

However, TOGAF defines an architecture as follows:

“1. A formal description of a system, or a detailed plan of the system at component level to guide its implementation
2. The structure of components, their inter-relationships, and the principles and guidelines governing their design and evolution over time” [2] [3]

What is an Enterprise?

An enterprise is defined as follows:

“The highest level (typically) of description of an organization and typically covers all missions and functions. An enterprise will often span multiple organizations.” [4]

What kinds of architecture are dealt with by TOGAF?

TOGAF deals with 4 kinds of architecture:

  • Business Architecture
    • defines the business strategy, governance, organization, and key business processes [2]
    • a description of the structure and interaction between the business strategy, organization, functions, business processes, and information needs [5]
  • Data Architecture
    • describes the structure of an organization's logical and physical data assets and data management resources [2]
    • a description of the structure and interaction of the enterprise's major types and sources of data, logical data assets, physical data assets, and data management resources [6]
  • Application Architecture
    • provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization [2]
    • a description of the structure and interaction of the applications as groups of capabilities that provide key business functions and manage the data assets [7]
  • Technology Architecture
    • describes the logical software and hardware capabilities that are required to support the deployment of business, data, and application services [2]
    • includes IT infrastructure, middleware, networks, communications, processing, standards, etc. [2]
    • a description of the structure and interaction of the platform services, and logical and physical technology components [8]

 

References

Monday, 26 December 2016

Amazon gift cards (how Amazon can take your money and give nothing in return)

Customer service? I think not

Final update (26/12/16 16:30): Following a Tweet from me Amazon’s Social Media Team got in touch and resolved the issue, finally. Well done them and thanks to the lady who sorted it out for me. Case closed. I’ve left the blog post below for information.

My son received 2 gift cards from a good friend of our this Christmas. Both the cards were purchased at the same time direct from Amazon and were delivered direct to us. Both cards arrived at the same time in the same package. No 3rd party retailers were involved.

My son was able to redeem the first card but the second one – with a value of £20 - simply wouldn’t work.

So, I contact Amazon customer services using their online email facility but the results are shockingly poor and have left me feeling like I’m being taken for a ride.

So far they have strung me along with an exchange of 11 emails and one long phone call. I’ve included the whole thing below because of it’s absurdity. In short they start by saying the card was not authorised by the ‘retailer’ and that I should contact them. I point out Amazon was the retailer – no 3rd parties involved. They respond by asking for a PDF of the card which I send. They then ask me to provide the following gem of information:

“If you're able to see the any 3 consecutive digits of the claim code, other than the first 2 or last 4, please reply to this e-mail with these numbers along with the 16 digit card number located on the back of the card.”

Remember, I’d just send them a PDF of the whole card. You have to assume I’m being strung along at this point.

To rub salt in my wounds Amazon includes the following message at the bottom of each email:

“Your feedback is helping us build Earth's Most Customer-Centric Company.”

That starts to look like a bit of a joke. Anyway, so far no I’ve not been able to resolve the issue. To cut a long story short, Amazon has taken money from a friend of mine and provided nothing in return. What a great way to make money!

The whole email trail follows if you are interested (I’ve omitted the card number for obvious reasons).

The email trail

25/12/16 13:25:20 from me

A gift card given to my son does not work. It's for £20 and has number ****-******-*****. Please advise.

25/12/16 15:34 from Amazon Customer Services

Hello,

I'm sorry to hear you've had trouble using the Gift Card you received for your son and I’ll be happy to help you today.

I’ve checked your account, and can see that according to our records the Gift Card wasn't activated by the retailer it was purchased from.

Unfortunately, we can't activate this Gift Card for you as it was not created on our system.

The best action to take in this situation is to bring the Gift Card back to the point of purchase with your receipt, where the retailer can reissue a new card for you to use. If you no longer have the receipt, please contact the shop where the card was purchased to resolve this.

Your patience and understanding is greatly appreciated.​

If you need any further information or assistance, please let us know by replying to this e-mail so that we'll be happy to help you further.
We look forward to seeing you again soon.

25/12/16 17:43 from me

Sorry, but I think you are mistaken.

Two gift cards arrived at the same time and were purchased by a friend direct from amazon.co.uk, not a 3rd party. The 2 cards arrived in the same package and came direct from you. One card worked, the second did not. I believe the mistake is at your end.

Andrew French

25/12/16 18:12 from Amazon Customer Services

Hello,

I'm sorry to hear that you were unable to use one of your Gift Cards.

We will be happy to take the necessary action.

If you have received the Physical Gift card, I request you to attach the picture of the front and back side of Gift Card.

Please attach the picture as a PDF, JPG or PNG file.

If you have received the Email Gift card, I request you to just copy and past the entire gift card information, and send it to us.

I am sorry in making you to write back to us, but this will help us in resolving this issue for you in an efficient manner.

Thank you for you patience and understanding in this regard.

We look forward to seeing you again soon.​​​​​​​​​

25/12/16 19:08 from me

Please find the PDF attached as requested.
[I attached the PDF to the email]

25/12/16 20:54 from Amazon Customer Services

Hello Andy,

Firstly, please accept my sincere apologies for any inconvenience caused by this situation.

I understand the level of disappointment this has caused to you. If I had been in this situation, I would have felt the same.​

You have been a loyal and supportive customer with us since a long time, I highly appreciate your support with us.​​​

It's never our intention to cause inconvenience to our honest and valuable customer like you.​

Further to your email, I understand that the 2 Gift card were purchased by your friend but one Gift card is working another one is not working.

In this situation to help you further, I have checked  your friend account omitted@omitted.co.uk  and see that he/she purchased only one gift card from our direct store and the order number for the one is #***-*******-*******.

I have checked the image you have provided to us and can see that according to our records the Gift Card wasn't activated by the retailer it was purchased from.
Unfortunately, we can't activate this Gift Card for you as it was not created on our system.

The best action to take in this situation is to bring the Gift Card back to the point of purchase with your receipt, where the retailer can reissue a new card for you to use. If you no longer have the receipt, please contact the shop where the card was purchased to resolve this.

Should you require a​ny additional information or assistance, please do not hesitate to contact us.​

Once again, please let me apologies for any inconvenience this has caused. It is never our intention to cause any sort on inconvenience to our valued customers like you.​​

We look forward to seeing you again soon.

25/12/16 21/31 from me

No, you have misunderstood again.

The account you reference (omitted@omitted.co.uk) is not the purchaser of the gift cards but the recipient! The card you site as having been ‘purchased’ was not purchased at all but was redeemed. It is the one card that did work (as per email trail below). The card we are talking about here did not work.

I have not given you the purchaser’s account name because I do not have it.
What I can say with a certainty is that both cards were purchased at the same time. They both arrived together in the same package direct from Amazon, not a 3rd party. THERE IS NO 3RD PARTY RETAILER TO CONTACT. AMAZON WAS THE RETAILER OF BOTH CARDS.

I suggest you credit my son’s account (omitted@omitted.co.uk), the intended recipient of the card, with the £20 value and cancel the card itself. You can then take whatever steps are necessary to sort out the confusion at your end.

If you are unable to resolve this matter – which is of your making – please provide details of how I may make a formal complaint. At this point Amazon have taken £20 for nothing in return.

25/12/16 23:02 from Amazon Customer Services

Hello Andy,

I'm sorry you weren't able to redeem the Gift Card to your account.

Please accept my sincere apologies for any inconvenience caused by this.

If you're able to see the any 3 consecutive digits of the claim code, other than the first 2 or last 4, please reply to this e-mail with these numbers along with the 16 digit card number located on the back of the card.

If you're not able to provide 3 consecutive digits, please reply to this e-mail with the serial number and attach a scanned copy of the card as a PDF, JPG or PNG file.
Please reply to this e-mail with the serial number located on the back of the card and attach a scanned copy of the card as a PDF, JPG or PNG file.

Once received, we'll attempt to validate the card and claim it to the account associated with this e-mail address.

As a representative of Amazon.co.uk, I want to assure you that as our valued customer, your satisfaction is our top priority and be assured that your future order would better reflect our commitment to your satisfaction.​

I highly appreciate your patience, cooperation and understanding in this matter.​
If you need any further information or assistance, please let us know by replying to this e-mail so that we'll be happy to help you further.

We value your business with us and we are looking forward to serve you more in the future.​​​​​​​

25/12/16 23:50 from me

This is a disgrace. Just look how long this email trail is. You are asking for information ALREADY SUPPLIED.

I have ALREADY SENT YOU A PDF of the card which includes the full serial number so why you are asking for that is beyond me. Anyway, I have reattached the PDF. The serial number is ****************, but you can see that in the PDF anyway.

And of course I can see the claim code. We have been trying to enter it into your system to redeem the card. I am pretty sure you are making unreasonable requests now simply to string me along without you actually doing anything. Anyway, in answer to your question about the claim code here are 3 consecutive digits other than the first 2 or last 4: ***. You can check them against the attached PDF if you’ve got nothing better to do. They appear after the first hyphen in the claim code.

And as a reminder, there is no 3rd party retailer. Amazon sold the card.  

Do not reply to me asking for any further information. YOU HAVE IT ALL. Credit my son’s account (omitted@omitted.co.uk) with the monies owed (£20) immediately.

26/12/16 04:45 from Amazon Customer Services

Hello,

I'm sorry to learn about the issue you experienced in relation to the Gift Card. I've reviewed our previous correspondence with you.

The information provided in our last message correctly represents our policy at this time.

As my colleague previously mentioned, I’ve checked your account, and can see that according to our records the Gift Card wasn't activated by the retailer it was purchased from.

Unfortunately, we can't activate this Gift Card for you as it was not created on our system.

The best action to take in this situation is to bring the Gift Card back to the point of purchase with your receipt, where the retailer can reissue a new card for you to use. If you no longer have the receipt, please contact the shop where the card was purchased to resolve this.

Your patience and understanding is greatly appreciated.​

If you still face any issue then I kindly ask you to get in touch via phone. This way, you can speak to our live customer support team who can ensure we resolve this concern to your satisfaction. I'm sorry we don't share account and order information through email address due to security reasons.

I realise that, at this point, asking you to contact us again would be disappointing; however, we really feel that the best way to assist you with this concern is over the phone.

We're available 7 days a week 06.00 to midnight, local UK time. Freephone (within the UK): 0800 496 1081 International customers can reach us at +44 (0) 207 084 7911.

Amazon cares about our customers, and we're working to improve our service and selection.

Your patience and understanding is highly appreciated in this matter.
I hope this helps. We look forward to seeing you again soon.

26/12/16 10:31 from me

Why don’t any of you read the previous emails?

As I said many times, AMAZON IS THE RETAILER.

Anyway, you can read all about it here: http://www.andyfrench.info/2016/12/amazon-gift-cards-how-amazon-can-take.html

Pay the money you owe.

26/12/16 13:30 No email this time – phoned Amazon customer support

I called Amazon Customer Support but they refused to deal with me even though I tried explaining that my son was only 16. They insisted on talking to him direct. Frankly, that’s outrageous.

Anyway, a long call ensued with my son having to read all the numbers on the card several times to the customer services representative and answering many questions.

The outcome? Contact the person who bought the card and ask them to contact Amazon, despite the fact that I told them he’s out of the country on an extended holiday.

Fobbed off again.

26/12/16 16:30 Contacted direct by the Amazon Social Media Team

OK, following a Tweet from me Amazon’s Social Media Team got in touch and sorted the issue out. Success at last and well done the nice Amazon lady who dealt with it. It was nice to talk to someone who could deviate from the script!

Monday, 3 October 2016

Election algorithms for clustered software

The problem

I’ve recently been looking at a problem with some software that was written to work in a cluster. This particular software service runs background jobs against a SQL Server database and in order to support fail-over scenarios the software was written to work in a cluster. Only one service instance (the master) was actually doing any work at any given time with the other instances (the slaves) providing redundancy in the case of a failure. In other words, one instance would be nominated as the master and would take responsibility for running the background jobs. If the master crashed or became unavailable one of the other instances in the cluster would take over as master.

From now on I’m going to continue to use the term service instance to describe a software comonent that participates in a cluster. Each service instance is probably a separate process.

The problem was that the mechanism used to elect and monitor the master was based on UDP broadcast, and broadcast is something that can be problematic in cloud-based environments such as AWS. Given there was a need to migrate this service to the cloud this was a significant issue.

At a high level the election algorithm being used by the cluster was for service instances to use UDP broadcast to exchange messages between themselves to agree which instance would be the master. Once the master had been nominated it took over the work of running the background jobs. The other service instances would then periodically poll the master to check that it was still alive. The first instance to find the master to be unavailable would claim the master role, take over the responsibility of running the jobs and broadcast the change in master.

The use of UDP broadcast in this context was useful because it meant that service instances didn’t need to know about each other. To use more direct addressing it would be necessary to store the addresses of all instances in the cluster in some form of registry or configuration. Configuration management across multiple environments is itself a challenge so reducing the amount of configuration can be an advantage.

However, in this case the use of UDP broadcast was an issue that needed to be addressed to facilitate a move to the cloud. This provided a good opportunity to review clustering election patterns and approaches to writing clustered software in general to see what options are available.

Note: There are alternatives to creating writing software that behave as a cluster natively (e.g. ZooKeeper). This article does not deal with these alternative approaches but focuses on the creation of natively clustered software.

Reasons for clustering

There are typically 2 reasons for writing software that supports clustering:

  • Failover – to prevent outages it would be advantageous to build in redundancy so that if one service instance crashes there’s another available to take up the slack. Note that in this case it isn’t necessary for all instances to be doing useful work. Some may be on stand-by, available to take over if the primary fails but not doing anything while the primary is active.
  • Performance – to facilitate greater application performance running separate software instances (probably on separate servers) may be advantageous. In this case work can be distributed between instances and processed in parallel.

 

Of course, these two aspects are not mutually exclusive; a cluster may support both high availability and distributed processing.

Characteristics of clustered software

Typically when running software as a cluster one instance will be nominated as the coordinator (leader or master). Note that this instance does not have to perform the work itself, it may choose to delegate the work to one of the other instances in the cluster. Alternatively – such as in our example above – the coordinator may perform the work itself exclusively.

This is somewhat analogous to server clustering which can be either symmetrical or asymmetrical. In the symmetrical case every server in the cluster is performing useful work. To distribute work between the servers in the cluster a load balancer is required. In the case of a software cluster it’s the instance elected as the coordinator that’s probably performing this task.

In the asymmetrical case only one server will be active with the other server instances in the cluster being passive. A passive instance will only be activated in the event of a failure of the primary. In the case of a software cluster the coordinator would be the active instance with other instances being passive.

Whichever basic topology is chosen it will be necessary for the software cluster to elect a coordinator when the cluster starts. It will also be necessary for the cluster to recognise when a coordinator has crashed or become unavailable and for this to trigger the election of a new coordinator.

When designing a system like this care should be taken to avoid the coordinator becoming a bottleneck itself. There are also other considerations. For example, in auto-scaling scenarios what happens if the coordinator is shut down as a result of downsizing the infrastructure?

Election patterns

How do software clusters go about managing the election of a coordinator? Below is a discussion of 3 possible approaches:

  • Distributed mutex – use a shared mutex is made available to all service instances and is used to manage which instance is the coordinator. Essentially, all service instances race to grab the mutex. The first to succeed becomes the coordinator.
  • Bully algorithm – use messaging between instances in the cluster to elect the coordinator. The election is based on some unique property of each instance (e.g. a process identifier). The process with the highest value ‘wins’. The winning instance bullies the other instances into submission by keeping the mutex and claiming the coordinator role.
  • Ring algorithm – use messaging between instances in the cluster to elect the coordinator. Service instances are ordered (either physically or logically) so each instance knows its successors. Ordering in the ring is significant with election messages being passed around the ring to figure out which one is ‘at the top’. That instance is elected the coordinator.

 

More detailed descriptions of the approaches are provided below. As you’d expect each has its pros and cons.

Distributed mutex

A mutex “ensures that multiple processes that share resources do not attempt to share the same resource at the same time”. In this case the ‘resource’ is really a role – that of coordinator - that one service instance adopts.

Using a distributed mutex has the advantage that it works in situations where there is no natural leader (e.g. no suitable process identifier which would be required for the Bully Algorithm). Under some circumstances (e.g. when the coordinator is the only instance performing any work) the service instances need not know about each other either; the shared mutex is the only thing an instance needs to know about. In cases where the coordinator needs to distribute work amongst the other instances in the cluster then the coordinator must be able to contact – and therefore know about – the other instances.

The algorithm essentially follows this process:

  1. Service instances race to get a lease over a distributed mutext (e.g. a database object).
  2. The first instance to get the mutex is elected as the coordinator. Other instances are prevented from becoming the coordinator because they are blocked from getting a lease on the mutex.
  3. The coordinator performs the task of coordingating the distribution of work (or executing it itself depending on requirements).
  4. The lease must be set to expire after a period of time and the coordinator must periodically renew the lease. If the coordinator crashes or becomes unavailable it won’t be able to renew the lease on the mutext which will eventually become available again.
  5. All service instances periodically check the mutex to see if the lease has expired. If a service instance finds the lease on the mutex to be available it attempts to secure the lease. If it succeeds the instance becomes the new coordinator.

 

Note that the mutext becomes a potential single point of failure so consideration should be given to a scenario where unavailability of the mutex can prevent the cluster from electing a coordinator.

Another characteristic of using a shared mutex in this way is that election of the leader is non-deterministic. Any service instance in the cluster could take on the role of coordinator.

A good explanation of the shared mutex approach can be found in this article from MSDN.

Bully algorithm

There are some assumptions for the Bully Algorithm:

  • Each instance in the cluster has a unique identifier which must be an ordinal. This could be a process number or even a network address but whatever it is we should be able to order instances in the cluster using this identifier.
  • Each instance knows the identifiers of the other instances that should be participating in the cluster (some may be dead for whatever reason).
  • Service instances don’t know which ones are available and which are not.
  • Service instances must be able to send messages to each other.

 

The basis of the Bully Algorithm is the service instance with the highest identifier will be the coordinator. The algorithm provides a mechanism for service instances to discover which of them has the highest identifier and for that instance to bully the others into submission by claiming the coordinator role. It follows this basic process:

  1. A service instance sends an ELECTION message to all instances with identifiers greater than its own and awaits responses.
  2. If no service instances respond the originator can conclude it has the highest identifier and is therefore safe to assume the role of coordinator. The instance sends a COORDINATOR message to all other instances announcing the fact. Other instances will then start to periodically check that the coordinator is still available. If it isn’t, the instance that finds the coordinator unavailable will start a new election (back to step 1).
  3. Any service instance receiving an ELECTION message and having an identifier greater than the originator will respond with an OK message indicating it’s available.
  4. If in response to an ELECTION message the originator receives an OK response back it knows there’s at least one service instance with a higher identifier than itself. The following then happens:
    1. The original service instance abandons the election (because it knows there’s at least one process with a higher identifier than itself).
    2. Any instances that responded to the ELECTION message with OK now issue ELECTION messages themselves (they start at step 1) and the process repeats until the service with the highest identifier has been elected.

A nice description of the process can be found in this article.

Ring algorithm

As with the Bully Algorithm there are some basic assumptions for the Ring Alorithm.

  • The service instances are ordered in some way.
  • Each service instance uses the ordering to know who its successor is (in fact it needs to know about all the instances in the ring, as we will see below).

 

The Ring Algorithm basically works like this:

  1. All service instances monitor the coordinator.
  2. If any service instance finds the coordinator is not available it sends an ELECTION message to its successor. If the successor is not available the message is sent to the next instance in the ring until an active one is found.
  3. Each service instance that receives the ELECTION message adds its identifier to the message and passes it on as in step 2.
  4. Eventually the message gets back to the originating process instance which recognises the fact because its own identifier is in the list. It examines the list of active instances and finds the one with the highest identifier. The instance then issues a COORDINATOR message informing all the instances in the ring which one is now coordinator (the one with the highest identifier).
  5. The service instance with the highest identifier has now been elected as the coordinator and processing resumes.

 

Note that multiple instances could recognise that the coordinator is unavailable resulting in multiple ELECTION and COORDINATOR messages being sent around the ring. This doesn’t matter, the result is the same.

Other things to look at

A NuGet package is available for a light-weight non-intrusive leader election library for .Net called NanoCluster. Source code is available on GitHub here:

https://github.com/ruslander/NanoCluster

It’s a small project and doesn’t seem to have been used a great deal but might provide some ideas.

References