May 2011 ~ andyfrench.info

Wednesday 25 May 2011

SQL revision - the GROUP BY clause

Every now and then I like to go back and have a look at something I think I know and double check my assumptions and understanding. In this case I’m going to look at the GROUP BY clause in SQL.

Partitions

Firstly, let’s look at partitions in sets. This is important because it directly relates to the ‘groups’ returned by a GROUP BY clause.

“A partition of a set X is a set of nonempty subsets of X such that every element x in X is in exactly one of these subsets.” *

We can derive certain properties of partitions:

The union of all the partitions (subsets) returns the original set
The intersection of the partitions (subsets) is empty

We can think of this like dividing a pizza into pieces. Each piece is a partition, and joining the partitions together gives us the original pizza.

The ‘groups’ returned by a GROUP BY clause are effectively simple partitions of the original set of data.

The GROUP BY clause

When we use a GROUP BY clause we are taking the resulting set of data from a query (consisting of a FROM and WHERE clauses) and then put the rows into groups (partitions) based on the values of the columns specified in the GROUP BY clause.

Each group becomes a single row in the result table. Each column in the row must be a characteristic of the group not of a single row in the group. Therefore the SELECT list must be made up of grouping columns or optional aggregate functions. Note also that groups, by definition, must have at least one row (i.e. they can’t be empty). his means that the result of a COUNT will never return zero when used in a query against a non-empty table. Groups are also distinct.

The resulting table of a GROUP BY is called a group table. All subsequent operations are executed on the rows in the group table rather than the original rows.

NULL values are generally treated as a single group.

The columns in the SELECT statement can be a subset of the columns in the GROUP BY clause, but the columns in the GROUP BY clause can never be a subset of the columns in the SELECT statement.

References

* Partition of a set, Wikipedia

What is the difference between the various WCF programming models?

OK, I’m getting confused again about the differences between the various WCF-based programming models. There are so many variations on the theme - WCF Data Services, WCF RIA Services, WCF Web API, etc. – that I can no longer see the wood for the trees. Microsoft’s insistence on renaming things doesn’t help. So, here’s a quick aide-mémoire for me based on WCF, Data Services and RIA Services Alignment Questions and Answers.

Service model	Description	Also known as
WCF	SOAP-based services Full flexibility for building operation-centric services Supports interoperability (e.g. with Java)	Indigo
WCF WebHttp Services	Operation centric Expose WCF service operations to non-SOAP endpoints Best when operation-centric HTTP services to be deployed at web scale or building a RESTful service and want full control over the URI/format/protocol	WCF Rest
WCF Data Services	Data centric Best when exposing data model through a RESTful interface Includes a full implementation of the OData Protocol	Astoria ADO.Net Data Services
WCF RIA Services	For building an end-to-end Silverlight applications Provides a prescriptive pattern that defaults many options for the best experience in the common cases	.Net RIA Services
WCF Web API	Expose applications, data and services to the web directly over HTTP Replaces the REST Starter Kit Supporting SOAP is not a goal New kid on the block

References

MEF basics

I’m digging into the Managed Extensibility Framework (MEF) and need to keep track of some simple terms, definitions and concepts. This post is just an aide-mémoire so don’t expect any detail. I’ve crushed the MEF documentation into some bullet points.

What is MEF?

“MEF offers discovery and composition capabilities that you can leverage to load application extensions.” *

The basic namespace for everything MEF is System.ComponentModel.Composition.

Basic terms and definitions

Catalog
- Responsible for discovering extensions (ComposableParts)
- Assembly Catalog
  - Discovers all the exports in a given assembly
- Directory Catalog
  - Discover all the exports in all the assemblies in a directory
  - Does a one-time scan of the directory and will not automatically refresh when there are changes in the directory (can call Refresh() to rescan)
  - Not supported in Silverlight
- Aggregate Catalog
  - Use when a combination of catalogs is needed
- Type Catalog
  - Discovers all the exports in a specific set of types
- Deployment Catalog
  - Silverlight only
  - For dynamically downloading remote XAPs
CompositionContainer
- Interacts with Catalogs
- Resolves a part's dependencies and exposes Exports to the outside world
ComposablePart
- A composable unit within MEF
- Offers up one or more Exports
- May depend on one or more externally provided Imports
- Attributed with the [System.ComponentModel.Composition.Export] and [System.ComponentModel.Composition.Import] attributes in order to declare their exports and imports.
- Either added to the container explicitly or created through the use of catalogs
- A common pattern is for a ComposablePart to export an interface or an abstract type contract rather than a concrete type
  - Allows the importer to be decoupled from the specific implementation of the export
Contract
- The bridge between exports and imports
- A contract is a string identifier
- If no contract is specified, MEF will implicitly use the fully qualified name of the type as the contract
- Every export has a contract, and every import declares the contract it needs
Contract Assembly
- An assembly which contains contract types that extenders can use for extending your application
Exports
- Composable Part export
  - Used when a Composable Part needs to export itself
  - Decorate the Composable Part class with the[Export]
- Property export
  - Decorate a property with the [Export] attribute
  - Allows exporting sealed types such as the core CLR types, or other third party types
  - Allows decoupling the export from how the export is created
  - Allows having a family of related exports in the same Composable Part
- Method export
  - Methods are exported as delegates which are specified in the Export contract
  - Allows finer grained control as to what is exported
  - Shields the caller from any knowledge of the type
  - Can be generated through light code gen
- Inherited Exports
  - Base class / interface defines exports which are automatically inherited by implementers
  - Use [InheritedExport]
- Lazy Exports
  - Can delay instantiation
  - Can prevent recursive composition down the graph
  - Import an [System.Lazy<T>] instead of [T] directly
Imports
- Property import
  - Decorate the property with [Import]
- Constructor parameters
  - Specify imports through constructor parameters
  - Add [ImportingConstructor] to the constructor
  - Add parameters to the constructor for each import
- Field imports
  - Decorate the field with [Import]
- Optional imports
  - MEF allows you to specify that an import is optional ([Import(AllowDefault=true)])
  - The container will provide an export if one is available otherwise it will set the import to Default(T)
- Importing collections
  - Can import collections with the [ImportMany] attribute
  - All instances of the specific contract will be imported from the container and added to the collection
  - Recomposition
    - As new exports become available in the container, collections are automatically updated with the new set
    - [ImportMany(AllowRecomposition=true)]
- IPartImportsSatisfiedNotification
  - Defines an OnImportsSatisfied method, which is called when all imports that could be satisfied have been satisfied

References

* MEF Overview

Refreshing attribute table data in ArcMap

I’ve been working on a piece of code that updates data in an attribute table in ArcGIS. As part of integration testing it was necessary to setup test data using ArcMap and then run the code which would modify other data. Unfortunately, even though the code appeared to be working, the data in ArcMap didn’t look like it had changed. Even closing the ArcMap attribute table view and reopening it didn’t have any effect. I recalled that ArcGIS can use versioning so it occurred to me that this might be part of the problem. I realised I needed to refresh the data but couldn’t remember how to do it. It turns out I needed to use the Versioning toolbar.

Right-click on the toolbar area at the top of the ArcMap screen.
In the pop-up menu scroll to the bottom and tick Versioning to open the Versioning tool bar.
Click the Refresh icon on the Versioning toolbar.

Figure 1 – Right-click on the toolbar area.

Figure 2 – The Versioning tools (the Refresh icon circled).

Service locator anti-pattern

Some time ago I blogged about Breaking dependencies on specific DI containers. That post was concerned with resolving a WCF service’s dependencies using an IInstanceProvider and how to break a tight decency on Unity. I chose to use the interface provided by the Common Service Locator library on CodePlex. But there is still a problem, and one that didn’t occur to me as I was writing the code because I was so wrapped up in the details of WCF.

In short, the use of a Service Locator is considered by some to be an anti-pattern. Having thought things over I think I have to agree.

The problem with a Service Locator is that it hides dependencies in your code making them difficult to figure out and potentially leads to errors that only manifest themselves at runtime. If you use a Service Locator your code may compile but hide the fact that the Service Locator has been incorrectly configured. At runtime, when your code makes use of the Service Locator to resolve a dependency that hasn’t been configured, an error will occur. The compiler wasn’t able to help. Moreover, by using a Service Locator you can’t see from the API exactly what dependencies a given class may have.

At least one proposed solution to this problem is to use Constructor Injection to explicitly pass all of a classes dependencies in when it is instantiated. I like the logic of this solution. You can see from the constructor signature exactly what dependencies a class has and if one is missing you’ll get a compiler error.

Let’s look at an example. Imagine some code like this:

class Program
{
    static void Main(string[] args)
    {
        var needsDependencies = new NeedsDependencies(new ServiceLocator());
        needsDependencies.CallMe();
    }
}

Can you tell from the above code what dependencies the NeedsDependencies class actually has? How can you tell if the ServiceLocator has been configured correctly without running the code and seeing if it fails? In fact you can only see the dependencies by looking at the internal code of NeedsDependencies:

public class NeedsDependencies
{
    private IServiceLocator _serviceLocator;

    public NeedsDependencies(IServiceLocator serviceLocator)
    {
        _serviceLocator = serviceLocator;
    }

    public void CallMe()
    {
        var dependency = (Dependency)_serviceLocator.GetInstance(typeof(Dependency));
        dependency.CallMe();
    }
}

From the source code we can see that the NeedsDependencies class actually needs an instance of the Dependency class. If we didn’t have access to the source code (e.g. we were using a library provided by a 3rd party) we’d be be none the wiser and would only see that the class needs a Service Locator.

If we remove the Service Locator and replace it with the actual dependencies we can see exactly what a class needs up front:

public class NeedsDependencies
{
    private Dependency _dependency;

    public NeedsDependencies(Dependency dependency)
    {
        _dependency = dependency;
    }

    public void CallMe()
    {
        _dependency.CallMe();
    }
}

To call this version we’d do something like this:

class Program
{
    static void Main(string[] args)
    {
        var NeedsDependencies = new NeedsDependencies(new Dependency());
        needsDependencies.CallMe();
    }
}

We can see right away that NeedsDependencies there is a dependency on the Dependency class – because the constructor makes this clear - and if we fail to provide one the compiler will complain.

It is important to note that we can still use IoC to resolve dependencies and we can even use a Service Locator. The point is we shouldn’t pass the Service Locator around but should restrict its use to one place in the application, usually at at start-up when dependencies are being resolved.

References

Service Locator is an Anti-Pattern, Mark Seemann.
Inversion of Control Containers and the Dependency Injection pattern, Martin Fowler

Best .Net podcasts and screen casts

Podcasts

Here are my favourite .Net podcasts:

.Net Rocks - .NET Rocks is a weekly talk show presented by Karl Franklin and Richard Campbell for anyone interested in programming on the Microsoft .NET platform. The show can get a bit chatty but overall it’s an entertaining and informative podcast.
Hanselminutes - .Net’s answer to the stand-up comedian Scott Hanselman. Short programs get right to the point.
Herding Code – Presented by an ensemble of .Net luminaries this podcast covers a lot of ground. Subject matter is diverse.
Deep Fried Bytes – As they say about themselves, “Deep Fried Bytes is an audio talk show with a Southern flavor hosted by technologists and developers Keith Elder and Chris Woodruff.”
Yet Another Podcast – This podcast has a strong Silverlight skew but covers other topics as well. Described as “Intelligent Technical Conversation Focused on Windows Phone 7, Silverlight, And Best Practices”.

Screen casts

The best screen casts come from conferences and Microsoft has done a great job at putting conference content online. Here are my favourites:

For a wide range of video content you can always stop by MSDN’s Channel9.

The mythical ‘attributed relationship’ in ArcGIS

Following on from a recent post on Relationship classes (and other things) in ArcGIS I thought I’d take a look at a confusing term that’s banded around in ArcGIS documentation, namely the attributed relationship.
Firstly I have noticed that there are 2 variations on the term: attribute relationship and attributed relationship. These 2 terms, although painfully similar, refer to quite different concepts.
The book Modeling Our World (Second Edition) draws a distinction between spatial relationships and attribute relationships:

“Modeling with spatial relationships
Your first choice for modeling relationships is to use the GIS to manage the spatial relationships inherent among features… [snip] …
Modeling with attribute relationships
There are also many associations that require attribute relationships to be defined. You can have an association between a geographic feature, such as a parcel of land, and a non-geographic entity, such as one or more parcel owners… [snip] …
Features in feature classes and rows in tables can have attribute relationships established between them through a common field called a key… [snip] … You can make these associations in several ways, including joining or relating tales temporarily in your map or by creating relationship classes in your geodatabase that maintains more permanent associations.” *

So here the definition of an attribute relationship is quite simple; an attribute relationship refers to a relationship between a feature class and a table. That’s it.
A few pages on you get this:

“Attributed relationship classes
An attributed relationship class is a type of relationship class that uses a table to store the key values for each individual relationship. Because of this, attributes can be optionally added to each relationship.” **

So, the definition of an attributed relationship is also simple but more specific; an attributed relationship is a relationship class with additional attributes other than just the keys.
I also came across this explanation in an old ESRI PDF:

“If a relationship class is not attributed and does not have a cardinality of many-to-many, it is stored as a set of foreign keys on the feature or object classes. Attributed relationships are stored in tables.”

Again this seems to confirm that an attributed relationship is just a relationship class (which I visualise as just a database link table) with some extra columns for additional attributes.
NB: If you are programming against ArcGIS you will find that there is an AttributedRelationship class, so from an API point of view the attributed relationship is a first class entity. I’m afraid I can’t comment on this aspect because I’ve yet to encounter it.
If you search ESRI documentation you’ll see the two terms coming up quite frequently. They express the notion that there can be relationships other than pure spatial ones between entities in the geodatabase and that relationship classes specifically can have attributes other than key values.

References

* p.78 – p79, Modeling Our World (Second Edition), Michael Zeiler, ISBN 978-1-58948-278-4
** p.85, Modeling Our World (Second Edition), Michael Zeiler, ISBN 978-1-58948-278-4

Unit of Work

Health warning – The code featured in this post hasn’t been tested. These are just my musings on the subject of the Unit of Work pattern.

The Unit of Work pattern has been around for quite a while. I first encountered this pattern when I adopted NHibernate as the basis of my data access code. In NHibernate the Session object represents a Unit of Work. I’ve been investigating the Microsoft Entity Framework again - the latest code first option in particular - and I’ve noticed that in many examples the Unit of Work pattern is being implemented explicitly. This is particularly prevalent when the Repository pattern is being used.

So, what is the Unit of Work pattern?

Martin Fowler defines Unit of Work in the following terms:

"Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

…When you're pulling data in and out of a database, it's important to keep track of what you've changed; otherwise, that data won't be written back into the database. Similarly you have to insert new objects you create and remove any objects you delete…

…A Unit of Work keeps track of everything you do during a business transaction that can affect the database. When you're done, it figures out everything that needs to be done to alter the database as a result of your work." *

Folwer goes on to suggest an interface for the Unit of Work pattern. In C# – and taking advantage of Generics - the interface might look something like this:

public interface IUnitOfWork<T>
{
    void RegisterNew(T instance);
    void RegisterDirty(T instance);
    void RegisterClean(T instance);
    void RegisterDeleted(T instance);
    void Commit();
    void Rollback();
}

For my purposes I think I can drop the RegisterClean(T instance) and Rollback() methods leaving:

public interface IUnitOfWork<T>
{
    void RegisterNew(T instance); // for tracking new instances
    void RegisterDirty(T instance); // for tracking updated instances 
    void RegisterDeleted(T instance); // for tracking deleted instances
    void Commit(); // to flush registered changes to the backing store
}

Many people seem to remove the Rollback() method and simply choose to not commit changes when something goes wrong. If you need to work with transactions you might choose differently.

Once you have an interface for your Unit of Work you need to decide what is going to implement it. I have observed 2 basic approaches to this:

Create a Unit of Work class that implements the IUnitOfWork<T> interface and pass this to a Repository. In this case the IUnitOfWork<T> acts as a wrapper or adapter for the NHibernate session or EF context etc.
Have your repository implement IUnitOfWork<T>. In this case the Repository is the Unit of Work.

I do not like the second option because I think it blurs the distinction between the 2 patterns. When we look at things from the point of view of separation of concerns or single responsibility the first option looks better to me.

So, given our IUnitOfWork<T> interface, lets imagine an NHibernate implementation of the Unit of Work as well as a Repository:

public class NHibernateUnitOfWork<T> : IUnitOfWork<T>
{
    private ISession _session;

    public NHibernateUnitOfWork(ISession session)
    {
        _session = session;
    }

    public void RegisterNew(T instance)
    {
        _session.Save(instance);
    }

    public void RegisterDirty(T instance)
    {
        _session.Update(instance);
    }

    public void RegisterDeleted(T instance)
    {
        _session.Delete(instance);
    }

    public void Commit()
    {
        _session.Flush();
    }
}

public interface IRepository<T>
{
    void Add(T instance);
    void Remove(T instance);
    void Update(T instance);
}

public class Repository<T> : IRepository<T>
{
    private IUnitOfWork<T> _unitOfWork;

    public Repository(IUnitOfWork<T> unitOfWork)
    {
        _unitOfWork = unitOfWork;
    }

    public void Add(T instance)
    {
        _unitOfWork.RegisterNew(instance);
    }

    public void Remove(T instance)
    {
        _unitOfWork.RegisterDeleted(instance);
    }

    public void Update(T instance)
    {
        _unitOfWork.RegisterDirty(instance);
    }
}

public class User
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

To make use of the Unit of Work we might do something like this:

//Configure NHibernate
var configuration = new Configuration();
configuration.Configure();

ISessionFactory sessionFactory = configuration.BuildSessionFactory();
ISession session = sessionFactory.OpenSession();

// Create and use a Unit of Work
IUnitOfWork<User> unitOfWork = new NHibernateUnitOfWork<User>(session);
IRepository<User> repository = new Repository<User>(unitOfWork);

var user = new User {FirstName = "Andy", LastName = "French"};

repository.Add(user);
// Maybe do other things with the repository here...
            
unitOfWork.Commit();

Note that in this example we are simply delegating to the NHibernate session which already represents a Unit of Work. Rather than do this some people like to keep lists of the affected entities in the Unit of Work class and then process the lists as a batch in the Commit() method. Perhaps such an implementation would look like this:

public class NHibernateUnitOfWork<T> : IUnitOfWork<T>
{
    private ISession _session;
    private IList<T> _added = new List<T>();
    private IList<T> _deleted = new List<T>();
    private IList<T> _updated = new List<T>();

    public NHibernateUnitOfWork(ISession session)
    {
        _session = session;
    }

    public void RegisterNew(T instance)
    {
        _added.Add(instance);
    }

    public void RegisterDirty(T instance)
    {
        _updated.Add(instance);
    }

    public void RegisterDeleted(T instance)
    {
        _deleted.Add(instance);
    }

    public void Commit()
    {
        using (ITransaction transaction = _session.BeginTransaction())
        {
            ProcessAdded();
            ProcessUpdated();
            ProcessDeleted();

            transaction.Commit();
        }
    }

    private void ProcessAdded()
    {
        foreach(var instance in _added)
        {
            _session.Save(instance);
        }
    }

    private void ProcessUpdated()
    {
        foreach (var instance in _updated)
        {
            _session.Update(instance);
        }
    }

    private void ProcessDeleted()
    {
        foreach (var instance in _deleted)
        {
            _session.Delete(instance);
        }
    }
}

Note that in this version we’ve been able to use NHibernate transaction support. Choose your poison.

References

* Unit of Work, Martin Fowler, http://martinfowler.com/eaaCatalog/unitOfWork.html

andyfrench.info

“If it hurts, do it more frequently, and bring the pain forward.” - Jez Humble

Wednesday 25 May 2011

SQL revision - the GROUP BY clause

Partitions

The GROUP BY clause

References

Monday 23 May 2011

What is the difference between the various WCF programming models?

References

Wednesday 18 May 2011

MEF basics

What is MEF?

Basic terms and definitions

References

See also

Tuesday 17 May 2011

Refreshing attribute table data in ArcMap

Service locator anti-pattern

References

Monday 16 May 2011

Best .Net podcasts and screen casts

Podcasts

Screen casts

Thursday 5 May 2011

The mythical ‘attributed relationship’ in ArcGIS

References

Wednesday 4 May 2011

Unit of Work

References

Popular Posts

Categories

Blog Archive