Azure Table Storage with Repository Pattern

I’m a great fan of Azure, I love the flexibility it gives, and the option to start small with low costs and scale as needed. Azure Table Storage is a great example of this - as you only pay for what you need. I’m also a fan of the Repository Pattern, and so in this post we’re going to look at how we can wrap a repository pattern around Azure Table Storage.

I was recently engaged to build such a pilot application.  The brief was that to begin with the costs had to be as low as possible. In particular we needed cheap data storage but with the potential that the requirement could be but terabytes worth of storage in the future.

These two requirements pretty much ruled out Azure SQL for a backend.  Although Azure SQL is great, it isn’t the cheapest option, and there are finite limits on the sizes of databases you can have.  So if we did go down that route we’d need to look at partitioning, sharding or some other managed way of splitting the data across databases.

Finally, because the majority of the data would be rows of transactions Azure Table Storage seemed to fit the bill nicely.

And so we started to design/build a new WebAPI with an Azure Table Storage backend.

There is also something else I love – SOLID principals.  Requirements change.  And I like to be able to change one bit of my code without it causing headaches throughout the application.

IF I were going to use SQL I would employ a Unit Of Work/Repository pattern and Entity Framework for the data persistence layer  - and so I decided I wanted to employ the same approach to this project.

I want Table Storage at the back, with a WebAPI in the front, however I wanted the WebAPI to be totally ignorant about storage mechanism used – after all, it may turn out that I need to change to something else at some point in the future.

So what I need to do is create a Table Storage provider that I can pass any Entity too and have it perform the relevant CRUD operations.

I then want to stick a repository and Unit Of Work in front of that provider.  That way if I change to SQL in the future I just have to swap out my repository implementation for one that uses Entity Framework instead.

Finally, I could also just implement the Azure stuff directly within a repository – but by putting it in a provider and referencing that provider it allows me more flexibility if I wanted to mix providers – e.g. half SQL half noSQL etc.

One other point before I begin – Azure can store different ‘models’ all in the same table. So we could store customers AND invoiceLines in the same table!  Therefore we have a ParitionKey that allows you to separate the different models in some way.  You don’t HAVE to do it like this – you could still have a table per entity, but my app is going to be multi-tenant and so I want one table per tenant – the discussions as to why and what is best are quite in depth, and in fact there is a very good discussion on it in the Patterns and Practices Library.

For the purposes of this article we are going to have a ‘TenantId’ and this will be used to define our table name, and each Entities name will be the ParitionKey.  In this way if this were a traditional SQL app the ParitionKey would in effect be our table names.  I hope that makes sense!

Building the Repository

This won’t be a complete blow by blow walkthrough.  I am going to assume you know some basics – how to create a project, a solution etc.

OK, so first I need my backend.  And as stated I want to use Windows Azure Table Storage rather than SQL as I want something very flexible, but cheap.  I’m also not too fussed about relationships (at least for now!).

So first of all we need an Azure Account and in particular a Storage Account.  There are lots of tutorials on how to do this – but in a nutshell, login to Windows Azure – click New – > Data Services –> Storage.

Once created you’ll need your access keys.  With the Storage Account selected just click Manage Access Keys.

Next lets create our WebAPI – in Visual Studio create a new Project.  As this will be split into a number of projects first create an Empty Solution.

Now add a new project to the solution.  Select Windows from the list of templates and ‘Class Library’ from the options presented.  Call it ‘Model’ and click OK.

Create a class called Customer and lets create some basic stuff in it.

namespace Model
{
    public class Customer
    {
        public Guid CustomerId { get; set; }
        public string Name { get; set; }
        public string Company { get; set; }
        public string Email { get; set; }
        public string Telephone { get; set; }
        public string VATNumber { get; set; }
    }
}

OK, so now we have the model we need a way of persisting it.

Using Azure Tables is fairly straight forward.  You create a CloudStorageAccount object, pass to it a connection string, then create a CloudTableClient and a Cloud table.  The connection string info comes from the ‘Manage Access Keys’ you did in the Azure Portal.

So for example:

string connectionString = "DefaultEndpointsProtocol=http;AccountName=<your storage account>;AccountKey=<your account key>";
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
            CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
            CloudTable table = tableClient.GetTableReference("MyTable");
            table.CreateIfNotExists();

You then use various methods to persist and read the data.

However what I don’t want to to have to do this manually for every model.  What I really want to do is use the repository pattern to create a base that I can pass ANY model to and have figure out what to so with it.

So what we need is an AzureTableStorageProvider that we can wrap all our functionality in, and then expose it via a Repository class.  So the first job is to create our provider.

Create a new Class Library Project called AzureTSProvider, and then create a new class called TableSet.

You also need to add the Azure NuGet packages – so right click the new project, select ‘Manage NuGet Packages’ and search for Windows Azure Storage.  Click Install, accept the Licenses and you’re good to go.

This is going to be a Generic Class – this allows us to pass in ANY class we construct, and act on it regardless.  We do this by setting in the class declaration and stating ‘where TEntity : class’.  We need a new() keyword in the declaration as well to tell our generic to create a new object it the one passed to it is null.

- NOTE – it doesn’t have to be TEntity by the way – you can call it whatever you like!

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace AzureTSProvider
{
    public class TableSet<TEntity>
    where TEntity : class,
        new()
    {
        private List<dynamic> internalList;
        private string partitionKey;
        private string tableName;
        private string connectionString;

        internal CloudTableClient tableClient;
        internal CloudTable table;

        public TableSet(string connectionString, string tableName)
        {
            this.partitionKey = typeof(TEntity).Name;
            this.tableName = tableName;
            this.connectionString = connectionString;

            //pluralise the partition key (because basically it is the 'table' name).
            if (partitionKey.Substring(partitionKey.Length - 1, 1).ToLower() == "y")
                partitionKey = partitionKey.Substring(0, partitionKey.Length - 1) + "ies";

            if (partitionKey.Substring(partitionKey.Length - 1, 1).ToLower() != "s")
                partitionKey = partitionKey + "s";

            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);
            tableClient = storageAccount.CreateCloudTableClient();
            table = tableClient.GetTableReference(tableName);
            table.CreateIfNotExists();
        }

        public virtual TEntity GetByID(object id)
        {
            var query = new TableQuery().Where(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, id.ToString()));
            var result = table.ExecuteQuery(query).First();

            return result;
        }

        public virtual List<TEntity> GetAll()
        {
            var query = new TableQuery().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey)); //get all customers - because Customer is our partition key
            var result = table.ExecuteQuery(query).ToList();
        }

        public virtual void Insert(TEntity entity)
        {
            TableOperation insertOperation = TableOperation.Insert(entity);
            table.Execute(insertOperation);
        }
    }
}

**To DTO or not to DTO?**

I’m going to fast forward a bit here to explain why I did what I’m going to do next.  Imagine I’ve hooked this into by repository and Unit Of Work, and then a WebAPI Controller that creates a customer object and tries to persist it through my framework.

I immediately ran into an issue.  You see in order for table storage to work with our model the entities we create have to implement inherit from TableEntity.  I don’t want to do this.  Why?  Because I want my API to have no reference to Azure whatsoever.  Or even my Repository.  Everything and anything to do with the persistence of the actual data needs to be neatly encapsulated in my provider classes.

OK, so what we need is a Data Transformation Object (DTO).  The DTO will inherit from TableEntity, and I could either manually copy propereies between my Entity and the DTO or use something like Automapper.

But now my issue is that I have to pass in a DTO along with my Entity.  We could just say this is a requirement but apart from not being very neat I’m basically having to duplicate every entity I create with a DTO version – all the properties would be identical.

So I decided that the best way forward would be to just create a DTO on the fly using the properties of my passed in entity.

Thank goodness for Dynamic Objects – as these allow me to do just this!

First I create an empty ‘base DTO’ which is a class that inherits from DynamicObject and ITableEntity.  It creates us an object that implements everything Azure wants but because it inhertits from DynamicObject it is expandable and so allows us to add more properties to it  This is created in my provider library – so create a new class called TableEntityDTO and paste in the following;

using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Table;
using System;
using System.Collections.Generic;
using System.Dynamic;

namespace AzureTSProvider
{
    public class TableEntityDTO   : DynamicObject,ITableEntity
    {
        #region ITableEntity properties
        // Summary:
        //     Gets or sets the entity's current ETag. Set this value to '\*' in order to
        //     blindly overwrite an entity as part of an update operation.
        public string ETag { get; set; }
        //
        // Summary:
        //     Gets or sets the entity's partition key.
        public string PartitionKey { get; set; }
        //
        // Summary:
        //     Gets or sets the entity's row key.
        public string RowKey { get; set; }
        //
        // Summary:
        //     Gets or sets the entity's time stamp.
        public DateTimeOffset Timestamp { get; set; }
        #endregion

        // Use this Dictionary store table's properties. 
        public IDictionary<string, EntityProperty> properties { get; private set; }

        public TableEntityDTO()
        {
            properties=new Dictionary<string,EntityProperty>();
        }

        public TableEntityDTO(string PartitionKey, string RowKey)
        {
            this.PartitionKey = PartitionKey;
            this.RowKey = RowKey;
            properties = new Dictionary<string, EntityProperty>();
        }

        #region override DynamicObject's mehtods
        public override bool TryGetMember(GetMemberBinder binder, out object result)
        {
            if (!properties.ContainsKey(binder.Name))
                properties.Add(binder.Name, ConvertToEntityProperty(binder.Name, null));
            result = properties\[binder.Name\];
            return true;
        }

        public override bool TrySetMember(SetMemberBinder binder, object value)
        {
            EntityProperty property = ConvertToEntityProperty(binder.Name, value);

            if (properties.ContainsKey(binder.Name))
                properties\[binder.Name\] = property;
            else
                properties.Add(binder.Name, property);

            return true;
        }

        public bool TrySetMember(string binder, object value)
        {
            EntityProperty property = ConvertToEntityProperty(binder, value);

            if (properties.ContainsKey(binder))
                properties\[binder\] = property;
            else
                properties.Add(binder, property);

            return true;
        }

        #endregion

        #region ITableEntity implementation

        public void ReadEntity(IDictionary<string, EntityProperty> properties, OperationContext operationContext)
        {
            this.properties = properties;
        }

        public IDictionary<string, EntityProperty> WriteEntity(OperationContext operationContext)
        {
            return this.properties;
        }

        #endregion

        /// <summary>
        /// Convert object value to EntityProperty.
        /// </summary>
        private EntityProperty ConvertToEntityProperty(string key, object value)
        {
            if (value == null) return new EntityProperty((string)null);
            if (value.GetType() == typeof(byte\[\])) 
                return new EntityProperty((byte\[\])value);
            if (value.GetType() == typeof(bool)) 
                return new EntityProperty((bool)value);
            if (value.GetType() == typeof(DateTimeOffset)) 
                return new EntityProperty((DateTimeOffset)value);
            if (value.GetType() == typeof(DateTime)) 
                return new EntityProperty((DateTime)value);
            if (value.GetType() == typeof(double)) 
                return new EntityProperty((double)value);
            if (value.GetType() == typeof(Guid)) 
                return new EntityProperty((Guid)value);
            if (value.GetType() == typeof(int)) 
                return new EntityProperty((int)value);
            if (value.GetType() == typeof(long)) 
                return new EntityProperty((long)value);
            if (value.GetType() == typeof(string)) 
                return new EntityProperty((string)value);
            throw new Exception("This value type" + value.GetType() + " for " + key);
            throw new Exception(string.Format("This value type {0} is not supported for {1}",key));
        }

         /// <summary>
         /// Get the edm type, if the type is not a edm type throw a exception.
         /// </summary>
        private Type GetType(EdmType edmType)
        {
            switch (edmType)
            {
                case EdmType.Binary : 
                    return typeof(byte\[\]);
                case EdmType.Boolean : 
                    return typeof(bool);
                case EdmType.DateTime : 
                    return typeof(DateTime);
                case EdmType.Double : 
                    return typeof(double);
                case EdmType.Guid : 
                    return typeof(Guid);
                case EdmType.Int32 : 
                    return typeof(int);
                case EdmType.Int64 : 
                    return typeof(long);
                case EdmType.String : 
                    return typeof(string);
                default: throw new TypeLoadException(string.Format("not supported edmType:{0}" ,edmType));
            }
        }
        }
}

Now in my TableSet class I create two methods. The first is CreateDTO – which takes my source entity, creates a dynamic entity, copies all the properties from my source entity and then copies all the properties from my DTO entity.

I also have a ‘GetId’ method that I use to scan each source entity property to see if it’s an ID – we’ll need this for the ‘RowKey’ that Azure needs to create a ‘primary key’.  It’s fairly simplistic but suits my needs.

Finally I have a StripDTO method that essential maps our DTO back to the base Entity for return queries.

#region object mapping
        dynamic CreateDTO(object a)
        {
            TableEntityDTO dto = new TableEntityDTO();
            object rowKey = null;

            Type t1 = a.GetType();
            Type t2 = dto.GetType();

            //now set all the entity properties
            foreach (System.Reflection.PropertyInfo p in t1.GetProperties())
            {
                dto.TrySetMember(p.Name, p.GetValue(a, null) == null ? "" : p.GetValue(a, null));
                if (IsId(p.Name))
                    rowKey = p.GetValue(a, null);
            }

            if (rowKey == null)
                rowKey = Guid.NewGuid();

            dto.RowKey = rowKey.ToString();
            dto.PartitionKey = partitionKey;

            return dto;
        }

        TEntity StripDTO(Microsoft.WindowsAzure.Storage.Table.DynamicTableEntity a)
        {
            TEntity result = new TEntity();

            Type t1 = result.GetType();
            var dictionary = (IDictionary<string, EntityProperty>)a.Properties;

            foreach (PropertyInfo p1 in t1.GetProperties())//for each property in the entity,
            {
                foreach (var value in dictionary)//see if we have a correspinding property in the DTO
                {
                    if (p1.Name == value.Key)
                    {
                        p1.SetValue(result, GetValue(value.Value));
                    }
                }

            }

            return result;
        }

        private object GetValue(EntityProperty source)
        {
            switch (source.PropertyType)
            {
                case EdmType.Binary:
                    return (object)source.BinaryValue;
                case EdmType.Boolean:
                    return (object)source.BooleanValue;
                case EdmType.DateTime:
                    return (object)source.DateTimeOffsetValue;
                case EdmType.Double:
                    return (object)source.DoubleValue;
                case EdmType.Guid:
                    return (object)source.GuidValue;
                case EdmType.Int32:
                    return (object)source.Int32Value;
                case EdmType.Int64:
                    return (object)source.Int64Value;
                case EdmType.String:
                    return (object)source.StringValue;
                default: throw new TypeLoadException(string.Format("not supported edmType:{0}", source.PropertyType));
            }
        }

        private bool IsId(string candidate)
        {
            bool result = false;

            if (candidate.ToLower() == "id")
                result = true;

            if (candidate.ToLower().Substring(candidate.Length - 2, 2) == "id")
                result = true;

            return result;
        }

        # endregion

Now we update our CRUD methods to use the DTOs thus;

public virtual TEntity GetByID(object id)
        {
            var query = new TableQuery().Where(TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.Equal, id.ToString()));
            var dto = table.ExecuteQuery(query).First();
            TEntity mapped = StripDTO(dto);

            return mapped;
        }

        public virtual List<TEntity> GetAll()
        {
            List<TEntity> mappedList = new List<TEntity>();
            var query = new TableQuery().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey)); //get all customers - because Customer is our partition key
            var result = table.ExecuteQuery(query).ToList();

            foreach (var item in result)
            {
                mappedList.Add(StripDTO(item));
            }
            return mappedList;
        }

        public virtual void Insert(TEntity entity)
        {
            dynamic mapped = CreateDTO(entity);
            TableOperation insertOperation = TableOperation.Insert(mapped);
            table.Execute(insertOperation);
        }

OK, the final step for the provider side is to create a Context class that our Repository will use.  Create a new Class called TSContext and enter the following;

using System;

namespace AzureTSProvider
{
    public abstract class TSContext
    {
        private string tableName { get; set; }
        private string connectionString { get; set; }

        public TSContext(string connectionString, string tableName)
        {
            this.tableName = tableName;
            this.connectionString = connectionString;
        }

        public virtual TableSet<TEntity> Set<TEntity>()
            where TEntity : class, new()
        {
            var set = new TableSet<TEntity>(connectionString, tableName);

            return set;
        }
    }
}

The final step on our journey is create our abstracted Repository base, our actual CustomerRepository and our UnitOfWork – I’m not going to go into too much detail with these as there are lots of articles that really go into the nitty gritty of them.

Create a new Project call DAL.  Add in references to our Model and AzureProvider projects. Create a class called Azure Context that inherits from our TSContext and paste in:

using AzureTSProvider;
using Model;
using System;

namespace DAL
{
    public class AzureContext : TSContext
    {
        private static string tableName;
        private static string connection;

        public AzureContext(string connectionString, string table)
            : base(connectionString, table)
        {
            tableName = table;
            connection = connectionString;
        }

        public TableSet<Customer> Customers { get; set; }
    }
}

Create a new class called RepositoryBase.

using AzureTSProvider;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace DAL
{
    public abstract class RepositoryBase<TEntity> where TEntity : class, new()
    {
        internal AzureContext context;
        internal TableSet<TEntity> dbset;

        public RepositoryBase(AzureContext context)
        {
            this.context = context;
            this.dbset = context.Set<TEntity>();
        }

        public virtual TEntity GetByID(object id)
        {
            return dbset.GetByID(id);
        }

        public virtual List<TEntity> GetAll()
        {
            return dbset.GetAll();
        }

        public virtual void Insert(TEntity entity)
        {
            dbset.Insert(entity);
        }

    }
}

Followed by our CustomerRepsitory that inherits from the base class.

using Model;
using System;

namespace DAL
{
    public class CustomerRepository : RepositoryBase<Customer>
    {

        public CustomerRepository(AzureContext context)
            : base(context)
        {
            if (context == null)
                throw new ArgumentNullException("Context cannot be null!");
        }

    }
}

Last but not least is our Unit of work that ties it all together.  Note I am manually entering my Azure details here.  You’d normally pass it through from your WebAPI or whatever else calls it, make a call from a config file, or use Dependency Injection.  The ‘tenantId’ – which is used as our partition key once we get there, would come from some other table – for example when a tenant logs in we’d grab a unique TEXT string (Partition keys ONLY accept alphanumerics and cannot start with a number, so GUIDS or ints are no good!) for that tenant and pass it through – e.g. the users email address with all the non-alphanumerics stripped out.

using System;

namespace DAL
{
    public class UnitOfWork: IDisposable
    {
        AzureContext context;

        private CustomerRepository customerRepository;

        public UnitOfWork(string tenantId)
        {
            string connectionString = "DefaultEndpointsProtocol=http;AccountName=<my storage name>;AccountKey=<my account key>";

            this.context = new AzureContext(connectionString, tenantId);
        }

        public CustomerRepository CustomerRepository
        {
            get
            {
                if (customerRepository == null)
                    customerRepository = new CustomerRepository(context);

                return customerRepository;
            }
        }

        public void Dispose()
        {
            this.Dispose(true);
            GC.SuppressFinalize(this);
        }

        protected virtual void Dispose(bool disposing)
        {
            if (!disposing)
            {
                return;
            }
        }
    }
}

We are now good to go! To use it we just make the following call form our WebAPI or where ever:

UnitOfWork proxy = new UnitOfWork("test001");
            Customer customer = new Customer()
            {
                Company = "Square Connection Ltd",
                Email = "Bretthargreaves@hotmail.com",
                Name = "Brett Hargreaves",
                Telephone = "12345 12345678",
                VATNumber = "123 456789 GB"
            };
            proxy.CustomerRepository.Insert(customer);

Obviously all this is just a starting point.  One of the biggest glaring differences between this and a ‘normal’ repository is that changes are immediately persisted.  What we need to do next is implement some change tracking in our Azure Provider and a ‘SaveChanges’ method that then commits everything.

But that’s a blog for another day!