Move Data Securely with Azure Data Factory

Azure Data factory is a great tool for moving and transforming data between different storage and database mechanisms in Azure.

A common scenario, and one that’s easy to use and setup, is to import data from a CSV file into SQL Azure or CosmosDb.  In an on-premise world you could use a BULK import statement to perform this action, however as you move to SQL Azure and need to start hooking into other Azure services can be a bit more complex.

As an overview the process to perform imports in this way is;

  1. Create a master encryption key for your database
  2. Create a Database Scoped Credential using a shared access signature generated on your storage account.
  3. Create an external data source using the scoped credential generated above.
  4. Perform your BULT INSERT routine.

There are a number of issues with this.  The first is that it’s very difficult to debug – because when anything goes wrong the messages you received are not particularly verbose.

The second is one of security.  Many companies require services such as storage accounts and SQL Servers to be locked down to only allow access from specific IP addresses or ranges.  In addition the use of shared signatures is difficult to maintain as once they expire you must run through the above setup process again with the newly generated SAS in or to re-run an import.

Luckily Azure Data Factory overcomes all these issues for simply data copes, and with the use of Managed Identities or Service Principals, and the ability to specify Trusted Azure services, or even open set ranges for Data Factory, many security concerns can also be overcome.

Let’s walkthrough copying data from a securely stored CSV file into a SQL Azure database using a secure connection and a Managed Identity.

I’m going to assume you’ve already created an Azure Data factory (it’s very simple to do, in the Azure Portal click Create, search for an choose Azure Data factory and follow the wizard).

I’ll also assume you have a Storage account and Azure SQL Database created (again all easily created through the portal using the Create wizard).

Before we create a pipeline, I’m going to make sure my Storage account and SQL Azure databases are all locked down to only allow specific networks to access them.  Go to your storage account and change Allow Access to Selected Networks. 

I’ve also added my client IP just so that I can view and load data from my computer.

In other words, I’ve basically blocked all access except from my computer.

Now I’m going to create a simple csv file and upload it to an uploads container in my storage account.  File is a simple text file with the following contents;

Id,FirstName,Surname,DOB

1,"Keith","Richards",18/12/1943

2,"Mick","Jagger",26/07/1943

Make sure the file is well formatted – i.e. proper quotes around strings, no spaces between commas etc.

I’ll then upload that to my storage account.

Now I’m going to go my Azure Factory by browsing too

https://adf.azure.com/

You’ll be asked to select your subscription and your data factory.

The first time you sign in you’ll be presented with a Welcome screen – click ‘Create a Pipeline’

What we want to do first though is create a couple of Connections.  Connection define how we will connect to our source and destination endpoints – so we’ll want one for our Storage account, and one for our SQL Servers. 

Down the bottom left you will see an option for Connections and Triggers – click Connections.

 At the top of the new pane is a +New button – click it.

Select Azure Blob Storage.

Fill in the details – give it a name, set Authentication Method to Account Key,  then select the storage account in the options that appear.  Finally, at the bottom click ‘Test connection’.  It should fail, and if you then click more it will tell you the connection is forbidden.

We need to do two things to get access – first we need to open the firewalls to allow ADF access to our account, and secondly we need to setup a Managed Identity to access it.

First, change the Authentication Method to ‘Managed Identity’ – you’ll need to select the storage account again.  However it will now show you the name and Id of the ADF Managed Identity that your ADF is using.

Now switch back to the Azure portal and go back to the Firewalls settings for your storage account.  Click the tick box ‘Allow Trusted Microsoft services’ and click Save.  This rule allows certain Microsoft hosted and managed services access to your storage account – for details of those services click here https://docs.microsoft.com/en-gb/azure/storage/common/storage-network-security#exceptions

Now, still on your Storage Account, go to the Access Control (IAM) pane.  Click Add a Role Assignment.  Set the role to Storage Blob Contributor, set Assign Access to as Data Factory, select your subscription. Select the Managed Identity that appears and click Save.

Switch back to the Data Factory portal, and click the Test Connection button again – the connection should now be successful – click Create.

Next we want to setup a connection to SQL Server.  This time we’ll setup access to our SQL Server.  In the Azure portal go to your SQL Server and again go to the firewall blade.

First we need to set the ‘Deny public network access’ to yes.  However I’m also going to add my Client IP so that I can still logon from my computer. 

Now go to the Data Factory and confirm access is blocked.  In the Data Factory go to Connections as before, click +New and choose Azure SQL Database.

As with the storage account fill in the details and select the sql you want to connect too.  For now choose SQL Authentication and use the sql username and password you used when creating the SQL Server (or some other user if you’ve set one up).  Again, click Test Connection and it should fail.

Now let’s open things up.  Go back to the Azure Portal, into your SQL Server Firewalls blade.

You’d be forgiven for wanting to choose the option ‘Allow Azure services and resources’ – and although this would work, the problem is that this does NOT only allow trusted Microsoft services such as ADF, it actually opens your SQL Server to the ENTIRE Azure network- this means ANY service, be it your own or another Azure customer could get network access to your SQL Server!  You could argue the authentication will secure you, but for many companies this is simply not good enough.

Instead, what we can do is open the specific IP ranges for ADF – these are broken down by region and can be found here;

https://docs.microsoft.com/en-gb/azure/data-factory/azure-integration-runtime-ip-addresses

Choose the region that you built your data factory in.

The addresses are listed as CIDR ranges, and we need to enter them as start-to-end.  So we need to convert the CIDR ranges.

Basically the /2x notation tells you how many address to add on to the starting ip – /25 is 128 address, /26 is 64 address and /28 is 16 addresses.

Thus, if we take UK South as an example – the ranges are:

51.104.24.128/25 is 51.104.24.128 to 51.104.24.143
51.104.25.0/26 is 51.104.25.0 to 51.104.25.63
51.104.9.32/28 is 51.104.9.32 to 51.104.9.47

If you can’t be bothered doing the math you can use this handy tool here

https://mxtoolbox.com/subnetcalculator.aspx

Once you’ve worked out the ranges add them to the Client IP address and click Save.

Next we need to grant access to our managed Identify.  Again, this is different to storage – we need to grant access to the database, not the Azure Portal – therefore we don’t use the Azure Portal to do it, we need to create a SQL Login using T-SQL.

To do this, we first need to set our SQL database to use an Active Directory Account.

In Azure Portal, still on our SQL Server, select the Active Directory Admin blasé.  At the top click Set Admin, and then choose and Active Directory account (note, this has to be an Active Directory account linked to your Azure Tenant for example an Office 365 account – you can’t use a guest account or ‘hotmail’ account).  If you don’t have one, you’ll need to create yourself an AD account within your tenant.

You now need to connect to your SQL Server from a tool such as SQL Studio Manager, or Azure Data Studio.  Connect to the SQL Server using the Domain Admin credentials you just set.

You need to run a query against the database – NOT the master database.

First you create an External Provider login, and use the Managed Identity you used earlier when setting the IAM role for the storage account – e.g.

CREATE USER [adf-cloudguru] FROM EXTERNAL PROVIDER;

Next you add that user to a database role – for simplicity I’m using db_owner, but you should use least privalidge.

ALTER ROLE [db_owner] ADD MEMBER [adf-cloudguru];

With that done, we can go back to our Azure Data Factory, tell the conneciton to use Manageed Identify and retry the connection.

Hopefully this will now be successful, and you can Create the connection.

Before we go onto the creating our pipeline I’m we need to create a Table to import our data into.  So go to your SQL Editor (SQL Management Studio, Azure Data Studio), and create a Rockers table in your database: 

CREATE TABLE [dbo].[Rockers](

    [Id] [int] NULL,

    [FirstName] [nvarchar](max) NULL,

    [Surname] [nvarchar](max) NULL,

    [DOB] [smalldatetime] NULL

) 

Once done, go back to the Azure Data Factory. 

To effectively ‘save’ anything in ADF you need to publish it – so at the top of the page click the Publish all button.  You’ll be shown the changes you made – click publish again.

Creating a Data Copy Pipeline

With our connections setup we can go ahead and create our actual copy pipeline.

To the left of the screen, you see Factory Resources, with Pipelines, Datasets and Data flows underneath – click the + to the right of the filter box between the menu heading and Pipelines – then choose Pipeline from the tear off menu.

Call it CopyRockers.

To the left is now a list of avtivities – we want the top one, Move & Transform.  Expand that menu then click and Drag Copy Data over to the workspace on the right.

Give the Copy Data Activity a name, then click the source tab.

The source has a drop down list to select a dataset – but it’s empty, so click New.

A screen similar to the Connection window appears – select Blob Storage and continue.  Then select Delimited and continue.

In the next window give our source a name, call it RockersCSV, in Linked Service select the MySecureStorage account we created earlier.  The click the Browse button and navigate to the file we uploaded earlier.  Finally tick the ‘First row as header’ checkbox.

Click OK.

You’re taken back to the main screen, and there is a now a button the Preview Data – click this to make sure it loads the file OK.

Now click Sink.

As with the Source, we need to create a new Sink – this is our destination.  Click Add, and Select Azure SQL Database from the options.

Call it RockersTable, for the Linked Service select MySecureSQLServer connection you created earlier, then select the rockers table from the drop down list.  Ensure Import Schema from Connection/store is selected. 

If you table isn’t listed it’s because you need to create it – so go back to earlier in this post and create the table using the supplied script.

Once everything is setup click OK.

Now click Mapping.

This screen allows us to map our source and destination fields.  Start by clicking Import Schemas – this will then attempt to match up the fields automatically.  You may need to tweak them a bit.

An import part is setting the source data types – as everything defaults to String.  For the Id we want to change this to an Int32, and the DOB want’s to be changed to a DateTime.  Datetimes are particulay hard work – and you often need to specifiy the format, so to the right of DateTime there’s a double down arrow – click that for more options.  In the format box set your date time format – for me that’s dd/MM/yyyy.

Now Click Publish all so save it.

You would normally ‘trigger’ the run, but first just click ‘Debug’.

This will run the import – and if all goes well, you’ll get a success.  If not, the output window will help you investigate why it’s failed.

Finally, let’s create a trigger.

Go to Triggers – New/Edit, then in the next window click the drop down and select +New.

Call our trigger – RockersUploadTrigger.

We want to set the trigger so that everytime a rockers csv is uploaded it runs the import.

Set the following:

Type : Event

Azure Subscription : Your subscription

Storage account name : Select you storage account

Container Name: your uploads container

Next we’ll tell it that whenever we upload a file that starts with ‘rockers’ and ends in ‘csv’ – in other words this wil triger when we upload rockers1.csv or rockers2.csv etc.  However, because our pipeline is set to specifically import the file rockers.csv – this will actually only work properly if we always upload the file as rockers.csv.

Finally set the Event to Blob created.

Click continue.

A data preview screen will show that your existing rockers.csv file would be a match.  Click Continue.  You’ll then be prompted to enter any paramerts (we don’t have any) that you need to publish your changes – click OK.

Then click publish all.

Now, go to SQL Studio Manager and query the Rockers table – you should see the original data.

Now edit your CSV file with the following data;

Id,FirstName,Surname,DOB

1,"Kurt","Cobain",20/02/1967

2,"Krist","Novoselic",16/05/1965

Save the file as rockers.csv again, and then re-upload it to your storage account.

Now switch back to your Data Factory, and down the left hand side you will see a Monitor option – click that.

That will show us a list of runs, and if any fail a reason why they failed.

If we re-query our table in SQL we will now see the additional records.

The final change I am going to make is to set the import file as a wild card – as mentioned earlier, so far the pipeline is expecting a file called rockers.csv, so let’s change this so that we can upload, for example, rockers2.csv.

Go back to Author and then to our copy pipeline and edit the source.

Change the File path type from ‘File path in dataset’ to Wildcard file path, leave the wildcard folder path empty and change the wildcard file name to rockers*.csv.

Now delete any existing files in the storage account, and upload a new file called rockers2.csv with the following contents;

Id,FirstName,Surname,DOB

1,"Freddy","Mercury",05/09/1946

2,"Brian","May",19/07/1947

And upload that instead.

Wait a couple of minutes then check the Monitor to ensure the job ran OK, and re-query the table to confirm the new data.

Conclusion

Data Factory is a much easier way to ingest data from CSV files to data stores – which is a common practice for many organisations.

Although much of this can be done within SQL itself, using ADF enables you to create repeatable, automatic pipelines in a secure manner.  And of course the Data Copy option is the simplest of it’s abilities – ADF provides a lot of power for much more complex transformations and data movements.

If you’d like to learn more about Azure gaining a Certification is the best way – check out my article on boosting your career with the Azure Solutions Expert Certification here. 

Are you a Coder, a Hacker or a Developer?

Coder, Hacker or Developer – which do you think best sums you up?  I’ll wager that most would consider themselves a ‘Developer’.

But in my many years experience – from learning the basics of coding, to my present day role of Architect and teacher/mentor – I’ve come to the conclusion that actually the three titles are all part of the career path in software development.

For me the titles perform a very good job of defining the role, but I feel the perceptions associated with each can be misleading.

Coder

Coder’s code.  To code means to string some computer language together in logical structures to perform some function.  That’s it.  And in today’s modern Agile world this is an extremely important skill.  Agile is to some extent about breaking down the work into manageable tasks or ‘User Stories’.  As a User I want to do x.

So a Coder will be assigned a task – write some code that completes a function.

All these functions are eventually strung together to making the whole.  But essentially most tasks come down to coding – hence why it’s the first step of all developers.

Hacker

To some a hacker is cool.  To Business & Government hackers are evil!

But let’s examine the term.  Hacker.  To ‘hack’ into an existing system and, well, bend it to your will.

Yes, in the wrong hands this skill can cause some real damage – but so can ANY tool in the wrong (or inept) hands.

But to me Hackers represent the next level of coding skills.  Hackers can obviously code, but hackers can also look into the base foundations that all coders use day in day out, and enhance them. Hackers are able to get right there into the code and make it do things you wouldn’t normally be able to do – how?  Because Hackers truly understand the underlying system they are hacking.

This is also an incredibly important skill.  Rarely do business requirements fit any standard mould.  I have heard coders say – ‘it can’t be done, the system just won’t allow that’.  The Hacker will see the challenge.  They’ll learn the underlying mechanism and find away to make it do what they need it too.

So don’t ever think hacker is a detrimental – term – for me it’s the next level in becoming a truly greater developer.

Developer

For all their technical ability, Hackers still lack an important skill – Business Acumen.  Developers (in the varied sense) are often viewed is ‘techies’.  But a true Developer understands the business needs – and can draw upon their Coder and Hacker skills to find the most effective, efficient solution to the Business Problem.

The true Developer is the most misunderstood role.  A Developer not only understands the underlying possibilities, they can decipher the business speak and apply their knowledge to the problem at hand.

In this sense ‘Developer’ is often referred to as Senior Developer – and it’s not about age or years of service (although that often helps!) – it’s about having the full skill set.  The most successful developers, the one’s who’ve built billion dollar companies, are the ones who also understand Marketing, Sales, Finance and Leadership – not necessarily in the depth needed to specialise – but certainly enough to carry their own and talk the talk.

Anyone can learn the basic skills to lay some bricks, or plumb some pipes – but a Developer understands how the whole fits together!

Architect

To me an Architect is the natural progression of a Developer – They take the business skills to the next level and help drive the full Software Development Life Cycle (SDLC).

The next level is about seeing things beyond the current project, to the multitude of projects that businesses have to take on. It’s about having the knowledge to see where IT can help and drive the solution BEFORE being asked.

It’s about looking for efficiencies of scale – how sharing code or better still, services, between the different projects can help reduce overhead, development time and quality.

 

All these ‘levels’ are equally important – but if you want to push your career or technical ability – then you need to understand the different roles available – and focus on what’s needed to take you to the next level.

HTML and CSS For Everyone!

Web Development seems to many non-techies as a black art.

A core skill for any web developer is HTML and CSS – the ‘languages’ used to build websites and an increasing number of mobile apps.

But what if I were to tell you then these two skills – HTML and CSS – are actually becoming key for EVERYONE in business?

The world today is centered around communications –email, social media, blogs and more. But to be heard in the noise of mass information your’s needs to standout.

If you regularly send marketing emails, manage a blog or social media account, then HTML and CSS is a must have skill – as only with an understanding of these can you hope to build something appealing and eye catching.

Because HTML and CSS are often considered a ‘programming language’ – many shy away.

Learning the basic core skills to create stunning material using HTML and CSS is actually very easy – if you have learned how to use Word or Excel – then you can learn this!

Just think what you could do if you mastered the basics? Create awesome emails to your customers, tweak your company’s or your personal WordPress site, speak on par with your technical teams, create a stunning personalised resume website!

Business today demands and ever increase skill base – and if HTML and CSS aren’t at the top of the list of things to learn in 2016 – they should be!

To learn more and get a discount off my own HTML and CSS Online course and book click Web Development Essentials – HTML and CSS

Visual Studio Code – Why?

All Articles in series
Visual Studio Code Why?
Installing Visual Studio Code on OSX – From Scratch

Visual Studio Code – Why?

I am a Microsoft developer. I have been for over 10 years, since Visual Basic was first released.

This article is the first in a series an introducing to the latest Developer product from Microsoft (still in beta at the time of writing) – Visual Studio Code. This wonderful piece of software runs on Windows, Linux and OSX. As I use an Apple MacBook Pro as my workhorse the potential for this is like suddenly discovering the wheel – it’s really is going go be a revolution!

But why? And perhaps more importantly WHY do I use a MacBook when I am a professional Microsoft developer!?

Originally I wasn’t a professional developer – it was a hobby – but it was something I always knew in my heart I wanted to be.

Unfortunately when I started working the opportunities weren’t really around – at least not where I lived in the North of England.

And so I did the next best thing – I became an IT consultant – and I focussed on Microsoft products – Windows 3.1 and Windows NT.

The point is, since then I have been Microsoft through and through. So I often ask myself why I own an iPhone, and 4 iPads, and two years ago I bought a MacBook Pro.

All I can say is the hardware is just gorgeous. The OS to some extents doesn’t really bother me that much – which is odd considering my professional focus!. But then actually, if I think about it, it DOES make sense as for the past 10 years my development career has been about Web Development – which at the back is IS Microsoft (.NET) but for the end user is actually just HTML – and therefore OS agnostic.

In 2012 I needed a new laptop. I wanted a nice screen – after all I spent my days staring at a screen all day long as I wrote code – and the MacBook Pro with Retina display just ticked all the boxes. I also wanted something thin, light and powerful. At the time the ONLY thing on the market was the MacBook. Before then I’d never used a Mac, but now I could’t imagine NOT using a Mac. I wonder if it’s because deep down I’ve always been a bit artistic – and Apple (at least originally) has always attracted the more artistic types because of the aesthetics of its products.

But this left me with a quandary. I was a Microsoft developer! And so for the past few years I’ve used Windows Virtual Machines running on my Mac. I hardly ever touched OSX at all! It was simply the shell for running my VMs.

Actually this worked really well – as being a developer you often have to build and rebuild your OS as you install various tools, then uninstall them or upgrade them. And using VMs makes this process sooo much easier!

And then there is the plain and simple fact that as a developer I need to understand various diverse platforms – Windows, OS X, Android, iOS. In fact in my current position I am involved with a number of applications that use iOS applications that talk to a .NET back end. This pretty much sealed the deal – by using a MacBook with Windows VMs in top I really could have my cake and eat it!

So then Visual Studio comes along. And now I have a reason to be a Microsoft developer AND an OSX user. Again the question – WHY? Well first of all the fact that with ASP.NET 5 and Mono I can actually build applications using Microsoft .NET but deploy them to Linux/Unix – which tend to be a bit cheaper to host – not to mention the exciting possibilities Docker is giving.

Yes, Docker is now available on Windows, and IIS in the latest version of Windows so much faster and has a much smaller memory overhead (not to mention how much faster it can spin up) – but there’s something about the flexibility of being able to deploy to either Windows OR Linux that gets me excited – I can’t help but feel that this, combined with Docker, is going to be a game changer in the industry.

And so the potential to build software completely on OSX, Windows or Linux, using the same code base? Well now I’m feeling the sort of excitement lottery winners feel when they see there numbers come up! Sad, I know.

But now here’s a problem. I’m a Microsoft developer, ex Microsoft OS consultant, and i’ve just been using OSX as a shell for my Windows VMs. I know practically NOTHING about OSX and more importantly the underlying Linux OS that OSX is built on. I’ve seen the past few months playing with VS Code, and I finally think I’m starting to get to grips with it!

So this series of articles is really about my experience installing, setting up, and building full .NET websites using VS Code, .NET 5 and mono. All on an OS i don’t really know, and using technologies (like bower, nodejs, nopm, grunt, gulp) that I’ve never had to really worry about before (because Visual Studio just DOES IT for you!).

I hope you enjoy the series, and gleam some benefit from it!

The first article in the series – installing VS Code and it’s dependencies can be found here
Installing Visual Studio Code on OSX – From Scratch

Installing Visual Studio Code on OSX – from scratch!

All Articles in series:
Visual Studio Code Why?
Installing Visual Studio Code on OSX – From Scratch

Installing Visual Studio Code

This guide will assume a complete clean install of OSX. That means that I will cover every single pre-requisite that is required. I will also assume the reader has not much experience with OSX – if you are coming from a Windows .NET development background installation of many of the pre-requisites are not as straight forward as you’ll be used too!

Installing Code

First of all – the easy bit. Installation of the main Visual Studio Code program.

Go to https://code.visualstudio.com the website will detect you are running OSX, Windows or Linux and present you with the relevant link. Simply click the link to download the program.

This will download Visual Studio Code.app.

This is a self contained program and doesn’t require any installation. But lets move it from the default download location – which should be downloads – to the apps folder. Simple drag from the download folder onto the app folder in your Dock (if you’ve created a shortcut for it) or to the Applications folder in finder.

Now run the program by going into Applications (from finder or the Dock) and clicking it. The first time you run it you’ll be prompted to open the file as it was downloaded from the internet. Just click Open.

If you want it to be always in the Dock, right click the Icon choose options and ‘Keep In Dock’.

Another useful option for launching VS Code is to enable launch from a Terminal window. On OSX a lot can and will be performed through terminal so it’s hand to be able to simply type ‘code .’ and have it launch.

To set this launch Terminal by going to the Applications folder, then the Utilities folder and clicking ‘Terminal’. Again once running it will be handy to create a shortcut to it by right clicking it and choose Options->Keep In Dock.

What we need to do is edit a file called –bash_profile – this is like a startup file that you can use to set environment variables and things. It is also hidden. So to edit it in the Terminal window type nano ~/.bash_profile

You will be presented with a blank screen (unless you’ve installed some software that has already put some info in this.

Enter the following

code () {
if [[ $# = 0 ]]
then
open -a "Visual Studio Code"
else
[[ $1 = /* ]] && F="$1" || F="$PWD/${1#./}"
open -a "Visual Studio Code" --args "$F"
fi
}

Now press CTRL+O and accept the default filename to save the changes. Now press CRTL+X to exit nano.

Normally this script will execute on startup, but to run it now type

source ~/.bash_profile

so now if you type (with the period)

code .

VS Code will launch.

Node.js & NPM

Node.js is a JavaScript runtime execution engine. Using it you can create JavaScript applications. It is also used when building applications in VSCode to automate tasks.

Node also includes a utility called NPM – Node Package Manager. NPM is like NuGet – we can use it to install software and update software.

We will be using NPM a lot, so the next thing we need to do is install Node.

In a browser navigate to

http://nodejs.org

On the main page you will be presented with the current release of Node.js for your OS. Click the link to download the installer which be called something like node-v0.12.7.pkg (the actual name will vary depending on what the current version is). Simply run the installer by double clicking it.

In the dialog that appears click continue, then accept the license agreement, then click the install button.

Once installed you’ll be presented with a box telling you to make sure /usr/local/bin is in your path – this is so it can be executed from anywhere.

On a default OSX installation this is already set. You can confirm this by typing echo $PATH in the terminal window.

To confirm everything is working as we need now type the following

node

this will ‘launch’ nodejs and you’ll see a ‘>’. Now type

console.log(‘node is working’)

If it’s working ‘node is working’ should be output to the window. You’ll also see ‘undefined’ appear – don’t worry about that.

To quit out of node press CTRL+C twice.

Mono

The next piece of software we need is Mono. Mono is an open source implementation of .NET. Basically this means it allows you to run .NET applications on Linux and OSX based systems!

To install Mono in a browser navigate to

http://mono-project.com

Again the front page will have a link to download Mono. Click the link then double click the .pkg file that is downloaded to start the installer.

Click continue, accept the License agreement and click install.

.NET Execution Environment (ASP.NET 5)

Now we have Mono, we can install the ASP.NET 5 runtime itself. This has to be done via a terminal Window and in stages. So launch terminal (if it’s not already running).

First install the .NET Version Manager (dnvm). To do this in terminal type the following

curl -sSL https://raw.githubusercontent.com/aspnet/Home/dev/dnvminstall.sh | DNX_BRANCH=dev sh && source ~/.dnx/dnvm/dnvm.sh

This will download dnvm and automatically update your .bash_profile with the required environment settings.

Now we can use the dnvm to install or upgrade .NET Core Clr– known as the .NET Execution Environment (DNX) – by typing the following

dnvm upgrade -r coreclr

Finally, we must install DNX for Mono by typing

dnvm upgrade -r mono

The final step is we need to update our .bash_profile again. We need to ensure dnvm and dnu commands in our path, and also enable a setting to fix a IOException that we get with Mono and .NET

So again in terminal edit our profile like we did before with

Nano ~/.bash_profile

Make sure you have a reference to the dnvm.sh – it will either simply have source dnvm.sh, or a longer more verbose version.
After that line then add

Export MONO_MANAGED_WATCHER=disabled

Save the file by pressing CTRL+O

Then quit out with CTRL+X

Bower

The final software we need to install is called Bower. Bower is another package manager – like NuGet – but specifically for Web Projects.

We install Bower using NPM (The Node Package Manager). From a terminal window type the following

Entity Framework & Direct SQL

Entity Framework is a great time saver.

When I think back to the days when I had to manually construct SQL statements, parse the results, control updates, and of course ensure my actual database design was in sync with what my code expected – well it sends a shiver down my spine!

I remember back in the day I even built myself a little utility that would take my database and spit out boiler plate code for basic CRUD operations.

But then EF was released – and although it was a little bit flakey at first (with people tutting and muttering about NHibernate), it was still a lovely little thing.  These days of course I use it all the time.  In fact as you may have seen in one of my earlier posts I’ve even copied more than a little for an Azure Table Storage version.

And of course couple EF with decent patterns such as the repository pattern and Dependency Injection and you have a rock solid foundation.

But, (there’s always a but) – EF is sometimes a bit slow when compared with issuing SQL commands directly – especially with batch operations.

For this reason the EF context exposes a Database property which in turn exposes a number of options for issue SQL commands directly.

I will show you how to use the two most common ones – SqlQuery and ExecuteSqlCommand

As I like to decouple as much as possible, and because I like my interfaces to only depend on core libraries I’m going to hide away some of the EF specific stuff.

So, first of all I like to have an IRepository<TEntity> interface and a Repository<TEntity> base class, that way each repository gets the basic CRUD methods and anything else I might want;

public interface IRepository where TEntity: class
{
void Delete(object id);
void Delete(TEntity entity);
System.Linq.IQueryable GetAll();
System.Linq.IQueryable GetAll(object filter);
IPagedResponseViewModel GetPaged(int take, int skip, string orderBy, bool orderByAscending, object filter);
TEntity GetById(object id);
TEntity GetFullObject(object id);
void Insert(TEntity entity);
void Update(TEntity entity);
void Commit();
void Dispose();
// Direct SQL Stuff
int ExecuteSqlCommand(string sql, object[] parameters);
ICollection SqlQuery(string sql, object[] parameters);
int ExecuteSqlCommand(string sql);
ICollection SqlQuery(string sql);
}
 public abstract class RepositoryBase : IRepository where TEntity : class
    {
        internal DataContext context;
        internal DbSet dbSet;

        public RepositoryBase(DataContext context)
        {
            this.context = context;
            this.dbSet = context.Set();
        }

.....
        public virtual int ExecuteSqlCommand(string sql)
        {
            return context.Database.ExecuteSqlCommand(sql);
        }
        public virtual int ExecuteSqlCommand(string sql, object[] parameters)
        {
            return context.Database.ExecuteSqlCommand(sql, parameters);
        }
        public virtual ICollection SqlQuery(string sql)
        {
            return context.Database.SqlQuery(sql).ToList();
        }
        public virtual ICollection SqlQuery(string sql, object[] parameters)
        {
            return context.Database.SqlQuery(sql, parameters).ToList();
        }
        public virtual void Commit()
        {
            context.SaveChanges();
        }
        public virtual void Dispose()
        {
            context.Dispose();
        }
    }

As you can see all we are really doing is encapsulating the SqlQuery and ExecuteSqlCommand methods by taking in a string or a string and a list of parameters.

First lets look at ExecuteSQLCommand – as this is very straight forward, we simply pass in our SQL string such as

ExecuteSQLCommand("UPDATE SomeTable SET SomeColumn='Some Value'");

EF issues the command and returns an int indicating the number of affected rows.
If you want you can pass in parameters like this

ExecuteSQLCommand("UPDATE SomeTable SET SomeColumn='Some Value' WHERE ID=@p0", new object[]{3});

Now for SQLQuery. You’ll may notice SQLQuery uses TEntity (if you’re not familiar with Generics we use TEntity to refer to any class we pass in during the instantiation of our repository – therefore anything that refers to TEntity refers to whatever object we want to use – e.g. we would have

public class MyObject{
    int Id { get; set; }
    string someProperty { get; set; }
    ....
}

public class MyRepository: Repository<MyObject>{
...
}

so when then instantiate the actual repository

var myrepo = new MyRepository();

We get all the commands from the base Repository class referencing our MyObject class model.

Anyway, if we ignore our repository for now, if we were to just query Database.SQLQuery directly we’d use

Database.SQLQuery<MyObject>("SELECT * FROM MyObjects");

this will result in a list of MyObject – EF actually attempts to convert the results it receives back to the model you pass in.
So all we have done now is automate that because we already know the model from when we instantiated our repository – thus when calling our encapsulated method we just use

SQLQuery("SELECT ALL FROM MyObjects");

Again we can pass in parameters just like with SQLExecuteCommand.

So as you can see this gives you complete flexibility in using EF – as mentioned earlier – if I have to iterate through and update large datasets I sometimes construct my SQL Directly and use these methods instead.

Note : This post was created in response to a user question on my course about ASP.NET Development Techniques.  My Blog viewers can get the course for just $10 by clicking this link 

The Importance of Flexible Application Design

I don’t normally blow my own trumpet, but this week the team I work with won a prestigious ‘Innovation’ award at Capita Plc for an iPad/ASP.NET/WebAPI solution we have built to address a specific business need.

The application essentially allows building Surveyors to record information about properties using an iPad in offline or online mode, sync that data up to a backend SQL database via a ASP.NET WebAPI service, and then expose the data through an ASP.NET Web Portal.  There’s also the usual Reports and dashboards that managers and such like generally swoon over.

The product itself is a great time saver, it allows surveys to be taken and reported on in a fraction of the time compared to pen and paper, or even an excel spreadsheet (by hooking costs into items that are picked by the surveyor).

As good as the solution is, from a business perspective, what really impressed the judges was how easily it it could adapted to running any kind of survey you want without have to re-code anything.

Let me explain a bit more.

Version 1 of the solution was built for a specific type of survey, and as such the original developer involved built the application with certain settings hard coded.  So for example, in this particular survey they wanted to segregate any one building into Blocks, Levels and Items, therefore to accommodate these business requirements the developer created Block, Level and Item entities.

The backend database used these entities, the front end iPad app used these entities.  And it worked fine.

Specific Entities

But then as these things go other areas of the business saw what had been done and naturally wanted the same thing for them.

Business versus Technical Design

The problem was that was different area of the business wanted something slightly different.  some wanted 4 or 5 levels, some wanted only 1 block, some didn’t really want to record costs but rather just snippets of information such as diary events during the course of a building project.

The original plan was to use the v1 as a core but then re-develop both the backend, and the iPad app for each client.

Now from a technical design point of view this is great.  We get to independently give each client exactly what they want.

However from a business perspective this really wasn’t very good.  You see there are 3 major issue with this way forward.

  • Each new client would take 3-4 months to re-develop
  • Multiple codebases – both front and backend for each client
  • no central location for management

Ultimately these 3 issues really come down to one common business problem – cost.

You see, many times the business would want it NOW.  Or at the very least within a couple of weeks, obviously being told 3-4 months PLUS testing is no good.

Secondly, although some commissions were large value, some were only for a few thousand (as it was only a handful of small properties/locations).  Again at 3-4 months the cost just becomes prohibitive.

Third, with multiple sets of code bases and no central management location, looking after all these different implementations would requirement far more support overhead – and therefore costs.

Start from scratch or accept the negatives?

It is about this time that I got involved with the project.

Immediately I saw the issue but more importantly the solution.

To some this may be obvious, but when you’re a true techie, especially someone who has already invented the wheel once, it’s quite hard to see more flexible alternatives.  After all, what does a techie care about costs?

Now I think I’m quite lazy!  But believe me, this has often been a useful trait.

You see I’m lazy in that I hate boring repetitive tasks,  I also hate having to re-do the same thing again and again.  Once done I want it finished so I can move onto another challenge.

So to me the solution was to have a generic ‘level’ that could recurse itself.  Each level then has a type associated with it, and a parentId.

In this way a ‘level’ can be anything you want.  A building, a floor, an area of ground, a room, a swing – whatever you wan and link it in a parent child hierarchy accordingly.  We can then define WHAT it actually is within a template that is also stored as a template entity.

The iPad app and the WebUI simple interrogated the template to work out what kind of control to use, and thus as the template changed the UI just flowed around it accordingly.

Flexible Entities

So what we can now do is build HOW the app will work within the app itself, without having to rebuild either of the UI’s or the backend.  We also get to keep everything in a central location and report across ALL different survey types.

This is nothing new.  There are LOTS of apps that do this, but its surprising how often this design pattern can be used.

The Business MUST come first

Now, this of course caused quite a heated ‘discussion’ between myself and the original developer.  He (quite rightly) pointed out that such a method would complicate the code more, making reporting a bit more difficult and increase the amount of required processing power.

However, although this is all true, the fact is, as developers we shouldn’t care about such things, it is our job (or at least should be) to bring the best possible value to the business – and if that means we have a harder time of it then so be it.

Now don’t get me wrong – I’m not suggesting we should embrace creating horribly complex code – quite the opposite – if you also apply SOLID patterns to your solutions then you can easily split up this complexity into nice manageable chunks.

And as for the processing power, well not really.  Even our 2nd generation iPads quite happily reconstructed the models according to a template on the fly – and we had some BIG templates with LOTS of data.

Reporting caused a few problems initially, but again Microsoft provide all the tools in the form of Analysis and Tabular services to help out when things got too hairy.

But let’s get back on track.  Eventually I won the discussion 🙂

The resultant solution now means new clients can be accommodate within DAYS, not months.  This has a huge impact on costs and allows smaller project to make use of our system.  It also gives the business confidence that it can bid for new work knowing that if they win (and by being able to do the job faster and therefore cheaper, they often win), we can turn around the new system for them within a few days and let them get on with the job as efficiently as possible.

Because lets not forget – software is about adding more value to the business, either with cost savings or a better proposition.

The Frustration of Tutorials and Walkthroughs

I have an issue with how most software developers learn to write software.

Like me, a lot of developers learn on their own by searching for who to do stuff in the internet. This is great and of course I doubt there is much you CAN’T find with this method.

However, when I start learning new technologies – for example when I first started looking at MVC quite a few years back (I started on MVC2) – I found that all the tutorials, including Microsoft’s, taught in such a way that was really quite bad programming practice.

Now I’m not saying it’s ALL bad, but recently I’ve been looking at various training courses online, and the majority all show the same thing – they teach you how to CODE, but they don’t really teach you how to write software.

Example. One MVC course which promises to teach you how to be a ‘complete’ developer, went through the basics – create a project, create a class, here’s what a for loop is, here’s what a while loop does. etc etc.
However there was NO mention of SOLID principals, no real explanation of Object Orientated programming, of using interfaces and abstractions. Dependency Inversion and a whole multitude of good programming practices simply ignored.

As a junior developer I found this VERY frustrating.

Of course, now I’ve been around a bit, I do understand a lot more – but it’s actually been quite a painful journey if truth be told. It does however explain why a lot of software houses don’t really like to employee freelancers who have only ever worked on their own.

So anyway, to address this woeful shortcoming, I have actually created my own course.

I created it on a site called Udemy.com – if you’ve never used it have a look – it’s pay per course rather than subscription based, but they do often run very good deals – and some of the courses are really very good.

Please, checkout my course here

There’s a substantial discount on what will normally be charged, and I’ve had over 1000 students within a few days! So I guess there are some people out there who actually want to learn how to write software properly!

The great thing about Udemy.com is that you get life time access to the courses – and that includes UPDATES. For example in my course I’ll be adding lectures on Module Injection and splitting your views up using Partial Views – another underused technique.

And I would welcome input please – what do you think most programmers miss my learning from random searching?

Knockout Component Loading with requirejs won’t load html file

Knockout 3.2 introduces component loading – this is a great feature that allows you to load HTML and JS modules into your code, thus enabling you to split your code in to self contained modules.

Think of it like PartialViews in MVC but for KnockoutJS.

The first thing you need to do is ‘register’ your component, e.g. (And this is taken from the KnockoutJS documentation)

ko.components.register('like-widget', {
    viewModel: function(params) {
        // Data: value is either null, 'like', or 'dislike'
        this.chosenValue = params.value;
        
        // Behaviors
        this.like = function() { this.chosenValue('like'); }.bind(this);
        this.dislike = function() { this.chosenValue('dislike'); }.bind(this);
    },
    template:
        '<div class="like-or-dislike" data-bind="visible: !chosenValue()">\
            <button data-bind="click: like">Like it</button>\
            <button data-bind="click: dislike">Dislike it</button>\
        </div>\
        <div class="result" data-bind="visible: chosenValue">\
            You <strong data-bind="text: chosenValue"></strong> it\
        </div>'
});

 

Then your main page would implement the following
    <ul data-bind="foreach: products">
        <li class="product">
            <strong data-bind="text: name"></strong>
            <like-widget params="value: userRating"></like-widget>
        </li>
    </ul>
with the following javascript to load your view model
    function Product(name, rating) {
        this.name = name;
        this.userRating = ko.observable(rating || null);
    }

    function MyViewModel() {
        this.products = [
            new Product('Garlic bread'),
            new Product('Pain au chocolat'),
            new Product('Seagull spaghetti', 'like') // This one was already 'liked'
        ];
    }

    ko.applyBindings(new MyViewModel());
All pretty cool – except we’re embedding our ‘module’ in the component registration.  What we need to do is have it all in a separate file.  Again the Knockout Documentation shows us how to simply do this – we use an AMD module loader such as RequireJA, thus we can store everything in separate files, thus:
component-like-widget.js 
define(['knockout'], function(ko) {

    function LikeWidgetViewModel(params) {
        this.chosenValue = params.value;
    }

    LikeWidgetViewModel.prototype.like = function() {
        this.chosenValue('like');
    };

    LikeWidgetViewModel.prototype.dislike = function() {
        this.chosenValue('dislike');
    };

    return LikeWidgetViewModel;

});
component-like-widget.html
<div class="like-or-dislike" data-bind="visible: !chosenValue()">
            <button data-bind="click: like">Like it</button>
            <button data-bind="click: dislike">Dislike it</button>
        </div>
        <div class="result" data-bind="visible: chosenValue">
            You <strong data-bind="text: chosenValue"></strong> it
And this was loaded from an external file
        </div>
So now our component registration needs to change to take account of the external files
ko.components.register('like-or-dislike', {
    viewModel: { require: 'files/component-like-widget' },
    template: { require: 'text!files/component-like-widget.html' }
});
 And then finally our main HTML page just implements the component similar to above (except we’re now adding products dynamically as well)

HTML

    <ul data-bind="foreach: products">
        <li class="product">
            <strong data-bind="text: name"></strong>
            <like-or-dislike params="value: userRating"></like-or-dislike>
        </li>
    </ul>
    <button data-bind="click: addProduct">Add a product</button>

script

    function Product(name, rating) {
        this.name = name;
        this.userRating = ko.observable(rating || null);
    }

    function MyViewModel() {
        this.products = ko.observableArray(); // Start empty
    }

    MyViewModel.prototype.addProduct = function() {
        var name = 'Product ' + (this.products().length + 1);
        this.products.push(new Product(name));
    };

    ko.applyBindings(new MyViewModel());

Now, this is pretty much was the Knockout documentation says – it works lovely on their own pages, but could I get it to work in my project?  Nope.  Not at all.

So, time to start digging.  First thing let’s have a look at what is being brought back from the server (I Use fiddler).

The javascript file loads fine – but when it ties to load the HTML file it does;t actually try to load it, instead it tries to get a file called ‘text’

Well I guess this makes sense – after all we’ve told it to load ‘text!widget-like-component.html’

So er, what am I missing?

Well I assumed requirejs automatically knew what when it saw !text it would know how to handle it.  Well you know what they same about ASS U ME

Now this is mainly because I’ve never really delved deep into requirejs – I just use it and out of the box it always works fine.  So, just in case anyone else out there is in the same boat here’s the fix.

To make requirejs know how to handle !text we need a !text plug in – an NuGet package apply named ‘Text Plugin for RequireJS’ – so install it with NuGet, and voila! The world is back to normal.

Entity Framework and Interface issues

There is, and has been for quite some time a serious issue with Entity Framework – namely that it doesn’t support complex interfaces.

A simple entity based on an interface such as this is fine

public class Root : IRoot
{
public int RootId { get; set; }
public string Data { get; set; }
}

However as soon as your start to introduce child objects, for example an
ICollection Leafs
well now we have an issue, because the interface needs to define the collection as
ICollection Leafs
and that’s where it all goes a bit pear-shaped because EF can’t figure out that Leafs is a navigation property because it’s based on an interface instead of a concrete type.

I have spend many a wasted few hours searching for a fix to this, the majority of the articles I find simply state that EF does not support Interfaces.

In the past I’ve worked around it by using a DTO, so basically my DTO class is what EF will use to create my database mappings, but I then need to use something like AutoMapper, or roll my own mapper in order to convert my DTO to a concrete class based on an interface and vice versa.

This also has the issue that although retrieving and adding NEW entities works OK, as soon as you try to do an update to an existing entity with Attach it all starts to fall apart again.
Once again I managed to fixed this by simply ensuring that when I’m mapping from my concrete to the DTO I get the EXISTING DTO from the database, and then loop through all it’s properties copying and updating as required.

So recently, in a fit of despair, I sat down and hammered out a way to find an alternative solution that was a bit more elegant.

OK, so first we have our standard interface thus:

public interface IRoot
{
string Data { get; set; }
System.Collections.Generic.ICollection Leafs { get; }
int RootId { get; set; }
}

So we have an interface properly defined, but EF doesn’t know how to get the navigation property. So we need to create one along side our Collection that the interface expects, so lets go with

public ICollection Leafs { get; set;}
public ICollection LeafNavigation {get; set;}

OK that’s great but now of course these are two separate lists, so we’ll add an internal list and connect our two ICollections up to it. We’ll also ass some constructors for good measure.

public class Root : IRoot
{
private List _leafs;

public Root(ICollection leafs) {
this._leafs = leafs as List;
}

public Root(List leafs)
{
this._leafs = leafs;
}

public Root() {
this._leafs = new List();
}

public int RootId { get; set; }
public string Data { get; set; }

public ICollection Leafs { get { return _leafs.ConvertAll(l => (ILeaf)l); } }

public ICollection LeafNavigation { get { return _leafs; } set { _leafs = value.ToList(); } }
}

So now EF is happy, we’ve implemented the interface so we can use IRoot rather than Root. But now we have another issue – basically I can’t seem to ADD Leaf Entities to IRoot.Leafs – maybe someone can point out what I did wrong – because although the code lets me add the leaf with

IRoot root = new Root();
root.Add(new Leaf(){ Data="Some data"});

it just does’t actually ADD it to the underlying collection (I’ve tried a number of permutations – like I said if I have done something obvious PLEASE let me know!)

Anyway, I managed a simple work around – basically lets just add and Add method to the actual class that takes a Leaf and adds it to the underly connection thus the finished class and interface look like this.

public interface IRoot
{
void AddLeaf(ILeaf leaf);

string Data { get; set; }
System.Collections.Generic.ICollection Leafs { get; }
int RootId { get; set; }
}

public class Root : IRoot
{
private List _leafs;

public Root(ICollection leafs) {
this._leafs = leafs as List;
}

public Root(List leafs)
{
this._leafs = leafs;
}

public Root() {
this._leafs = new List();
}

public void AddLeaf(ILeaf leaf) {
_leafs.Add(leaf as Leaf);
}

public int RootId { get; set; }
public string Data { get; set; }

public ICollection Leafs { get { return _leafs.ConvertAll(l => (ILeaf)l); } }

public ICollection LeafNavigation { get { return _leafs; } set { _leafs = value.ToList(); } }
}

And Voila, this works a treat, I can add Leafs, and update them, along with the root and EF just binds everything as normal. Add, Update and Get all work seamlessly.

I’ve uploaded this to GitHub – and I hope this helps anyone who may be having similar issues!
https://github.com/squareconnection/EFInterfaces