NoSQL & Document databases

I am really loving the NOSQL movement at the moment, however there seems to be a lot of confusion as to when it is appropriate to use them.

Significant things to considered are:

  • Atomic transactions are only for single operations. You can’t really have long running or nested transactions (ACID rule a different).
  • Joins don’t make sense so don’t expect to use them when designing the system
  • Document databases are typically schema-less. This means adding new properties is much less of a hassle than in SQL-land, especially once you are in production.
  • The notion of an aggregate root fits perfectly with a document so the idea of using a DocDB for DDD is appealing (assuming transactions are not required)
  • DocDBs tend to scale horizontally very well unlike our SQL counterparts which tend to only scale vertically without huge headaches
  • Read and write performance is possibly the opposite of what is expected with very fast writes (I understand) being the norm.
  • Queries are done differently. MongoDB for example uses JavaScript as it query language (this does not mean it is used in the web tier!)

Several uses for document databases come to mind:

  • High volume, low value writes; eg user data entry on social sites; this is not business critical but potentially requires easy scaling options; ie no one is going to die if you last Facebook update doesn’t go through to all the DB servers instantaneously.
  • Auditing; One area I’m keen on is command persistence. I like the idea of having a trail of all command sent to a component, it becomes a self documenting timeline of what users were trying to do to the system. When a command is handled by a component it can just write the whole serialized object to a DocDB, thereby capturing all the info without being bound to a schema (audit is version agnostic). The command can then be processed by the component. I will admit that an Object DB is also suitable for this.

Things not to use Document Database for is high value transaction heavy stuff i.e. banking transactions or thing that inherently require SQL… whatever use case that may be.

Hope this helps 🙂

Help Me Help You

We have been lucky here in Perth that we have a very active community, well run by people who have stepped up to the plate to provide us these events. However these events are not free. We need venues, we often get prizes and some sweet swag. I don’t think some of our attendees quite understand that without that support we have no event.
We are also fortunate enough to have sponsors we actully like! For this reason when we make a plug it not just because they are a sponsor but because we either use the product or service (or want to) ourselves. As our audiene is largely made of business developers I thought they would understand these basic back scratching processes.
Anyway, below is a list companies or products that have made many live better by helping out a technical community I’m involved in AND I recommend professionally:
JetBrains-ReSharper, Teamcity
TekPub
Readify
Beacon
Excom
Redgate

Thanks. I like your stuff, recommend you and thank you for helping us out. Also want to thank Mitch Wheat and Mike @wolfbyte for organizing us presentations every month; you are appreciated!

Testing with Domian Driven Design

DDD is a hairy beast. It feels like everyone has a slightly different opinion of what it is. I personally thought I had a pretty good grasp on it until I caught up with Udi Dahan earlier this year where he (as per usual) completely turn my understanding on its head.
That being said I believe the way I am approaching the current project is in line with what I believe the majority of people believe DDD is, correctly or otherwise.
Key aspects in this project that help make it resemble a DDD domain are:

  • The level of understanding the devs have and the ubiquitous language that is in use and they way it is constantly evolving amoungts all team memebers, including our SMEs/Users
  • The general structure of the code – repositories, services, aggregate roots comprised of entities and value types etc etc

To be honest one could argue, that like most DDD projects, it is just clean OO coding. It is; that doesn’t mean that there isn’t something that others could glean of of the project, such as what we are doing to keep things clean and help with creating nice tests.

First and foremost follow TDD and BDD if possible
If you have a good understanding of the component you are building then writing explicit specifications should be easy. Do this with your BA and SME make sure the tests are not shallow low value tests. Investigate what BDD is and see if one of the many framework fit your teams needs. Personally I am still using xUnit frameworks and am creating elaborate contexts on which I make assertions. My fixture set ups can sometimes be complex but my test methods are very clear & clean and can often just be one line assert statements. I have not found a BDD framework that sits well with me in the .Net world like Cucumber does in my Rails development, so I will continue down this path till I find something I like better than explicit and somewhat elaborate xUnit styled tests.

The domain should remain as pure as humanly possibly
Anything that is public should have a good reason for being so. Unfortunately most developers use public as their default. This is bad practice as it produces a very non intuitive API. Use access to help show the other developers how you intended the API to be used.
Typically, IMO, this means protected virtual by default. (I’m an NH fanboi). Also restrict what you return. I personally don’t like the idea of returning child entities from parent entities especially aggregate roots. What is the consumer going to do with these entities? Most of the time a value type or projection is more appropriate and helps keep the line of ownership clean.

Modifying the domain to be more testable should be frowned upon, especially if it confused the API
This ties into the first point. NB : There are concession that I make that I can live with (I will cover them soon) but I believe these do not negatively affect my domain and make life easier and my intentions more explicit to the next developer

Use subclass fakes to increase accessibility for testing
I often see people making fields, properties or methods public so tests can call them. Please don’t do this. If these are private definitely do not expose them. This is an indication that you are not doing TDD. Private methods generally only come from refactoring once your test pass so testing private methods is a sure sign you are not doing TDD. Internals sometimes may warrant being tested. This is OK as we can make internals visible to the test project. I generally don’t mind doing this as I put this in the AssemblyInfo file which is recreated in my deployment anyway so there is no dodgy test orientated code in my final deployed assembly. I must say that I don’t really do this a lot and i think people use this as a crutch. True TDD generally will not warrant a lot of this, However i find myself doing it a bit in my domain, possibly out of fear more than any really rational reason.
If you really need to make something accessible to test and don’t want to expose it fully then create a fake that inherits from the class and override the accessor in that class. Sometimes I do this to check things like IDs. Again question whether this is really needed, have faith in the TDD process!

Allow hooks in the domain for creating child objects
One thing I have used in the current project is protected factory method to create child entities. By opening up these one liners I can override the method in a fake sub class to create a fake subclass of the child: eg OrderFake will create and return an OrderLineFake in its CreateOrderLine method as opposed to the Order creating and returning an OrderLine in its CreateOrderLine.

When using fakes of real domain objects make sure you are not hiding any of the real object functionality. The fakes should be as plain as possible. Adding additional logic will surely corrupt your tests. One thing to watch for is to make sure your fakes implement the same ctors as their parent and call into those ctors. Failure to do this will create a big PITA 🙂

Next post I will talk about some issues and traps that we as a team have managed to fall into. Most of them have been things when consciously question know we shouldn’t do but have managed to creep into the solution, hopefully we can help you avoid repeating our mistakes!

What is Regression Testing?

We are on a journey. The company I work for is a very large one and has suffered some lag in migrating to modern ways of software development, a situation many companies of this size often find themselves in. Fortunately the team I work with have come together to make a conscious and consistent effort to improve our area of influence. First and foremost was the introduction of a structured developer oriented testing plan. Because we are a development team we started with unit testing, integration testing and some build scripts to run these tests. From there we have improved they way we do those tests, streamlined our scripts and made CI a solid part of our process. Automated deployments have become easier too leading to automated staged deployments. We are now at the stage where we are adding in Fitness and UI testing to the equation.

Each of these step have been driven from the ground up. We have had to, carefully, maneuver these processes into our daily process and show the benefits to management. It has taken time but they have responded very well; so well that they are jumping on board! This is a great thing and has made all the hard work, and it was hard work, worthwhile.
However there are still political games to play. The language is fractured and management want to throw around terms that they don’t really understand and force their idea of these terms onto the very people that introduced them to the department, sound familiar? 😉

“Unit Testing” and “Regression Testing” are two of the terms du jour. Unit testing “means” something a developer does and regression testing is an automated UI test that a non techie can watch as it magically clicks away at the screen. This, as you may know, is not correct… well not strictly correct; and in a place where the language is so commonly fractured it is hard to justify fighting for seemingly such trivial victories.
So instead I vent here.

The things that i felt i needed to express most importantly are:
Any automated test is a regression test
– and –
You do not write regression tests.
What i am referring to in that last statement is merely the fact that you write a repeatable test that tests a certain unit, module or scenario, that can be automated and then your check it in to source control. The very notion that it is part of your build process/continuous integration means it now acts as a regression test.
So our tests that

  • Test a unit, i.e. an xUnit framework which may also uses mocks and stubs etc
  • Testing seams or integration points i.e. DB tests, file systems tests, web service tests etc
  • System smoke test
  • High level acceptance tests using tools like cucumber, fitness etc
  • UI testing – i.e. a button clicking framework like QTP

are all regression test because the are repeatable, automated tests.
Please don’t make the mistake of thinking you will explicitly write a regression test. The test should test that the code produces a specific outcome e.g. enforces a business rule.
The automation and maintenance of a healthy CI environment makes those tests regression tests.

I guess the key here is really your CI process. If the tests can be dodged or accidentally not run somehow then this will need to be addressed.
Make the pit of success an easy one for you team to fall in to and let them know regression tests are a side effect of good daily habits.

InfoQ

It still surprises me that people are not aware of the great resource that is InfoQ. If you are reading this post from my web page then there is probably a big InfoQ link right in front of you. If you are like the majority then this is probably a feed and you cant see it! Anyway its a great site that I have as my home page that gives a running report of the happenings of our industry. It doesn’t say anything about the latest Intel processors or have Dilbert cartoons its just enterprise development and architecture, specifically .Net/Java/Ruby, SOA and Agile; Basically stuff I am interested in on a professional level.

Go check it out : http://www.infoq.com

Udi Dahan’s Advanced Distributed Systems Design with SOA & DDD

Over the last week I have been in Melbourne attending Udi Dahan’s Distributed systems course and I thought now that I am home  will do a quick review on the course.

Firstly the course is not about how to build a 3 tier system on top of a Microsoft stack. Udi is a well known M$ MVP and has a well known open source .Net project however the course is an architectural course focusing extensively on architectural concerns that are largely technology agnostic. Most of the examples at in C# and use (an abstraction of)MSMQ however I guess near on 100% of the examples could work in Java (assuming tool support). The course is also not about how to build a Thomas Erl style web of web services.

To be honest leave all your preconceived ideas of architecture at home… it will just make life easier.

There were jokes of requiring a support group for attendees of the course, you will feel like Neo after he took the Red Pill; naked alone and very… well humbled.. or stupid, depends how diplomatic you want to be. Because people inherently don’t like change there was the typical amount of resistance; given there would have been several hundred years of experience in building large systems in the small room and I would consider these guys some of the best minds in the country in terms of .Net, the red pill was pretty hard to swallow. However like Neo with Morpheus, we put our faith in Udi and let him show us the way.

Seriously though, the course was incredible, I think the overwhelming majority learnt more in those 5 days about designing large scale scalable enterprise architectures than they had over the 5 years of their career.

Just a side note about Udi: he would have to be the most tolerant presenter and teacher I have ever had. These concept were hard to grok all in one go, Udi allowed for lots of questions lots and ensured that the audience knew enough to progress. Sure a lot of the time I was thinking… “what the?” but typically the very next slide answered my question. In terms of raw ability, I have never met anyone like him. Yes he is an outstanding architect, truly outstanding, seriously never met anyone that can walk the walk like Udi in those terms. However to be a truly great architect you have to be a bit more than the guy who is drawing the UML. He has a fantastic mind that has a great understanding of business, organisational management, psychology and his technical understanding is better than all but the absolute best coders. This man is no Ivory Tower Architect. He would be just at home in the largest corporation  boardroom having discussions with CTOs as he would pair programming with the guy at the boiler plate writing the code. Possibly a genius.

Request Response is dead… well, its not but… well go on the course, its freaking awesome just be ready to start looking for a support group… you’ll need it 😉

**********************

Course outline as it was for our course (from http://www.udidahan.com/training/ on 23 Jan 2010) :

The Rhys Campbell Course Rating: 10/10

Advanced Distributed Systems Design using SOA & DDD

Duration: 5 days

Introduction

Designing large-scale distributed systems is hard. New technologies make it easier to comply with today’s communications and security standards, but don’t auto-magically give you a robust and scalable system. Join Udi for a course packed with the wisdom of companies like SUN, Amazon, and EBay.

Tried-and-true theories and fallacies will be shown, keeping you from making those same costly mistakes today. Communications patterns like publish/subscribe and correlated one-way request/response will be used in conjunction with advanced object-oriented state management practices for long-running workflows. If you enjoy deep architectural discussion, if you are in charge of building a large-scale distributed system, if you want to know more about how the big guys run their systems, this is for you.

Audience

This workshop is targeted at team leads, application and solutions architects, as well as technologists who are involved in making decisions about the overall system design of software products and projects.

Course Topics

Module 1: Distributed Systems Theory
Decades of distributed systems development have taught us many lessons. In this module we’ll cover many historical mistakes as well as proven best practices for scalable and robust design. Topics include:

  • 8 fallacies of distributed systems
  • Transactions

Module 2: Coupling: Platform, Temporal, & Spatial
Loose coupling has become the watchword of complex systems development, yet few understand its multiple dimensions. In the module we’ll be covering the three different dimensions of coupling as well as patterns for dealing with them.

  • Platform Coupling – XML/SOAP
  • Temporal Coupling – Synchronous/Asynchronous
  • Spatial Coupling – Endpoints/Topics

Module 3: Asynchronous Messaging Patterns
Although scalability is achieved through the use of asynchronous message passing, more advanced message exchange patterns are required to handle today’s complex integration scenarios. This module will cover the most commonly used patterns:

  • One way
  • Correlated Request/Response
  • Publish/Subscribe

Module 4: Bus & Broker Architectural Styles
Enterprise Service Buses are all the rage these days. In this module we’ll be covering what’s the difference between the Bus architectural style, and the more well-known Broker, found commonly in many EAI projects. Topics will include:

  • Architectural advantages and disadvantages
  • Technological advantages and disadvantages

Module 5: SOA Building Blocks
One of the goals of SOA is to develop systems which are more closely aligned with Business. In this module we’ll be covering an analysis methodology from moving from the business domain to executable systems that comply with all the principles of loose-coupling.

  • Business Services
  • Business Components
  • Autonomous components & Queues

Module 6: Scalability and Flexibility
In order to enable agility, services must be able to scale up, out, and down quickly. In this module we’ll see how autonomous components can be configured including transactional and durable aspects of message handling.

  • Configuring autonomous components
  • Scaling up and out

Module 7: Long running processes
The distributed communications patterns wouldn’t be complete without a discussion on orchestration. In this module we’ll see how to manage the state of long-running distributed communication flows as well as:

  • Encapsulating process logic
  • Advantages & disadvantages of orchestration

Module 8: Service / Autonomous Component Solutions
As developers go to implement autonomous components, guidance is required as to which concepts need to implemented in which project, what dependencies are there between projects, and how to bridge the worlds of messaging, business logic, and reporting. Topics include:

  • Messages + Handlers
  • Databases

Module 9: Service Layer – Domain Model Interaction
Logic-rich services require the use of advanced techniques for logic componentization. The Domain Model Pattern enforces a high level of Separation of Concerns, yet it must eventually be connected with Service Layer code that supports many concurrent users. In this module, the topics covered will include:

  • Domain Model introduction
  • Testing Domain Models
  • Optimistic, Pessimistic, and Realistic Concurrency Models

Module 10: Creating High-Performance Domain Models
The strong separation between the Domain Model and the database which stores and retrieves its data may enable a high level of testability, yet often causes performance problems. In this module, we’ll see the various aspects impacting the performance of persistence:

  • Transactions and Isolation Levels
  • Lazy Loading, Eager Fetching
  • Databases Tips & Tricks

Module 11: Web Services and User Interfaces
The ease of interacting with users over the web drives the need for service to UI interactions. Also, many integrations require exposing synchronous web services to customers. In this module, we’ll see what is required in both cases:

  • ASP.NET 2.0 Asynchronous Tasks
  • Rich Internet Applications and Services
  • Web Services for integration

Module 12: Smart Client / Service Interaction
The publish/subscribe semantics with which services communicate require smart clients to perform a great deal of background work. Also, certain service contracts lead to more performant clients. In this module, we’ll cover the first part of these interactions:

  • Multi-threaded client challenges
  • Client-friendly Service Contracts
  • Service Agents and Client Repositories

Module 13: Notifications & Smart Clients
After Message Handlers in the Service Layer create or update the relevant Model objects in the client Repository, Supervising Controllers are in charge of getting Views to show the updated data. In this module, we’ll describe the parts and interactions of these flows:

  • Client-side Model Objects
  • Supervising Controllers
  • Views and their Interfaces

Module 14: Commands & Smart Clients
Capturing user intent and synchronization between views are at the core of smart clients. After describing solutions that use Events on the View Interfaces, the Command Pattern will be introduced to further decrease coupling between Supervising Controllers. In this module, we’ll describe the parts and interactions of these flows:

  • View Interfaces, and how Entity Cloning affects them
  • Supervising Controllers and clone reconciliation
  • Commands, and Event-Based programming

Summary & Review
In order to make sure that attendees are able to put into practice all that they’ve learned throughout the course, here we strengthen the seams between the various topics. Q&A is also a core part of this final section.

Relearning WCF

Of late I have been playing with WCF again. We have some projects here at work that require some integration and we are desperately trying to move away from the old ASMX based services. Unfortunately because I have not touched WCF the whole time I have been here (12 months now, wow! That has gone fast!) and I have found myself at a point where I really need to relook at WCF again and basically relearn it… oh well.
Anyway here is a bunch of stuff that here, that at work we have found to be useful that you may not otherwise be able to do with ASMX or may not be aware you could do with WCF.

IOC and WCF

You can in fact use IoC with WCF, there are some good blog posts and accompanying videos to show what to do and if, like me, you just want one ready to that uses the CSL then The Code Junkie has done it for you!

Dynamic KnownType Resolution

This always erked me that I had to put into the data contract that I knew of other types, it was like really bad tight coupling*. There are a bunch of way to declare known types with the bottom example seeemingly a little known alternative : a provider mechanism

in config

Data contract with attributes


[KnownType(typeof(PurchaseApprovalRequest))]
[DataContract]
public class ApprovalRequest
{...

Knowntype provider

The way I have just found out is by declaring a knowntype provider on the service contract:

[ServiceKnownType("GetKnownTypes", typeof(ApprovalRequestKnownTypesProvider))]
[ServiceContract]
public interface IApprovalService
{...

with the following class (change the implementation to suit yourself, this is from some of my demo code, it’s not recommended!)

internal static class ApprovalRequestKnownTypesProvider
{
public static IEnumerable GetKnownTypes(ICustomAttributeProvider provider)
{
// collect and pass back the list of known types
foreach (var module in Assembly.GetExecutingAssembly().GetLoadedModules())
{
foreach (var knownType in module.FindTypes(
(t, f) => ((Type)f).IsAssignableFrom(t), typeof(ApprovalRequest)))
{
yield return knownType;
}
}
}
}

With these two little nuggets I have been able to produce a pretty handy little broker service that act as a very basic content based router that keeps the client messages very clean and does not expose any implementation details (i.e. no passing of service or workflow names in the message header!)

*NB: to paraphrase Krzysztof: “Polymorphism is an OO term, not SOA term, so I don’t use it, and make my contracts explicit wherever possible.” be wary that you are using known types for the right reasons

Functional .Net : Tuple

Tuple’s really don’t have a lot to do with functional programming, they are a common concept in many language that for some reason are only making their way in the .Net on version 4.0. One could argue that you could have always easily constructed your own Tuple class but unnecessary duplication of such a simple type has obviously become apparent to the BCL team. This is good. Simple classes like this should be present in the framework. 🙂

So what is a Tuple? It is basically a container of a finite list of objects; a Point could be described as a list of 2 values Tuple where the int values could be the X and The Y coordinates of a point. A Date could be described as Turple with the values being Year, Month & Day. It is not a list in that your would typically enumerate through the items but you would reference them by index. You may be ask why is this different to an Array eg int[3]? Well with a Tuple the list length is set at compile time and the type of each index point is also set. Tuple forces you to always have the DateTime as the 3rd item. This is all pretty under-whelming to be honest, but like anything cool its the simplicity that is its strength and also why it pops up in functional styled programming a lot. Future demos are likely to include Tuples 🙂

Functional .Net : Currying

Currying is another functional technique that is possible to achieve with C#. The technique basically allows the rewriting of a function that takes in multiple arguments to one that takes in one argument and returns a function which may in turn take more arguments, the basic premise being able to build up composite function by splitting functions down and reducing the number of parameters dealt with. I will be honest and say that  I have found the language you use (C#, F#, Haskell etc) would be more influential to your predisposition  in using this technique, as it is with many of the function patterns and IMO C# does not lend itself nearly as well as F# for example. That being said it still can be done so lets look at a basic example. For starters currying is something that is not catered for explicitly out the box in C# but can easily be done using extension methods eg:

public static Func<TArg1, Func> Curry(this Func func)
{
return a1 => a2 => func(a1, a2);
}
public static Func<TArg1, Action> Curry(this Action action)
{
return a1 => a2 => action(a1, a2);
}

These extension method now allow you to take a 2 parameter delegate and split it into a one parameter argument return an Action or Func that take one argument. This can obviously be extended and helps facilitate the separation and composition of functions.

Now from my understanding of currying it is a specific form of partial application, being that currying splits its functions down to single argument delegates while partial application make no such claim, i.e. a three argument function be be reduced to a single argument function returning a 2 argument delegate. As this is purely academia I don’t really care, the principle is the same.

A trivial example of currying (I’m lazy I stole it from Matt P and it is using the extension method above) :

Func multiply = (x, y) => x * y;
var curriedMultiply = multiply.Curry();
var curriedMultiplyThree = curriedMultiply(3);
var curriedMultiplyResult = curriedMultiplyThree(15);
Console.WriteLine("Result of 3 * 15 = {0}", curriedMultiplyResult);

Unfortunately the verbosity of C# when approaching this style of coding very quickly begins to put me off.  The equivalent in F# is much more readable, but hey its what the language is strong at so it really should be a nicer experience. Either way its good to know the facilities are there if one day I do ever need to use it.

Basically the take away from this post is the extension methods at the top, with out these there will be no curry love.

Links forwarding (yeah this was a lazy post):

http://codebetter.com/blogs/matthew.podwysocki/archive/2009/04/26/functional-c-forward-functional-composition.aspx

http://mikehadlow.blogspot.com/2008/03/currying-in-c-with-oliver-sturm.html

http://stackoverflow.com/questions/411572/proper-currying-in-c

Functional .Net : Closures

One of the more commonly used functional techniques that can be used in C# is the use of Closures, a technique that if your are currently using lambdas, you may be using them inadvertently. My understanding of closure may be different to others as there seems to be so many subtlety different definitions especially when comparing languages. Anyway, in my mind the comments of Javascript closure best align with my understanding (http://www.jibbering.com/faq/faq_notes/closures.html)

A closure is a delegate that has references to a variable not passed to it and in an scope outside the delegates immediate scope.

Like any delegate its definition and execution are not the same thing. You can define a closure and never use it or just call it later

A simple closure example I can think of is:

static void Main(string[] args)
{
var timesToRepeat = 100;
//Declare the Action
Action print = text => //text (string) is the only parameter
{
//using varb declared outside of Action
for (int i = 0; i < timesToRepeat; i++)
{
Console.WriteLine(text);
}
};
timesToRepeat = 3;//Lets modify the variable
print("Hello!");//Call the action/evaluate the expression
//Prints:
//Hello!
//Hello!
//Hello!
}

Note that the timeToRepeat variable is declared outside of the declaration of the lambda statement. Think about this; the Action ‘print’ can be passed out side of this scope, it could be passed to another class which does not have visibility of the locally declared variable. The ‘print’ expression is bound to that variable declared outside of its scope. This obviously has ramification in terms of holding reference to that object. Please also note that the expression ‘print’, like all delegates is evaluated when it is called, not when it is declared; Stepping over the above code will not print when declaring the ‘print’ Action but at the last line when it is called. One last thing to note is that the variable timetoRepeat is modified after defining the print Action and this is carried when we call ‘print’ in the last line; “Hello!” is printed 3 times, not 100 times as the variable would imply when the closure was declared.

You may have been using closures with out knowing it. Javascript and the associated libraries like jQuery use this technique a lot, as do many open source library such as TopShelf, MassTransit etc.