Building Biggy: Resolving Dissonance Between Domain Objects and Backing Store Entities

Posted on March 31 2014 09:43 PM by jatten in C#, Biggy, Database   ||   Comments (0)

untitled-by-deeashleyI've recently been hard at work contributing to an exciting new open source project, Rob Conery's Biggy. Biggy is just getting off the ground, so things are evolving rapidly, and getting things working, for the moment, trumps elegance. At this point, we are refining things and evolving the structure and capabilities of the project.

Biggy offers a fresh take on maintaining an in-memory, synchronized representation of your data, along with a hybrid document/relational persistence model.

The engine Biggy currently uses to talk to relational databases is a tuned-up, modified version of Massive. Originally a highly dynamic micro ORM, for Biggy we have added strong support for static types, and customized things so that Biggy can materialize relational data into domain POCO objects in a high-performance way.

Image by Dee Ashley | Some Rights Reserved

Biggy supports multiple database platforms. While currently the focus is on SQL Server and Postgresql, the notion is that ultimately, Biggy should be able to support any platform which implements the correct IBiggyStore interface (see K. Scott Allen's recent post positing our upcoming new architecture in this regard).

One of the major problems we needed to solve was the mapping of database objects to domain objects. Yes, this is one of the ever-present issues with ORM design (note that Biggy is NOT an ORM, although it shares some handy features with ORMs).

Mapping Domain Object Properties Using Type Introspection

The original Massive, and Biggy as well, use a wee bit of type introspection to identify object property names. Massive would originally assume that each property would correspond to a like-named database field.

Of course, this is not a safe assumption, especially in cases where one is handed an existing database with names which don't follow convention. Also, certain Db platforms (Postgresql among them) are are case-sensitive with respect to object names, and expect, by convention, that multi-part names will be separated using underscores.

Consider a domain object Artist:

And example Artist Class:
public class Artist 
{
    public int ArtistId { get; set; }
    public string LastName { get; set; }
    public string FirstName { get; set; }
}

 

SQL Server is a fairly forgiving, and makes no distinction between a column named LastName and a column named lastName, or a column named lastname. SQL Server doesn't care about case. SQL Server will require delimiters for a column named Last Name, so this would have to be wrapped up like so: [Last Name].

Postgres, however, is a different story. Postgres comes from a Unix heritage, where all the things are case-sensitive. Postgres expects database object names to be all lower-case unless you tell it otherwise by using double-quotes as delimiters. On top of that, when you pass raw SQL in which includes mixed-case object names, Postgres "helpfully" down-cases everything for you, unless specific object names are wrapped in delimiters. This is all fine and dandy, until you have actual database columns which include upper-case characters. 

In Postgres-land, the object naming convention is to lower-case names, and separate words with an underscore. We expect to see columns such as last_name and first_name, and we don't often see things like LastName or lastName. None of which is to say it doesn't happen, of course. In fact, the default Postgres version of our chosen test database, Chinook, arrives out-of-the-box with proper-cased table and column names. For example, the Artist table contains the columns ArtistId and Name.

NOTE: For those interested, I have created a modified version of the Chinook database with object names which conform to the Postgres convention, and which implements serial primary keys. The version available on Codeplex has neither (though it is a terrific sample database nonetheless!).

If you have an Artist table in Postgres with the columns ArtistId and Name, the following SQL will fail:

This SQL Statement will Fail Against Postgres:
SELECT ArtistId, Name FROM Artist

 

In order for the above query to work, it would need to be re-written as follows:

This SQL will Work Against the Artist Table, but YUCK!
SELECT "ArtistId", "Name" FROM "Artist"

 

Lastly, if one is using Postgres platform conventions, dutifully using all-lower-case table and column names, properly separated with underscores, then under the original implementation of Massive and Biggie, we would have to name our domain objects and object properties accordingly:

Example Artist Class With Postgres-Compatible Object and Property Names:
public class artist 
{
    public int artist_id { get; set; }
    public string last_name { get; set; }
    public string first_name { get; set; }
}

 

For the exceptionally simple artist class, this may not seem such a big deal. However, with a number of more complex classes, a bunch of property names which include underscores as word separators might become is simply painful. Not to mention, idiomatic C# code utilizes either proper-casing or camel-casing. It is nice to respect the conventions of whatever platform, language or framework one is using, even when they mix and clash.

Solving Two Distinct Problems

As we see in the examples above, we are actually faced with two similar, yet overlapping problems:

  • Properly mapping domain objects and properties to the appropriate, corresponding database entity or field.
  • Properly handling cases where database object names need to be delimited in a platform-specific manner when serialized into dynamic SQL.

Since Biggy dynamically assembles SQL based on object properties, it was clear that we had a few choices to make. There were two potential approaches to this problem:

The "Opinionated" Approach

Decide which conventions Biggy will support, and encourage users to use these as standard defaults. Provide a mechanism to override the defaults, but throw until they either explicitly apply either our default convention, or the override mechanism. The "Rails" approach, if you will.

The "Kitchen Sink" approach

Assume that client code will sensibly use a few accepted conventions, and try to map those first. If this fails, then attempt to resolve the mapping through progressive permutation. If that fails, then look for an override mechanism. Fail only if we are unable to find a match using magic. More like the ASP.NET MVC approach, if you will.

I was surprised to find Rob was in favor of the second, "Kitchen Sink" approach. I fully expected him to favor a more opinionated policy. In any case, it was decided we would do everything we reasonably could to resolve column names to object properties.

It was decided Biggy would attempt to resolve name mappings along the following lines:

  1. See if there is a direct match between the column name and the property name. If so, we're done - map it.
  2. See if, by ignoring case and/or stripping spaces and underscores out of the Db Column name, a match could be determined. If so, use it.
  3. If neither of the above successfully produces a match, look for a custom attribute on the property in question which specifies the Db Column to map to.
  4. Throw an informative exception indicating the source of the problem.

It would have been easy to decide that seeking out a custom attribute would be the first thing to do, and under what I consider the "opinionated" approach, this would have been the next course of action failing an initial match. However, in one of my favorite observations this year, Rob pointed out that using custom attributes in such a manner is distasteful because "It throws the DB stuff right into your model and it's the ultimate push of the database right through the abstraction."

"I really dislike attributes because it throws the DB stuff right into your model and it's the ultimate push of the database right through the abstraction."

-Rob Conery

I couldn't agree more. And I wish *I* had said that.

Get the Schema from the Source - Information Schema

All that said, at some level, the code needs to know some things about the database in order to use it. Just not at the domain model level.

Since Biggy is purpose-built to maintain an in-memory representation of application data and sync to the store as needed (on writes and updates), I figured, why not go straight to the source for a canonical schema against which to compare property names to columns. To my way of thinking, standard relational databases already provide access to this canon for schema information - the ASNI Standard collection of Views known as INFORMATION_SCHEMA.

At some level, Biggy needs to understand and know about the database schema in order to do its job. We can query Information_Schema, and then cache the result at the lowest level possible. That being the case, we introduced some new class structures to the Biggy code base.

A Column Mapping Class

At the lowest level, I wanted to be able to represent everything Biggy would need to know about a database column and how it maps to its associated object property. Initially, I created the DBColumnMapping class, as follows:

The DBColumnMapping class:
public class DbColumnMapping
{
    string _delimeterFormatString;
    public DbColumnMapping(string delimiterFormatString)
    {
        _delimeterFormatString = delimiterFormatString;
        this.IsAutoIncementing = false;
        this.IsPrimaryKey = false;
    }
  
    public bool IsAutoIncementing { get; set; }
    public bool IsPrimaryKey { get; set; }
    public Type DataType { get; set; }
    public string TableName { get; set; }
    public string ColumnName { get; set; }
    public string PropertyName { get; set; }
    public string DelimitedColumnName
    {
        get { return string.Format(_delimeterFormatString, this.ColumnName); }
    }
}

 

Notice that the database platform delimiter is passed in as a format string. For example, in the case of SQL Server, the format string would look like this

SQL Server Delimiter Format String:
"[{0}]"

 

As you can see from the DBColumnMapping code, the DelimitedColumnName property uses the string.Format() method to wrap the actual column name in the delimiter characters specific to the platform before returning it to the client code.

The two properties related to Primary key status are set to sensible default values. Most columns are NOT primary keys, and most columns are NOT auto-incrementing.

Next, I wanted a way to easily retrieve the column name for a specific property, or the property name from a specific column. To this end, I created the DBColumnMappingLookup class. I composed this around dual Dictionaries which contain references to the same set of DBColumnMapping objects, but which refer to them using different sets of keys. One is keyed with the property name associated with each DBColumnMapping object, and the other uses the column name as the key. In both cases, the actual DBColumnMapping instance contains all the information we might need about the association, once retrieved.

The DBColumnMappingLookup Class
public class DbColumnMappingLookup
{
    Dictionary<string, DbColumnMapping> ByProperty;
    Dictionary<string, DbColumnMapping> ByColumn;
    string _delimiterFormatString;
  
    public DbColumnMappingLookup(string NameDelimiterFormatString)
    {
        _delimiterFormatString = NameDelimiterFormatString;
        this.ByProperty = new Dictionary<string, DbColumnMapping>
            (StringComparer.InvariantCultureIgnoreCase);
        this.ByColumn = new Dictionary<string, DbColumnMapping>
            (StringComparer.InvariantCultureIgnoreCase);
    }
  
    public int Count()
    {
        return this.ByProperty.Count();
    }
  
    public DbColumnMapping Add(string columnName, string propertyName)
    {
        string delimited = string.Format(_delimiterFormatString, columnName);
        var mapping = new DbColumnMapping(_delimiterFormatString);
        mapping.ColumnName = columnName;
        mapping.PropertyName = propertyName;
        // add the same instance to both dictionaries:
        this.ByColumn.Add(mapping.ColumnName, mapping);
        this.ByProperty.Add(mapping.PropertyName, mapping);
        return mapping;
    }
  
    public DbColumnMapping Add(DbColumnMapping mapping)
    {
        this.ByColumn.Add(mapping.ColumnName, mapping);
        this.ByProperty.Add(mapping.PropertyName, mapping);
        return mapping;
    }
  
    public DbColumnMapping FindByColumn(string columnName)
    {
        DbColumnMapping mapping;
        this.ByColumn.TryGetValue(columnName, out mapping);
        return mapping;
    }
  
    public DbColumnMapping FindByProperty(string propertyName)
    {
        DbColumnMapping mapping;
        this.ByProperty.TryGetValue(propertyName, out mapping);
        return mapping;
    }
  
    public bool ContainsPropertyName(string propertyName)
    {
        return this.ByProperty.ContainsKey(propertyName);
    }
  
    public bool ContainsColumnName(string columnName)
    {
        return this.ByColumn.ContainsKey(columnName);
    }
}

 

The Add() method adds each new DBColumn instance to both dictionaries, using the appropriate property value as the key. There is also an overridden Add() method by which a new DBColumn can be added implicitly by passing a column name and a property name as arguments.

Given the above, now, if I have a particular column name and I need to find the matching property, I can simply pass the column name as the argument to the FindByColumn() method and the DBColumn Object will be returned, from which I can grab the associated property name (and any other column details I need).

Likewise, I can test to see if either a specific property, or a specific column is represented in the lookup, by passing the appropriate string to the ContainsColumnName() or ContainsPropertyName() methods.

But wait, john - what if the same column or property name exists for more than one table or object?

Well, that's where the DBTableMapping class comes in.

Mapping Columns to Tables, and Tables to Domain Types

The final piece in the puzzle, the DBTableMapping class gathers all of the columns together for each table, as well as performing similar name mapping function for tables and domain objects.

The DBTableMapping Class:
public class DBTableMapping
{
    string _delimiterFormatString;
    public DBTableMapping(string delimiterFormatString)
    {
        _delimiterFormatString = delimiterFormatString;
        this.ColumnMappings = new DbColumnMappingLookup(_delimiterFormatString);
        this.PrimaryKeyMapping = new List<DbColumnMapping>();
    }
    public string DBTableName { get; set; }
    public string MappedTypeName { get; set; }
    public string DelimitedTableName
    {
        get { return string.Format(_delimiterFormatString, this.DBTableName); }
    }
    public List<DbColumnMapping> PrimaryKeyMapping { get; set; }
    public DbColumnMappingLookup ColumnMappings { get; set; }
}

 

Once again, we initialize DBTableMapping with the database platform delimiter format string. Also not that the DBTableMapping contains a List<DBColumnMapping> specifically to hold references to the primary key(s) for the table in question. While not common, composite PKs do occur in the wild.

Cache Schema Information and Inject into the Store

Coming down the pike very quickly is a revamped architecture for Biggy. We will see the in-memory list functionality cleanly separated from the backing store implementation. At the same time, we might add a "DBCache" object which would be (optionally) injected into the store. The DBCache object would allow a single round trip to the database to retrieve schema info, into the structures described above, which could be passed by reference to store instances at initialization.

The store itself would be concerned with managing the serialization and deserialization of data between the database, the domain objects consumed by Biggy, and your application code. Sitting on top of the store would be an instance of IBiggy itself. The concrete implementation of an IBiggy would be concerned only with managing the in-memory data, and calling into the store to retrieve, write, or synchronize data.

All that said, the DBCache would be the ideal place to cache all of our database schema info. With two quick pokes at Information_Schema at initialization, we can grab everything we need. Let's take a look at how this would work in a SQL Server implementation.

First, the cache (I've omitted non-relevant code to focus on retrieving and loading our column mapping data, and I've simplified things a bit). The cache consists of a base class, (in our case, the DBCache), and a subclass specific to the database platform:

The BiggyRelationalContext Class (Simplified):
public abstract class DBCache
{
    public abstract string DbDelimiterFormatString { get; }
    protected abstract void LoadDbColumnsList();
    protected abstract void LoadDbTableNames();
  
    public string ConnectionString { get; set; }
    public List<DbColumnMapping> DbColumnsList { get; set; }
    public List<string> DbTableNames { get; set; }
  
  
    public DBCache(string connectionStringName)
    {
        ConnectionString = ConfigurationManager
            .ConnectionStrings[connectionStringName].ConnectionString;
        this.LoadSchemaInfo();
    }
  
  
    public void LoadSchemaInfo()
    {
        this.LoadDbTableNames();
        this.LoadDbColumnsList();
    }
    // ... A bunch of other code here unrelated to mapping
}

 

In the above, you can see we have a List<DBColumnMapping> and a List<string> representing database columns and table names, respectively. We are going to use the call to LoadSchemaInfo() to pull in the table names, and the column info from the database using Information_Schema. However, LoadSchemaInfo() in turn calls two abstract methods which are defined on our subclass. In this case, SQLServerCache:

The SQLSeverCache Subclass of DBCache:
public class SQLServerCache : DbCache 
{
    public SQLServerCache(string connectionStringName) 
        : base(connectionStringName) { }
  
    public override string DbDelimiterFormatString 
    {
        get { return "[{0}]"; }
    }
  
    public override DbConnection OpenConnection() 
    {
        var conn = new SqlConnection(this.ConnectionString);
        conn.Open();
        return conn;
    }
  
    protected override void LoadDbColumnsList() 
    {
        this.DbColumnsList = new List<DbColumnMapping>();
        var sql = ""
        + "SELECT c.TABLE_NAME, c.COLUMN_NAME, "
        + "  CASE tc.CONSTRAINT_TYPE WHEN 'PRIMARY KEY' THEN CAST(1 AS BIt) ELSE CAST(0 AS Bit) END AS IsPrimaryKey,  "
          + "  CASE (COLUMNPROPERTY(object_id(tc.TABLE_NAME), kcu.COLUMN_NAME, 'IsIdentity')) WHEN 1 THEN CAST(1 AS Bit) ELSE CAST(0 AS Bit) END as IsAuto "
        + "FROM INFORMATION_SCHEMA.COLUMNS c "
        + "LEFT OUTER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu "
        + "ON c.TABLE_SCHEMA = kcu.CONSTRAINT_SCHEMA AND c.TABLE_NAME = kcu.TABLE_NAME AND c.COLUMN_NAME = kcu.COLUMN_NAME "
        + "LEFT OUTER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS tc "
        + "ON kcu.CONSTRAINT_SCHEMA = tc.CONSTRAINT_SCHEMA AND kcu.CONSTRAINT_NAME = tc.CONSTRAINT_NAME";
  
        using (var conn = this.OpenConnection()) 
        {
            using (var cmd = conn.CreateCommand()) 
            {
                cmd.CommandText = sql;
                var dr = cmd.ExecuteReader();
                while (dr.Read()) 
                {
                    var clm = dr["COLUMN_NAME"] as string;
                    var newColumnMapping = new DbColumnMapping(this.DbDelimiterFormatString) {
                        TableName = dr["TABLE_NAME"] as string,
                        ColumnName = clm,
                        PropertyName = clm,
                        IsPrimaryKey = (bool)dr["IsPrimaryKey"],
                        IsAutoIncementing = (bool)dr["IsAuto"]
                    };
                    this.DbColumnsList.Add(newColumnMapping);        
                }
            }
        }
    }
  
    protected override void LoadDbTableNames() 
    {
        this.DbTableNames = new List<string>();
        var sql = "SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA = 'dbo'";
        using (var conn = this.OpenConnection()) 
        {
            using (var cmd = conn.CreateCommand()) 
            {
                cmd.CommandText = sql;
                var dr = cmd.ExecuteReader();
                while (dr.Read()) 
                {
                    this.DbTableNames.Add(dr.GetString(0));
                }
            }
        }
    }
}

 

In the subclass SQLServerCache, we find the implementation for LoadDBTableNames() and LoadDbColumnsList(), which fetch the table name and column info we need, respectively. That big, ugly blob of raw SQL retrieves all the columns in our database, and includes all the information required for each to properly populate a DBColumnMapping object. Except, of course, the mapped property name. That's next. For now, we just fill it in with the column name. We will overwrite it later as appropriate.

Output from Big Ugly SQL from Information_Schema:

sql-get-column-info-from-information-schema

Mapping Table and Column Data at the Table Level

Now, after all that ceremony, we get to the heart of the matter. We will add a method to the base class DBCache which accepts a type argument <T>, and attempts to map that type to a database table, and then maps the table columns to the type properties.

Ready for a monolithic wall of code? Good! There are some potential refactorings in here, but for the most part, we have a lot of layered type introspection going in, and for now it made as much (more) sense to leave it all together. Here it is, I give you, the getTableMappingForT() method:

The GetTableMappingForT Method:
public virtual DBTableMapping getTableMappingFor<T>() where T : new()
{
    // This is what we want to return to the caller:
    var tableMapping = new DBTableMapping(this.DbDelimiterFormatString);
  
    // Always cache the results of reflection-based calls where possible:
    var item = new T();
    var itemType = item.GetType();
    var properties = itemType.GetProperties();
  
    string replaceString = "[^a-zA-Z1-9]";
    var rgx = new Regex(replaceString);
  
    // Get a down-cased version of the type name
    string flattenedItemTypeName = 
        rgx.Replace(itemType.Name.ToLower(), "");
  
    // Get a pluralized version, in case the table is pluralized:
    string plural = Inflector.Inflector
        .Pluralize(flattenedItemTypeName);
  
    // Does the fully-lower-case type name match any fully-lower-case table name?
    var dbTableName = this.DbTableNames
        .FirstOrDefault(t => 
            rgx.Replace(t.ToLower(), "") == flattenedItemTypeName);
    if (dbTableName == null)
    {
        // If not, does the plural match any table name?
        dbTableName = this.DbTableNames
            .FirstOrDefault(t => rgx.Replace(t.ToLower(), "") == plural);
    }
    else
    {
        // Is the type decorated with a DBTableName Attribute?
        var tableNameAttribute = itemType.GetCustomAttributes(false)
            .FirstOrDefault(a => a.GetType() == 
                typeof(DbTableAttribute)) as DbTableAttribute;
        if (tableNameAttribute != null)
        {
            dbTableName = tableNameAttribute.Name;
        }
    }
  
    tableMapping.DBTableName = dbTableName;
    tableMapping.MappedTypeName = itemType.Name;
  
    // Find the column info for this table in the local DBColumnsList:
    var dbColumnInfo = from c in this.DbColumnsList 
                       where c.TableName == dbTableName select c;
    foreach (var property in properties)
    {
        // Downcase the property name
        string flattenedPropertyName = rgx.Replace(property.Name.ToLower(), "");
        // does the down-cased property name match any donw-cased column name?
        DbColumnMapping columnMapping = dbColumnInfo
            .FirstOrDefault(c => 
                rgx.Replace(c.ColumnName.ToLower(), "") == flattenedPropertyName);
        if (columnMapping != null)
        {
            // use it as-is
            columnMapping.PropertyName = property.Name;
            columnMapping.DataType = itemType;
        }
        else
        {
          // Look for a custom column name attribute:
          DbColumnAttribute mappedColumnAttribute = null;
          var attribute = property.GetCustomAttributes(false)
              .FirstOrDefault(a => a.GetType() == typeof(DbColumnAttribute));
          if (attribute != null)
          {
              // Use the column name found in the attribute:
              mappedColumnAttribute = attribute as DbColumnAttribute;
              string matchColumnName = mappedColumnAttribute.Name;
              columnMapping = dbColumnInfo
                  .FirstOrDefault(c => c.ColumnName == matchColumnName);
              columnMapping.PropertyName = property.Name;
              columnMapping.DataType = itemType;
          }
        }
        if (columnMapping != null)
        {
            // Add the column mapping instance to the 
            // DBTableMapping.ColumnMappings list:
            tableMapping.ColumnMappings.Add(columnMapping);
            if (columnMapping.IsPrimaryKey)
            {
                // Add the DBColumnMapping Instance to 
                // the list of primary key columns for the current table:
                tableMapping.PrimaryKeyMapping.Add(columnMapping);
            }
        }
    }
    return tableMapping;
}

 

The getTableMappingForT method above returns a complete mapping of the table represented by the type <T> to the caller (in this case, an instance of SQLServerStore). The Store can then use the mapping to rapidly retrieve the proper database object mappings for various domain objects as needed to build SQL on-the fly, serialize/de-serialize objects, and ensure that SQL pushed into the database is properly delimited by default.

Custom Attributes as a Last Resort

Notice in the above code there is a point where, while attempting to match a table name, if we have not found a match by any other means, we look for any custom attributes of type DBTableAttribute. Likewise when trying to match column names, we eventually end up looking for a DBColumnAttribute.

In other words, when all else fails, we look to see if the user has decorated a class or property with a custom attribute which provides a name mapping. In reality, the down-cases name comparison, the pluralized comparison, and the regex which allows comparison with any odd characters stripped out, we have likely covered 95% of the name mismatch cases.

But sometimes you may need to map a property to a column name which, for whatever reason, is not going to match. in this case, you can break out the EF style Custom attributes we added for just this purpose.

Custom Attributes for when Nothing Else Works:
public class DbColumnAttribute : Attribute
{
    public string Name { get; protected set; }
    public DbColumnAttribute(string name)
    {
        this.Name = name;
    }
}
  
public class DbTableAttribute : Attribute
{
    public string Name { get; protected set; }
    public DbTableAttribute(string name)
    {
        this.Name = name;
    }
}

 

Putting it All Together

Well, gee whiz, John, that was a long rambling tour. Why do I care?

Well, now Biggy can provide automatic table and column mapping out of the box, 95% of the time. In those edge cases where Biggy can't automagically figure out how to map database objects to domain objects, you can always add some custom attributes to make it work.

For example, something which happens frequently under the hood is that Biggy needs to de-serialize data incoming from the backing store into a proper domain object. If we take a look at some code from the BiggyRelationalStore class, we can see that after initialization, the store reaches into the injected context to retrieve a DBTableMapping instance.

Code Excepted from Base Class BiggieRelationalStore:
// Constructor for Base Class:
public BiggyRelationalStore(DbCache dbCache) 
{
    this.Cache = dbCache;
    this.tableMapping = this.getTableMappingForT();
    // Is there an auto PK? If so, set the member variable:
    if(this.tableMapping.PrimaryKeyMapping.Count == 1) 
    {
        if(this.tableMapping.PrimaryKeyMapping[0].IsAutoIncementing) 
        {
            this.PrimaryKeyMapping = 
                this.tableMapping.PrimaryKeyMapping[0];
        }
    }
}
  
  
public virtual DBTableMapping getTableMappingForT() 
{
    return this.Cache.getTableMappingFor<T>();
}
  
// A bunch of other code...
  
internal T MapReaderToObject<T>(IDataReader reader) where T : new() 
{
    var item = new T();
    var props = item.GetType().GetProperties();
    foreach (var property in props) 
    {
        if (this.PropertyColumnMappings.ContainsPropertyName(property.Name)) 
        {
            string mappedColumn = 
                this.PropertyColumnMappings
                    .FindByProperty(property.Name).ColumnName;
            int ordinal = reader.GetOrdinal(mappedColumn);
            var val = reader.GetValue(ordinal);
            if (val.GetType() != typeof(DBNull)) 
            {
                property.SetValue(item, reader.GetValue(ordinal));
            }
        }
    }
    return item;
}

 

After initialization, the DBTableMapping instance is available locally, so that methods such as MapReaderToObject can map object properties to database columns, retrieve the data from a DataReader, and hydrate a domain object instance.

Explore the Code

Much of the code you see above is in the final stages of development, but has not yet been pushed to the master branch of the Github repomaster branch of the Github repo. I expect it will be soon. In the meantime, I strongly encourage you to pull down the current code and see what we've got happening. The project is young, and it will be evolving rapidly in the weeks to come. There is a lot to see up there, and I hope you both find something useful, and also shoot us some feedback. Especially bug reports!

Additional Resources and Items of Interest

 

Posted on March 31 2014 09:43 PM by jatten     

Comments (0)

Biggy: A Fresh Look at High-Performance, Synchronized In-Memory Persistence for .NET

Posted on March 22 2014 04:58 PM by jatten in C#, Database, Biggy   ||   Comments (2)

the-grid-house-by-elif-ayiter

About a month ago, Rob Conery popped off with the following tweet:

rob-conery-biggy-tweet

And hence, Biggy was born. Rob's "most ridiculous thing he has ever created" has quickly morphed into a high-performance document/relational query tool that is fully LINQ-compliant.

Biggy is an open source project hosted on Github, and is in the early stages of active development. Things have evolved rapidly, and I am proud to say I have contributed some real code and fixes on this, from almost the beginning. This is an exciting project with some interesting, convention-challenging aspects and some cool and really, really knowledgeable leaders and contributors, and I am just damn stoked to be a part of it.

Look, ma! I'm an open source contributor!

Related:

What the Hell is it?


Well, we're not precisely sure . . . ok, we kind of are now. Biggy began as Rob's experiment with a flat-file JSON store, which would be materialized into an in-memory, queryable list structure, which is synchronized with the flat-file store for writes (inserts and updates).

Soon enough, he began experimenting with integrating document-style persistence of data as JSON with standard relational database structures. In Biggy Basics, Part I Rob discusses his thoughts on appropriate persistence models for different types of data:

Our store is a simple process - the "input" data (Products and Customers) generate "output", or "record" data. The input data changes fairly often - Customers logging in and changing things, store owners changing prices, etc.

The output data doesn't change much - in analytical terms this is called "slowly changing over time" - you might go in and tweak an order here and there, but mostly it's a matter of historical record and should never be changed.

To me the "input" stuff (Products/Customers) is perfect for a looser, document structure. The output stuff should not only be in a relational structure - it should be denormalized, ready for analytical export to CSV or some other reporting system.

It was around this time that I jumped in. I've been looking for an OSS project to jump into, for a variety of reasons, and this seemed perfect. The size and scope were such that one could easily grasp the whole of the project. Further, I cut my coding teeth on database stuff, so this was right up my alley.

Ultimately, what Biggy does is combine many of the best characteristics of several data management and persistence models in one package. Biggy is a high-performance, synchronized in-memory query and persistence tool for .NET.

Or, as the README file on Github states, Biggy is "A Very Fast Document/Relational Query Tool with Full LINQ Compliance"

Fast, In-Memory Queries

One of the fundamental aspects of Biggy as a data access tool is that your backing store is represented completely in memory using your domain object model. Where this truly shines is in query performance. Once your data is loaded up, queries are executed, via LINQ, against this in-memory object model.

For example, as things sit currently we could "new up" our data in-memory from the Chinook sample database, and run the following code, with a multi-join LINQ query against it (the Biggy API is evolving as we speak, but this code works against the current master branch available from the Biggy repo as of 3/22/2014):

Example Multi-Join LINQ Query against a Biggy Relational Store:
var _artists = new SQLServerList<Artist>(_connectionStringName, "artist");
var _albums = new SQLServerList<Album>(_connectionStringName, "album");
var _tracks = new SQLServerList<Track>(_connectionStringName, "track");
  
var recordsLoaded = _artists.Count() + _albums.Count() + _tracks.Count();
var actracks = from ar in _artists
             join a in _albums on ar.ArtistId equals a.ArtistId
             join t in _tracks on a.AlbumId equals t.AlbumId
             where ar.Name == "AC/DC"
             select t;
foreach (var track in actracks)
{
    Console.WriteLine("\t-{0}", track.Name);
}

Note, where we "new" up the table data into memory for the first time, we take a little hit as the connection pool is initialized. Not much of one, though! From there, things are FAAASSSST. Here are some really rough performance numbers from my machine (everything is different from one machine to the next):

biggy-acdc-query-perf

Indeed, that triple-join query returned in 1 millisecond. As for the Loading in 117 ms, the bulk of that was the opening of the connection in the first place.

The idea here is that (within sensible parameters determined by our application needs), we can load our application data into memory, and eschew round trips to the database except to perform writes, to keep our in-store model in sync with the back-end.

Structured, Normalized Relational Storage

Let's face it. Despite the advent and popularity of the many flavors of NoSQL, the relational model of data persistence is not going anywhere. Further, Structured Query Language (SQL) also happens to be an excellent way to talk to a database. While often reviled by programmers, I say, man up. When it comes to describing what you want from your data store, SQL beats any "Map-Reduce" style DSL hands down for ease of use and readability.

Also, for a good many types of data, relational data storage is simply (as of this writing, anyway) the best available model, time-tested and mature.

Biggy plays nicely with relational databases. The current implementation focuses on SQL Server and PostgresSql, but as it evolves, it becomes ever easier to port implementation for other SQL database platforms.

As data is added, updated, or deleted in memory, Biggy maintains sync with the backing store without having to reload the in-memory cache.

Loosely Structured, De-normalized Document Storage

For more modest persistence needs (small web applications and the like) Biggy offers an evolved version of Rob's original flat-file JSON store concept. For tables of small to moderate size, performance is strong, and management of data as simple as pointing Biggy at the directory you want to use as your database, and you're off and running. Biggy will de-serialize your JSON data into memory as domain objects. New data is appended to the end of the file as it is added to the in-memory list, and updates are written by flushing all data back to the file.

Biggy also brings the document storage model to your relational database. Domain objects can be persisted as documents within the relational structure, taking advantage of the existing indexing capabilities of the relational Db, and the efficient, flexible record structure of a document Db. Records are saved to a document table indexed with a standard Primary Key, with the actual data serialized as JSON in a "body" field.

This is especially well-suited for PostgreSQL, with its JSON data type. However, SQL Server does just fine with JSON serialized as text. As you might expect, you can save complex objects as a JSON document, and re-serialize later with any parent/child relationships intact:

For example, consider two tables in our database, Artist and Album. We represent these in our domain as you would expect:

Example Artist and Album Classes:
public class Artist 
{
    public int ArtistId { get; set; }
    public string Name { get; set; }
}
  
public class Album 
{
    public int AlbumId { get; set; }
    public string Title { get; set; }
    public int ArtistId { get; set; }
}

 

Using Biggy's hybrid relational/document features, we could decide we need a handy way to store and retrieve artist data in a de-normalized format, such that the artist, and their album catalog, were persisted as a single JSON document, ready for retrieval. We can simply add a handy document container class to our model:

Extending the Artist Class for Document Persistence:
public class ArtistDocument : Artist 
{
    public ArtistDocument() 
    {
        this.Albums = new List<Album>();
    }
    public List<Album> Albums { get; set; }
}

 

Now, we could write something akin to the following code:

var _artists = new SQLServerList<Artist>(_connectionStringName, "artist");
var _albums = new SQLServerList<Album>(_connectionStringName, "album");
  
var list = new List<ArtistDocument>();
foreach(var artist in _artists) 
{
    var artistAlbums = from a in _albums
                     where a.ArtistId == artist.ArtistId
                     select a;
    var newArtistWithAlbums = new ArtistDocument() 
    {
        ArtistId = artist.ArtistId,
        Name = artist.Name,
        Albums = artistAlbums.ToList()
    };
    list.Add(newArtistWithAlbums);
}
var _artistDocuments = new SQLDocumentList<ArtistDocument>(_connectionStringName);
_artistDocuments.AddRange(list);

 

In the code above, we new up our Artist and Album data from the Chinook backing store, and use it to hydrate our document container class, ArtistDocument. We then create a new SQLDocumentList<ArtistDocument> instance.

But wait, John, you say, there's no table for this in Chinook Db. Well, you're correct, there's not. But Biggy knows this, and takes care of that for us. When we initialize a DocumentList<T>, Biggie creates a table for us if one does not already exist.

Once that's done, we just drop our list of complex objects into the AddRange() method and we're done. If we look at our backing store, we see the following table data:

Artist Document Data Persisted in SQL Server as Complex JSON:

complex-artist-documents

The image above may be hard to see in detail, be we have persisted each ArtistDocument record with a serial integer Primary Key, and a body of JSON, which includes the artist data, as well as a JSON array representing the album catalog for each artist.

A Closer look reveals standard JSON:

{"Albums":[
	{"AlbumId":163,"Title":"From The Muddy Banks Of The Wishkah [Live]","ArtistId":110},
	{"AlbumId":164,"Title":"Nevermind","ArtistId":110}],
"ArtistId":110,"Name":"Nirvana"}

 

And yet, in our in-memory list, each artist will be serialized back into its proper domain model, with the parent/child relation between artist and albums intact.

Picking Up Where Massive Left Off

Underlying the relational database integration in Biggy is re-worked version of another of Rob's projects, Massive. We've re-worked some of the internals, and added strong support for static types (the original Massive was all dynamic). Much of the speed and power of Biggy with respect to relational data comes from the innate flexibility of Massive, coupled with some customization to meet the needs of Biggy.

In a way, one might say that Biggy is the next layer that Massive was always waiting for. With the addition of static type handling and easy, synchronized in-memory persistence, Biggy helps take Massive to the next level.

Check it Out, but Be Warned

There's a lot going on with Biggy right now, as I write this. We are exploring a new architecture proposed by none other than K. Scott Allen ( @OdeToCode on Twitter), and things are in flux. That said, if you enjoy messing about with innovative persistence and data access technologies, do visit Github and check out the code, pull it down, and play with it. We need to know how it works "in the wild" and under conditions we haven't thought of.

But, be warned. As I said, things are moving fast, and as far as I know, the API is not locked in yet.

What's in it for Me?

This is the first Open Source Project I have really jumped in to. As I mentioned earlier, the scope was right, the technology I know well, and I think that my views on data access jibe well with the project owner - Rob does interesting things that challenge convention, and I totally subscribe to that. I have been looking for a project to jump on for a while, and this was perfect.

I have been coding for myself (and some odd projects related to my day job) for a few years, and until now have never really worked with someone else's code. I recognized early on that the thing to do is to try to ascertain and stay in tune with the project owner's direction. In fact, this is one of the more interesting aspects of participation, actually. I am accustomed to working my project out my own way, to suit my needs. I have never really had to try and follow someone's "vision" other than my own "how can I make this work" approach.

In the few short weeks I have been contributing to Biggy, I have already grown as a coder, and I intend to keep it up.

What's Next?

Over the next few posts, I'll be taking a look at some of the more interesting things I have run into on this project, and some of the things I have learned. Hope you check out Biggy, and follow along (we can ALWAYS use GOOD bug reports - at present, I am certain there are plenty to be had!).

Resources and Items of Interest

 

Posted on March 22 2014 04:58 PM by jatten     

Comments (2)

C#: Using Reflection and Custom Attributes to Map Object Properties

Posted on March 10 2014 05:40 PM by jatten in C#, CodeProject   ||   Comments (0)

map-to-the-cyan-studio-500I have a general distaste for decorating my code with Attributes and Annotations. Most of the time, I can't help but feel like there must be a better way to accomplish what I am trying to do, and/or that I have somewhere sprung a leak in what should be a helpful abstraction.

Other times, though, custom attributes can be just the tool for the job, and sometimes, the only practical way to solve a problem.

An easy to understand use case for Custom Attributes might be the mapping of object properties to database fields in a data access layer. You have no doubt seen this before when using Entity Framework. In EF, we often utilize System.ComponentModel.DataAnnotations to decorate the properties of our data objects.

Image by Elizabeth Briel | Some Rights Reserved

Here, we're going to take a quick look at creating our own custom attributes.

Use Custom Attributes to Give Hints or Property Metadata

Yes, EF and the System.ComponentModel.DataAnnotations namespace provide a ready-made means to do this, but for one, you may find yourself building your own data access layer or tool, and for another, this is an easy-to-understand example case.

Let's see how we might implement our own version of these data annotations as Custom Attributes. To create a Custom Attribute in C#, we simply create a class which inherits from System.Attribute. For example, if we wanted to implement our own [PrimaryKey] Attribute to indicate that a particular property on a class in our application represents the Primary Key in our database, we might create the following Custom Attribute:

public class PrimaryKeyAttribute : Attribute { }

 

Now, consider a class in our application, the Client class. Client has a ClientId property which corresponds to the Primary Key in the Clients database table:

public class Client 
{
    public int ClientId { get; set; }
    public string LastName { get; set; }
    public string FirstName { get; set; }
    public string Email { get; set; }
}

 

We could simply decorate the ClientId property with our new attribute, and then access this from code as in the simple example following:

Decorate the ClientId property with the Custom Primary Key Attribute:
public class Client
{
    [PrimaryKey]
    public int ClientId { get; set; }
    public string LastName { get; set; }
    public string FirstName { get; set; }
    public string Email { get; set; }
}

 

Use Reflection to Examine the Properties and Attributes of an Object

Then, we can cook up a simple console app demo to see how this works. First, we'll take the long way around, iterating using a foreach structure so we can see more clearly what's going on. Then we will look at a more concise (and efficient) LINQ-based implementation.

Silly Example of Accessing a Custom Property:
static void WritePK<T>(T item) where T : new()
{
    // Just grabbing this to get hold of the type name:
    var type = item.GetType();
  
    // Get the PropertyInfo object:
    var properties = type.GetProperties();
    Console.WriteLine("Finding PK for {0}", type.Name);
    foreach(var property in properties)
    {
        var attributes = property.GetCustomAttributes(false);
        foreach(var attribute in attributes)
        {
            if(attribute.GetType() == typeof(PrimaryKeyAttribute))
            {
            string msg = "The Primary Key for the {0} class is the {1} property";
            Console.WriteLine(msg, type.Name, property.Name);
            }
        }
    }
}

 

In the code above, we pass in a Generic object of type T (meaning, this method could be used with ANY domain object to check for the presence of a [PrimaryKey] attribute). We first use the GetType() method to find the object's Type information, and then we call the GetProperties() method of the Type instance, which returns an array of PropertyInfo objects.

Next, we iterate over each of the PropertyInfo instances, and call the GetCustomAttributes() method, which will return an array of objects representing the CustomAttributes found on that property. We can then check the type of each CustomAttribute object, and if it is of type PrimaryKeyAttribute, we know we have found a property that represents a primary key in our database.

LINQ Makes it Mo' Bettah

We could re-write the code above, using LINQ, for a more compact and efficient method as follows:

The WritePk Method, Re-Written Using LINQ:
static void WritePK<T>(T item) where T : new()
{
    var type = item.GetType();
    var properties = type.GetProperties();
    Console.WriteLine("Finding PK for {0}", type.Name);
    // This replaces all the iteration above:
    var property = properties
        .FirstOrDefault(p => p.GetCustomAttributes(false)
            .Any(a => a.GetType() == typeof(PrimaryKeyAttribute)));
    if (property != null)
    {
        string msg = "The Primary Key for the {0} class is the {1} property";
        Console.WriteLine(msg, type.Name, property.Name);
    }
}

 

This example is fairly simplistic, but illustrates well how we can access CustomAttributes to useful end. 

Another case, which I ran into recently is mapping properties to database columns. In creating a general-purpose data access tool, you never know how database columns are going to align with the properties on your domain objects. In my case, we needed to dynamically build some SQL, using reflection to grab object properties, and map to the database. However, there is no guarantee that the database column names will match the property names on the domain object.

In cases where column names differ from object properties in such a situation, Custom Attributes are one means of dealing with the situation (this is the part where the abstraction layer of the data access tool gets violated by the Db rearing its head into the business object domain . . .).

Use Custom Attributes to Map Properties to Database Columns

The previous example simply used a Custom Attribute simply as sort of a tag on a property. Attributes can also convey information if needed. Let's consider a means to map the property to a specific database column name.

Once again, we create a class which inherits from System.Attribute, but this time we will add a property and a constructor:

The Custom DbColumn Attribute:
public class DbColumnAttribute : Attribute
{
    string Name { get; private set; }
    public DbColumnAttribute(string name)
    {
        this.Name = name;
    }
}

 

Now, let's pretend you inherit a database which you need to integrate with your existing code base. The table from which the client information will be sourced uses all lower-case column names, with underscores between segments instead of proper or camel casing:

SQL For a Table With Column Names Which Do Not Match Class Properties:
CREATE TABLE Clients (
    client_id int IDENTITY(1,1) PRIMARY KEY NOT NULL,
    last_name varchar(50) NOT NULL,
    first_name varchar(50) NOT NULL,
    email varchar(50) NOT NULL
);

 

Now you can do this:

The Client Class with Column Name Attributes:
public class Client
{
    [PrimaryKey]
    [DbColumn("client_id")]
    public int ClientId { get; set; }
  
    [DbColumn("last_name")]
    public string LastName { get; set; }
  
    [DbColumn("first_name")]
    public string FirstName { get; set; }
  
    [DbColumn("email")]
    public string Email { get; set; }
}

 

You can access these attributes, and their properties, from code like so:

Reading Custom Attribute Properties from Code:
static void WriteColumnMappings<T>(T item) where T : new()
{
    // Just grabbing this to get hold of the type name:
    var type = item.GetType();
  
    // Get the PropertyInfo object:
    var properties = item.GetType().GetProperties();
    Console.WriteLine("Finding properties for {0} ...", type.Name);
    foreach(var property in properties)
    {
        var attributes = property.GetCustomAttributes(false);
        string msg = "the {0} property maps to the {1} database column";
        var columnMapping = attributes
            .FirstOrDefault(a => a.GetType() == typeof(DbColumnAttribute));
        if(columnMapping != null)
        {
            var mapsto = columnMapping as DbColumnAttribute;
            Console.WriteLine(msg, property.Name, mapsto.Name);
        }
    }
}

 

"But John," you say, "Entity Framework already does this!"

Precisely. But now you know how it works. Believe me, you may not always have EF at your disposal. Also, you WILL run into databases "in the wild" where column naming conventions do not align with C# Class and property naming conventions (Work with a Postgresql database for five minutes, and get back to me).

Cache Results from Calls Using Reflection Where Appropriate

A quick note, which is only marginally applicable to the examples above, but important in the design of a real-world application. Calls to using reflection can be expensive. Overall, machines these days are fast, and generally, the odd call to GetType() and GetCustomAttributes() are not all that significant. Except when they are.

For example, if the above code were used in a larger application context repeatedly, it might be better to walk through the object properties at object initialization (or even at application load) and map the object properties for each to its respective column name and stash them all in a Dictionary<string, string>. Then, anywhere in your code where the mapping is needed, you can access the primary key name for a specific object by use of the property as a key.

How you do this and where would depend heavily on what you are doing, and the larger structure of your application. For an example of what I am talking about, check out the Biggy project, where I recently had to do this very thing.

Custom Attributes Can Be an Architectural Trade-Off

I was working on an open-source project recently, and the project maintainer wisely pointed out that column-mapping attributes such as the above are "the database pushing right on up through the abstraction." Which is true. In the case of mapping database columns to object properties, we are attempting to solve but one aspect of the age-old impedance mismatch problem faced by all Object-Relational Mapping (ORM) frameworks. It ain't always elegant, but sometimes, it is the only way.

None of this is to say that Custom Attributes are only useful in the context of mapping database columns. There are any number of potential use-cases. While I personally dislike cluttering up my code with attributes and annotations, there will be times when it is the best way to solve a problem.

The next time you find yourself wishing you could know something extra about a particular property or method, Custom Attributes are one more tool in your chest.

Additional Resources and Items of Interest

 

Posted on March 10 2014 05:40 PM by jatten     

Comments (0)

About the author

My name is John Atten, and my "handle" on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, JavaScript/Node, and databases of many flavors. Actively learning always. I dig web development. I am always looking for new information, and value your feedback (especially where I got something wrong!). You can email me at:

jatten at typecastexception dot com

Web Hosting by