Use Cross-Platform/OSS ExcelDataReader to Read Excel Files with No Dependencies on Office or ACE

Posted on October 17 2013 07:28 PM by jatten in C#, CodeProject, Tools   ||   Comments (0)

Planet-Sun-Textures-500I recently posted an article about LinqToExcel,  a fantastic library I discovered while trying to solve a problem for a work-related project. The premise was that I wanted a library I could use to read data from MS Excel files which may be uploaded to a website, and LinqToExcel seemed to fit the bill nicely. Until, that is, I went to deploy to Windows Azure Websites (a requirement of the project in question).

Image by Elif Ayiter | Some Rights Reserved

My Problem with LinqToExcel

LinqToExcel takes a dependency on the ACE Database Engine (I knew this, but it didn't occur to me that this would be an issue until I attempted to deploy), a redistributable .exe which must be installed on the host machine, and which is not available under a standard Azure Website configuration (and likely not easily available under many shared hosting plans, either). In point of fact, one could not even use ADO.NET/OleDb to read Excel data under these circumstances, as the .NET OleDb provider also requires the ACE database engine in order to connect to Excel as a data source.

ExcelDataReader: Open Source, Cross-Platform, No Proprietary Dependencies

Enter ExcelDataReader, an Open Source project which facilitates reading data from Excel spreadsheets by working directly with the Binary Excel file (.xls files), and/or abstracting the OpenXml library to provide a more familiar Data Access paradigm (.xlsx files).

Also note that ExcelDataReader is Cross-Platform and will run on Windows, as well as (under Mono) OSX, and Linux.

I will admit, the API for using ExcelDataReader is not as intuitive or friendly as that exposed by LinqToExcel. That said, ExcelDataReader is considerably easier to use (but more limited in functionality) than the raw Office OpenXml interface, as well as other popular Open Source libraries such as NPOI. These two libraries offer significantly more functionality than does ExcelDataReader (or, for that matter, LinqToExcel). They are focused as much on document creation and content manipulation as they are on simply reading data. However they bring substantial complexity and are less intuitive.

ExcelDataReader, in keeping with its name, models the underlying workbook and worksheets using the somewhat dated ADO.NET DataSet and DataTables model.

Get ExcelDataReader from the Nuget Gallery

As with LinqToExcel, ExcelDataReader is available via Nuget. You can simply do:

Get ExcelDataReader via the Nuget Package Console:
PM> Install-Package ExcelDataReader

 

Or, you can right-click on your solution in the Visual Studio Solution Explorer, select "Manage Nuget Packages for Solution" and use the GUI. When the window opens, make sure the "Online" tab is selected in the left-hand pane, then search for "ExcelDataReader." Click Install:

Get ExcelDataReader via the Nuget Package Manager GUI:

get-excel-data-reader-via-nuget-package-manager

The Simplest Way to Access Data Using ExcelDataReader

Now that you have ExcelDataReader installed in your project, let's get right to some simple examples.

First off, as previously mentioned, ExcelDataReader works with no dependencies on either the ACE Database engine, or the Office Interop libraries. It starts with the Excel binary file, therefore, we need to provide a FileStream object to the ExcelReaderFactory.

It is also important to note that ExcelDataReader works with both .xls and .xlsx file formats. In doing so, the ExcelReaderFactory provides two methods, CreateBinaryReader() and CreateOpenXmlReader() to work with each format respectively. Both methods return an instance of the interface IExcelDataReader.

For our example, we will create a simple class, ExcelData, encapsulating the file and data access logic. We will pass a file path into the constructor, and build out some methods to get at our Excel data. So to being with, we create the new class, and implement the constructor and getExcelReader() method:

A Basic Class to provide access to ExcelDataReader Functionality
using Excel;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;
  
namespace ExcelDataReaderExamples
{
    public class ExcelData
    {
        string _path;
  
        public ExcelData(string path)
        {
            _path = path;
        }
  
  
        public IExcelDataReader getExcelReader()
        {
            // ExcelDataReader works with the binary Excel file, so it needs a FileStream
            // to get started. This is how we avoid dependencies on ACE or Interop:
            FileStream stream = File.Open(_path, FileMode.Open, FileAccess.Read);
  
            // We return the interface, so that 
            IExcelDataReader reader = null;
            try
            {
                if (_path.EndsWith(".xls"))
                {
                    reader = ExcelReaderFactory.CreateBinaryReader(stream);
                }
                if (_path.EndsWith(".xlsx"))
                {
                    reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
                }
                return reader;
            }
            catch (Exception)
            {
                throw;
            }
        }
    }
}

 

Note in the above we needed to add the using System.IO and using Excel to our using statements at the top of the file.

From here, we can add some more useful functionality to our class. Let's start with some Workbook metadata, and add a getWorksheetNames() method:

Adding a getWorksheetNames Method to the ExcelData Class:
public IEnumerable<string> getWorksheetNames()
{
    var reader = this.getExcelReader();
    var workbook = reader.AsDataSet();
    var sheets = from DataTable sheet in workbook.Tables select sheet.TableName;
    return sheets;
}

 

So now we begin to see the underlying DataSet and DataTable-based model employed by ExcelDataReader. The call to reader.AsDataSet() returns (wait for it . . .) a System.Data.DataSet object, which contains a DataTableCollection representing each worksheet within the workbook. In the code above, we grab the Name property of each using Linq and return a resulting IEnumerable.

Doing Real Work: Reading Spreadsheet Data Using ExcelDataReader

So, at its most basic, we now know that ExcelDataReader is going to hand us back our old favorite, the ADO.NET DataTable. For the purpose of our example, let's add a very simple getData() method. We're going to add two arguments here. The first is the file path to our Excel Workbook. The other is a Boolean argument used to tell ExcelDataReader whether or not the first row in our Worksheet represents column names. The IExcelDataReader interface conveniently defines a property for just this purpose.

Adding a getData() method to the ExcelData Class:
public IEnumerable<DataRow> getData(string sheet, bool firstRowIsColumnNames = true)
{
    var reader = this.getExcelReader();
    reader.IsFirstRowAsColumnNames = firstRowIsColumnNames;
    var workSheet = reader.AsDataSet().Tables[sheet];
    var rows = from DataRow a in workSheet.Rows select a;
    return rows;
}

 

We could then write some example client code as follows, just to see how it's all working. Say we had some sample spreadsheet data like this:

Sample Spreadsheet Data

sample-spreadsheet-data

We could write some (sadly) familiar-looking code to iterate over the returned IEnumerable<DataRow> and write the results out to the console:

Example Client Code Demonstrating the getData() Method:
void WriteData()
{
    string path = @"D:\Users\John\Documents\Albums.xlsx";
  
    var excelData = new ExcelData(path);
    var albums = excelData.getData("Sheet1");
    foreach (var row in albums)
    {
        Console.WriteLine(Environment.NewLine);
        for (int i = 0; i <= row.ItemArray.GetUpperBound(0); i++)
        {
            Console.WriteLine(row[i].ToString());
        }
    }
}

 

Materializing an Object from Excel Data

The code above is simple, and likely does not resemble what one would actually use in any real application. But you get the picture. It would be a simple matter to write a method mapping the Excel data into an existing POCO class, if were were confident the spreadsheet would contain the proper data and column names. On top of that, depending on our business case, we could implement some staging and validation which would inform the user that the spreadsheet in some way does not meet the requirements for import.

Materializing an Object from the Excel Data:
public class Album
{
    public string Name { get; set; }
    public string Artist { get; set; }
    public string Genre { get; set; }
}
  
  
IEnumerable<Album> getAlbums()
{
    List<Album> albumList = new List<Album>();
    string path = @"D:\Users\John\Documents\Albums.xlsx";
  
    var excelData = new ExcelData(path);
    var albums = excelData.getData("Sheet1");
    foreach (var row in albums)
    {
        var album = new Album()
        {
            Name = row["Name"].ToString(),
            Artist = row["Artist"].ToString(),
            Genre = row["Genre"].ToString()
        };
        albumList.Add(album);
    }
    return albumList;
}

 

Yeah. Remember all that DataTable/DataReader stuff? It's not pretty, but it works. We could re-write our WriteData() method as follows, and again output the data to the console:

void WriteData()
{
    foreach(var album in albums)
    {
        Console.WriteLine("{0} {1} - {2}", album.Name, album.Artist, album.Genre);
    }
}

 

ExcelDataReader is a Purpose-Driven Tool

ExcelDataReader is not as elegant as LinqToExcel, nor as functional as NPOI or the OpenXml libraries. However, unlike LinqToExcel, it has no dependencies on external database engines which may not be available on a remote server (including Windows Azure Websites!), and it affords an interface which is at least familiar. Unlike NPOI or OpenXml, there is very little learning curve.

In a perfect world, the LinqToExcel and ExcelDataReader libraries would be rolled into one. In fact, since they are both OSS, that may be something worth looking at . . .

If your use case involves deployment to Windows Azure Websites, or any other machine where the ACE Database engine is not available, ExcelDataReader will get the job done, and spare the increased complexity of OpenXml or NPOI if you need to import or otherwise read Excel data into your application. Also, ExcelDataReader is cross-platform and will run on Windows, OSX, and Linux (both of the latter using Mono).

If you expect to have (or be able to install) the ACE Database engine, and if cross-platform deployment is not a concern, then LinqToExcel is a better choice from a coding standpoint.

OpenXml and NPOI are both very flexible and widely supported libraries. However, for the simple reading/importing of Excel data, they are a little more difficult to learn and use, and you can count on a lot of ordinal index-based referencing of cells and such.

Additional Resources and Items of Interest

 

Posted on October 17 2013 07:28 PM by jatten     

Comments (0)

Comments are closed

About the author

My name is John Atten, and my username on many of my online accounts is xivSolutions. I am Fascinated by all things technology and software development. I work mostly with C#, Java, SQL Server 2012, learning ASP.NET MVC, html 5/CSS/Javascript. I am always looking for new information, and value your feedback (especially where I got something wrong!). You can email me at:

jatten@typecastexception.com

Web Hosting by