C# Developer Blog

Saturday, 3 March 2012

NDataFlow - Open Source .NET Dataflow Library

Last year, a colleague of mine showed me an open source .NET extract, transform and load (ETL) helper library that he's working on. It is called NDataFlow and allows you to annotate methods with attributes to create a dataflow in your application. It's a nice lightweight library that you can use whenever you are developing simple or complex ETL programs. The example below simulates a very simple ETL scenario where a set of people (hard-coded in the example) are filtered based on their location and then output to a file.

class Program : DataflowComponent
{
  static void Main(string[] args)
  {
    new Program().Run();
  }

  //First method in the flow
  [Operation]
  public void DataSource(Output output)
  {
    //Imagine retrieving people from a real data source
    //e.g. database, xml file, csv file, etc.
    output.Emit(new Person() { Name = "Alice", City = "London" });
    output.Emit(new Person() { Name = "Bob", City = "New York" });
    output.Emit(new Person() { Name = "Foo", City = "London" });
    output.Emit(new Person() { Name = "Bar", City = "Sydney" });
  }

  [Operation]
  public IEnumerable FilterForLondonPeople
    ([Link("DataSource")] IEnumerable input)
  {
    return input.Where
      (p => p.City.Equals("London", 
        StringComparison.InvariantCultureIgnoreCase));
  }

  [Operation]
  public void OutputResults
    ([Link("FilterForLondonPeople")] IEnumerable results)
  {
    using (var sw = new StreamWriter(@"C:\LondonPeople.txt", false)
    {
      foreach (var p in results)
        sw.WriteLine("{0} {1}", p.Name, p.City);
    }
  }
}

The example shows that there is little work needed to get a simple dataflow setup and running. You inherit the Run method by deriving from the NDataFlow.DataflowComponent class. Then, if you've setup your method and attributes correctly using the LinkAttribute it's a simple case of calling Run to start your dataflow. In this case, the first method in the dataflow would be DataSource, whose output is sent to FilterForLondonPeople and finally whose output is sent to the OutputResults method.

4 comments:

awchigee7 February 2013 at 16:32
Thank you for this wonderful post, Sir.
I have a few questions. I am developing a simple wireless sensor device simulator where I am going to simulate 2 nodes communicating with each other. My current problem is how to simulate the transfer of data from one node to the other. Do you think this Dataflow library will do the trick?
ReplyDelete
Replies
Ravinder Singh7 February 2013 at 19:35
Hi awchigee,

Are your nodes running on separate concurrent threads? Can there be more than two nodes? If there can be more than two nodes, does each node need to communicate the extracted data to all nodes (i.e. broadcast the data) or is each node only ever communicating with one other node?

I don't see why you couldn't use this ETL library in your simulator, each node can derive from DataflowComponent and run on a separate thread. Your simulation engine that spawns these threads/nodes can randomly call the Run method on each node to simulate a dataflow within a node (i.e. sensor activity). But this still gives you the problem of communicating between nodes - there are probably a number of solutions depending on your answers to my questions :)

ReplyDelete
Replies

Add comment

Pages

Saturday, 3 March 2012

NDataFlow - Open Source .NET Dataflow Library

4 comments: