Friday, 9 November 2012

Initial thoughts on NoSQL with MongoDB

I have heard about the NoSQL movement for a while now but only recently started to properly look into a NoSQL technology. A few weeks ago I found out that 10gen (the team behind the popular NoSQL database called MongoDB) are starting a free "MongoDB for developers" course. The course runs weekly, with each week containing a number of videos that you watch, take a quiz on and then submit homework for. So far it has been very useful despite some initial hiccups in administrating the course. We're now starting Week 3 and have already covered quite a lot about NoSQL in general and how MongoDB can be used in a CRUD sense.
 
Week 3 of the course dives deeper into schema design in MongoDB and compares this with schema design in the more familiar relational data model. The course also has a project that runs throughout - which is to create a simple blogging engine using MongoDB and Python. It seems like the future weekly homeworks are going to be geared towards extending the blogging engines' functionality. Although the course material is using Python to communicate with MongoDB, there are a number of driver libraries that you can download from the MongoDB website for other programming languages - such as C#. I haven't had a chance to play with the C# driver yet but plan on writing a blog post about my experience with it in the future.
 
My training at university and all of the work experience I have gained so far has been around working with relational database management systems. So it is quite exciting moving away from the relational mindset and thinking about a different way to persist application data. It is weirdly refreshing to work with JSON documents in collections than working with the tables and rows abstraction in relational databases.
 
One thing that the course instructors emphasise about MongoDB is that your schema design should be geared towards how your application will access your data. So unlike in a relational data model, where you typically normalise your model to the third normal form - in MongoDB your data model will reflect the access pattern your application will use to get or update the data. As MongoDB doesn't provide support for joining data, you typically model your data (or JSON documents) so that each document contains all of the necessary information your application will regularly need for a particular use case. This is interesting as I think there is an implicit assumption here that you'd only ever have one particular access pattern for your MongoDB database... for example what if I wanted to build another application that accesses the data for a purpose that is different from when the original MongoDB database was designed? I'm hoping the answer to this question is not along the lines of migrating/copying the data to a schema that's more appropriate for the new application!
 
At this stage of my exploration, another concern with MongoDB is that there doesn't seem to be a quick out-of-the-box method to generate a pretty looking report for the data stored in the collections. In the relational world, SQL with the table and rows abstraction lends itself nicely to generating reports that non-technical users can easily make sense of. With MondoDB, you can query your collections of JSON documents easily enough in the shell using mongo.exe and the find() method on a collection - but the resulting output is in JSON format which probably won't go down too well with a non-technical user. Having said this, there may still be solutions/explanations to these concerns that I haven't come across - so I'm definitely not letting them stop me from diving deeper into the NoSQL MongoDB world.