Filed under: mootools

Apache CouchDB Crash Course

CouchDB is a nosql, document-based database system. When i first started using couchDB it was a little perplexing for my years of SQL and relational database use. This means that i have come away with something that i will share in the hopes to simplify your learning curve, and to offer a simplistic and short explanation that might help in deciding whether or not couchDB fits your needs.

You might also find my V8 javascript crash course helpful ( Link 

Non-relational 

Relational databases simply have a schema, a "best practices" if you like. The term means that certain structures and conventions are enforced, and help you design data in a relational manner.

Relational data just means data that shares common fields, are stored together. They are grouped into a table. Books all share common data right? A title, an author. This makes the data inside the "books" table relational data.

For couch, data is stored in a "document" instead. Documents don't enforce any schema so you can store data however you like. The design choices affect things, obviously, but it stores data as simple, easy to edit JSON items.

If you aren't familiar with JSON, its JavaScript Object Notation, it uses key/value type pairs (that can contain children, like a folder tree) and it is a hugely supported and readily available transport language.

JSON

{ 'name' : 'sven' }

See that? Thats a simple javascript object. It holds a single key, called 'name', and its value is 'sven'. JSON can get pretty complex when doing larger data structures, so it is always good to have a good JSON editor or "prettyfier" handy. CouchDB comes with a relatively useful editor for editing the data, but it can get pretty convoluted pretty quick. Let's look back at our book example.

{ title:'Pro JavaScript with MooTools' , author:'Mark Obcena' }

So, let's complicate the example by adding some release dates. The situation calls for more than one of the same thing, so lets group the release dates into their own object, making them easier to work with. Here are two release dates.

{ 'releaseDate':'1/1/11' } ,  { 'releaseDate':'5/1/11' }

So, two release dates but they feel non-descriptive.

[{
        country: 'UK',
        releaseDate: '1/1/11'
 },
 {
        country: 'ZA',
        releaseDate: '6/1/12'
}]

 

There, that makes a bit more sense. In order to "insert" these two into the JSON object, we enclose them as an array, using the familiar [ ] construct. This leaves our final object looking like this :

{
   title: 'Pro JavaScript with MooTools',
   author: 'Mark Obcena',
   releaseDates: [
      {
         country: 'UK',
         date: '1/1/11'
      },
      {
         country: 'ZA',
         date: '6/1/12'
      }
   ]
}
  Documents

Well, as you can see, a document just contains a JSON object. A tree of them? A single object? See, Couch doesn't mind.

Usually, you store one code side object per document. Take a user for example. Login details, private messages, personal info. They can all be a single document. This actually immediately strikes a chord in the "a huge list is bad" crowd, which is what got me at first. I tried embedding as much as possible so i would have less documents to "speed up" the data flow. Well, this is how it thrives.

The way it works is simple, it caches things. When you ask for all the documents of type "user", and you sort them by "user name", the user name part is cached and makes all consequent accesses on that "query" super fast. It also caches a bunch of other info about the users for being able to access it quickly, and thats what makes the documents great.

Views

A view is quite simply explained by a function that runs on each document of a database. This small function has a map and reduce flow, and rereduce. These are well explained in the docs so i needn't get into much detail here, but lets say we wanted to view the data in our book document. We know what it has has, so we can construct a view to do something specific. For example, list only the author and title. The emit function "returns" the JSON of the request.

function(doc) {
    emit(doc.title, doc.author);
}

Or, lets return all books that have an unknown author,

function(doc) {
    if( ( doc.title && !doc.author ) || 
                ( doc.title && doc.author == 'unknown') )
                        emit(doc.title, 'missing author');
}

Features

Couch doesn't just store bunches of JSON objects. It can do a heck of a lot more for you, and includes templating systems already built in ( think rails, sinatra, cakePHP ). It also supports internal versioning, and conflict handling. It also is a peer driven model, meaning data can be merged smartly. If you add a second author to the book above, it only modifies that entry, and makes things simple when having distributed changes synced back into the db.

Conclusion

There are plenty NoSql database systems. Couch is particularly interesting to me because it covers the most common format in my current engines, JSON.

It also ties my entire engine to being javascript overall, with C++ in the background. Node.js has decent couch support, and it ties together extremely well in my current toolchain. There are many alternatives, that are just as good and widespread. For example, MongoDB is a great alternative to couch depending on your needs.

Thanks for reading.

Its real

  Click here : The book i mentioned. Its gonna be awesome.

 

Posterous theme by Cory Watilo | Mod by FuzzYspo0N