Friday, May 23, 2014

Mongo Learning Series 1

Mongo Learning

First of all, I want to thank and congratulate the MongoDB team for hosting such a wonderful introductory interactive course.  Good job guys.
For those interested here is the url

It is a 7 week course. The syllabus follows:
Week 1: Introduction
Introduction & Overview - Overview, Design Goals, the Mongo Shell, JSON Intro, installing tools, overview of blog project. Bottle, Pymongo
Week 2: CRUD
CRUD (Creating, Reading and Updating Data) - Mongo shell, query operators, update operators and a few commands
Week 3: Schema Design
Schema Design - Patterns, case studies and tradeoffs
Week 4: Performance
Using indexes, monitoring and understanding performance. Performance in sharded environments.
Week 5: Aggregation Framework
Goals, the use of the pipeline, comparison with SQL facilities.
Week 6: Application Engineering
Drivers, impact of replication and Sharding on design and development.
Week 7 - Case Studies
Interview with Jon Hoffman, foursquare and interview with Ryan Bubunksi, codecademy
Final Exam

My notes covers the important take aways.

Week 1: Introduction

What is MongoDB?
MongoDB is a non relational data store for JSON (Java Script Object Notation) documents. MongoDB is document oriented.
Example JSON:
{“name”:”Prashanth”}
{“a”:1, “b”:2, “c”:3}
JSON document sample with hierarchy:
{“a”:6,
 “b”:7,
Fruit: [“apple”,”pear”,”mango”]}
JSON documents are stored with in MongoDB, what differentiates between relational and MongoDB is the way the document is structured and stored in a way you would use in an application in contrast to tables
MongoDB is schema less. (Dynamic Schema)
You can save {“a”:1, “b”:2} and {“a”:1, “b”:2, “c”:3} in the same collection

MongoDB relative to relational
MongoDB does not support joins
MongoDB does not support transactions across multiple documents
You can access items in a collection atomicly. Since data is hierarchical, something which requires multiple updates with in a relational system can be handled with in a single atomic transaction within a single document.
Overview of building an app with MongoDB
MongoD process is the database server
Mongo process is the Mongo shell
Python was the language used in this class to build the app (Note there are other courses which uses other languages)
Bottle framework – A lightweight WSGI (Web Server Gateway Interface) micro web framework for python was used to host the application

Quick Introduction to Mongo Shell

use test
test is the name of a db
you can use the command show dbs to list all the dbs
you can use the command show collections to list all the collections within a db
when you do a find the json document is printed on the shell based on the query parameters passed. You can make the document look pretty by using the pretty command
db.things.find().pretty()
pretty display as below



Introduction to JSON chapters covers a little more on JSON format
Installing MongoDB, Installing Bottle and Python, and Installing PyMongo covers the installation instructions for Mac and Windows
PyMongo is the Mongo driver
The documentation for the API for the MongoDB drivers is available at http://api.mongodb.org/


Hello World, Mongo style
Bottom of Form

import pymongo
from pymongo import MongoClient
# connect to database
connection = MongoClient('localhost', 27017)
db = connection.test
# handle to names collection
names = db.names
item = names.find_one()
print item['name']

An example of doing the same from the Javascript in the shell is shown in the fig below

An insight in to the save method

If there isn’t an object Id it creates one, if there is one then it updates the document
Hello World on a Web Server


Hello.py

import bottle
import pymongo
# this is the handler for the default path of the web server
@bottle.route('/')
def index():
    # connect to mongoDB
    connection = pymongo.MongoClient('localhost', 27017)
    # attach to test database
    db = connection.test
    # get handle for names collection
    name = db.names
    # find a single document
    item = name.find_one()
    return 'Hello %s!' % item['name']
bottle.run(host='localhost', port=8082)

Mongo is Schemaless
In MongoDB, since the data is not stored in tables there is no need for operations such as alter table as and when the need to store more related data changes.
In real world there might be scenarios where the data attributes is different for different items in the entities.  For example company data. [Company A] might have an office in a different country and hence need to store a whole lot of additional details, while all other companies in the data base might not have offices in multiple countries. In the JSON documents this can be added only to [Company A], as long as there is a way to retrieve that information from the document, these attributes need not be entered in to other documents with empty data
The week is then followed by deep dive in to JSON Arrays, Dictionaries, Sub Documents, and JSON Spec
Introduction to class project : Building a Blog site

Blog comparison with respect to relational


While in comparison all of the above entities will be in one single JSON document



Introduction to Schema Design
To Embed or not to Embed:
Looking at the posts collection in the JSON document, lets say we have tags and comments array. We can decide to keep them in separate documents, however the rule of thumb if the data is typically accessed together then we should put them together

In MongoDB Documents cannot be more than 16MB
If the document size will end up being more than 16MB then split the data in to multiple documents

The chapters that follow includes chapters on Python, which I am not covering in detail in the blog because I want to concentrate on Mongo mostly
Python
Introduction
Lists
Slice Operator
Inclusion
Dicts
Dicts and Lists together
For loops
While loops
Function Calls
Exception handling
Bottle Framework
                URL Handlers
                Views
                Handling form Content

PyMongo Exception Processing
import sys
import pymongo
connection = pymongo.MongoClient("mongodb://localhost")
db = connection.test
users = db.users
doc = {'firstname':'Andrew', 'lastname':'Erlichson'}
print doc
print "about to insert the document"
try:
    users.insert(doc)
except:
    print "insert failed:", sys.exc_info()[0]
doc = {'firstname':'Andrew', 'lastname':'Erlichson'}
print doc
print "inserting again"
try:
    users.insert(doc)
except:
    print "second insert failed:", sys.exc_info()[0]

print doc



Please Note : This is a series of 6 
Reference: All the material credit goes to the course hosted by Mongo

No comments: