Friday, May 23, 2014

Mongo Learning Series 2

Week 2: CRUD


CRUD
Operations
Mongo
SQL
Create
Insert
Insert
Read
Find
Select
Update
Update
Update
Delete
Remove
Delete

MongoDB does not use a separate query language
Secrets of the Mongo Shell
Mongo Shell is an interactive java script interpreter
The Mongo shell is designed to be similar to the bash shell. The key strokes are modeled after the emacs editor.
Up arrow brings the previous command
Ctrl A goes to first of the line, can also use the home key
Ctrl e  or Use the end key to the end of the line
Can also move around through the arrow keys or bash customary ctrl f, ctrl b
Type in help provides a list of topics to view
BSON Introduced
BSON stands for Binary JSON, is a serialization format designed to represent a super set of what can be transcribed in JSON format
MongoDB uses a binary representation to store the data in the database
Insert Docs
Db is a variable with a handle on the database
Collections as properties of the database
doc= {“name”:”smith”, “age”:30,”profession”:”hacker”}
db.people.insert(doc) – inserts the doc in to the database
db.people.find() – gets back all the documents in the people collection
“_id” – an object ID, when an document is inserted in to the database, every document needs a unique indentifier, _id is used for the same
_id is the primary key field, It is required to be present and the value in it is required to be unique, it is immutable. To change the value the document will have to be removed and added back with a different value, which would not be an atomic operation
_id : ObjectId (“50843730cb4cf4564b4671ce”)
Object Id is generated taken in to account the current time, identifier of the machine which is constructing the object , process Id of the process that is constructing the object id, a counter that is global to the process
db.people.insert({“name”:”smith”, “age”:30,”profession”:”hacker”})

Introduction to findOne()
findOne methods get one document out of random from the document collection
findOne first argument is a match criteria, analogous to the where clause
Second argument is to specify what fields to get from the database
If “_Id”:false is not explicitly stated then by default _id is always displayed

Introduction to find
First argument is where clause
db.scores.find({type:”essay”})
db.scores.find({student:19})
db.scores.find({type:”essay”,student:19}) – means that both the conditions have to match.
Second argument identifies the fields to get from the database
Querying using $gt, $lt
Db.scores.find({score: { $gt: 95}}) – will find score greater than 95
Db.scores.find({score: { $gt: 95,$lte: 98 }}) – will find score greater than 95 and less than or equal to 98

Inequalities on strings
The inequality operations $gt, $lt can also be applied to strings
db.people.find({name:{$lt:”D”}})
find the records which are lexicographically lesser than “D”
are sorted according to the total order of UTF 8 code units, lexicographically sorting of the bytes UTF 8 representation
MongoDB compares and sorts in an asciibetically correct fashion
All comparison operations in MongoDB are strongly typed and dynamically typed too


In the above figure, there is a document which has a number 42 for a name, please note that the query does not return that document in the result set
Using regex, exists, type
db.people.find({profession:{$exists:true} } )
will find the documents which has the field profession


db.people.find({profession:{$exists:false} } ) – return all documents in which the profession does not exist
db.people.find({name:{$type:2} } ) – type is represented as a number as specified in the specs – string being 2, this query will result in all documents which has name values of type string
patterns in string
Mongo supports PCRE (Perl Compatabile Regular Expression)library
db.people.find({name:{$regex:”a”} } ) – return the list of documents which has letter “a” in the name
db.people.find({name:{$regex:”e$”} } ) – return the list of documents which ends with letter “e” in the name
db.people.find({name:{$regex:”^A”} } ) – return the list of documents which starts with letter “A” in the name

Using $or
Grouping multiple documents together with a logical connective
Union of some documents
Find documents which names ended with an e or had a age
db.people.find({$or:[{name:{$regex:”e$”}},{age:{$exists:true}} ] } )
$or is a prefix operator
Using $and
Logical conjunction
Find only the documents which sorts after c and contains the letter “a” in it
db.people.find({ $and: [ { name:{$gt: “C”}}, {name:{$regex: “a” }} ] })
not used heavily because, there are simpler ways to query the same - db.people.find ({  name:{ $gt: “C”, $regex: “a” }}  ) will have the same result

Querying inside arrays

Query all documents that has the favorite as pretzels
Example: db.accounts.find({favorites: “pretzels” })
The querying is polymorphic, in the above example if the document had a field called favorites, which wasn’t an array it would check the value of the same for pretzels, and if the favorties happened top be an array as is in the above figure it looks at the elements of the array to find the value pretzels in the array
Using $in and $all
Query for more than one value in an array, say pretzels and beer in the above example
db.accounts.find({favorites: { $all: [“pretzels”,”beer”]  })
$in operator
db.accounts.find({name: { $in: [“Howard, “John””] } })

Dot Notation
Querying nested documents

The find query searches for the document byte by byte and performs an exact match search. When searching for subdocuments if you want to query as db.users.find({“email”: {“work”:richard@10gen.com,”personal”:kreuter@example.com}) you will find the document, however if the sub document is reversed db.users.find({“email”:”personal”:kreuter@example.com, {“work”:richard@10gen.com}) the result set will not find any document
Also, subsets of the sud document will not be abel to find a result
db.users.find({“email”:”personal”:kreuter@example.com)
will not find any  document
db.users.find({“email. work”:richard@10gen.com})

Querying Cursors
When you are using an interactive shell such as Mongo and you are executing a command such as db.people.find() in the background a cursor is being constructed and returned in the shell. Shell is configured to print out cursor by iterating through all of the elements that are retrieved from the cursor and printing out those elements
cur  = db.people.find(); null;
null;
Cursor object has a variety of methods
hasNext methods returns true if there is another document to visit on this cursor
next() method returns the next document
while (cur.hasNext()) printjson(cur.next()); -- prints out all the documents in the cursor
cur.limit(5) – imposes a limit of 5 records to iterate through
cur.sort( { name : -1 } ) – returns the sorted records in a lexicographically sorted in reverse for the name field
The sort and limit are NOT processed in memory, rather it is processed in the database engine.
cur.sort( { name : -1 } ).skip(2) – skips 2 records and return the rest

Counting results
db.scores.count ( { type : ”exam” } ) – gives the count of the result set

Wholesale updating of a document
db.people.update ( { name:”smith” } , { “name” : ”Thompson” , “salary”: 50000 }  ) – the first arguments acts as a where clause and the second argument the value to be replaced
Update is however a replacement method where in if you wanted to add one value, you will have to know all the other values and then add the new value.
using the $set command
db.people.update ( { name:”smith” } , { $set : { “age”: 50  }  } )
The above command will look to see if there is already a field called age, if so update the value to 50 else, it will create a field called age and store the value 50 against it.
If we wanted to increment a value then we can use a operator called $inc
db.people.update ( { name:”smith” } , { $inc : { “age”: 1  }  } )
Which in the above command will increment the age of smith by 1, $inc also sets the value if the field does not exist. For example if in the above sample smith did not have an age field in the document, the age will be set to the increment value, in this case 1

Using the $unset command
To remove a particular field from the document, you could use the update field have all the fields in the update command except the field that need to be removed, but is obviously very cumbersome.
db.people.update ( { name:”smith” } , { $unset : { “age”: 1  }  } )
The above command will remove the age field from the document with the name smith

Using $push, $pull, $pop, $pushAll, $pullAll, $addToSet
These operations are used to modify the arrays in a document
Lets use the following document as an example:
{ “_id”: 0, “a”: [ 1 , 2 , 3 , 4 ] }
db.arrays.update ( { _id : 0 } , { $set : {“a.2”: 5 } } )
will modify the third element in the array to 5
{ “_id”: 0, “a”: [ 1 , 2 , 5 , 4 ] }
To add an item in to the arry
db.arrays.update ( { _id : 0 } , { $push : {a: 6 } } )
will add 6 to the array
{ “_id”: 0, “a”: [ 1 , 2 , 5 , 4 , 6] }
db.arrays.update ( { _id : 0 } , { $pop : {a: 1 } } )
will remove the right most element of the array
{ “_id”: 0, “a”: [ 1 , 2 , 5 , 4 ] }
db.arrays.update ( { _id : 0 } , { $pop : {a: -1 } } )
specifying negative 1 will remove the left most element of the array
{ “_id”: 0, “a”: [  2 , 5 , 4 ] }
db.arrays.update ( { _id : 0 } , { $pushAll : {a: [7,8,9] } } )
adds all of the numbers to the specified array
{ “_id”: 0, “a”: [  2 , 5 , 4 ,7,8,9 ] }
db.arrays.update ( { _id : 0 } , { $pull : {a: 5 } } )
will remove the specified value from the array
{ “_id”: 0, “a”: [  2 ,  4 , 7 , 8 , 9 ] }

db.arrays.update ( { _id : 0 } , { $pullAll : {a: [2, 4, 8 ] } } )
will remove all the specified values from the array
{ “_id”: 0, “a”: [7 , 9 ] }

db.arrays.update ( { _id : 0 } , { $addToSet : {a: 5 } } )
will add values to the array by checking if it already exists in the array, if it does then it does nothing. Use addToSet if you want to ensure that duplicates aren’t stored in the array
{ “_id”: 0, “a”: [  5 , 7 ,  9 ] }

Upserts
db.people.update({name:”George”}, {$set :{age: 40} } , {upsert: true} )
This command checks to see if there exists a record matching the criteria and updates that record and if that matching criteria doesn’t exist it creates a new document with the given criteria

Multi-Update
db.people.update({}, {$set :{title: “Dr”} } , {multi: true} )
The update can be effectively applied to multiple documents that match a particular criteria. If the update has to be applied to all the documents , then the first argument can be {}. The above statement will be applied to every document in the collection. Please note that the statement multi:true is very important. If that is not provided then only the first document matching the criteria will be updated
Default behavior in contrast with SQL is to update only one document matching the criteria, unless the second argument multi to true is set.
Inside of mongodb there is a single thread for each operation that is executed. The multi update operation will happen sequentially in a single thread. However, every write operation that effects more than one document is carefully coded in a cooperative multi-tasking fashion to occasionally yield control to allow other operations to affect the same data set.
Lets say we had 10 documents in the collection, above multi update operation could update 3 documents, pause and let other operations, update another 5 documents pause and then complete all the updates.  It is a mutex lock. The concurrency consequences of this is that write operation even if they effect multiple documents are NOT isolated transactions. MongoDB guarantees individual document update is always atomic with respect to any concurrent reads or writes

Removing Data
Deleting documents from the collection.
db.people.remove(  {name:”Alice”}  )
The first argument is analogous to the where/find clause to find the document to remove, if nothing is provided, the command removes all the documents in the collection.
The above command removes the document with the name Alice
db.people.remove(  {name:  {$gt:  “M” } }  )
Documents with names like William or Thompson whose name falls in the given category are deleted
db.people.remove(  )
removes all documents
db.people.drop ()
removes all the documents. Drop is much faster. Remove requires a one by one update of an internal state for each document that was part of the collection. Drop includes freeing up much larger data structure inside of the databases data file.  However, collections can have some meta data in them for instance indexes, which gets discarded when the collection gets dropped, but remains when the documents are removed. Remove command are not atomic isolated transaction with respect to other readers or writers.
getLastError
_id is a primary key, immutable and unique.
db.people.insert(  {name:”Alice”}  ), if this command is run twice via the shell, on the second run the shell will give an duplicate key error message given that the name Alice doesn’t exist in the database the first time around.
getLastError return the outcome of the last write operation that that sent on the connection that is currently being used.
db.runCommand ( {getLastError : 1 } )

getLastError is available in the shell and in the drivers.  getLastError although called error writes the status of the previously run command, even if it was successful.

1 in the above diagram returned an error after running an insert with the same _id twice.
2 in the above diagram ran a successful insert and hence, n=0
3 in the above diagram ran an update successfully with a the details of the update in the getLastError command, please note the n=2, which means that because multi = true was given the update was done on 2 documents.
This command is also particularly useful in the upsert scenario

In the above diagram, you can see that the update was not performed on existing document, rather created a new document.
In a nut shell the command clarifies whether the last performed operation succeeded or failed.


The chapters that follow includes chapters on PyMongo, using Pymongo to find, find_one, cursors, field selection, $gt, $lt, regex, Dot Notation, Sort, Skip, Limit, Inserting, Updating, Upserts, find_and_modify



Please Note : This is a series of 6 
Reference: All the material credit goes to the course hosted by Mongo

No comments: