Week 4: Performance
Indexes
Database performance is driven by indexes for MongoDB as any
other database
Databases stores the data in large files on disk, which
represents the collection. There is no particular order for the documents on
the disk, it could be anywhere. When you query for a particular document, what
the database will have to do by default is scan through the entire collection
to find the data. This is called a table scan in a relational DB and a
collection scan in Mongo DB and it is death to performance. It will be extremely
slow. Instead the data is indexed to perform better.
How does indexing work:
If something is ordered/sorted then it is quick to find the
data. MongoDB keeps the key ordered.
MongoDB does not keep the keys linearly ordered, but uses
BTree. When looking for the items, look for the key in the index which has a
pointer to the document and thus retrieve the document.
In MongoDB indexes are ordered list of keys
Example:
(name, Hair_Color, DOB)
Inorder to utilize an index, you have give it a left most
set of items
As in provide: name
or name and hair color
than just DOB
Every time a data needs to be inserted in to the database
the index also needs to be updated. Updating takes time. Reads are faster,
however the writes takes longer when you have an index.
Lets say we have an index on (a,b,c)
If a query is done on b, index cannot be used
If a query is done on a, index can be used
If a query is done on c, index cannot be used
If a query is done on a,b:
index can be used, it uses 2 parts of the index
If a query is done on a,c:
index can be used, it uses just the a part and ignores the c part
Creating Indexes
db.students.ensureIndex({student_id:1})
db.students.ensureIndex({student_id:1,class:-1}) – Compound
index
Negative indicates descending. Ascending vs descending
doesn’t not make a big difference when you are searching, however makes a huge
difference when you are sorting. If the database use the index for the sort
then it needs to be in the right order.
You can also makes it 3 part index.
Discovering Indexes
db.system.indexes.find() – will give all the indexes in the database.
db.students.getIndexes()– will give all the indexes in the
given collections.
db.students.dropIndex( {Student_id:1}) - will delete/drop
the index
MultiKey Indexes
In MongoDB you can hold a key which is an array
tags: [“cycling”,”tennis”,”football”]
ensureIndex ({tags:1})
When you index an key which is an Array, A MultiKey Index is
created.
Rather than create one index point for a document, while
creating an index if MongoDB sees an array, it will create an index point for
every item in the array.
MongoDB also lets to create a compound index with arrays.
Mongo restricts having 2 keys to be arrays and being indexed
at the same time. Compound index on 2 arrays is restricted.
Indexes are not restricted to the top level alone.
Index can be created on sub
areas of the document as well
For example.
db.people.ensureIndex({‘addresses.tag’:1})
db.people.ensureIndex({‘addresses.phones’:1})
Index creation
Option, Unique
Unique index enforces a constraint that each key can only
appear once in the index
db.stuff.ensureIndex ( {‘thing’:1}, {unique:true} )
Removing duplicates when creating unique indexes
db.stuff.ensureIndex ( {‘thing’:1}, {unique:true,
dropDups:true} )
Adding dropDups will delete all duplicates. There is no
control on the document to be deleted, hence it is important to exercise
caution before using this command
Index creation
Option, Sparse
When and index is created on a collection and more than one
document in the collection is missing a key
{a:1, b:1, c:1}
{a:2, b:2}
{a:3, b:3}
If an index is created on c
First document has c in it and hence ok, for the second
document mongo considers c to be null and the third document also does not has
c and hence null. Since c is null and unique is specified this cannot be
allowed
In scenarios where duplicates cannot be dropped, there is a
unique problem
Querying documents in the collection with sparse index will
not change the result set
However, sorting on collections with sparse index results in
result set which ignores the document with out the index sparse keys
Indexes can be created foreground or on the back ground. Default : foreground.
When the index is created in the foreground it blocks all
writers
Foreground indexes are faster
While running indexes with background:true option, it will
be slow but does not block writers
In production systems when there are other writers to the
database and doesn’t use replica sets, creating indexes as background tasks is
mandatory so that the other writers are not blocked.
Using Explain
Important query metrics such as , Index usage pattern,
execution speed, number of scanned documents etc. can be identified by using the explain
command
Explain details:
{
"cursor" :
"",
"isMultiKey" : ,
"n" :
,
"nscannedObjects" : ,
"nscanned"
: ,
"nscannedObjectsAllPlans" : ,
"nscannedAllPlans" : ,
"scanAndOrder" : ,
"indexOnly" : ,
"nYields"
: ,
"nChunkSkips" : ,
"millis" :
,
"indexBounds" : { },
"allPlans"
: [
{
"cursor" : "",
"n" : ,
"nscannedObjects" : ,
"nscanned" : ,
"indexBounds" : { }
},
...
],
"oldPlan"
: {
"cursor" : "",
"indexBounds" : {
}
}
"server" :
"",
"filterSet" :
}
Choosing an Index
How does MongoDB choose an Index
Lets say, the collection has an index on a, b and c
We will call that query plan 1 for a, 2 for b, and 3 for c
When we run the query for the first time, Mongo runs all the
three query plans 1, 2 and 3 in parallel.
Lets say, query plan 2 was the fastest and completed
processing, mongo will return the answer to the query and memorize that it
should use that index for similar queries.
Every 100 odd queries it will forget what it knows and rerun the
experiment to know which one performs better.
How Large is your index
Index should be in memory. If index is not in memory and is
on disk and if we are using all of it, it will impact the performance severely.
.totalIndexSize() command gives the size of the index
Index Cardinality
Cardinality is a measure of the number of elements of a set
How many index points for each different type of index that
MongoDB supports
In a regular index, every single key you put in an index
there will be an index point, and in addition if there is no key there will be
an index point under the null entry, so you get 1:1 relative to the documents
In Sparse index, when a document is missing the key being
indexed it is not in the index. Because it is a
null, and nulls are not kept in the index for Sparse index. So here,
Index cardinality will be potentially less than or equal to the number of
documents
In Multikey Index, an index on array value there will be
multiple index points for each document. And hence, the cardinality will be
more than the number of documents.
Index Selectivity
Being selective on indexes are very important, which is no
difference to RDBMS
Lets see an example of Logging with operation codes
(OpCodes) such as Save, Open, Run, Put, Get
If can have an index on lets say (timestamp, OpCodes) or the
reverse (Opcodes, timestamp)
If you know the particular time when you are interested to
see what happened then (timestamp, OpCodes) makes the most sense, while the
reverse could have had millions of records on a certain operation.
Hinting an Index
Generally, MongoDB uses its own algorithm to choose an
index, however if you wanted to tell MongoDB to use an particular index you can
do so by using the hint command
Hint({a:1,b:1})
If you want MongoDB to not use an index and use a cursor
that goes through all the documents in the collection, then you can use the
natural
Hint({$natural:1})
Hinting in Pymongo example
Efficiency of Index
Use
Searching on regexes which are like /abcd/ with out
stemming, comparison operators such as $gt, $ne etc are very inefficient even
with indexes
In which cases based on the knowledge of the collection you can
hint for the appropriate index to use rather than the default index used by
Mongo
Geo Spatial indexes
Allows you to find things based on location
2D and 3D
2D: cartisian plan (x
and y coordinates)
You want to know what closest stores to the person.
In order search based on location, you will need to store
‘location’: [x,y]
Index the locations
ensureIndex({‘location’:’2d’,type:1})
while querying then you can use
find({location:{$near:[x,y]}}).limit(20)
Database will return the documents in order of increasing
distance.
Geospatial Sperical
Geo Spatial indexes considers the curvature of the earth.
In the database the order for the x and y coordinates are
longitude and latitude
Db.runCommand( { geoNear: ‘stores’, near:[50,50],
spherical:true, maxDistance :1})
The stores is the collection
It is queried with the run command instead of the find
command
Logging slow queries
MongoDB automatically logs queries which are slow, > 100
ms.
Profiling
Profile writes entries/documents to system .profile which
are slow (specified time)
There are three levels for the profiler 0, 1 and 2
0 default means off
1 log slow running queries
2 log all queries – more for debugging rather than
performance
db.system.profile.find().pretty()
db.getProfilingLevel()
db.getProfilingStatus()
db.setProfilingLevel(1,4)
1 sets it to log slow running queries and 4 sets it to 4
milliseconds
Write the query to look in the system profile collection for
all queries that took longer than one second, ordered by timestamp descending.
db.system.profile.find({millis:{$gt:1000}}).sort({ts:-1})
Mongostat
Mongostat named after iostat from the unix world, similar to
perfmon in windows
Mongotop
Named after the Unix Top command. It indicates or provides a
high level view of where Mongo is spending its time.
Sharding
Sharding is the technique splitting up a large collection amongst
multiple servers
Mongos lets you shard
The way Mongo shards is that you choose a shard key, lets
say student_id is the shard key.
As a developer you need to know that, for inserts you will
also need to send the shard key, the entire shard key if it is a multi parted
shard key in order for the insert to complete.
For an update or a remove or a find, if MongoS is not given
a shard key then it will have to broadcast the request to all the shards. If
you know the shard key, passing the shard key will increase the performance of
the queries
MongoS is usually co-located with the application and you
can have more than one MongoS
How to get all the keys of a document
var message = db.messages.findOne();
for (var key in message) {
print(key);
}
Please Note : This is a series of 6
Reference: All the material credit goes to the course hosted by Mongo
No comments:
Post a Comment