Tutorial

Prerequisites

To start, you need to have Monary installed. In a Python shell, the following should run without raising an exception:

>>> import monary

You’ll also need a MongoDB instance running on the default host and port (localhost:27017). If you have downloaded and installed MongoDB, you can start the mongo daemon in your system shell:

$ mongod

Making a Connection with Monary

To use Monary, we need to create a connection to a running mongod instance. We can make a new Monary object:

>>> from monary import Monary
>>> client = Monary()

This connects to the default host and port; it can also be specified explicitly:

>>> client = Monary("localhost", 27017)

Monary can also accept MongoDB URI strings:

>>> client = Monary("mongodb://me@password:test.example.net:2500/database?replicaSet=test&connectTimeoutMS=300000")

See also

The MongoDB connection string format for more information about how these URI’s are formatted.

Performing Finds

To perform a “find” query, we first need a data set. We can insert some sample data using the mongo shell:

$ mongo

Then, at the shell prompt, insert some sample documents into the collection named “coll”:

> use test
switched to db test
> for (var i = 0; i < 5000; i++) {
... db.coll.insert({ a : Math.random(), b : NumberInt(i) })
... }

To check that you’ve successfully inserted documents, you can run:

> db.coll.find()

Which will print out the first batch of documents. Each document looks something like this:

{
    a : 0.34534613435643535,
    b : 1
}

To query the database using Monary, you need to specify the name and type of a MongoDB document field.

For example, to retrieve all of the b values with Monary, we use the query() function and specify the field name we want and its type:

>>> with Monary() as m:
...     arrays = m.query("test", "coll", {}, ["b"], ["int32"])

arrays is now a list containing a NumPy masked array with 5000 values:

>>> arrays
[masked_array(data = [0 1 2 ..., 4997 4998 4999],
             mask = [False False False ..., False False False],
       fill_value = 999999)
]

We can also query for both the a and b fields together:

>>> with Monary("localhost") as m:
...     arrays = m.query("test", "coll", {}, ["a", "b"], ["float64", "int32"])
...
>>> arrays
[masked_array(data = [0.7288538725115359 0.4277338122483343 0.5252409593667835 ...,
 0.36620052182115614 0.2733050910755992 0.16910275584086776],
             mask = [False False False ..., False False False],
       fill_value = 1e+20)
, masked_array(data = [0 1 2 ..., 4997 4998 4999],
             mask = [False False False ..., False False False],
       fill_value = 999999)
]

arrays is now an array containing two masked arrays: one for a and one for b. The indicies of the two masked arrays correspond to the same document: for example, arrays[0][250] and arrays[1][250] correspond to the values of a and b in the 250th document:

>>> with Monary("localhost") as m:
...     arrays = m.query("test", "coll", {}, ["a", "b"], ["float64", "int32"])
...
>>> a = arrays[0][250]
>>> b = arrays[1][250]
>>> print a, b
0.653997767251 250

If we return to the mongo shell to check that our document matches, we can run:

> use test
> db.coll.find({"b":250})
{ "_id" : ObjectId("553e815e5d1bdb50241c0e41"), "a" : 0.6539977672509849, "b" : 250 }