Insert Example¶
This example shows you how to use Monary’s insert
method to send documents
to MongoDB.
Any value that can be queried can also be inserted. Both nested field insertion (via fields containing ”.”) and BSON value insertion are supported as well.
Purpose of Insert¶
Inserts allow you to use Monary to convert data from NumPy masked arrays into
documents stored in MongoDB. Monary’s insert takes in a list of
MonaryParams
.
Monary inserts can also be used to store intermediate data. This can be useful when doing operations on blocks of data with block query.
Setup¶
For this example, let’s insert some unprocessed documents representing students’
test scores into MongoDB. Please see the
MonaryParam example to understand how to
create a MonaryParam
.
First we need to connect to our local DB:
>>> import monary
>>> client = monary.Monary()
Next, we generate the documents. Note that we are using bson.encode
to
store our subdocument:
>>> import bson
>>> import random
>>> al_num = '0123456789abcdefghijklmnopqrstuvwxyz'
>>> scores = []
>>> ids = []
>>> names = []
>>> for _ in range(1000):
... ids.append("".join(al_num[random.randint(0, len(al_num)-1)]
... for _ in range(14)))
... score = {"midterm": random.randint(0, 1000) / 10,
... "final": random.randint(0, 1000) / 10}
... scores.append(bson.BSON.encode(score))
... names.append("...")
Now that we have generated documents, we need to construct a MonaryParam
.
MonaryParams
represent one column, i.e. one field, for a set of BSON documents.
We need the data itself to be in numpy’s masked_array type:
>>> import numpy as np
>>> max_length = max(map(len, scores))
>>> scores_ma = np.ma.masked_array(scores, np.zeros(1000), "<V%d"%max_length)
>>> ids_ma = np.ma.masked_array(ids, np.zeros(1000), "S14")
>>> names_ma = np.ma.masked_array(names, np.zeros(1000), "S3")
Now we can create a MonaryParam
:
>>> types = ["bson:%d"%max, "string:14", "string:3"]
>>> fields = ["scores", "student_id", "student_name"]
>>> values = [scores_ma, ids_ma, names_ma]
>>> params = monary.MonaryParam.from_lists(values, fields, types)
And we can insert it into the database “monary_students”, and the collection “raw”:
>>> client.insert("monary_students", "raw", params)
Using Monary Insert¶
The semester has ended, and it’s time to assign grades to each student. Let’s first get all the raw test data back into NumPy arrays with Monary:
>>> import numpy as np
>>> from monary import Monary
>>> m = Monary()
>>> ids, midterm, final = \
... m.query("monary_students", "raw", {},
... ["student_id",
... "test_scores.midterm",
... "test_scores.midterm"],
... ["string:14", "float64",
... "float64"])
Now we process the scores and assign grades to each student:
>>> grades = [None, None]
>>> for i, arr in enumerate([midterm, final]):
... # curve to average of 2.3333
... mean, stdev = arr.mean(), arr.std()
... grades[i] = (arr - mean) / stdev
... grades[i] += 2.3333
... # bound grades within [0.0, 4.0]
... fours = np.argwhere(grades[i] > 4.0)
... zeros = np.argwhere(grades[i] < 0.0)
... grades[i][fours] = 4.0
... grades[i][zeros] = 0.0
Now weight both tests and assign overall grades:
>>> overall_grades = (grades[0] * 0.4 + grades[1] * 0.6).round(3)
Then we need to create MonaryParams
:
>>> from monary import MonaryParam
>>> id_mp = MonaryParam(ids, "student_id", "string:14")
>>> overall_mp = MonaryParam(overall_grades, "grades.overall")
>>> midterm_mp = MonaryParam(grades[0], "grades.midterm")
>>> final_mp = MonaryParam(grades[1], "grades.final_exam")
Finally, we insert the results to the database:
>>> ids = m.insert("monary_students", "graded",
... [id_mp, overall_mp, midterm_mp, final_mp])
>>> from monary import mvoid_to_bson_id
>>> oids = list(map(mvoid_to_bson_id, ids))
>>> oids[0]
ObjectId('53dba51e61155374af671dc1')
We can see that insert returns a Numpy array containing the ObjectId of the inserted documents.