[08:51:03] <kali> mongodb allows you to aggressively pick speed over data persistence, and in some cases, it's a perfectly valid choice
[08:52:06] <kali> at some point in the past, it was more or less what the default driver setting let you to do. it has changed two years ago, so the current default is more biaised towards persistence
[08:53:20] <kali> bottom line was: if you used mongodb at that time without reading the documentation, without configuring anything, then yes, you could, under stress of hardware failure show that write ops were lost
[08:53:42] <kali> is that... convincing somehow ? :)
[08:53:45] <chovy> i'm getting db.createUser() is not a function. i'm on 2.6.4
[09:14:33] <chovy> how do i create an admin user on a specific database, that has full privileges just for that one db?
[09:25:58] <adsf> is the standard mongo time miliseconds since epoch?
[12:51:24] <drag0nius> what is preferable way to store binary data in mongo?
[12:52:23] <drag0nius> i'm on pymongo, so i should just use bson.binary.Binary?
[12:57:35] <bcrockett> Greetings. I am trying to configure SSL, and have followed the instructions (..../tutorial/configure-ssl/), but I'm having a bit of trouble wrapping my mind around the implementation.
[12:58:20] <bcrockett> Compared to, say, nginx or apache you place the cert/key pair (plus intermediate if needed) in the configuration; and you are off to the races
[13:00:02] <bcrockett> I can't, for the life of me, generate a self-signed certificate (from my CA) that makes both the server & client happy
[13:00:16] <goucha> hi everyone. First of all, I'm still a beginner with mongodb. I'm having some trouble building a query accessing child elements when there are many parents. I couldn't find documentation for this, is there anyone that could check my code, please? I have the json structure and and explanation, I can post it somewhere
[13:00:58] <bcrockett> nor do I understand why the client needs a certificate
[13:04:51] <goucha> the url for my problem is here: http://pastebin.com/Ga90juv3 thanks for having a look
[13:18:23] <joannac> goucha: um, if you want multiple conditions, db.foo.find({a:1, b:1})
[14:31:28] <nylon> does anyone have a good reference video or? docs that cover multiple map/reduce to combined 3-4 collections
[14:32:42] <GothAlice> nylon: Sounds like you have a solution, but what's the problem you're trying to solve with that… complex map/reduce procedure? Why map/reduce and not aggregate queries (which will, in general, be faster).
[14:34:09] <GothAlice> http://docs.mongodb.org/v2.6/reference/operator/aggregation/out/#pipe._S_out is the aggregate pipeline command to output the results to a collection, which sounds like the latter part of your question.
[14:35:28] <nylon> GothAlice: it's a master/details scenario where there are sales orders, sales lines which reduce well in correct hierarchy, however, the products collection has an key of ProductID which is located in saleslines but does not reduce well
[14:46:05] <nylon> GothAlice: do you have any recommendations for a person coming from a relational databases to better understand which approach is better for simulating joins: Aggregation Framework OR MapReduce ?
[14:47:15] <GothAlice> nylon: The question is difficult to answer because both approaches are effectively "fundamentally wrong"—MongoDB strongly tries to get you to really think about how your data is going to be used before jumping in to structure it, unlike SQL where you are strongly encouraged to throw things at the wall and see what sticks.
[14:47:22] <GothAlice> nylon: It always comes down to how you need to use the data.
[14:48:20] <GothAlice> nylon:http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework is a tutorial that demonstrates this for a specific problem domain: gathering of statistical data over time (in this case from ocean bouys). It includes measurements of the impact of different structures in terms of disk space used and query performance.
[14:49:58] <GothAlice> nylon: From my own data, which is a typical "forum" like phpBB, I have four entities: categories, forums, topics/threads, and replies. Because it's far more likely the application will be viewing a thread (and comments on that thread) than anything else, I literally store replies to a topic/thread within the topic/thread record. But note, categories and forums are external to topics/threads, and topics/threads reference (ObjectId field) the
[14:50:43] <GothAlice> In order to get this related data, I manually (or automatically since MongoEngine does some of this for me) need to re-query the database, e.g. given a thread, to get the forum for it: db.forum.find({_id: thread.forum})
[14:50:50] <GothAlice> No need for map/reduce or aggregate queries at all.
[14:51:43] <the-gibson> Hi all, anyone here know how to get in contact with the rockmongo folks? looks like their website is down
[15:01:21] <nylon> GothAlice: you perform this using aggregation?
[15:01:42] <GothAlice> nylon: Nope, those "joins" are 100% just additional .find() queries, nothing fancy whatsoever.
[15:04:03] <GothAlice> nylon: In fact, I fake "joins" by performing multiple queries then manually combining them into one super query. I.e. users = db.users.find({age: {$gte: 18}}) — find all "of age" users, posts = db.threads.find({creator: {$in: users}}) — find all posts by "of age" users.
[15:04:49] <GothAlice> When things in MongoDB start to seem like they're getting too complex, there's probably a simpler way.
[15:08:35] <GothAlice> In MongoEngine the above example query is best written: Thread.objects(creator__in=User.objects(age__gte=18).scalar('id')) ("scalar" here means "I only care about the user's ID, no other data at all")
[15:16:15] <ianp> is NoSQL an appropriate choice for a relationship-heavy datamodel ?
[15:16:40] <mike_edmr> ianp: based on your question alone, i would say no
[15:16:49] <mike_edmr> but really it's "it depends"
[15:17:17] <ianp> I didn't understand GothAlice's comment of "unlike SQL where you are strongly encouraged to throw things at the wall and see what sticks."
[15:17:20] <GothAlice> ianp: It can be, if you can determine a good two-step breakout for it. The "storing replies within the thread document" approach to that relationship has pros and cons, but mostly pros. (You can still append replies, get specific replies, update specific nested replies, etc., etc.)
[15:17:29] <ianp> funny - that's how ALL development is done by most people where I work ;)
[15:17:57] <mike_edmr> choosing the best data store depends on your feature set and what the performance characteristics of the most heavily used operations are
[15:18:01] <ianp> also I usually let ORMs do my SQL/NoSQL querying when I can
[15:18:04] <GothAlice> ianp: SQL is a spreadsheet. That's it. (Sure, you can have rules and triggers and such that enforce constraints, but… it's still just a series of two dimensional grids of data.)
[15:18:41] <ianp> or you mean the query results themselves (they both fit that definition I guess)
[15:19:01] <ianp> whereas you can get hierarchical data back without a lot of ERD in a document based store
[15:19:15] <GothAlice> Treating MongoDB like a spreadsheet may lead to headaches, nausea, loss of appetite, and possibly the kicking of your dog. ;)
[15:19:23] <ianp> the problem is all that data in the hierarchy needs to be constrained etc, leveraging the finetuend things in the RDBMS can be positive
[15:19:26] <mike_edmr> for highly relational data where you cant predict what queries you are going to need to make down the road, and not a lot of need for high write throughput, sql is usually a better starting point
[15:19:39] <mike_edmr> for an app you plan to scale big with a predefined set of features
[15:19:48] <mike_edmr> and document types that lend themselves to those features
[15:19:57] <mike_edmr> mongo might be a far better choice
[15:20:05] <ianp> sounds like a good rule of thumb mike_edmr
[15:25:41] <GothAlice> Compare https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L58-L63 (MongoDB schema) vs. https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L110-L114 (the SQL version of the exact same thing… mostly)
[15:26:06] <GothAlice> https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L153-L182 is the "attach" (or "reattach") code from the MongoDB version. Maintaining the relationship information is relatively easy.
[15:26:19] <GothAlice> Vs. https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L172-L267 in SQL.
[15:26:58] <GothAlice> It took the cooperation of a mathematician for two days to get that stargate method stable.
[15:27:29] <GothAlice> (Well, stargate and the rest of it.)
[15:28:53] <GothAlice> I had the requirement of a hierarchy of documents with order preserved. In SQL I used a combination of adjacency list (left/right) and nested sets (parent). This had the massive penalty of insertions updating potentially every record in the table. In MongoDB… I just store an ordered list of child IDs. Much, much easier.
[15:30:23] <GothAlice> The left/right structure let me query for "all descendants" (not just immediate children) and even count the number of descendants without performing an additional query ([right - left] / 2), so to do that in MongoDB I instead have a list of all parent IDs (from the root to immediate parent). I can query for the presence of an ID in that list and get a count to perform the same thing in MongoDB.
[15:41:39] <GothAlice> The above diatribe (heh; apologies) is a perfect example of why thinking about how you use your data will strongly affect how you model your data. (Breadcrumb navigation is trivial when you have a list of the parents already in the right order…) The naive approach of simply having a parent reference and the SQL classic of an "order" column (integer to sort on) would make use of the data in typical CMS ways exceedingly difficult under
[15:45:19] <ddod> Is there a way to do a findandmodify to multiple documents at once?
[15:46:56] <GothAlice> ddod: Roughly, just with a multiple update. Getting the resulting records back would require a second query. findAndModify itself only operates on single documents, like findOne.
[15:48:03] <ddod> GothAlice: Okay, thanks, I'm still a bit of a noob
[15:48:56] <GothAlice> ddod: If you're familiar with SQL, this page can be a useful resource: http://docs.mongodb.org/manual/reference/sql-comparison/#update-records
[16:19:09] <docdoak> how can I return the 3rd element of a sorted find() query?
[17:06:22] <cheeser> convert that document to ObjectId("53bfe04dca9547a13b08fe10")
[17:07:33] <Tug> yeah I did that but then the query works as expected
[17:07:44] <Tug> and is not as slow as the profiler says
[17:18:37] <synthosl> Hi, I not quite sure whether it bug or not... When I ma trying to connect to mongo with specifying `read_preference` like it descrbe in docs (for example: 'PRIMARY') I've got an error `ConfigurationError: Not a valid read preference`. In the same time if I using this option in lower case, like in ReadPreference._mongos_modes it is works fine.
[17:19:14] <synthosl> For `PRIMARY_PREFERRED` I have the same issue. I.e. it is not converted to camel case from underscore case.
[17:19:29] <synthosl> from pymongo import MongoClient; c = MongoClient(host='mongodb://localhost:27017/mcp', read_preference='PRIMARY');
[17:49:09] <docdoak2> I'd rather have just gotten out of a meeting :-P
[17:53:33] <GothAlice> Heh; thinking about it, I realized exactly what skill goes out the window. After grinding away through a meeting, my ability to reduce problems (and descriptions of solutions) to digestible components for non-developers has been completely sapped. I always end up writing more complicated code after one, and it takes about an hour to return to normal. XD
[17:57:04] <ssarah> i connected to a mongod and i rs.initiated() by mistake
[18:13:14] <lhh> hey guys, have a question about concurrent DB writes. I have two scripts I want to run, one that imports data from an excel sheet and one that grabs data from an API. All of this data is being saved in the same db. if I run both of these processes simultaneously, what will happen with the concurrent writes?
[18:13:48] <lhh> am i going to be at risk of corrupting my database or running into some other issue?
[18:22:08] <GothAlice> lhh: MongoDB interleaves (using a lock) write operations on a per-database level.
[18:22:27] <GothAlice> lhh: So no, you won't damage your data, but it won't be able to do anything in literal "parallel".
[18:23:01] <drag0nius> erm, i'm getting "pymongo.errors.OperationFailure: The dollar ($) prefixed field '$push' in '$push' is not valid for storage."
[18:27:03] <lhh> GothAlice: awesome, thanks a lot. the writes themselves are pretty quick, it's the data processing that's time intensive. sound like I should be good to run them simultaneously and the writes will just queue up
[19:05:37] <GothAlice> drag0nius: I'd need to see the whole query you are attempting to run. Sounds like you're trying to push a value into a field named "$push", which is bad.
[19:10:48] <drag0nius> i was afraid i'd need some aggregations and whatnot
[19:12:35] <drag0nius> how do i represent datetime in queries?
[19:12:54] <drag0nius> in js queries, in python i'm using like datetime.utcnow()
[19:13:16] <GothAlice> Hmm, well, depending on the interpretation of what you want, you may still need an aggregate query. "retrieve entries where there are less than 10 log entries with a date higher than 1" — "record containing X log entries with no more than 10 having a date higher than 1" vs. "record containing no more than 10 entries, at least one of which is higher than 1".
[19:13:49] <GothAlice> (.find() will accommodate the latter, not the former)
[19:14:24] <GothAlice> drag0nius: In either you pass in the native date representation. In Python that's datetime instances, in JS that's Date instances.
[19:15:14] <drag0nius> i have accounts pool (document), and i'm throttling requests to server, so i want get those that made less than X request in last let's say hour
[19:16:08] <GothAlice> drag0nius: For that to be as efficient as possible, you may wish to pre-aggregate your data. Depends on if you want the number of requests to reset every hour on the hour (as an example) or be a literal "X requests in the last 60 minutes" type of affair.
[19:17:33] <GothAlice> One simple way to do that is to have a dedicated collection to store activity data. On each request, insert (with a w=0 write concern to save time) a {user: foo, expires: datetime.utcnow() + timedelta(hours=1)} and an auto-expiring index on "expires". Then to count activity you literally: db.activity.find({user: foo}).count()
[19:18:07] <GothAlice> The auto-expiring index (which is only minute accurate) will automatically cull old activity from the collection. :)
[19:20:33] <GothAlice> https://github.com/bravecollective/core/blob/develop/brave/core/account/model.py#L240-L263 — the method described above is how we track login attempts
[19:22:24] <drag0nius> $size does not support $gt/$lt operators
[19:22:45] <GothAlice> Mmm, looks like a good use for $or, then. >:3
[19:22:54] <GothAlice> (Actually a terrible use for $or… but it should work.)
[21:35:09] <Terabyte> i have a DBObject, and a pojo whose fields map directly onto the contents of DBObject, is there a 1 liner to turn the dbobject into my pojo?