PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 30th of October, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:09:15] <scrandaddy> thanks!
[03:14:29] <GothAlice> iapain: Heh, thanks for the CTCP requests I just noticed in my logs. ^_^
[03:14:35] <GothAlice> Have a great night everybody!
[08:44:51] <crashev> Hello, around the net there are signs that mongodb loses data - is this still valid anyhow?
[08:49:39] <kali> it's never really been valid
[08:50:07] <crashev> there is really a lot of noise in web about it
[08:50:10] <kali> yeah
[08:50:13] <kali> i'll develop
[08:50:15] <crashev> so wonder what is going on
[08:50:16] <kali> let me make a draft :)
[08:51:03] <kali> mongodb allows you to aggressively pick speed over data persistence, and in some cases, it's a perfectly valid choice
[08:52:06] <kali> at some point in the past, it was more or less what the default driver setting let you to do. it has changed two years ago, so the current default is more biaised towards persistence
[08:53:20] <kali> bottom line was: if you used mongodb at that time without reading the documentation, without configuring anything, then yes, you could, under stress of hardware failure show that write ops were lost
[08:53:42] <kali> is that... convincing somehow ? :)
[08:53:45] <chovy> i'm getting db.createUser() is not a function. i'm on 2.6.4
[09:14:33] <chovy> how do i create an admin user on a specific database, that has full privileges just for that one db?
[09:25:58] <adsf> is the standard mongo time miliseconds since epoch?
[09:41:27] <kali> adsf: yes
[09:45:28] <adsf> kali: thank you :)
[09:45:47] <adsf> now to get my convert on :p
[12:51:24] <drag0nius> what is preferable way to store binary data in mongo?
[12:52:23] <drag0nius> i'm on pymongo, so i should just use bson.binary.Binary?
[12:57:35] <bcrockett> Greetings. I am trying to configure SSL, and have followed the instructions (..../tutorial/configure-ssl/), but I'm having a bit of trouble wrapping my mind around the implementation.
[12:58:20] <bcrockett> Compared to, say, nginx or apache you place the cert/key pair (plus intermediate if needed) in the configuration; and you are off to the races
[13:00:02] <bcrockett> I can't, for the life of me, generate a self-signed certificate (from my CA) that makes both the server & client happy
[13:00:16] <goucha> hi everyone. First of all, I'm still a beginner with mongodb. I'm having some trouble building a query accessing child elements when there are many parents. I couldn't find documentation for this, is there anyone that could check my code, please? I have the json structure and and explanation, I can post it somewhere
[13:00:58] <bcrockett> nor do I understand why the client needs a certificate
[13:04:51] <goucha> the url for my problem is here: http://pastebin.com/Ga90juv3 thanks for having a look
[13:18:23] <joannac> goucha: um, if you want multiple conditions, db.foo.find({a:1, b:1})
[13:19:58] <joannac> collectionObj.find({ "_id" : "myObjectID", 'GRANDPARENTS.PARENTS.KIDS._id': 98758})
[13:22:28] <joannac> goucha: wait
[13:22:38] <goucha> joannac, I don't understand. I think what I need is different
[13:22:38] <joannac> goucha: what are you expecting to get as output?
[13:22:55] <goucha> the kid with the ID i ask for
[13:23:07] <joannac> can't be done with your schema
[13:23:25] <joannac> you can't select a specific array element more than one array deep
[13:23:33] <goucha> so.. does that mean I will always get the same object?
[13:23:47] <joannac> yes, you will get the same toplevel document
[13:23:54] <joannac> you can get a specific grandparent back
[13:24:04] <joannac> but you can't get a specific parent or child back
[13:24:18] <goucha> ok, that clarifies it. thank you
[14:26:36] <GothAlice> joannac: A higher than average number of people over the last two days with exceedingly deep (and thus un-queryable) models. :/
[14:30:23] <nylon> lo
[14:31:28] <nylon> does anyone have a good reference video or? docs that cover multiple map/reduce to combined 3-4 collections
[14:32:42] <GothAlice> nylon: Sounds like you have a solution, but what's the problem you're trying to solve with that… complex map/reduce procedure? Why map/reduce and not aggregate queries (which will, in general, be faster).
[14:34:09] <GothAlice> http://docs.mongodb.org/v2.6/reference/operator/aggregation/out/#pipe._S_out is the aggregate pipeline command to output the results to a collection, which sounds like the latter part of your question.
[14:35:28] <nylon> GothAlice: it's a master/details scenario where there are sales orders, sales lines which reduce well in correct hierarchy, however, the products collection has an key of ProductID which is located in saleslines but does not reduce well
[14:46:05] <nylon> GothAlice: do you have any recommendations for a person coming from a relational databases to better understand which approach is better for simulating joins: Aggregation Framework OR MapReduce ?
[14:47:15] <GothAlice> nylon: The question is difficult to answer because both approaches are effectively "fundamentally wrong"—MongoDB strongly tries to get you to really think about how your data is going to be used before jumping in to structure it, unlike SQL where you are strongly encouraged to throw things at the wall and see what sticks.
[14:47:22] <GothAlice> nylon: It always comes down to how you need to use the data.
[14:48:20] <GothAlice> nylon:http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework is a tutorial that demonstrates this for a specific problem domain: gathering of statistical data over time (in this case from ocean bouys). It includes measurements of the impact of different structures in terms of disk space used and query performance.
[14:49:58] <GothAlice> nylon: From my own data, which is a typical "forum" like phpBB, I have four entities: categories, forums, topics/threads, and replies. Because it's far more likely the application will be viewing a thread (and comments on that thread) than anything else, I literally store replies to a topic/thread within the topic/thread record. But note, categories and forums are external to topics/threads, and topics/threads reference (ObjectId field) the
[14:49:58] <GothAlice> related data.
[14:50:43] <GothAlice> In order to get this related data, I manually (or automatically since MongoEngine does some of this for me) need to re-query the database, e.g. given a thread, to get the forum for it: db.forum.find({_id: thread.forum})
[14:50:50] <GothAlice> No need for map/reduce or aggregate queries at all.
[14:51:43] <the-gibson> Hi all, anyone here know how to get in contact with the rockmongo folks? looks like their website is down
[14:52:10] <GothAlice> the-gibson: Try iwind.liu@gmail.com
[14:52:35] <GothAlice> Note that he's in Beijing, so replies may be delayed depending on your timezone.
[14:53:55] <the-gibson> GothAlice: thanks
[15:01:21] <nylon> GothAlice: you perform this using aggregation?
[15:01:42] <GothAlice> nylon: Nope, those "joins" are 100% just additional .find() queries, nothing fancy whatsoever.
[15:04:03] <GothAlice> nylon: In fact, I fake "joins" by performing multiple queries then manually combining them into one super query. I.e. users = db.users.find({age: {$gte: 18}}) — find all "of age" users, posts = db.threads.find({creator: {$in: users}}) — find all posts by "of age" users.
[15:04:49] <GothAlice> When things in MongoDB start to seem like they're getting too complex, there's probably a simpler way.
[15:08:35] <GothAlice> In MongoEngine the above example query is best written: Thread.objects(creator__in=User.objects(age__gte=18).scalar('id')) ("scalar" here means "I only care about the user's ID, no other data at all")
[15:16:15] <ianp> is NoSQL an appropriate choice for a relationship-heavy datamodel ?
[15:16:40] <mike_edmr> ianp: based on your question alone, i would say no
[15:16:49] <mike_edmr> but really it's "it depends"
[15:16:50] <ianp> fair answer, it's too vague
[15:17:17] <ianp> I didn't understand GothAlice's comment of "unlike SQL where you are strongly encouraged to throw things at the wall and see what sticks."
[15:17:20] <GothAlice> ianp: It can be, if you can determine a good two-step breakout for it. The "storing replies within the thread document" approach to that relationship has pros and cons, but mostly pros. (You can still append replies, get specific replies, update specific nested replies, etc., etc.)
[15:17:29] <ianp> funny - that's how ALL development is done by most people where I work ;)
[15:17:57] <mike_edmr> choosing the best data store depends on your feature set and what the performance characteristics of the most heavily used operations are
[15:18:01] <ianp> also I usually let ORMs do my SQL/NoSQL querying when I can
[15:18:04] <GothAlice> ianp: SQL is a spreadsheet. That's it. (Sure, you can have rules and triggers and such that enforce constraints, but… it's still just a series of two dimensional grids of data.)
[15:18:24] <ianp> an RDBMS, yes
[15:18:41] <ianp> or you mean the query results themselves (they both fit that definition I guess)
[15:19:01] <ianp> whereas you can get hierarchical data back without a lot of ERD in a document based store
[15:19:15] <GothAlice> Treating MongoDB like a spreadsheet may lead to headaches, nausea, loss of appetite, and possibly the kicking of your dog. ;)
[15:19:23] <ianp> the problem is all that data in the hierarchy needs to be constrained etc, leveraging the finetuend things in the RDBMS can be positive
[15:19:26] <mike_edmr> for highly relational data where you cant predict what queries you are going to need to make down the road, and not a lot of need for high write throughput, sql is usually a better starting point
[15:19:39] <mike_edmr> for an app you plan to scale big with a predefined set of features
[15:19:48] <mike_edmr> and document types that lend themselves to those features
[15:19:57] <mike_edmr> mongo might be a far better choice
[15:20:05] <ianp> sounds like a good rule of thumb mike_edmr
[15:25:41] <GothAlice> Compare https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L58-L63 (MongoDB schema) vs. https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L110-L114 (the SQL version of the exact same thing… mostly)
[15:26:06] <GothAlice> https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L153-L182 is the "attach" (or "reattach") code from the MongoDB version. Maintaining the relationship information is relatively easy.
[15:26:19] <GothAlice> Vs. https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L172-L267 in SQL.
[15:26:21] <GothAlice> Which blows the mind.
[15:26:58] <GothAlice> It took the cooperation of a mathematician for two days to get that stargate method stable.
[15:27:29] <GothAlice> (Well, stargate and the rest of it.)
[15:28:53] <GothAlice> I had the requirement of a hierarchy of documents with order preserved. In SQL I used a combination of adjacency list (left/right) and nested sets (parent). This had the massive penalty of insertions updating potentially every record in the table. In MongoDB… I just store an ordered list of child IDs. Much, much easier.
[15:30:23] <GothAlice> The left/right structure let me query for "all descendants" (not just immediate children) and even count the number of descendants without performing an additional query ([right - left] / 2), so to do that in MongoDB I instead have a list of all parent IDs (from the root to immediate parent). I can query for the presence of an ID in that list and get a count to perform the same thing in MongoDB.
[15:41:39] <GothAlice> The above diatribe (heh; apologies) is a perfect example of why thinking about how you use your data will strongly affect how you model your data. (Breadcrumb navigation is trivial when you have a list of the parents already in the right order…) The naive approach of simply having a parent reference and the SQL classic of an "order" column (integer to sort on) would make use of the data in typical CMS ways exceedingly difficult under
[15:41:39] <GothAlice> MongoDB.
[15:45:19] <ddod> Is there a way to do a findandmodify to multiple documents at once?
[15:46:56] <GothAlice> ddod: Roughly, just with a multiple update. Getting the resulting records back would require a second query. findAndModify itself only operates on single documents, like findOne.
[15:47:52] <GothAlice> ddod: db.foo.update({query}, {update}, {multi: true}); db.foo.find({query})
[15:48:03] <ddod> GothAlice: Okay, thanks, I'm still a bit of a noob
[15:48:56] <GothAlice> ddod: If you're familiar with SQL, this page can be a useful resource: http://docs.mongodb.org/manual/reference/sql-comparison/#update-records
[16:19:09] <docdoak> how can I return the 3rd element of a sorted find() query?
[16:20:18] <cheeser> skip()
[16:20:47] <GothAlice> If you really only want the third, you'll want to add limit() to that, as well.
[16:21:39] <docdoak> I only want the third though
[16:22:04] <docdoak> for example: db.zips.find( { state: "NY"} ).sort( {pop: -1} )
[16:22:10] <docdoak> returns them in descending sort
[16:22:17] <docdoak> if I wanted the third most populated city, how would I do that?
[16:22:23] <docdoak> I'm not sure where to put skip/limit
[16:22:29] <GothAlice> docdoak: http://docs.mongodb.org/manual/reference/method/cursor.skip/
[16:22:39] <GothAlice> docdoak: http://docs.mongodb.org/manual/reference/method/cursor.limit/
[16:22:46] <docdoak> thanks alice
[17:03:28] <Tug> Is $oid an internal operator ?
[17:04:04] <Tug> I'm usign the profiler in mms and it shows queries like: { "streamer.cid" : "5uJAPUrmwQK5qHZP1Ijkxxs2.D04kUK3y6CmJmhUz6gf1Rek2acfPbTVArln8j4FGfFE" , "site" : { "$oid" : "53bfe04dca9547a13b08fe10"}}
[17:05:03] <Tug> I tried to copy it in the mongo shell but it fails: "Can't canonicalize query: BadValue unknown operator: $oid"
[17:06:02] <cheeser> i hate that.
[17:06:22] <cheeser> convert that document to ObjectId("53bfe04dca9547a13b08fe10")
[17:07:33] <Tug> yeah I did that but then the query works as expected
[17:07:44] <Tug> and is not as slow as the profiler says
[17:18:37] <synthosl> Hi, I not quite sure whether it bug or not... When I ma trying to connect to mongo with specifying `read_preference` like it descrbe in docs (for example: 'PRIMARY') I've got an error `ConfigurationError: Not a valid read preference`. In the same time if I using this option in lower case, like in ReadPreference._mongos_modes it is works fine.
[17:19:14] <synthosl> For `PRIMARY_PREFERRED` I have the same issue. I.e. it is not converted to camel case from underscore case.
[17:19:29] <synthosl> from pymongo import MongoClient; c = MongoClient(host='mongodb://localhost:27017/mcp', read_preference='PRIMARY');
[17:19:37] <synthosl> Is it bug?
[17:26:53] <docdoak> is there something like $unique?
[17:27:00] <docdoak> if I'm trying to count how many cities are in each state
[17:27:39] <docdoak> db.zips.aggregate( {$group: { _id: "$state", cities: { $unique: $city" }}})
[17:27:41] <docdoak> something like that?
[17:29:26] <chovy> how do i create an admin user on a specific database, that has full privileges just for that one db?
[17:35:37] <GothAlice> docdoak: You could $group on $city, then re-group on $state.
[17:36:29] <docdoak2> gothalice: group on city then state?
[17:36:36] <docdoak2> but what if multiple states have the same city name?
[17:37:27] <docdoak2> also i only saw one line, so you may have typed more (i lost conn)
[17:37:33] <GothAlice> chovy: db.createUser({user: 'foo', pwd: '', roles: ["dbOwner"]})
[17:37:48] <GothAlice> chovy: http://docs.mongodb.org/manual/reference/built-in-roles/#dbOwner — dbOwner combines readWrite, dbAdmin, and userAdmin.
[17:38:20] <GothAlice> docdoak2: Then $group on {state: "$state", city: "$city"}. :3
[17:38:34] <docdoak2> ok, i have already done that group
[17:38:40] <docdoak2> but how do I count the rows?
[17:39:23] <GothAlice> That'll group on unique combinations of state + city. $sum: 1
[17:39:38] <GothAlice> docdoak2: http://docs.mongodb.org/manual/reference/operator/aggregation/sum/#grp._S_sum
[17:40:37] <docdoak2> ok so I have db.zips.aggregate( { $group: { _id: {state: "$state", city: "$city"}}}, { $group: {_id: 1, count: {$sum: 1}}})
[17:40:39] <docdoak2> but thats giving me a count of all the unique ones
[17:41:21] <docdoak2> is there a way to make it by state?
[17:42:04] <GothAlice> What, exactly, are you trying to get? (Answer without resorting to any query semantics.)
[17:42:14] <docdoak2> how many unique cities there are in each state
[17:42:33] <docdoak2> NY: 200, NJ: 300
[17:42:34] <docdoak2> etc.
[17:42:48] <GothAlice> Oh, $group on $state and $sum: 1 that.
[17:42:49] <docdoak2> the data is all zip codes though, so there are multiple entries for each city
[17:43:50] <GothAlice> $group on state and city, $sum: 1 the unique cities, then $group on state and $sum the result of the first sum?
[17:45:09] <GothAlice> Even easier. {$group: {_id: {state: "$state", city: "$city"}}}, {$group: {_id: "$_id.state", cities: {$sum: 1}}}
[17:45:24] <GothAlice> First group finds unique cities, second group counts them by state.
[17:46:09] <docdoak2> fantastic alice
[17:46:12] <docdoak2> thats what i was stuck on
[17:46:17] <docdoak2> thank you!
[17:46:19] <GothAlice> You'll get back pairs like: {_id: "NY", cities: 200}
[17:46:25] <docdoak2> yeah, it worked
[17:46:28] <GothAlice> Yeah.
[17:46:29] <docdoak2> and it makes sense to me now
[17:46:36] <docdoak2> though I'm still pretty terrible with this mongodb thing
[17:46:38] <docdoak2> haha
[17:47:50] <GothAlice> Every time I come out of a meeting I feel like I've suffered brain damage.
[17:48:51] <docdoak2> I'm a substitute teacher
[17:49:00] <docdoak2> in the classroom right now
[17:49:09] <docdoak2> I'd rather have just gotten out of a meeting :-P
[17:53:33] <GothAlice> Heh; thinking about it, I realized exactly what skill goes out the window. After grinding away through a meeting, my ability to reduce problems (and descriptions of solutions) to digestible components for non-developers has been completely sapped. I always end up writing more complicated code after one, and it takes about an hour to return to normal. XD
[17:57:04] <ssarah> i connected to a mongod and i rs.initiated() by mistake
[17:57:06] <ssarah> how do i revert it?
[17:57:08] <ssarah> =)
[17:59:30] <decompiled> I think you have to re-setup the rs.conf stuff
[17:59:43] <GothAlice> ssarah: Hmm. That's an interesting problem. Have you tried rs.reconfig({}) ?
[18:02:56] <GothAlice> ssarah: Seems to be even easier: http://serverfault.com/questions/424465/how-to-reset-mongodb-replica-set-settings
[18:07:19] <ssarah> thanks, GothAlice
[18:09:15] <ssarah> by the time i saw your solution i had already deleted the dbpaths -_-
[18:11:26] <chovy> GothAlice: thanks, that user will be restricted to only the database in use? with `use dbname`?
[18:12:32] <GothAlice> chovy: Yes.
[18:13:14] <lhh> hey guys, have a question about concurrent DB writes. I have two scripts I want to run, one that imports data from an excel sheet and one that grabs data from an API. All of this data is being saved in the same db. if I run both of these processes simultaneously, what will happen with the concurrent writes?
[18:13:48] <lhh> am i going to be at risk of corrupting my database or running into some other issue?
[18:22:08] <GothAlice> lhh: MongoDB interleaves (using a lock) write operations on a per-database level.
[18:22:27] <GothAlice> lhh: So no, you won't damage your data, but it won't be able to do anything in literal "parallel".
[18:23:01] <drag0nius> erm, i'm getting "pymongo.errors.OperationFailure: The dollar ($) prefixed field '$push' in '$push' is not valid for storage."
[18:27:03] <lhh> GothAlice: awesome, thanks a lot. the writes themselves are pretty quick, it's the data processing that's time intensive. sound like I should be good to run them simultaneously and the writes will just queue up
[19:05:37] <GothAlice> drag0nius: I'd need to see the whole query you are attempting to run. Sounds like you're trying to push a value into a field named "$push", which is bad.
[19:05:56] <drag0nius> i resolved this
[19:06:08] <drag0nius> problem was i was adding another field in query near '$push'
[19:06:24] <drag0nius> {last_activity: ..., $push: {...}}
[19:08:01] <drag0nius> how could i request a document where there are less than 10 entries in 'log' array matching a query?
[19:08:21] <cheeser> $size?
[19:08:39] <drag0nius> entries like: {'log':[{date: 0}, {date: 1}, {date: 2}]}
[19:09:24] <GothAlice> db.foo.find({log: {$size: {$lt: 10}}, "log.somefield": criteria})
[19:09:31] <drag0nius> and i want to retrieve those entries where there are less than 10 entries with date higher than 1
[19:09:48] <drag0nius> ok
[19:09:51] <drag0nius> let's try
[19:10:05] <GothAlice> In that case, "log.somefield": criteria turns into "log.date": {$gt: 1}
[19:10:27] <drag0nius> ok thanks
[19:10:48] <drag0nius> i was afraid i'd need some aggregations and whatnot
[19:12:35] <drag0nius> how do i represent datetime in queries?
[19:12:54] <drag0nius> in js queries, in python i'm using like datetime.utcnow()
[19:13:16] <GothAlice> Hmm, well, depending on the interpretation of what you want, you may still need an aggregate query. "retrieve entries where there are less than 10 log entries with a date higher than 1" — "record containing X log entries with no more than 10 having a date higher than 1" vs. "record containing no more than 10 entries, at least one of which is higher than 1".
[19:13:49] <GothAlice> (.find() will accommodate the latter, not the former)
[19:14:24] <GothAlice> drag0nius: In either you pass in the native date representation. In Python that's datetime instances, in JS that's Date instances.
[19:14:30] <drag0nius> k
[19:14:38] <drag0nius> basically what i want
[19:14:42] <GothAlice> (Or better yet, ISODate.)
[19:15:14] <drag0nius> i have accounts pool (document), and i'm throttling requests to server, so i want get those that made less than X request in last let's say hour
[19:16:08] <GothAlice> drag0nius: For that to be as efficient as possible, you may wish to pre-aggregate your data. Depends on if you want the number of requests to reset every hour on the hour (as an example) or be a literal "X requests in the last 60 minutes" type of affair.
[19:17:02] <drag0nius> i want the literal part
[19:17:33] <GothAlice> One simple way to do that is to have a dedicated collection to store activity data. On each request, insert (with a w=0 write concern to save time) a {user: foo, expires: datetime.utcnow() + timedelta(hours=1)} and an auto-expiring index on "expires". Then to count activity you literally: db.activity.find({user: foo}).count()
[19:18:07] <GothAlice> The auto-expiring index (which is only minute accurate) will automatically cull old activity from the collection. :)
[19:20:33] <GothAlice> https://github.com/bravecollective/core/blob/develop/brave/core/account/model.py#L240-L263 — the method described above is how we track login attempts
[19:22:24] <drag0nius> $size does not support $gt/$lt operators
[19:22:45] <GothAlice> Mmm, looks like a good use for $or, then. >:3
[19:22:54] <GothAlice> (Actually a terrible use for $or… but it should work.)
[20:13:10] <drag0nius> meh
[20:13:28] <drag0nius> $size: {$not:...} doesn't work either?
[20:14:04] <drag0nius> nvm
[20:14:06] <drag0nius> wrong order
[20:32:19] <drag0nius> that 'log': $size..., log.date... you suggested doesn't work at all
[20:32:31] <drag0nius> log.date does not limit size
[21:23:05] <drag0nius> is DBRef anything but a data structure?
[21:23:19] <drag0nius> i don't see any dereferencing or anything else for queries
[21:24:13] <cheeser> you don't see what now?
[21:26:23] <drag0nius> automatic turning reference into instance
[21:27:56] <joannac> yes, it's just a data structure
[21:28:15] <joannac> you need to roll your own, or I think Mongoose has stuff to auto-populate references
[21:34:33] <Terabyte> hey
[21:35:09] <Terabyte> i have a DBObject, and a pojo whose fields map directly onto the contents of DBObject, is there a 1 liner to turn the dbobject into my pojo?