PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 12th of January, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:06:56] <mrapple> I'm a bit confused, what's the difference (is there even one) between these two queries?
[03:07:00] <mrapple> db.foo.find({"views.user" : "Test", "views.unread" : 0})
[03:07:03] <mrapple> db.foo.find({"views" : {"$elemMatch" : {"user" : "Test", "unread" : 0}}})
[10:08:50] <phira> if I know I'm going to restart a replicaset member, ie for a system update or something, is it good practice to remove it from the RS first?
[10:08:55] <phira> or doesn't it really matter?
[10:24:24] <NodeX> doens't really matter
[10:24:32] <NodeX> it will resync when it comes back
[10:41:06] <kali> +1 just, just restart it, the less you mess with the RQ config, the less you're likely to make a mistake with it
[10:41:52] <NodeX> kali : you were correct about that TV series the other day, sorry - I forgot to reply
[10:42:02] <NodeX> it's cancelled already though :(
[10:42:16] <kali> ha :)
[10:42:19] <NodeX> a new one aired in USA last night called "Banshee" - it's pretty good too
[10:42:30] <kali> cancelled after the pilot ?
[10:42:30] <NodeX> from the write of "True blood" apparently
[10:42:34] <NodeX> yeh :(
[10:42:43] <NodeX> status = "Pilot not picked up"
[10:43:23] <NodeX> quick RS question .... is there anyway to have a secondary not put indexes into RAM?
[10:43:52] <NodeX> I need a secondary purely as a backup and never intend to switch to it - just want the files
[10:44:39] <kali> well, mongodb does not do anything clever to control the way the RAM is used. it delegates everything to the kernel, so quick answer is no
[10:45:00] <kali> but anyway, you need your replica to update the index to keep track of its primary. and for that, you need them in RAM anyway
[10:45:20] <kali> Alan Ball indeed
[10:45:43] <NodeX> no way I can connect to the shell on the secondary and drop the indexes without it messing things up?
[10:46:15] <kali> mmm i think you can do that :)
[10:46:40] <NodeX> ooo nice
[10:46:43] <kali> but if you need a failover, you'll have to rebuild the index first
[10:46:53] <NodeX> Yer
[10:47:30] <kali> if it does not work just with slaveOk(), i'm pretty sure it will work if you stop the instance, restart it without the replset on a different port and drop the indexes and restart with the normal setup
[10:48:02] <NodeX> aggregation is killing my primary CPU so I have to move it to another box
[10:48:40] <NodeX> but that box also has a mongo instance on it so I need a low level instance that doesn't use much RAM
[11:28:18] <NodeX> anyone ever used ovh ?
[11:40:13] <elias_> Anyone aware of a comparison betwen MongoDB and Cassandra with some examples and detailed explanation why one is better than the other in certain circumstances? We are in the final round of a DB evaluation exercise of a project we need to decide between the two. Briefly, within our team cassandra is considered better as it regards to non-functional requirements such as High Availability, Clustering etc while MongoDB is considered better as
[11:40:13] <elias_> it regards to ease of use during development, data modeling and query flexibility. Since we need both, its hard to make the decision. Any hint from your side or references to external material would be greatly appreciated.
[11:43:00] <NodeX> each comparison is meaningless as it's probably not related to your specific app/requirements
[11:43:42] <NodeX> perhaps explain what you're using it for / the types of queries and we can advise what Mongo will be like in terms of performance and scalability regarding
[11:49:32] <elias_> Yes, I understand that. That's why I am interested in examples than strict feature comparison
[11:50:20] <elias_> its hard to go into example use cases, but some of our objects seem to have a depth of 4 or more levels
[11:50:50] <elias_> which seems to be trouble in cassandra - not blocking though
[11:51:48] <elias_> maybe let me start with a fundamental question on demineralization and its implications
[11:52:04] <elias_> say you have twitter with a user having 100.000 followers
[11:52:56] <elias_> if you store tweets in to a user_timeline collection, a tweet from this user will be copied 100,000 times
[11:54:02] <unsleep> i have a tag list for a suggestion box... the tags are an array index.... if i have the word "word" and i can find it i need to search tag:"word".... how i find it with looking for "w" only?
[11:54:27] <NodeX> regex
[11:54:29] <unsleep> the only way to do it is with a regexp? i dont see it very appropiate
[11:54:39] <NodeX> tag : /^w*/
[11:55:08] <unsleep> its the... "common" way to do it?
[11:55:19] <NodeX> if you want all "w's" yes
[11:55:28] <unsleep> ;)
[11:55:48] <NodeX> you'll need an index on it and also a caret (Prefix) (^) else it wont use the index and will be very slow
[11:56:30] <NodeX> elias_ : is there more to the question or is that it?
[11:56:37] <elias_> yes ...
[11:56:53] <NodeX> ok lol, wasn't sure if you wanted me to extrapolate from that
[11:56:59] <elias_> I mean denormalization than demineralization ...
[11:57:01] <NodeX> me/us
[11:57:17] <elias_> so, in the twitter example, a user_time line collection seems to be a good idea in terms of performance
[11:57:40] <elias_> because the time line is kept readily available
[11:57:52] <unsleep> so doing this like that i need to parse the word before.. isnt it?
[11:57:57] <elias_> but for users with a lot of followers it can create so many replicas
[11:58:22] <NodeX> with a twitter app I wouldn't do it like that tbh
[11:58:26] <NodeX> (personaly)
[11:58:59] <NodeX> looking up those 100k followers will be messy and unperformant so that needs to be in a graph DB probably
[11:59:43] <elias_> any alternative to suggest - not for twitter per se - but for similar problem
[11:59:53] <elias_> how would you do it in mongo though
[11:59:56] <NodeX> Mongo docs currently have a 16mb per document limit on them and in order to keep things fast you would need to store the followers + a little info about them maybe (twitter name or w/e) - this may exceed the doc size
[12:00:20] <unsleep> i preffer to do 10 querys than 10 copies...
[12:00:27] <NodeX> which would mean you needing to shard the user' followers
[12:00:40] <NodeX> unsleep : that's bad performance
[12:00:42] <NodeX> very bad
[12:01:36] <unsleep> create content to the disk again and again is better?
[12:01:36] <NodeX> elias_ : toi be honest the best way will depend on your access pattern
[12:01:46] <elias_> sorry I need to go - will leave this open though to see any of your answers
[12:01:51] <elias_> thanks a lot anyway
[12:01:54] <NodeX> unsleep : yes because it's only done 100k times
[12:02:08] <NodeX> 10 queries could be done 100k times a day forever
[12:02:27] <unsleep> but if the content is always fresh....
[12:02:39] <NodeX> eh?
[12:02:53] <unsleep> this chat lines are always new
[12:02:54] <NodeX> I am not advocating either of these ways
[12:03:09] <NodeX> but 10 queries is bad performance and will always be slower
[12:03:43] <unsleep> 100 followers = 100 copies
[12:03:53] <unsleep> 10 guys seing = 10 queries
[12:04:15] <NodeX> then another 100 queries to view the tweet
[12:04:32] <unsleep> we match xD
[12:04:39] <NodeX> as I said, I am not advocating either way
[12:05:18] <unsleep> write to disk is much more consuming isnt it? (im not prety sure but i think that use the disk but queries are in memory)
[12:05:39] <NodeX> the write is not relevant because it (can be) fire and forget
[12:05:48] <NodeX> and or trickling thru a queue
[12:05:58] <NodeX> (which is also in memory)
[12:07:27] <unsleep> what i do is to write the "username" in a tag array in the content and then search it with find
[12:07:42] <unsleep> not write the content to all the users
[12:07:53] <unsleep> but im newbie
[12:08:21] <NodeX> ^^ the normalised way
[12:08:35] <NodeX> which is not as performant as writing the content to each user
[12:08:49] <NodeX> but does save disk space
[12:09:41] <unsleep> but that look some relational-mysql-sharded model
[12:11:13] <NodeX> it's far from relational
[12:12:29] <kali> it groups data that is needed at the same time at the same place, which is the opposite of what a relational schema would do
[12:13:21] <NodeX> exactly so why are you calling it relationaL
[12:13:49] <NodeX> de-normalised data (mostly) scales better than normalised structures
[12:14:02] <NodeX> but it adds the headache of taking up a lot more space
[12:14:03] <kali> phewww... grim news today
[12:14:12] <NodeX> grim news?
[12:14:17] <NodeX> the TV show?
[12:14:39] <kali> nope
[12:16:02] <kali> today news... france forces at war in mali, three kurds activists killed in paris this week, one french hostage probably killed in somalia during its attempted rescue, about one million religious zealot marching in paris tomorrow against gay marriage
[12:16:07] <kali> + the usual economic crap
[12:22:10] <NodeX> :/
[12:30:00] <kali> omg
[12:30:05] <kali> peggy is zoez bartlet
[12:30:08] <kali> zoey
[12:35:42] <NodeX> lol
[12:35:47] <NodeX> west wing :P
[12:36:39] <kali> and mad men
[12:40:05] <NodeX> not seen mad men
[16:16:55] <warrick> i have never used mongodb, but find the api meets my needs very well. however, i am unsure that mongodb is suited for my use case.
[16:17:37] <warrick> can mongodb maintain 1000-1500 upserts/second
[16:18:27] <warrick> small document size (< 1KB), and two daily collections, which will be cycled out of mongodb
[16:25:03] <ron> warrick: most likely.
[16:28:20] <warrick> ron: thanks
[16:28:44] <warrick> yeah, mainly i am concerned that it wont sustaing the 1000+ upserts/second
[16:29:15] <warrick> im not seeing promising performance with the with the prototype, only 30 upserts/second
[16:29:23] <warrick> something must be terribly wrong
[16:30:47] <ron> well, without more details it would be difficult to say.
[17:16:30] <manuelbieh> hey guys
[17:16:44] <manuelbieh> i'm trying to find a user from a users collection
[17:16:47] <manuelbieh> using $or
[17:17:15] <manuelbieh> but I dont find any users although there definitely is a user with the email i use
[17:17:20] <manuelbieh> db.users.find({email: 'info@manuelbieh.dee', $or: [{email: 'info@manuelbieh.de'}]})
[17:17:28] <manuelbieh> finds nothing
[17:18:04] <manuelbieh> why is that?
[17:18:20] <manuelbieh> db.users.find({email: 'info@manuelbieh.de'}) works fine
[17:19:40] <manuelbieh> ah
[17:19:41] <manuelbieh> got it
[17:20:01] <manuelbieh> db.users.find({$or: [{email: 'foo'}, {email: 'bar'}]}),
[17:47:27] <Mortah> hullo. We just added an index (background true) to a replica set. primary server added it fine. secondaries have got 20% through (and still going) but all queries to secondaries have suddenly started hanging
[17:47:29] <Mortah> any ideas?
[17:48:24] <nemothekid> What would be the best way to drop an entire collection on a single shard without effecting the other shards? I don't think a remove is possible either
[17:49:39] <kali> Mortah: yes. it's a bug
[17:49:49] <kali> it is fixed in 2.2.2 i think
[17:50:01] <Mortah> oh
[17:51:33] <kali> Mortah: https://jira.mongodb.org/browse/SERVER-7501
[17:52:01] <Mortah> ick
[17:53:02] <Mortah> that looks likely..
[17:53:16] <kali> Mortah: you can workaround by building it offline: stop a replica, start it without the replset option and on a different port, build the index manually, set the replica back in line
[17:53:20] <Mortah> (looking at how long we've had issues vs how long the index has been running)
[17:53:26] <kali> Mortah: and iterate over your replica
[17:53:31] <kali> Mortah: or bump to 2.2.2 :)
[17:53:59] <Mortah> as I understand that issue, it will build in foreground on secondaries and lock them out just the same?
[17:55:57] <kali> Mortah: well, the instance will be unreachable instead of stuck, the various clients will get a more sensible behaviour
[17:56:20] <Mortah> yes, that would be preferable
[18:09:27] <Mortah> kali: thanks for the help btw, I saw 2.2.2 come out and decided it wasn't an urgent upgrade :/
[18:09:35] <Mortah> hohum!
[19:51:19] <salentinux> hi guys, is it possible to express an "or and ands" as find condition? for example ((bool1 && bool2) || (bool3 && bool4))
[19:52:17] <salentinux> errata corridge: "or of ands"
[19:54:40] <kali> salentinux: have you tried it with $or ? i think it works
[20:04:03] <salentinux> it works directly in mongo shell! So, there should be a problem in my cakephp code
[20:04:11] <salentinux> sorry kali, you're right
[20:12:44] <wting> I'm using multiple threads to read/write to a MongoDB. Should I have one db connection per thread or one total?
[20:13:03] <wting> *It's a Ruby app.
[20:20:14] <kali> wting: the driver is thread safe, one instance of mongoclient is enough
[21:30:01] <wting> kali: thanks
[21:30:42] <borntyping> Hallo all
[21:30:42] <borntyping> I'm working on a project using MongoEngine, and am wondering whether it is better to use embedded documents or not for storing a set of related objects (that are unique to the parent object)_
[21:30:49] <borntyping> *?
[21:33:45] <mrpro> depends
[21:36:17] <borntyping> mrpro: on?
[21:36:25] <mrpro> if you want to retrieve everything at once
[21:36:27] <mrpro> its nice
[21:36:35] <mrpro> cause you get everyone with one round trip
[21:36:40] <mrpro> everything*
[21:37:02] <mrpro> but you can also exclude fields if needed
[21:37:21] <mrpro> storing a set of related objects… sounds like you want a field which is an array of objects
[21:37:26] <mrpro> but yea it dpends how often you'll modify it etc
[21:37:40] <mrpro> and if it'll gro
[21:37:41] <mrpro> grow
[21:38:15] <borntyping> It'll definitly grow pretty often - it's basically a list of people following a user
[21:39:49] <mrpro> yea honestly i am not sure
[21:39:59] <mrpro> i'd think that if things grow, they'd need to be moved around and stuff
[21:40:14] <mrpro> i'd read more about storage and shit
[21:40:37] <borntyping> Hmm, thanks
[21:41:00] <borntyping> That's pretty much the same answer I've got in other places, but in a little more detail
[23:09:18] <owen1> let's say i have 3 hosts in my replica set. 1 can't see the other two. will he become a primary? what if the other 2 are ok, it's just a network issue?