PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 6th of February, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:26:26] <Siraris> If I have a field with Andrew Jones and one with Andrew Jones Smith, and I do a find for Andrew Jones, how do I make sure I don’t get Andrew Jones Smith?
[03:40:40] <joannac> Siraris: erm, if you do an exact match, that won't happen?
[03:40:52] <Siraris> What do you mean?
[03:41:04] <joannac> Siraris: how are you structuring your query?
[03:41:41] <Siraris> db.questions.find({"study_number":{$regex : /Cro5/i}});
[03:41:51] <joannac> don't use regex
[03:42:01] <Siraris> sigh
[03:42:27] <Siraris> I have to, I don’t know what the case will be
[03:42:35] <joannac> then normalise it
[03:43:21] <Siraris> That’s what I was afraid of
[03:43:22] <joannac> or put a $ at the end
[03:43:31] <joannac> (not sure if that works)
[03:44:09] <Siraris> Thanks joannac :)
[03:44:11] <Siraris> You’re a life saver
[03:55:42] <jackythanh> hi, anyone please help me traceroute from mms.mongodb.com to 203.162.55.1. I need to check lattency from MMS to our server.
[04:00:15] <joannac> jackythanh: why do you need to know latency from mms to you?
[04:02:23] <jackythanh> @joannac: because our DB admin said he had some alerts about connecting from mms to our mongo server
[04:03:20] <joannac> jackythanh: what alerts?
[04:03:40] <joannac> PM me if you want to talk privately
[05:02:21] <hahuang61> is there any way to see progress of a shardCollection command?
[05:04:58] <joannac> in terms of what? splits and migrates? you can check the log
[05:33:53] <hahuang61> sorry I meant balancer, but of course I can just check sh.status()
[05:33:54] <hahuang61> :p
[05:33:58] <hahuang61> derp moment.
[06:39:11] <hahuang61> balancer seems to be not moving... very slowly at least. lots of disk io on the machine that has all the chunks and barely any network usage...
[06:39:22] <hahuang61> a mongorestore just finished, is it possible that it's still flushing something to disk?
[06:48:44] <joannac> check the logs
[06:49:37] <hahuang61> joannac: mongod or mongos?
[06:50:28] <Boomtime> i'd start with the one with lots of disk io
[06:51:03] <hahuang61> Boomtime: I see: luster cw-rs2-1.internal.bv:27018,cw-rs4-1.internal.bv:27018,cw-rs6-1.internal.bv:27018 pinged successfully at Thu Feb 5 22:48:25 2015 by distributed lock pinger 'cw-rs2-1.internal.bv:27018,cw-rs4-1.internal.bv:27018,cw-rs6-1.internal.bv:27018/cw-rs1-1.internal.bv:27019:1422559530:1804289383', sleeping for 30000ms
[06:51:14] <hahuang61> why's it sleep for 30 seconds in between?
[06:51:19] <hahuang61> can I turn that down?
[06:51:23] <joannac> no
[06:51:31] <hahuang61> why is that
[06:52:02] <Boomtime> because you don't need to, it is working normally
[06:52:29] <hahuang61> Boomtime: I just did a giant import, with 3.5k chunks onto a database that's not in production
[06:52:33] <hahuang61> I'd like it to balance faster
[06:52:39] <joannac> just don't worry about it, it pings every 30 seconds, and the ping is successful
[06:52:51] <hahuang61> oh that's not related to balancing?
[06:52:52] <joannac> hahuang61: pastebin a sh.status()
[06:53:41] <hahuang61> joannac: whole thing?
[06:53:49] <joannac> yes
[06:54:19] <hahuang61> joannac: https://gist.github.com/hahuang65/3f806bb7469d05d07082
[06:54:42] <hahuang61> check out badgeville.players
[06:54:49] <joannac> okay.
[06:55:03] <joannac> what's the shard key?
[06:55:10] <hahuang61> joannac: hash of player's email.
[06:55:17] <joannac> and why is it so off-balance? did you shard after importing?
[06:55:18] <hahuang61> joannac: the field is shard_id
[06:55:26] <hahuang61> joannac: yeah -_- like a dop
[06:55:28] <hahuang61> dope*
[06:55:29] <joannac> right
[06:55:34] <joannac> well that would be why
[06:55:54] <hahuang61> so there's literally no way to speed it up
[06:55:55] <Boomtime> heh, that's probably going to be awhile
[06:56:26] <joannac> well, what do the mongos logs say?
[06:56:40] <Boomtime> does it matter?
[06:57:26] <joannac> in all honesty, drop and shard first
[06:57:35] <joannac> i think it'd be faster
[06:57:39] <Boomtime> ^ +1
[06:58:17] <hahuang61> joannac: took 1 day to import, still faster?
[06:58:23] <joannac> depends
[06:58:29] <joannac> how long since the import finished?
[06:59:02] <hahuang61> 40 minutes
[06:59:09] <hahuang61> would it be faster to manually move chunks?
[06:59:14] <joannac> okay, so you moved 17 chunks in 40 minutes
[06:59:22] <joannac> and you need to move 3000 chunks to get balanced
[06:59:25] <joannac> so I say yes
[07:00:15] <Boomtime> manually moving chunks is little different from letting the balancer do it - it's the same mechanism
[07:00:18] <hahuang61> so this 30s sleep doesn't affect the sharding?
[07:00:36] <Boomtime> no, it's just checking to see if there is something to do, there isn't
[07:00:36] <joannac> no, the 30s sleep is for something completely unrelated to balancing
[07:00:38] <hahuang61> that's just some separate process to see if a machine is... available?
[07:01:00] <Boomtime> a chunk move is organised by a mongos, but the actual move occurs directly between the two mongods
[07:01:23] <hahuang61> why is it so slow though?
[07:01:28] <hahuang61> seems UNUSALLY slow
[07:01:37] <hahuang61> this machine is literally doing nothing
[07:02:06] <joannac> which one?
[07:02:08] <Boomtime> yeah, it has to constantly verify, robustness costs speed
[07:02:15] <hahuang61> the machine with all the chunks
[07:02:25] <hahuang61> but 1 chunk per like 4 minutes seems way too long?
[07:03:48] <Boomtime> yeah, that isn't actually very good.. but it's not horrible either - a good shard key and pre-splitting will largely avoid the need to balance at all though
[07:04:22] <hahuang61> we're looking at network stats on the machine and it's kilobytes.
[07:04:26] <Boomtime> how big are the documents on average?
[07:04:28] <hahuang61> that's almost 0
[07:04:46] <hahuang61> avgObjSize" : 2333,
[07:05:12] <hahuang61> small
[07:05:19] <joannac> chunk size?
[07:05:41] <hahuang61> how do I find that out?
[07:05:49] <hahuang61> I never changed it, so it's default, though tit's 64mb?
[07:05:53] <joannac> yup
[07:06:32] <hahuang61> what's the process? these machines have gigabit transfer speed between them, so unless its ALL disk IO, there's like no activity happening
[07:07:38] <joannac> what is the disk io on the one with all the chunks?
[07:08:00] <hahuang61> joannac: 3-5m on averag
[07:08:27] <hahuang61> jumpts to 12/13m for maybe 4 seconds
[07:08:30] <joannac> i think you mean megabit. or megabyte
[07:08:30] <hahuang61> then nothing for another 30
[07:08:52] <hahuang61> joannac: https://gist.github.com/hahuang65/f24cd950e65212de3dce
[07:14:44] <hahuang61> joannac: here's the story so far
[07:15:05] <hahuang61> joannac: we had everything sharded and ready to go, then we noticed that upsert is not an option for mongorestore.
[07:15:35] <hahuang61> joannac: we already set our maintenance window with our cucstomers for tonight, so we had to do a last minute player collection dump (6.5 hours) and restore (26 hours) to the new database
[07:15:46] <hahuang61> joannac: which means I dropped the players collection, and lost the sharding settings I supposed
[07:16:09] <hahuang61> joannac: and now that we restored the players we have to wait for this thing to balance, cuz I didn't shard again after I dropped the collection.
[07:16:12] <hahuang61> joannac: so vexing
[07:19:57] <hahuang61> i suppose nothing I can really do except keep doing deltas for the other collections then try this again in a few days (we have to give 1 week of notice for maint windows).
[07:20:03] <joannac> hahuang61: right. not sure what to tell you. you're doing it the slow way
[07:20:19] <hahuang61> joannac: doesn't matter, it's 1 week either way it seems
[07:20:29] <joannac> I guess.
[07:20:42] <joannac> except you're trading 1 days of io to all disks for 1 week of io
[07:20:47] <hahuang61> joannac: man I wish it supported upserts. It says "trivial" on that ticket! :( why didn't any one implement itttttt
[07:20:52] <joannac> which is background load on your machines
[07:21:04] <joannac> hahuang61: well, you're free to implement it and submit a pull request :)
[07:21:07] <hahuang61> joannac: yeah I will drop it and reimport it
[07:21:26] <hahuang61> haha it's not trivial for me, simply because it'll take a bit to go through the code
[07:21:31] <hahuang61> maybe I will some day
[07:23:42] <hahuang61> joannac: looks like minute and a half between each one
[07:29:53] <hahuang61> joannac: seems like we can manually move a chunk and it just takes 8 seconds...
[07:30:13] <hahuang61> is that a bad idea?
[07:30:52] <joannac> can you reproduce that?
[07:31:09] <joannac> as in, if you moveChunk again, does it also complete in 8 seconds?
[07:31:32] <hahuang61> joannac: yup
[07:31:37] <hahuang61> 7k ms
[07:31:39] <hahuang61> reported
[07:31:52] <Boomtime> do a few
[07:36:14] <hahuang61> okay, lemme try
[07:46:33] <hahuang61> bah, yeah it's about a minutes
[07:47:21] <Boomtime> goodo, that is what i'd expect
[07:48:07] <Boomtime> the shell command can detemrine pretty quick if the commit step is going to work, but it doesn't wait for the entire operation to finish
[07:49:06] <Boomtime> meanwhile, the source shard will not permit a new migration to or from to begin until it has fully completed the previous one - this includes cleaning (removing) the dual copies of documents which have migrated away
[07:49:36] <Boomtime> you are bound by the fact that all migrations are occurring at a single mongod - something that normally does not happen
[07:59:24] <hahuang61> Boomtime: yeah
[07:59:25] <hahuang61> makes sense
[08:40:52] <hahuang61> joannac: thanks for the info and help today
[08:40:59] <hahuang61> joannac: I dropped the and am reimporting
[09:00:01] <tim___> Hey all. Using Morphia and Java. If I have several documents with Array<User> in each, how do I pull out all documents that contain arrays that contain a specific user?
[09:00:50] <tim___> i am hoping I will not have to iterate through them all in java but have the mongo layer find them form me
[09:39:38] <morenoh149> db.documents.find({ users: { $in: [_specific_user_id] }})
[09:39:58] <morenoh149> dammit he left :<
[09:40:27] <kali> db.documents.find({ users: _specific_user_id }) was enough anyway
[09:42:47] <morenoh149> kali: hey maybe he had many user's ;)
[10:09:25] <dsirijus> hey, i'm unsure how would i model this (mongoose, but whatever)...
[10:09:57] <dsirijus> user has authentications field, and that field contains objects specific to particular authentication method
[10:10:25] <dsirijus> however, i only know details of one authentication method for now, so it's kind of irksome to make it a schema
[10:33:08] <amcsi_work> hi
[10:34:34] <amcsi_work> I have PHP end-to-end tests written, each starting with mongo wiping the test db and inserting a bunch of static test data. This setUp does take half a second each which adds up quickly. Is there a way to speed this process up for testing?
[12:18:02] <Zelest> http://rebecca.blackfriday/
[12:31:12] <agend> hi - is there any chance that document size limit (16MB) would be changed or made configurable in future?
[12:33:25] <joannac> agend: unlikely
[12:33:29] <StephenLynx> if I use findOne to retrieve an object with a subarray, is it faster than an aggregate that retrieves a number of documents that result more or less in the same size of the sub array in the first operation?
[12:34:05] <joannac> StephenLynx: an aggregate that retrieves full documents?
[12:34:25] <StephenLynx> no, I would use projection to retrieve on relevant data.
[12:34:53] <StephenLynx> I would use the same projection on both operations.
[12:35:38] <joannac> I am really confused by this usecase
[12:36:13] <StephenLynx> in the first case my data is {identifier:a, subArray[{datax:1},{datax:1}]} and in the second {identifiera:a, subidentifier:b, datax:1}
[12:36:42] <StephenLynx> correction: {identifier:a, subArray:[{datax:1},{datax:1}]}
[12:37:25] <StephenLynx> the second case I would return multiple documents instead of just the subarray of a single document.
[12:37:36] <StephenLynx> in the second case I use indexes.
[12:37:40] <joannac> i don't see what this has to do with aggregation
[12:37:53] <StephenLynx> becase thats what I would use.
[12:37:57] <StephenLynx> to query.
[12:37:58] <joannac> you're returning 1 document vs many documents
[12:38:03] <StephenLynx> yes.
[12:38:05] <joannac> why do you need aggregation?
[12:38:28] <StephenLynx> because I might perform aditional operations that are easier to do with aggregation.
[12:38:33] <joannac> db.foo.find({identifier:a})
[12:38:47] <joannac> okay, and you wouldn't in the first case?
[12:38:55] <joannac> then you're not comparing like with like
[12:39:10] <StephenLynx> no, in the first case I can use a findOne because I have just one document to be queried.
[12:39:27] <StephenLynx> performance wise, is there much of a difference?
[12:39:31] <StephenLynx> on these two cases?
[12:39:55] <joannac> yes, because in the second one you're doing "additional operations that are easier to do with aggregation"
[12:40:00] <joannac> and you didn't say what those are
[12:40:16] <StephenLynx> ok, lets say I juse use a find on the second case and there are no additional operations.
[12:40:21] <StephenLynx> I just retrieve and project.
[12:40:40] <joannac> then yes, the first one will be better, because you're retrieving less data
[12:41:06] <StephenLynx> no, it would amount to the same
[12:41:07] <joannac> does the first one suit your usecase?
[12:41:29] <StephenLynx> because the subarray in the first case amount to the sum of the documents in the second case.
[12:41:59] <StephenLynx> so volume is roughly equals.
[12:42:22] <StephenLynx> the difference is in between a subarray in one document vs a group of documents.
[12:43:48] <StephenLynx> I'm asking because recently in a project I had posts in threads stored as a sub array
[12:44:03] <StephenLynx> but because of the 16mb limit I decided to split posts in a separate collection.
[12:44:14] <StephenLynx> so I wouldn't had to deal with said limit.
[12:44:27] <StephenLynx> have*
[12:44:42] <joannac> if that's a concern, then split them out
[12:46:27] <joannac> you're asking the wrong questions btw
[12:46:29] <StephenLynx> I did, what I'm asking is that when retrieving, how much of an impact it causes
[12:46:41] <StephenLynx> in performance. keeping in mind I use an index.
[12:46:45] <joannac> are you ever going to want only some of the documents in the array?
[12:46:50] <StephenLynx> yes.
[12:47:00] <joannac> then don't use an array
[12:47:04] <joannac> split them out
[12:47:37] <StephenLynx> yeah, I was thinking about that. I had to manually slice the posts because mongo can't slice a sub array.
[12:47:41] <joannac> an array is going to be smaller and less to retrieve, but none of that is going to matter if you have to parse your array and pull out only the bits you want
[12:48:30] <StephenLynx> when I retrieve posts, I take a parameters that dictates to return only posts with an id greater than the informed one.
[12:48:53] <StephenLynx> a parameter*
[12:50:02] <StephenLynx> so yeah, I guess I will keep posts on a separate array.
[12:50:15] <joannac> ...what?
[12:50:22] <StephenLynx> collection*
[12:50:23] <StephenLynx> :v
[14:36:15] <agend> hi - does 16MB size limit concerns result of group operation - meaning what reducer returns?
[14:44:05] <cheeser> yes
[14:45:43] <agend> @cheeser - yes - about - 16MB limit?
[14:46:06] <cheeser> yes
[14:46:40] <agend> thx
[14:50:48] <agend> but what about: "Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable aggregation pipeline stages to write data to temporary files."
[14:51:15] <cheeser> what about it?
[14:51:34] <agend> doesnt it mean that group result might be 100MB
[14:51:53] <cheeser> sure. that could be a million documents though
[14:53:25] <agend> or one
[14:54:42] <cheeser> documents are limited to 16M even in memory afaik
[14:56:28] <agend> hmm
[15:08:24] <LindaKendall> hi, has anyone ever had this error 11000 E11000 duplicate key error index: local.slaves.$_id_ in relation to slaves?
[15:08:49] <LindaKendall> it's locking my entire database every now and again, it's beginnging to drive me mad!
[15:50:15] <LindaKendall> anyone?
[16:04:03] <LindaKendall> turns out I had a dupe slave somehow, thanks for the help ;)
[20:06:40] <dsirijus> if you have an array of {id:String, unlocked:Boolean}, and max size is ~20, is it better to alocate all 20 items, then just switch up the "unlocked" property, or just have an array of {id:String} and add/delete elements of it?
[20:07:16] <joannac> will you always get to ~20?
[20:08:07] <dsirijus> no, you will mostly fluctuate ~5 (good question, btw)
[20:08:33] <dsirijus> but it really will be just an id and a Boolean and maybe a Number there
[20:08:39] <joannac> are your ids fixed size?
[20:09:11] <dsirijus> yes, unless i wanna do "migrations"
[20:09:31] <joannac> i don't know what that means
[20:09:45] <dsirijus> ignore me
[20:10:09] <dsirijus> yes, ids are always of the form '?????'
[20:10:26] <joannac> 5 char string?
[20:10:31] <dsirijus> yeah
[20:10:49] <joannac> is there a Number in each array element?
[20:12:07] <dsirijus> (there's several of such array types, but each in itself is of fixed types)
[20:12:22] <dsirijus> but let's say this one is just a 5 char string and a Bollean
[20:12:39] <dsirijus> and other one is 5 char string, boolean and a number (but that's another array)
[20:16:41] <dsirijus> order of magnitude of all such arrays is 1, so 10-100 elements
[23:39:12] <jeebster> I know it's generally standard practice to index foreign keys in a relational database, but does this hold true in mongo?
[23:41:14] <jeebster> or is there not much reason since there's no concept of 'join' in mongo?
[23:43:35] <Boomtime> jeebster: MongoDB doesn't recognise the concept "foreign key" (as you say, it arises from the idea of joins), but you should index whatever patterns you commonly do queries on
[23:48:23] <jeebster> Boomtime: gotcha. I generally add an _id attribute for what I'd consider a 'relation' and query against it so perhaps that's a candidate for indexing
[23:55:05] <esko> hi guys! could someone help with a quickie.. im just starting out with mongo. http://pastebin.com/tzH88wfB i need to insert some placeholder value inside all tags: which are empty