PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 13th of December, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:52:15] <kataracha> kali: you werent lying about it taking a long time to build
[03:35:27] <kataracha> does anyone know why I might be gettings this: scons: *** [build/install/bin/bsondump] Source `src/mongo-tools/bsondump' not found, needed by target `build/install/bin/bsondump'.
[03:35:49] <kataracha> i have built mongo and ran: scons install
[03:35:52] <kataracha> to get that error
[03:50:02] <cheeser> kataracha: you should ask the mongodb-dev list
[04:16:52] <blargypants> For some reason I thought it was possible to initiate a whole replica set as a shard from my mongos instance. Am I right that it's possible or am I imagining things
[06:33:02] <pyarun> I am new to mongodb and trying some queries. I have a collection named person
[06:34:40] <pyarun> I am new to mongodb and trying some queries. I have a collection named person, i want to get a dictionary having as key the last_name of the person and as value rest of the information about the user. here is the sample input output: http://pastie.org/9777746#25
[08:43:34] <kali> pyarun: this is not the mongodb way. you may achieve to format the answer this way, but at a very high price in terms of code complexity and performance (with map/reduce)
[08:43:58] <kali> pyarun: presenting results is the purpose of the application server, not the database
[12:00:07] <YaManicKill> Hey guys, I have a data structure for something which is similar to: {user: "sffwer", url: "http://mg.reddit.com"}, I want to get a list of these objects that match a certain regex for the url, but at most 1 per user.
[12:00:17] <YaManicKill> I got this far: db.reference.find( { url: {$regex: "mg.reddit.com"} } ).count()
[12:00:29] <YaManicKill> But obviously that doesn't do at most 1 per user. Any ideas?
[12:01:13] <YaManicKill> I thought I would just be able to stick on a ".group({key: user})", but that doesn't work...it seems to be a completely different function with a completely different way of doign things.
[13:41:55] <alexgg> So, is there a good, robust way to have a stable "insertion order" sort ?
[13:44:19] <jiffe> so when I add a shard, how can I tell if things are moving?
[13:53:03] <kali> alexgg: if you use auto generated ObjectId for _id, they will be "roughly" sorted by creation date, and it will be stable
[13:53:37] <kali> alexgg: it's rough as two document created in the same second by different process and/or server will not necessarily be in the right order
[13:53:41] <alexgg> yes, roughly :) I'm implementing an event store and external system will ask to be synchronized from event X
[13:53:57] <alexgg> so I'm looking for a sure way
[13:54:57] <kali> can you assume all the client are in sync ?
[13:55:06] <kali> as in ntp-synced
[13:56:13] <alexgg> They *should* be
[13:56:24] <alexgg> but if not, can't the DB handle the Id generation?
[13:57:33] <kali> alexgg: there is a cost. you need to "pull" an id from mongodb to put it in your document before inserting them: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
[13:57:57] <alexgg> yeah I saw that, that would increase latency
[13:58:03] <alexgg> and force me to use Longs?
[13:58:57] <alexgg> is that read query very performant too?
[13:59:10] <arnser> whats the best way to see how much memory / cpu mongo is using during a stress test? I am sending significant load to my server at the moment but cant see mongo in top anywhere seems to be using almost no memory or cpu
[13:59:27] <kali> alexgg: which read query ? it's not a read query, it's a findAndModify
[13:59:39] <kali> alexgg: so you'll have to perform twice as many write ops
[13:59:45] <alexgg> kali: ah, I had the second scenario in mind
[14:00:40] <alexgg> There's a problem with the findAndModify approach I think
[14:00:43] <kali> alexgg: ha, you mean optimistic insertion ?
[14:01:09] <kali> alexgg: yeah, that one will be quite fast (just a lookup on the last item of an index, O(1))
[14:01:26] <alexgg> Writer A findAndModifies counter 3000, Writer B findAndModify counter 3001, but Writer B then writes the event faster than Writer A
[14:02:22] <alexgg> If "read last event id" operation is 0(1) then that would be quite excellent.
[14:02:34] <kali> it is.
[14:02:42] <alexgg> Except the use of ever incrementing ids perhaps
[14:03:49] <kali> but i think you can get unlucky and find race conditions leading to local permutation in event order too
[14:04:20] <alexgg> even though the id would have an unique index?
[14:04:22] <kali> short of implementing a global lock, I think there are always RC
[14:05:45] <kali> alexgg: yes. A comes first, performs the max lookup, find 3000, tries 3000 but fails. B comes next, lookup says 3001, B insertion works. A tries again and get 3002
[14:06:01] <alexgg> ah, that would be fine
[14:06:05] <alexgg> I don't mind gaps too much
[14:06:28] <kali> this is not a gap, B arrived after A but is inserted before
[14:06:52] <alexgg> but effectively, B is inserted before A and that will never change
[14:07:16] <alexgg> And B will have a lower counter than A
[14:08:57] <alexgg> A and B are http requests, if they are so close that one generate an event that is inserted slighthy before the other, it's ok
[14:09:08] <alexgg> for my problem anyway
[14:11:26] <alexgg> is $natural stable if documents are immutable?
[14:13:32] <kali> mmmm... immutable are never deleted
[14:13:40] <kali> immutable AND never deleted
[14:13:50] <kali> i would not rely on that
[14:13:58] <alexgg> Yes, that sounds a bit dangerous :)
[14:14:14] <alexgg> compaction might influence it too
[14:16:43] <kali> compaction is not in place, it's done sequentially, so it should not, but...
[14:21:05] <alexgg> yep, many assumptions
[14:21:15] <alexgg> My problem is exactly that one: http://stackoverflow.com/questions/20960815/range-query-for-mongodb-pagination
[14:21:38] <alexgg> except I can't aford (I think, at this time) to lose event reads once in a while
[14:21:55] <alexgg> And it's hard to reason about how often objectId will give me that issue
[14:22:37] <alexgg> (the comments on the accepted answers are interesting)
[14:25:34] <kali> well, there are two questions: 1/ is the order stable 2/ is the order actually the insertion order
[14:26:17] <alexgg> But I'm trying to do roughly the same thing as him
[14:26:18] <kali> _id being unique and immutable, whatever you'll be using will be a stable order
[14:26:59] <alexgg> Well, yes it's stable for a constant number of documents :)
[14:27:06] <kali> and if you're dealing with tweets, discrepency in the database order compared to actual insertion order does not matter too much
[14:27:13] <alexgg> but I wish to have it stable even as I add more documents
[14:27:32] <alexgg> I deal with business events used to synchronize a few systems together
[14:27:49] <alexgg> perhaps it's the wrong way to synchronize systems, I don't know :)
[14:27:55] <kali> mmmm yeah :)
[14:27:58] <alexgg> that's what Apache kafka does
[14:28:12] <kali> have a look at capped collection in that case
[14:28:18] <kali> and use the $natural order
[14:28:38] <alexgg> yes that's really great, but it won't account for very old events
[14:28:50] <alexgg> in case a system messed up and want to more or less start from scratch
[14:29:36] <kali> have a routine moving document from the capped collection to a persistent collection ?
[14:29:48] <kali> one single process
[14:31:23] <alexgg> yes could do that
[14:33:06] <alexgg> Maybe I have my solution :)
[14:33:40] <alexgg> I already needed a capped collection because the interface will watch changes in real time
[14:35:46] <alexgg> The only issue I can think of is back pressure
[14:36:22] <alexgg> http requests will happily return once the event is in the capped collection, but potentially the cold collection won't be able to follow the speed
[14:36:34] <alexgg> I don't think I will reach that level of activity though.
[15:03:22] <alexi5> hello
[15:23:00] <jiffe> so I've added a replica set as a shard to a cluster and watching logs I see the balancer locking and unlocking something but I don't see anything moving
[15:37:14] <jiffe> and when I run sh.isBalancerRunning() it varies between true and false so it seems like its turning on and shutting off
[15:53:49] <kali> jiffe: yeah, the balancer sleeps and wake up once in a while, so this is expected
[15:54:15] <kali> jiffe: have you sharded a database and a collection ? how big is the collection ?
[15:59:24] <jiffe> kali: the db was sharded but not the collection, I called shardCollection() but its taking a while for that call to complete, I don't see anything happening in the logs yet, the collection is 22869151710404 in size so not sure how long it should take
[16:00:29] <kali> is that... 22TB ?
[16:00:56] <jiffe> yeah
[16:01:18] <kali> ok. I have absolutely no idea how long it could take for such a beast
[16:01:40] <jiffe> what exactly does shardCollection() do before it returns to the shell?
[16:02:01] <kali> i'm not sure. I wonder if it defines the chunks at that point
[16:02:45] <kali> FYI i've seen this take minutes for ~20GB collections
[16:03:41] <jiffe> ok
[16:25:50] <DesertRock> So in figuring out how to do something similar to a SQL join, I've been seeing lots of things suggesting I do a forEach, and then insert each modified record into a new collection. Is this really the suggested method of operations?
[17:48:04] <alexi5> hi guys
[17:49:44] <alexi5> I have a document pasted at http://pastie.org/9778447. is it possible to do a modify query on the document by incrementing votes and adding the string "121345" to the Numbers array of the sub document that has keyword equal to christmas ?
[17:50:37] <alexi5> can you give me an example of a modification document that can accomplish this
[17:54:32] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords": { $elemMatch: { keyword:"christmas" }}},{ $push : {"Keywords.$.Numbers": "121345" }})
[17:54:55] <kali> ha ! and incrementing votes too
[17:55:25] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords": { $elemMatch: { keyword:"christmas" }}},{ $push : {"Keywords.$.Numbers": "121345" }, $inc: {"Keywords.$.Votes": 1}})
[17:56:19] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords.keyword":"christmas" },{ $push : {"Keywords.$.Numbers": "121345" }, $inc: {"Keywords.$.Votes": 1}}) should be enough actually
[17:57:41] <alexi5> ow thanks kali
[20:00:24] <mnms_> Hi
[20:01:27] <mnms_> Does single shard can be replicated ?
[20:01:47] <Derick> In the general setup, a shard is a replicaset.
[20:03:17] <mnms_> Derick: What does it means, cause each shard should be on individual machine and this machine should have fe. his own slave with replicated data ?
[20:03:33] <mnms_> Sorry Im new..
[20:04:23] <mnms_> I think I understand so shared is basically replica set like you said
[20:04:46] <Derick> a replicaset is a collection (of mostly 3) data carrying servers, each on a different physical server
[20:04:53] <Derick> they replicate all data between them
[20:05:31] <Derick> a shard is a logical separation of a set of data, and in each "shard" the data is replicated in a replicaset environment
[20:05:53] <Derick> so in a 2-shard, 3-replicas cluser, you have 6 data carrying nodes - and a mongos, and three config servers
[20:06:25] <mnms_> Ok...
[20:06:34] <mnms_> Why three config server ?
[20:09:04] <prettymuchbryce> What is the difference between apt-get mongodb, and apt-get mongodb-org ?
[20:09:18] <mnms_> cannot be two ? Cause then I need 3 machines, yes ?
[20:11:57] <mnms_> Derick: what if one of config servers fails ?
[20:12:39] <Derick> prettymuchbryce: mongodb is the distribution one - often outdated. mongodb-org is our own packages
[20:12:51] <Derick> mnms_: then you have a degraded cluster and you need to fix it
[20:13:09] <prettymuchbryce> Thanks Derick.
[20:13:48] <mnms_> Derick: but the data will be still stored ?
[20:14:04] <Derick> mnms_: sure
[20:14:10] <Derick> mnms_: that's why you have three copies
[20:14:25] <Derick> mnms_: it does means no data can be moved between shards (as shards autobalance normally)
[20:14:41] <mnms_> Ok I see test architecture it is with one config server
[20:14:42] <Derick> in general, you don't really need sharding though...
[20:15:02] <Derick> just a 3 node replicaset gets you very far - and you don't need mongos or config servers
[20:17:06] <mnms_> But to be clear if two from three config servers are down data will be not moved between shards ?
[20:24:52] <kali> with one config server down, the configuration database goes to read only: shards stayed where they are
[20:25:12] <mnms_> Derick: Ok, Im little confused why data should be moved between shards ? Data is stored in particular shards based on key which you choose ?
[20:25:40] <mnms_> kali: If from three servers one will stay up, what happened then ?
[20:25:56] <mnms_> coonfig-servers
[20:25:58] <mnms_> *
[20:26:02] <Derick> mnms_: yes, but if you start adding data, you want each shard to have about the same amount of data. And balancing moves data between them
[20:26:31] <Derick> mnms_: same thing with 1 of 3 up - but that's something you don't want to be in
[20:26:51] <Derick> if you only have *configured* 1 config server, then there is no "read only" situations
[20:26:53] <mnms_> Derick: but it can be until i fix it
[20:26:56] <kali> mnms_: with two config servers down, it starts to stink. config database is still locked, of course, but you can no longer restart a mongos. as long as they stay up, the cluster "works"
[20:26:57] <Derick> mnms_: yes
[20:27:50] <mnms_> Derick: So balancing of data on shards will stop working when any of config-servers goes down ? what is the rule ?
[20:27:56] <kali> yep
[20:28:32] <Derick> rule?
[20:31:14] <mnms_> Derick: What is responsible for balancing data on shards ? Config Server
[20:31:15] <mnms_> ?
[20:31:46] <Derick> mongos decides
[20:31:47] <kali> one of the mongod is the "balancer" one
[20:31:52] <kali> mongos, not mongod
[20:32:07] <Derick> kali: iirc, any one of them can decide - not?
[20:32:25] <kali> Derick: i think it's "sticky"
[20:32:25] <Derick> ie, there is no dedicated mongos for it
[20:32:33] <Derick> yeah, that might be the case
[20:32:43] <kali> Derick: i think it's always the same, until it crashes
[20:33:18] <kali> but i no longer have a big cluster to check that out :)
[20:33:35] <Derick> me neither
[20:35:04] <mnms_> Derick: So why failure of one of config servers can have impcat on moving data between shards ?
[20:35:33] <kali> because the config servers have to stay in sync
[20:35:49] <kali> if one of them is down, the balancing mongos can no longer update it
[20:36:36] <kali> in real life, it's usually fine: you don't want any server to stay down for very long, and you don't want balancing to happen too often either
[20:37:49] <mnms_> I dont understand the concept, cause I have three config servers to provide HA, and when one is down I loss some funcionality
[20:38:58] <kali> your cluster will still work fine from the application point of view
[20:40:22] <kali> balancing is not a functionality, it's a sad necessity
[20:42:18] <mnms_> Im reading right now about sharding cause I dont understand some things...
[20:45:05] <mnms_> I understand that the config server doesnt need strong machines ?
[20:45:21] <kali> nope
[20:45:34] <kali> i mean: that's right :)
[20:47:22] <mnms_> but in docs is: The balancer runs in all of the query routers in a cluster.
[20:49:05] <mnms_> so why failure of one of config router have impact on balancing data between shards
[20:49:08] <mnms_> ?
[20:52:13] <kali> because to balance data, you need to write in the config database.
[20:52:40] <kali> and if one config server is down, you can no longer write on two remaining one or you'll break the sync
[20:53:19] <kali> where have you read "The balancer runs in all of the query routers in a cluster." ?
[20:53:31] <kali> because it sound ambiguous or even plainly wrong
[20:53:49] <Derick> kali: i think what that says is that every mongos (query router) a balancer can run
[20:54:42] <kali> Derick: well, if that's how it says it, the sentence should be fixed
[21:01:21] <mnms_> kali: http://docs.mongodb.org/manual/core/sharding-introduction/ Balancing chapter
[21:03:14] <Derick> http://docs.mongodb.org/manual/core/sharding-introduction/#balancing to be precise
[21:03:47] <mnms_> Derick: Yes, sorry
[21:07:55] <mnms_> Another case: If one of config router is down then the mongos cannot be restarted ?
[21:08:22] <kali> with one, it's fine
[21:08:50] <kali> one -> you loose balancing, two -> you loose ability to start a mongos
[21:08:58] <mnms_> So if one config routers is down there is only problem with writing to config server, yes ?
[21:09:18] <kali> yes
[21:09:24] <mnms_> kali: one or two of three yes ?
[21:09:30] <mnms_> in your scenario
[21:09:46] <kali> yes, of three. 1 config server is for testing and mad people only
[21:10:22] <mnms_> It two are down Im llosing ability of starting mongos ?
[21:10:39] <kali> yes
[21:10:41] <mnms_> so also restarting existing mongo yes ?
[21:10:46] <mnms_> mongos*
[21:10:47] <kali> yes
[21:11:25] <mnms_> but if two of three are up then I can restart and start mongos process ?
[21:11:36] <kali> yes
[21:11:59] <mnms_> I dont understand why is like that cause mongos can still read from one config server
[21:13:26] <mnms_> Starting and restarting doesnt need to write anything to config routers
[21:13:32] <mnms_> Unclear to me
[21:13:32] <kali> it might be a provision againt some weird split brain scenarios
[21:14:49] <mnms_> kali is it somewhere in the docs that after failiure two of three I cannot restart mongos
[21:15:12] <kali> it might be somewhere, but i know this from painful experience :)
[21:15:41] <mnms_> cause in docs is something else
[21:15:49] <mnms_> If one or two config servers become unavailable, the cluster’s metadata becomes read only. You can still read and
[21:16:05] <mnms_> write data from the shards, but no chunk migrations or splits will occur until all three servers are available
[21:16:36] <mnms_> So docs tells something else
[21:17:50] <kali> well, it says nothing about starting and stopping mongos explicitely
[21:18:06] <mnms_> It says somewhere else
[21:18:57] <mnms_> MongoDB reads data from the config server data in the following cases: 1) A new mongos starts for the first time, or an existing mongos restarts 2) After a chunk migration, the mongos instances update themselves with the new cluster metadata
[21:25:14] <mnms_> Ok, guys thanks for help
[21:25:37] <mnms_> Im going to eat something, I will start tomorrow my research :)
[22:19:16] <engirth> hey, in this example http://pastie.org/9778939 I am expecting to update all array elements that match a condition.
[22:19:58] <engirth> any guidance is helpful
[22:22:34] <blizzow> I have a collection that was created but is not showing up when I run show collections on my mongos. I tried to drop the collection and saw a complaint \"ns not found\" for the servers in two of my three replica sets. :( I try the drop again and it says the drop failed with the same ns not found message. I'm able to re-create the collection but it a) doesn't show up when I do show collections, and it complains that it's already sharded. How can I rem
[22:23:57] <Derick> It stopped at "sharded. How can I rem"
[22:24:03] <Derick> IRC has a 500 char limit per message
[22:29:03] <engirth> also, example in the pastie updates only the first match
[22:33:31] <blizzow> Derick: How can I remove this stubborn beast.
[22:44:02] <r01010010> hi
[22:44:06] <r01010010> anyone using mongoose?
[22:44:33] <r01010010> i don't know why but pre.('save' ... middle way enters many times and i don't know why
[23:13:01] <yzhao> how do you guys store files in mongodB? esp if files contain . dots which are forbidden in fields
[23:14:15] <Derick> yzhao: name: "filename.php"
[23:14:21] <Derick> don't use values as keys
[23:33:09] <yzhao> Is it normal to use GridFS as a temporary store for large files?
[23:33:11] <yzhao> or a permananet store*
[23:36:45] <yzhao> Ie i reference a gridFS file in a document
[23:55:25] <modulus^> KurtKraut: are you a Naught Zee ?