pmxbot IRC Log Viewer

[02:52:15] <kataracha> kali: you werent lying about it taking a long time to build

[03:35:27] <kataracha> does anyone know why I might be gettings this: scons: *** [build/install/bin/bsondump] Source `src/mongo-tools/bsondump' not found, needed by target `build/install/bin/bsondump'.

[03:35:49] <kataracha> i have built mongo and ran: scons install

[03:35:52] <kataracha> to get that error

[03:50:02] <cheeser> kataracha: you should ask the mongodb-dev list

[04:16:52] <blargypants> For some reason I thought it was possible to initiate a whole replica set as a shard from my mongos instance. Am I right that it's possible or am I imagining things

[06:33:02] <pyarun> I am new to mongodb and trying some queries. I have a collection named person

[06:34:40] <pyarun> I am new to mongodb and trying some queries. I have a collection named person, i want to get a dictionary having as key the last_name of the person and as value rest of the information about the user. here is the sample input output: http://pastie.org/9777746#25

[08:43:34] <kali> pyarun: this is not the mongodb way. you may achieve to format the answer this way, but at a very high price in terms of code complexity and performance (with map/reduce)

[08:43:58] <kali> pyarun: presenting results is the purpose of the application server, not the database

[12:00:07] <YaManicKill> Hey guys, I have a data structure for something which is similar to: {user: "sffwer", url: "http://mg.reddit.com"}, I want to get a list of these objects that match a certain regex for the url, but at most 1 per user.

[12:00:17] <YaManicKill> I got this far: db.reference.find( { url: {$regex: "mg.reddit.com"} } ).count()

[12:00:29] <YaManicKill> But obviously that doesn't do at most 1 per user. Any ideas?

[12:01:13] <YaManicKill> I thought I would just be able to stick on a ".group({key: user})", but that doesn't work...it seems to be a completely different function with a completely different way of doign things.

[13:41:55] <alexgg> So, is there a good, robust way to have a stable "insertion order" sort ?

[13:44:19] <jiffe> so when I add a shard, how can I tell if things are moving?

[13:53:03] <kali> alexgg: if you use auto generated ObjectId for _id, they will be "roughly" sorted by creation date, and it will be stable

[13:53:37] <kali> alexgg: it's rough as two document created in the same second by different process and/or server will not necessarily be in the right order

[13:53:41] <alexgg> yes, roughly :) I'm implementing an event store and external system will ask to be synchronized from event X

[13:53:57] <alexgg> so I'm looking for a sure way

[13:54:57] <kali> can you assume all the client are in sync ?

[13:55:06] <kali> as in ntp-synced

[13:56:13] <alexgg> They *should* be

[13:56:24] <alexgg> but if not, can't the DB handle the Id generation?

[13:57:33] <kali> alexgg: there is a cost. you need to "pull" an id from mongodb to put it in your document before inserting them: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/

[13:57:57] <alexgg> yeah I saw that, that would increase latency

[13:58:03] <alexgg> and force me to use Longs?

[13:58:57] <alexgg> is that read query very performant too?

[13:59:10] <arnser> whats the best way to see how much memory / cpu mongo is using during a stress test? I am sending significant load to my server at the moment but cant see mongo in top anywhere seems to be using almost no memory or cpu

[13:59:27] <kali> alexgg: which read query ? it's not a read query, it's a findAndModify

[13:59:39] <kali> alexgg: so you'll have to perform twice as many write ops

[13:59:45] <alexgg> kali: ah, I had the second scenario in mind

[14:00:40] <alexgg> There's a problem with the findAndModify approach I think

[14:00:43] <kali> alexgg: ha, you mean optimistic insertion ?

[14:01:09] <kali> alexgg: yeah, that one will be quite fast (just a lookup on the last item of an index, O(1))

[14:01:26] <alexgg> Writer A findAndModifies counter 3000, Writer B findAndModify counter 3001, but Writer B then writes the event faster than Writer A

[14:02:22] <alexgg> If "read last event id" operation is 0(1) then that would be quite excellent.

[14:02:34] <kali> it is.

[14:02:42] <alexgg> Except the use of ever incrementing ids perhaps

[14:03:49] <kali> but i think you can get unlucky and find race conditions leading to local permutation in event order too

[14:04:20] <alexgg> even though the id would have an unique index?

[14:04:22] <kali> short of implementing a global lock, I think there are always RC

[14:05:45] <kali> alexgg: yes. A comes first, performs the max lookup, find 3000, tries 3000 but fails. B comes next, lookup says 3001, B insertion works. A tries again and get 3002

[14:06:01] <alexgg> ah, that would be fine

[14:06:05] <alexgg> I don't mind gaps too much

[14:06:28] <kali> this is not a gap, B arrived after A but is inserted before

[14:06:52] <alexgg> but effectively, B is inserted before A and that will never change

[14:07:16] <alexgg> And B will have a lower counter than A

[14:08:57] <alexgg> A and B are http requests, if they are so close that one generate an event that is inserted slighthy before the other, it's ok

[14:09:08] <alexgg> for my problem anyway

[14:11:26] <alexgg> is $natural stable if documents are immutable?

[14:13:32] <kali> mmmm... immutable are never deleted

[14:13:40] <kali> immutable AND never deleted

[14:13:50] <kali> i would not rely on that

[14:13:58] <alexgg> Yes, that sounds a bit dangerous :)

[14:14:14] <alexgg> compaction might influence it too

[14:16:43] <kali> compaction is not in place, it's done sequentially, so it should not, but...

[14:21:05] <alexgg> yep, many assumptions

[14:21:15] <alexgg> My problem is exactly that one: http://stackoverflow.com/questions/20960815/range-query-for-mongodb-pagination

[14:21:38] <alexgg> except I can't aford (I think, at this time) to lose event reads once in a while

[14:21:55] <alexgg> And it's hard to reason about how often objectId will give me that issue

[14:22:37] <alexgg> (the comments on the accepted answers are interesting)

[14:25:34] <kali> well, there are two questions: 1/ is the order stable 2/ is the order actually the insertion order

[14:26:17] <alexgg> But I'm trying to do roughly the same thing as him

[14:26:18] <kali> _id being unique and immutable, whatever you'll be using will be a stable order

[14:26:59] <alexgg> Well, yes it's stable for a constant number of documents :)

[14:27:06] <kali> and if you're dealing with tweets, discrepency in the database order compared to actual insertion order does not matter too much

[14:27:13] <alexgg> but I wish to have it stable even as I add more documents

[14:27:32] <alexgg> I deal with business events used to synchronize a few systems together

[14:27:49] <alexgg> perhaps it's the wrong way to synchronize systems, I don't know :)

[14:27:55] <kali> mmmm yeah :)

[14:27:58] <alexgg> that's what Apache kafka does

[14:28:12] <kali> have a look at capped collection in that case

[14:28:18] <kali> and use the $natural order

[14:28:38] <alexgg> yes that's really great, but it won't account for very old events

[14:28:50] <alexgg> in case a system messed up and want to more or less start from scratch

[14:29:36] <kali> have a routine moving document from the capped collection to a persistent collection ?

[14:29:48] <kali> one single process

[14:31:23] <alexgg> yes could do that

[14:33:06] <alexgg> Maybe I have my solution :)

[14:33:40] <alexgg> I already needed a capped collection because the interface will watch changes in real time

[14:35:46] <alexgg> The only issue I can think of is back pressure

[14:36:22] <alexgg> http requests will happily return once the event is in the capped collection, but potentially the cold collection won't be able to follow the speed

[14:36:34] <alexgg> I don't think I will reach that level of activity though.

[15:03:22] <alexi5> hello

[15:23:00] <jiffe> so I've added a replica set as a shard to a cluster and watching logs I see the balancer locking and unlocking something but I don't see anything moving

[15:37:14] <jiffe> and when I run sh.isBalancerRunning() it varies between true and false so it seems like its turning on and shutting off

[15:53:49] <kali> jiffe: yeah, the balancer sleeps and wake up once in a while, so this is expected

[15:54:15] <kali> jiffe: have you sharded a database and a collection ? how big is the collection ?

[15:59:24] <jiffe> kali: the db was sharded but not the collection, I called shardCollection() but its taking a while for that call to complete, I don't see anything happening in the logs yet, the collection is 22869151710404 in size so not sure how long it should take

[16:00:29] <kali> is that... 22TB ?

[16:00:56] <jiffe> yeah

[16:01:18] <kali> ok. I have absolutely no idea how long it could take for such a beast

[16:01:40] <jiffe> what exactly does shardCollection() do before it returns to the shell?

[16:02:01] <kali> i'm not sure. I wonder if it defines the chunks at that point

[16:02:45] <kali> FYI i've seen this take minutes for ~20GB collections

[16:03:41] <jiffe> ok

[16:25:50] <DesertRock> So in figuring out how to do something similar to a SQL join, I've been seeing lots of things suggesting I do a forEach, and then insert each modified record into a new collection. Is this really the suggested method of operations?

[17:48:04] <alexi5> hi guys

[17:49:44] <alexi5> I have a document pasted at http://pastie.org/9778447. is it possible to do a modify query on the document by incrementing votes and adding the string "121345" to the Numbers array of the sub document that has keyword equal to christmas ?

[17:50:37] <alexi5> can you give me an example of a modification document that can accomplish this

[17:54:32] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords": { $elemMatch: { keyword:"christmas" }}},{ $push : {"Keywords.$.Numbers": "121345" }})

[17:54:55] <kali> ha ! and incrementing votes too

[17:55:25] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords": { $elemMatch: { keyword:"christmas" }}},{ $push : {"Keywords.$.Numbers": "121345" }, $inc: {"Keywords.$.Votes": 1}})

[17:56:19] <kali> alexi5: update({"_id" : ObjectId("548c6e092a2d0e11806e04e7"), "Keywords.keyword":"christmas" },{ $push : {"Keywords.$.Numbers": "121345" }, $inc: {"Keywords.$.Votes": 1}}) should be enough actually

[17:57:41] <alexi5> ow thanks kali

[20:00:24] <mnms_> Hi

[20:01:27] <mnms_> Does single shard can be replicated ?

[20:01:47] <Derick> In the general setup, a shard is a replicaset.

[20:03:17] <mnms_> Derick: What does it means, cause each shard should be on individual machine and this machine should have fe. his own slave with replicated data ?

[20:03:33] <mnms_> Sorry Im new..

[20:04:23] <mnms_> I think I understand so shared is basically replica set like you said

[20:04:46] <Derick> a replicaset is a collection (of mostly 3) data carrying servers, each on a different physical server

[20:04:53] <Derick> they replicate all data between them

[20:05:31] <Derick> a shard is a logical separation of a set of data, and in each "shard" the data is replicated in a replicaset environment

[20:05:53] <Derick> so in a 2-shard, 3-replicas cluser, you have 6 data carrying nodes - and a mongos, and three config servers

[20:06:25] <mnms_> Ok...

[20:06:34] <mnms_> Why three config server ?

[20:09:04] <prettymuchbryce> What is the difference between apt-get mongodb, and apt-get mongodb-org ?

[20:09:18] <mnms_> cannot be two ? Cause then I need 3 machines, yes ?

[20:11:57] <mnms_> Derick: what if one of config servers fails ?

[20:12:39] <Derick> prettymuchbryce: mongodb is the distribution one - often outdated. mongodb-org is our own packages

[20:12:51] <Derick> mnms_: then you have a degraded cluster and you need to fix it

[20:13:09] <prettymuchbryce> Thanks Derick.

[20:13:48] <mnms_> Derick: but the data will be still stored ?

[20:14:04] <Derick> mnms_: sure

[20:14:10] <Derick> mnms_: that's why you have three copies

[20:14:25] <Derick> mnms_: it does means no data can be moved between shards (as shards autobalance normally)

[20:14:41] <mnms_> Ok I see test architecture it is with one config server

[20:14:42] <Derick> in general, you don't really need sharding though...

[20:15:02] <Derick> just a 3 node replicaset gets you very far - and you don't need mongos or config servers

[20:17:06] <mnms_> But to be clear if two from three config servers are down data will be not moved between shards ?

[20:24:52] <kali> with one config server down, the configuration database goes to read only: shards stayed where they are

[20:25:12] <mnms_> Derick: Ok, Im little confused why data should be moved between shards ? Data is stored in particular shards based on key which you choose ?

[20:25:40] <mnms_> kali: If from three servers one will stay up, what happened then ?

[20:25:56] <mnms_> coonfig-servers

[20:25:58] <mnms_> *

[20:26:02] <Derick> mnms_: yes, but if you start adding data, you want each shard to have about the same amount of data. And balancing moves data between them

[20:26:31] <Derick> mnms_: same thing with 1 of 3 up - but that's something you don't want to be in

[20:26:51] <Derick> if you only have *configured* 1 config server, then there is no "read only" situations

[20:26:53] <mnms_> Derick: but it can be until i fix it

[20:26:56] <kali> mnms_: with two config servers down, it starts to stink. config database is still locked, of course, but you can no longer restart a mongos. as long as they stay up, the cluster "works"

[20:26:57] <Derick> mnms_: yes

[20:27:50] <mnms_> Derick: So balancing of data on shards will stop working when any of config-servers goes down ? what is the rule ?

[20:27:56] <kali> yep

[20:28:32] <Derick> rule?

[20:31:14] <mnms_> Derick: What is responsible for balancing data on shards ? Config Server

[20:31:15] <mnms_> ?

[20:31:46] <Derick> mongos decides

[20:31:47] <kali> one of the mongod is the "balancer" one

[20:31:52] <kali> mongos, not mongod

[20:32:07] <Derick> kali: iirc, any one of them can decide - not?

[20:32:25] <kali> Derick: i think it's "sticky"

[20:32:25] <Derick> ie, there is no dedicated mongos for it

[20:32:33] <Derick> yeah, that might be the case

[20:32:43] <kali> Derick: i think it's always the same, until it crashes

[20:33:18] <kali> but i no longer have a big cluster to check that out :)

[20:33:35] <Derick> me neither

[20:35:04] <mnms_> Derick: So why failure of one of config servers can have impcat on moving data between shards ?

[20:35:33] <kali> because the config servers have to stay in sync

[20:35:49] <kali> if one of them is down, the balancing mongos can no longer update it

[20:36:36] <kali> in real life, it's usually fine: you don't want any server to stay down for very long, and you don't want balancing to happen too often either

[20:37:49] <mnms_> I dont understand the concept, cause I have three config servers to provide HA, and when one is down I loss some funcionality

[20:38:58] <kali> your cluster will still work fine from the application point of view

[20:40:22] <kali> balancing is not a functionality, it's a sad necessity

[20:42:18] <mnms_> Im reading right now about sharding cause I dont understand some things...

[20:45:05] <mnms_> I understand that the config server doesnt need strong machines ?

[20:45:21] <kali> nope

[20:45:34] <kali> i mean: that's right :)

[20:47:22] <mnms_> but in docs is: The balancer runs in all of the query routers in a cluster.

[20:49:05] <mnms_> so why failure of one of config router have impact on balancing data between shards

[20:49:08] <mnms_> ?

[20:52:13] <kali> because to balance data, you need to write in the config database.

[20:52:40] <kali> and if one config server is down, you can no longer write on two remaining one or you'll break the sync

[20:53:19] <kali> where have you read "The balancer runs in all of the query routers in a cluster." ?

[20:53:31] <kali> because it sound ambiguous or even plainly wrong

[20:53:49] <Derick> kali: i think what that says is that every mongos (query router) a balancer can run

[20:54:42] <kali> Derick: well, if that's how it says it, the sentence should be fixed

[21:01:21] <mnms_> kali: http://docs.mongodb.org/manual/core/sharding-introduction/ Balancing chapter

[21:03:14] <Derick> http://docs.mongodb.org/manual/core/sharding-introduction/#balancing to be precise

[21:03:47] <mnms_> Derick: Yes, sorry

[21:07:55] <mnms_> Another case: If one of config router is down then the mongos cannot be restarted ?

[21:08:22] <kali> with one, it's fine

[21:08:50] <kali> one -> you loose balancing, two -> you loose ability to start a mongos

[21:08:58] <mnms_> So if one config routers is down there is only problem with writing to config server, yes ?

[21:09:18] <kali> yes

[21:09:24] <mnms_> kali: one or two of three yes ?

[21:09:30] <mnms_> in your scenario

[21:09:46] <kali> yes, of three. 1 config server is for testing and mad people only

[21:10:22] <mnms_> It two are down Im llosing ability of starting mongos ?

[21:10:39] <kali> yes

[21:10:41] <mnms_> so also restarting existing mongo yes ?

[21:10:46] <mnms_> mongos*

[21:10:47] <kali> yes

[21:11:25] <mnms_> but if two of three are up then I can restart and start mongos process ?

[21:11:36] <kali> yes

[21:11:59] <mnms_> I dont understand why is like that cause mongos can still read from one config server

[21:13:26] <mnms_> Starting and restarting doesnt need to write anything to config routers

[21:13:32] <mnms_> Unclear to me

[21:13:32] <kali> it might be a provision againt some weird split brain scenarios

[21:14:49] <mnms_> kali is it somewhere in the docs that after failiure two of three I cannot restart mongos

[21:15:12] <kali> it might be somewhere, but i know this from painful experience :)

[21:15:41] <mnms_> cause in docs is something else

[21:15:49] <mnms_> If one or two config servers become unavailable, the cluster’s metadata becomes read only. You can still read and

[21:16:05] <mnms_> write data from the shards, but no chunk migrations or splits will occur until all three servers are available

[21:16:36] <mnms_> So docs tells something else

[21:17:50] <kali> well, it says nothing about starting and stopping mongos explicitely

[21:18:06] <mnms_> It says somewhere else

[21:18:57] <mnms_> MongoDB reads data from the config server data in the following cases: 1) A new mongos starts for the first time, or an existing mongos restarts 2) After a chunk migration, the mongos instances update themselves with the new cluster metadata

[21:25:14] <mnms_> Ok, guys thanks for help

[21:25:37] <mnms_> Im going to eat something, I will start tomorrow my research :)

[22:19:16] <engirth> hey, in this example http://pastie.org/9778939 I am expecting to update all array elements that match a condition.

[22:19:58] <engirth> any guidance is helpful

[22:22:34] <blizzow> I have a collection that was created but is not showing up when I run show collections on my mongos. I tried to drop the collection and saw a complaint \"ns not found\" for the servers in two of my three replica sets. :( I try the drop again and it says the drop failed with the same ns not found message. I'm able to re-create the collection but it a) doesn't show up when I do show collections, and it complains that it's already sharded. How can I rem

[22:23:57] <Derick> It stopped at "sharded. How can I rem"

[22:24:03] <Derick> IRC has a 500 char limit per message

[22:29:03] <engirth> also, example in the pastie updates only the first match

[22:33:31] <blizzow> Derick: How can I remove this stubborn beast.

[22:44:02] <r01010010> hi

[22:44:06] <r01010010> anyone using mongoose?

[22:44:33] <r01010010> i don't know why but pre.('save' ... middle way enters many times and i don't know why

[23:13:01] <yzhao> how do you guys store files in mongodB? esp if files contain . dots which are forbidden in fields

[23:14:15] <Derick> yzhao: name: "filename.php"

[23:14:21] <Derick> don't use values as keys

[23:33:09] <yzhao> Is it normal to use GridFS as a temporary store for large files?

[23:33:11] <yzhao> or a permananet store*

[23:36:45] <yzhao> Ie i reference a gridFS file in a document

[23:55:25] <modulus^> KurtKraut: are you a Naught Zee ?

Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 13th of December, 2014