pmxbot IRC Log Viewer

[00:00:23] <keeger> so i have a fixed collection of Towns

[00:00:33] <keeger> 2 million in size

[00:00:47] <keeger> could I define a Capped Collection to hold them?

[00:01:14] <keeger> they are just updated in-place, none get deleted or added

[00:02:55] <GothAlice> Do they ever grow in document size?

[00:03:04] <GothAlice> I.e. append to a list, a string gets longer, an integer needs to go from short to long?

[00:03:19] <keeger> strings and integers grow

[00:03:40] <keeger> but no additional fields get added

[00:04:05] <GothAlice> Then capped collections won't really work out for you unless you account for that by adding padding to the records yourself, then removing that padding. There are some caveats, however, but I do not recall where I read that.

[00:05:13] <keeger> so i saw that you save space by using single character names, and using a mapper to resolve it to more meaningful stuff

[00:05:22] <GothAlice> Aye.

[00:05:33] <keeger> does 3.0 help with this in any way?

[00:05:43] <GothAlice> Basically means I need to use a REPL shell from the language my app is written in, and not raw MongoDB shell, but.

[00:06:09] <GothAlice> It does; through native compression support I don't need to do this shortening any more. The keys will effectively be huffman coded throughout the collection.

[00:06:19] <GothAlice> Which means 100% more mongo shell love.

[00:06:41] <keeger> oh, you mean like de-duplication?

[00:06:52] <GothAlice> That's what modern lossless compression is. :)

[00:07:04] <GothAlice> (With various numbers of additional tricks applied.)

[00:07:06] <keeger> ah. i have a different background :)

[00:07:21] <keeger> but i remember backups doing data deduping to save huge amounts of space

[00:07:48] <keeger> is the compression WT or mmap? or both?

[00:09:23] <GothAlice> Consider "tar" and "gz" files. "tar" packs a directory structure together into a single file. The gzip tool then takes that file and allocates space for a "dictionary" of known terms. It then reads in the tar file, and as it sees patterns adds them to the dictionary and writes out the index into that dictionary. To decompress you then just need the dictionary and the index list which is your compressed data.

[00:09:28] <GothAlice> It's a WT feature.

[00:09:43] <keeger> gotcha

[00:09:44] <joeyjoeyjoe111> "Index keys that are of the BinData type are more efficiently stored in the index if: the binary subtype value is in the range of 0-7 or 128-135, and the length of the byte array is: 0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 20, 24, or 32." <--- Can anyone provide a link as to why this is? particularly the size of the byte array?

[00:09:44] <GothAlice> WT does compression intelligently, taking into account its own internal structures.

[00:10:20] <GothAlice> joeyjoeyjoe111: Choice of hashing algorithms.

[00:10:59] <joeyjoeyjoe111> GothAlice: Can you link me to source code or documentation that elaborates a bit more?

[00:11:01] <GothAlice> I'd dig through the mongodb source code to find the indexer code that checks those things.

[00:11:05] <GothAlice> But it'd be pretty deep.

[00:11:10] <joeyjoeyjoe111> :(

[00:11:12] <joeyjoeyjoe111> Okay.

[00:11:13] <joeyjoeyjoe111> Thanks.

[00:11:34] <keeger> that's what all the girls say....

[00:12:27] <GothAlice> And the next thing you know you're addicted to the Apple ][ debugger.

[00:13:08] <keeger> i'd laught, but i'm afraid that went over my head :(

[00:13:26] <keeger> but dont let me distract you

[00:13:53] <GothAlice> (PEEK/POKE let you read and write arbitrary memory locations, it's how you do things like high graphics modes and get around copy protection on "borrowed" games. ;)

[00:14:09] <keeger> ah! lol

[00:16:25] <GothAlice> https://github.com/mongodb/mongo/blob/master/src/mongo/db/index/external_key_generator.cpp#L100 is where the magic starts.

[00:16:46] <joeyjoeyjoe111> Awwwwwww yeah.

[00:16:49] <joeyjoeyjoe111> Thanks!

[00:16:57] <GothAlice> No worries. :)

[00:20:50] <keeger> so goth, i'm working through some use cases

[00:21:09] <keeger> and i have a few that modify 2 documents, so i guess the 2 step commit will work for that

[00:21:09] <GothAlice> keeger: Hit me.

[00:21:19] <GothAlice> Two-phase, yes.

[00:21:32] <keeger> sorry, beer setting in :)

[00:21:45] <keeger> but i have a situation where X number of players can attack 1 town

[00:21:56] <keeger> when the battle is done, i need to update X + 1 documents

[00:22:20] <keeger> i like a good apple, but sadly all i have is Guinness atm

[00:22:52] <keeger> hah

[00:23:16] <keeger> so can i extend the 2 phase commit to X+1 docs?

[00:23:19] <GothAlice> Two-phase works for that, the situation of a "stuck" pseudo-transaction gets a bit trickier.

[00:23:49] <keeger> my concern is say a mongo node dies mid transaction

[00:24:10] <GothAlice> You effectively need to track progress so you know where the failure happened.

[00:24:37] <GothAlice> And use a write concern appropriate to your required level of durability.

[00:24:39] <keeger> so i do this using the transaction collection

[00:24:53] <keeger> well i'm planning on using repilica set with write concern of 2

[00:25:04] <keeger> app will start with 3 servers

[00:25:34] <GothAlice> Very reliable—also very slow, as it requires multiple network round-trips.

[00:25:46] <keeger> when you say very slow, how slow is that

[00:25:48] <keeger> 10 ms?

[00:25:54] <keeger> or more like seconds

[00:25:58] <GothAlice> But really, how reliable? One secondary goes offline and your queries will never come back…

[00:26:08] <GothAlice> (In the hundreds of ms range, depending on network conditions.)

[00:26:37] <keeger> hmm

[00:26:58] <GothAlice> http://techidiocy.com/write-concern-mongodb-performance-comparison/ — you can work back from writes/sec to msec/write.

[00:27:35] <keeger> that's with jounaling on though right?

[00:27:59] <GothAlice> In various combinations. https://whyjava.wordpress.com/2011/12/08/how-mongodb-different-write-concern-values-affect-performance-on-a-single-node/ is another, but neither of these touch on replica-safe write concerns, which only get slower…

[00:28:01] <keeger> oh wow, i can't read a table, my bad

[00:28:27] <GothAlice> A ha! *This* is the one I was looking for: http://www.nonfunctionalarchitect.com/2014/06/mongodb-write-concern-performance/

[00:30:09] <GothAlice> Performance in 3.0.0 may differ, of course.

[00:31:20] <keeger> urg that last one doesn't tell me which color goes to which setting heh

[00:31:43] <GothAlice> FTA: "The slowest of these are FSYNCED, FSYNC_SAFE, JOURNALED and JOURNALED_SAFE (with JOURNALED_SAFE being the slowest)."

[00:31:54] <GothAlice> He gives hints. XD

[00:31:58] <keeger> lol

[00:32:08] <keeger> but i want to know which line is REPLICA_ACKNOWLEDGED

[00:32:15] <keeger> which is the one i was leaning towards

[00:32:27] <GothAlice> It's in the cluster averaging second from the top in the first graph.

[00:32:35] <GothAlice> So for him: ~3.2 seconds.

[00:32:49] <keeger> ouch

[00:33:31] <keeger> er, i think the 3rd graph is the one with replicas_acknowledged

[00:33:51] <GothAlice> Actually, I'm wrong. Yeag. Cluster 2. That first cluster is the fsync batch.

[00:33:57] <keeger> yeah

[00:34:16] <keeger> so it's like...between 500 ms and 600 ms, which is still not very good

[00:34:21] <keeger> that's measurable to the user

[00:34:27] <keeger> and this is on a small db too, he said 100 docs

[00:34:38] <GothAlice> Indeed. I would imagine such a task would be backgrounded.

[00:35:16] <keeger> seems 200 ms is the fastest it does in that test

[00:35:30] <GothAlice> Again, use those as relative measures.

[00:35:37] <GothAlice> His architecture is not your architecture. :)

[00:35:47] <GothAlice> (And when in doubt: test!) :D

[00:36:04] <keeger> heh

[00:36:16] <GothAlice> Drat, the Bachelor presentation we gave didn't cover how we lazily evaluate scores. :/

[00:36:40] <keeger> one thing i was thinking would be nice

[00:36:40] <BadHorsie> Say I have this doc: {"vlans":[{"name":"1","entries":[{"ip":"10.1.1.1","name":""},{"ip":"10.1.1.2","name":""}]}]} And I want to update the name of ip 10.1.1.1 (Without using vlans.0.entries.0.name, more like find and I guess maybe the $ operator)

[00:36:49] <keeger> would be a shared nothing, so each webserver has a copy of the db

[00:37:45] <keeger> but i dunno about how much time would be spent purely keeping the dbs in sync

[00:37:54] <agenteo> oh man… it turns out it is working for me too… the effect is so subtle that it’s hard to see!

[00:37:54] <GothAlice> BadHorsie: You want http://docs.mongodb.org/manual/reference/operator/query/elemMatch/ and http://docs.mongodb.org/manual/reference/operator/update/positional/#up._S_

[00:38:08] <agenteo> prob my color scheme

[00:38:17] <BadHorsie> Ah, something broke my screen lol

[00:38:20] <GothAlice> agenteo: Indeed. I'm using a "bright bolds" variant of the default Pro theme.

[00:38:22] <BadHorsie> Thanks GothAlice

[00:38:54] <GothAlice> BadHorsie: No worries. If after reading those and playing around a bit in a mongo shell you still have questions, we're all ears. :) (The examples are pretty comprehensive.)

[00:39:45] <Boomtime> BadHorsie: am i correct in saying you have an array nested in another array (2 deep) and want to update a single array element of the inner-most array?

[00:40:19] <Boomtime> if so, then sorry: https://jira.mongodb.org/browse/SERVER-831

[00:40:22] <GothAlice> Boomtime: Good catch! BadHorsie: Your data design will require you to load the document and issue a somewhat more involved update.

[00:40:36] <BadHorsie> Ah....

[00:40:38] <GothAlice> Boomtime: +1 on the $n syntax.

[00:41:13] <keeger> well time for the dinner. thx for the help everyone, most especially GothAlice. i'll be back to wrangle out how to setup my mongo dbs :)

[00:41:35] <GothAlice> keeger: Ping me when you get back and I'll gist you some lazy scoring code. :)

[00:41:43] <keeger> GothAlice, sounds great

[00:42:56] <BadHorsie> Should I be trying to iterate through the results with a cursor and do the match like that?

[00:44:21] <Boomtime> BadHorsie: cursors iterate through document matches, returning whole documents (or whatever is filtered) only

[00:44:22] <BadHorsie> $n sounds nice... reminds me of sed backrefs :)

[00:44:52] <Boomtime> you will probably need to do the inner array manipulation on the client, then update the whole inner array

[00:44:54] <BadHorsie> Boomtime: Yeah, so like iterating throuth thi matching doc's subarray and checking for equality "manually"

[00:45:19] <Boomtime> right

[00:45:20] <BadHorsie> I guess that will be a bit slow/expensive...

[00:45:33] <Boomtime> the find can still narrow your search to positive matches only

[00:45:48] <GothAlice> BadHorsie: That is an approach. Be careful to catch the cursor timing out and re-trying from where you left off in case it takes a while, but that shouldn't be a problem if you're typically only updating a small set. Also, what Boomtime says. Note the race conditions and potential for this update to overwrite other updates more broadly than just the field you are updating. (Since you'll need to update the whole sub-array.)

[00:46:43] <BadHorsie> Perhaps I should consider a different structure..

[00:46:50] <GothAlice> It's something you may wish to refactor at some point; give the top-level "vlans" array its own collection, with back-references.

[00:47:05] <GothAlice> Then it's only a one-level deep nesting.

[01:00:50] <agenteo> @GothAlice I got it, so in iTerm there is an option called “Draw bold text in bright colors”, checked by default. If unchecked I get the same behaviour as you. yay

[01:02:05] <GothAlice> agenteo: Great success!

[01:21:56] <stiffler> hi lads

[01:25:37] <stiffler> I have problem with queries. This is my schema: http://codepaste.net/93c7rf . There is also query what I have written so far. But I got too many result. Like statments stops.timetable.hour: '6' doesnt work. There are also different hours as well

[01:26:06] <stiffler> in query result I mean

[01:27:13] <stiffler> anybody could help? Have I explained my problem good enought?

[01:27:19] <GothAlice> Are you positive you mean that arrangement of type and hour check?

[01:28:20] <stiffler> if I have understood you right, yes

[01:28:21] <GothAlice> The $elemMatch is effectively doing nothing, there.

[01:28:34] <stiffler> I have tried with and without

[01:28:48] <panurge> strange problem.. I'm using mongo 2.6 but db does not have createUser method

[01:28:50] <stiffler> bascily I only want to get timetable

[01:29:00] <stiffler> I dont need any other fields

[01:29:07] <GothAlice> It's finding any document that contains _any_ timetable with type "12" (the string "12") that also contains _any_ timetable with hour=='6' (the string "6")

[01:29:41] <GothAlice> (That also has any stop that includes a lineNr of 817.)

[01:30:48] <stiffler> so how to get timetable with only type: 12 and only hour: 6?

[01:30:51] <GothAlice> Oh bugger me, it's another doubly nested list.

[01:31:12] <stiffler> sorry those my first steps with mongodb

[01:31:22] <stiffler> so it mightbe bit messy

[01:33:27] <stiffler> bascily it returns whole document but I need only timetable depends of parent fields

[01:33:34] <GothAlice> Double list nesting makes things exceedingly difficult to query, and most notably, update. It is generally recommended to split out whatever is double-nested ("stops", which is ironically inside a collection called "stops") so that at no level is there more than a) one level of list nesting, and b) ensure if you have multiple sibling lists that you never need to query more than one at a time.

[01:33:43] <GothAlice> See: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html

[01:34:57] <GothAlice> Some nesting can be OK. For example, I nest replies to forum threads within the thread document.

[01:35:30] <stiffler> ok but I really nearly finished this project, and I would like to make step forward with refactoring half of code. is it imposible to make correct query with those nests ?

[01:35:45] <stiffler> *without refactoring

[01:39:52] <GothAlice> db.stops.find({'stops.lineNr': '817', 'stops.timetable': {'$elemMatch': {'type': '12', 'hour': '6'}}}) — does this describe what you are looking for? (I can not get a firm grasp of what you are looking for from the original query.)

[01:40:49] <stiffler> yes it does

[01:41:00] <stiffler> but it doesnt work as well

[01:41:06] <stiffler> still to many timetables

[01:41:30] <stiffler> type works

[01:41:34] <stiffler> but hours not

[01:41:40] <stiffler> *hour

[01:42:16] <stiffler> oh sorry

[01:42:23] <stiffler> type doesnt work as well

[01:42:34] <stiffler> just there were not timetables with other types

[01:43:44] <GothAlice> We may be running into point three of: http://docs.mongodb.org/manual/reference/operator/projection/positional/#array-field-limitation

[01:45:49] <stiffler> The query document should only contain a single condition on the array field being projected. Multiple conditions may override each other internally and lead to undefined behavior.

[01:45:53] <stiffler> did you mean this?

[01:46:16] <GothAlice> Also some other parts of that.

[01:46:23] <Everhusk> what does adding a $ to a variable do in an aggregation call

[01:46:30] <Everhusk> http://docs.mongodb.org/manual/reference/operator/aggregation/group/#accumulator-operator

[01:46:35] <Everhusk> i.e. on $price

[01:46:36] <GothAlice> stiffler: It all applies in your current schema situation.

[01:47:29] <stiffler> hmm... so do I have to change scheme or there are some other ways to solve my problem?

[01:47:47] <GothAlice> Everhusk: Those describe replacing the text you have entered ($price) with the value of the field with the same name, less the $. If you project just "price", the value of the field will be "price".

[01:47:49] <stiffler> ex. two queries

[01:47:54] <stiffler> or javascript itteration?

[01:48:32] <Everhusk> ah ok makes sense

[01:48:33] <Everhusk> thanks

[01:48:36] <GothAlice> stiffler: You'll have to do a query to find the document that matches and filter the first level, but you'll still get back more timetables than you expect. You'll have to filter those application-side.

[01:49:34] <stiffler> so hard javascript code will be my solution?

[01:49:55] <GothAlice> stiffler: It's a "why have a database" situation. It's certainly worth a little refactoring now…

[01:51:10] <stiffler> ok its nearly 3am in my timezone. I have to think about this what I want to do with this.

[01:51:24] <stiffler> so this schema should be splited on 3 documents

[01:51:34] <stiffler> or two

[01:51:45] <stiffler> like stops and timetable yes?

[01:52:24] <GothAlice> Well, you have a collection of documents called stops containing an array called stops which contains an array called timetable. Those last two should be in their own collection.

[01:52:33] <GothAlice> (You can keep those nested; one level is A-OK.)

[01:54:13] <stiffler> so {stops: [], timetable: []} and then i will have to add field to timetable whitch will help me keep relation between stops and timetable ?

[01:56:25] <GothAlice> … is there some reason why you aren't creating collections (other schemas)? That's still using the embedding notation. (What you wrote would be a document with two lists, and you can only query one at a time.)

[01:58:15] <GothAlice> Or I may be misreading this JavaScript. *shakes a fist at Mongoose and all the darn curly braces*

[01:58:19] <stiffler> I thought that nesting is the biggest advantage of mongodb and that how we keep relation

[01:58:46] <stiffler> s

[01:59:02] <GothAlice> Yes and no. It requires restraint and understanding, like any tool. I gave the link that outlines the limitations. One has to work within those limitations.

[01:59:33] <stiffler> so bascily the best solution is to make seprate collections

[01:59:48] <GothAlice> Performing a second query to "pseudo-join" data between collections is not unusual. (In most cases I measured, it's still faster even with the second round-trip vs. MySQL.)

[01:59:49] <stiffler> and manage them with the simillar why like in Sql databases

[02:00:02] <GothAlice> Again, yes and no.

[02:00:06] <stiffler> hehe

[02:00:08] <GothAlice> You can nest. Just be sane about it. ;)

[02:00:40] <stiffler> but I dont have to nest and still will be able to use all features of mongodb?

[02:00:46] <GothAlice> Keep it to one list (you wish to query) per document, and don't nest more than one level deep to keep the widest number of options available in terms of querying and updating that data.

[02:00:52] <GothAlice> Yes, though you lose some storage efficiency.

[02:00:56] <GothAlice> (Depends on the use case.)

[02:01:33] <stiffler> ok that was great lesson

[02:01:58] <stiffler> thank you for all very usefull informations and links.

[02:02:22] <stiffler> i will refactor it

[02:02:47] <stiffler> but tomorrow, I mean today morning (or afternoon :))

[02:02:54] <GothAlice> For a comparison of different storage methods (levels of nesting and typical queries for that type of data) you can see: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework — storage can range from ~50MB to more than 600MB.

[02:03:18] <stiffler> ok next one links went to bookmarks

[02:03:21] <GothAlice> :)

[02:03:35] <stiffler> thanks a lot one again

[02:03:40] <GothAlice> No worries!

[02:03:47] <stiffler> I gtg (3am came)

[02:03:56] <stiffler> have a good night/day

[02:04:01] <stiffler> bye

[02:04:05] <stiffler> ps ill be back :)

[02:24:23] <italomaia> hello folks

[02:24:51] <italomaia> I'm getting a NodeNotFound when I try to initiate my replicaset

[02:25:46] <italomaia> no ideia what my sin could be. Deploying on google cloud

[03:10:21] <roadrunneratwast> how do i retreive a subdocument from a parent document.

[03:10:34] <roadrunneratwast> I just created Resource saved! { __v: 0, place: 54f6760619d6f1ac82a72fd0, time: 54f6760619d6f1ac82a72fd1 }

[03:10:44] <roadrunneratwast> how do i get resource.place ?

[03:12:03] <joannac> that's a reference

[03:12:10] <joannac> which collection does it point to?

[03:12:35] <joannac> do a find on that collection

[03:13:54] <roadrunneratwast> I can't find that collection

[03:14:04] <roadrunneratwast> I created a subschema for Places

[03:14:36] <roadrunneratwast> which is used by Resources {place : {ref: PlaceSchema}

[03:14:54] <joannac> mongoose?

[03:15:03] <roadrunneratwast> yeah

[03:15:09] <roadrunneratwast> sorry i wasn't clear on that

[03:15:10] <joannac> go to #mongoosejs

[03:15:15] <joannac> or wait for someone else

[03:15:16] <roadrunneratwast> okay

[03:15:18] <roadrunneratwast> okay

[03:43:54] <keeger> GothAlice, long dinner :)

[03:44:26] <GothAlice> keeger: I do hope it was enjoyable. :) https://gist.github.com/amcgregor/dba54ae5de9cce0f9fb5 < the scoring code

[03:44:34] <GothAlice> With relevant model.

[03:45:57] <GothAlice> We had mixed scoring; energy which continually goes up but gets spent, with a maximum, refractory time, and rate of increase over that time. There was also per-game atomic increment scoring based on certain events or chains of events occurring.

[03:47:08] <keeger> interesting

[03:47:19] <GothAlice> For the first case all we need to know is the last time energy was spent, and how much energy there was at that time. Simply $set it whenever it gets spent. :)

[03:48:17] <keeger> i just hate dealing with server setup stuff

[03:48:32] <keeger> not an arena i know that well

[03:49:10] <GothAlice> ^_^ I like MMS.

[03:49:23] <GothAlice> https://mms.mongodb.com/

[03:49:44] <keeger> well i just need to figure out the pattern i need

[03:49:50] <keeger> replica set, sharding etc

[03:49:58] <keeger> then manage locking and transactions

[03:52:04] <GothAlice> For high availability you'll need replicas. This also covers data loss if appropriately spread amongst racks or DCs. Sharding is primarily useful when your "hot dataset" (the records you frequently query) grow beyond the size of available RAM, as this technique allows you to spread the physical records around. However in this situation, you would want to shard replica sets to ensure you maintain multiple copies of your data at all times.

[03:52:04] <GothAlice> (Only sharding reduces you back to single points of failure for your data.)

[03:53:07] <keeger> i was hoping to start off with 2 servers, and then scale wider if demand grew

[03:53:28] <keeger> cheaper, but a replica + shard is at least 4 servers probably

[03:54:38] <GothAlice> That would give you data protection, but not high-availability. Arbiters sorta add availability back, but at the cost that when running degraded (to use a RAID term) your data is at risk.

[03:56:54] <keeger> so for shard + replica, i need a replica, which is at least 2 boxes, and the shard, which is also at least 2, correct? probably need one more for the replica for safety?

[03:58:26] <keeger> and the app would basically hit the shard servers, for doing read/write, and the replica is just for data backup

[05:21:45] <Freman> so... turns out my problem is... mongo can't keep up

[05:27:40] <keeger> stop using it so hard!

[05:27:42] <keeger> :P

[05:31:36] <Freman> I know the machine is absolutely hammered

[05:32:31] <dcuadrado> what about mongostat ?

[05:32:32] <keeger> wait till it sobers up!

[05:32:56] <Freman> I do a few "update - upserts"

[05:33:12] <dcuadrado> how many servers do you hve?

[05:34:03] <Freman> just the one

[05:34:25] <Freman> it is being used as the log dumping ground for our entire application stack across 90 odd servers

[05:34:31] <Freman> (I know right?)

[05:35:00] <dcuadrado> are you using a queue before writing to it?

[05:35:17] <dcuadrado> or just 90 vs 1 ?

[05:35:35] <Freman> I am, that's why my app is getting screwed over - others aren't

[05:35:44] <Freman> mine is 6 > 1 > mongo

[05:36:33] <dcuadrado> so you can kill the consumers to reduce the load of the server

[05:40:09] <dcuadrado> Freman: please tell us a little more about your infrastructure

[05:42:13] <Freman> so, there's one mongo server that acts as the log server for the entire operation, there's 90 odd servers with all sorts of logs pointed at it (from php application logs to syslog)

[05:43:25] <Freman> my app is a nodejs server that connects to 6 other nodes, these 6 nodes run the tasks as directed by the server, the output of those tasks is sent back to my server, which then forwards it to mongo

[05:43:39] <Freman> which is taking it's sweet time to write it

[05:44:44] <Freman> http://pastebin.com/9636rBbz - a chunk from mongostat

[05:46:15] <GothAlice> Freman: Ensure your "logging" collections are "capped" collections.

[05:47:00] <Freman> capped? mine has a background task that prunes the entries to 100 executions per task, there's about 160 tasks

[05:47:14] <GothAlice> And/or reference: http://docs.mongodb.org/ecosystem/use-cases/storing-log-data/#limitations

[05:47:25] <GothAlice> waitwat

[05:47:33] <GothAlice> Define "prune".

[05:48:24] <dcuadrado> Freman: what version of mongodb are you using?

[05:51:23] <Freman> v3.0.0-rc11

[05:51:41] <dcuadrado> with WT I suppose

[05:52:26] <Freman> my prune is... http://pastebin.com/78U6TSpJ

[05:54:43] <dcuadrado> that's probably what's causing the bottleneck

[05:54:57] <Freman> that runs once every minute

[05:55:58] <GothAlice> Mother of god.

[05:56:03] <Freman> sorry, that runs once every 60 minutes

[05:56:35] <joannac> ...what

[05:56:48] <dcuadrado> lol

[05:57:26] <Freman> even without that code running it falls behind very quickly (just commented it out)

[05:57:47] <dcuadrado> Freman: are you using wiredtiger or mmap? how big are the docs you are inserting?

[06:01:05] <Freman> most of them tiny

[06:01:19] <Freman> but I'm only one of many :(

[06:01:22] <Freman> http://pastebin.com/yhGVNhKJ <- most of my code

[06:01:24] <GothAlice> Freman: In the link I provided, it mentions several ways of improving write performance, and the trade-offs in the different approaches.

[06:02:16] <Freman> I'm responsible for a grand total of 9032 documents :D

[06:02:55] <Freman> http://pastebin.com/LsNx1URj my average document

[06:03:33] <GothAlice> And what you are currently doing is kinda not cricket. Having a capped collection of a reasonable size (estimate out for X time or X records, or both) and have a process "tailing" it to catch interesting messages which can be saved in a real collection elsewhere is better. Adding TTL indexes onto that to enforce a X time clearing of that collection would be for bonus points. (No need for you to do what the DB can do for you!)

[06:04:30] <Freman> problem is GothAlice we have things that run every second... and things that run once a month

[06:04:56] <GothAlice> Freman: I've benchmarked running 1.9 million things per second.

[06:05:11] <Freman> honestly, I don't think the db is suffering from my usage... 9032 documents...

[06:05:13] <GothAlice> So… it's clearly capable of reasonable performance, here. ¬_¬

[06:05:33] <Freman> I'm not saying its not

[06:05:59] <Freman> I'm saying that my additional load is evidently the proverbial straw

[06:06:40] <GothAlice> Capped collections are more performant for a number of reasons, and the article I linked describes sharding plans that increase performance if you do not wish to use capped collections. (They're a feature added for this purpose, though. See: http://docs.mongodb.org/manual/core/capped-collections/)

[06:06:51] <fhainb> perhaps you want to use a real database instead of a toy?

[06:07:47] <dcuadrado> Freman: can you remind me why do you say mongo can't keep up?

[06:08:05] <dcuadrado> what makes you think that

[06:09:32] <dcuadrado> (I'm thinking the bottleneck is in the client)

[06:09:40] <Freman> because I have a task that is outputting logs in real time, once every second, I can see them refresh, however the callback from mongo is ages behind

[06:10:05] <dcuadrado> what callback from mongo/

[06:10:06] <dcuadrado> ?

[06:10:10] <GothAlice> Freman: If you "see them refresh" by polling them, then you're doing it wrong.

[06:10:17] <dcuadrado> oh node

[06:10:38] <Freman> I'm not polling anything

[06:11:13] <Freman> client writes to my server, it displays them as it receives them, at the same time it fires off a mongo update, then the callback from that update fires when mongo returns

[06:11:22] <GothAlice> Freman: Your approach is backwards. Instead of logging everything somewhere slow, then deleting the things you don't want, typically by number of elements, in a rather slow way, use a type of collection that can naturally limit itself to a certain number of documents.

[06:11:55] <GothAlice> A type of collection that is also inherently faster for logging-type usage.

[06:12:23] <Freman> I'm not logging anything I don't want and as a measure of keeping my collection small I'm deleting older entries

[06:12:45] <Freman> got the most part my entries are "this process ran, it had no stderr"

[06:12:48] <GothAlice> Freman: Capped collections do that for you.

[06:13:44] <dcuadrado> Freman: wondering... would a memory only db work?

[06:13:46] <Freman> I still say it's not really my problem, but a problem with the other 3 trillion log entries

[06:13:54] <dcuadrado> I mean for you use case

[06:16:04] <GothAlice> Freman: Read the links I have provided, then field your questions. I currently have 5,574,197 log entries in one application's capped collection-based log. That's about one day, so… just under 4K messages per minute or 64 per second.

[06:16:49] <Freman> I did, but I'm not the architect of this mess, I'm just trying to get a task done, one that I'm sick of coming back to after having wasted the better part of this week trying to solve the issues with talking to mongo

[06:17:24] <Freman> whats the fastest way to find out a collections size (either count, or bytes?)

[06:18:00] <GothAlice> Freman: db.stats()

[06:18:17] <GothAlice> For general storage vs. objects. data vs. index stuff.

[06:18:40] <GothAlice> db.collection.stats() for the logical collection-specific stats

[06:21:06] <dcuadrado> Freman: if you just want to get the task job without changing the other parts of the system I would just put an nsq.io and have consumers writing to the db, that way you don't lose data and everything is eventually synced

[06:22:43] <Freman> http://pastebin.com/7XVsS1ar are the collections in this database... my collection is the very last one

[06:25:46] <dcuadrado> task done*

[06:26:01] <dcuadrado> I'm really sleepy atm

[06:26:28] <Freman> thanks dcuadrado, I'll look at that tonight when I get home

[06:27:03] <dcuadrado> it's definitely not your fault

[06:27:36] <Freman> as you can see, my collection is minuscule compared to the others on that poor machine - the only reason mine even cares is because I have a watchdog to restart stuck processes, a process is deemed stuck when it's last run hasn't been updated for a long time, that gets updated at the end of the logging process

[06:28:04] <dcuadrado> you are probably losing data right?

[06:28:17] <GothAlice> I'm leaning towards correctable architectural efficiency with two potential clear answers, one of which eliminates the need for a maintenance process. (The other can expire by time using a TTL index, BTW.)

[06:28:24] <Freman> I'm not no, but the watchdog is going apw

[06:28:39] <GothAlice> s/efficiency/deficiency

[06:29:09] <dcuadrado> how are you not losing data? where is it stored if the process crashes or is restarted?

[06:29:41] <GothAlice> dcuadrado: With default write concern, in a buffer in the primary's RAM.

[06:29:41] <Freman> the watchdog politely asks the worker that's running the task to not run it again (if it is still running it) and then starts it again on another node

[06:30:07] <Freman> I'd like to build it in such a way that processes that run once a month have 6 months worth of logs and things that run once a second have a weeks worth of logs or something like that

[06:30:08] <dcuadrado> oh I see

[06:30:48] <Freman> dcuadrado: the whole thing is really nice, I promise, my mongo might not be up to par I confess but I seriously put some thought behind this task management thing

[06:30:59] <Freman> distributed cluster crontab :D

[06:32:11] <dcuadrado> I don't think it's mongo's fault either (although it's probably not the best tool for that job either)

[06:32:45] <dcuadrado> Freman: can you take a look at the log of mongodb?

[06:33:00] <Freman> No I don't believe that either, I just think I've reached the limit of what this install can handle with the way it is being used

[06:33:12] <dcuadrado> is there anything suspicious there?

[06:33:45] <Freman> probably, I'm afraid I don't have time right now - my carpool is waiting

[06:33:56] <Freman> which means I have to come back and sort this out tomorrow

[06:34:19] <GothAlice> https://gist.github.com/amcgregor/4207375 < a distributed task worker system, including scheduled tasks, using MongoDB. This was the 1.9 million calls per second thing I mentioned. (That's end-to-end on a single host, sharding on the task collection would improve performance somewhat.)

[06:35:28] <Freman> we've got cron style and persistent style

[06:35:42] <Freman> I'd love to talk more (and solve this) but I really have to split

[06:35:48] <GothAlice> No worries.

[06:35:57] <Freman> thanks for your patience I'm going to look at NSQ tonight when I get home

[06:36:45] <dcuadrado> mongodb is not very good for worker systems either

[06:37:12] <dcuadrado> Freman: i don't think nsq is gonna help

[06:38:13] <GothAlice> dcuadrado: Production systems would argue the mongodb point there.

[06:44:35] <dcuadrado> yeah I've used it on production as worker system several times and while it works it's not the best tool for the job

[06:45:17] <dcuadrado> I shouldn't ave said it's not very good

[06:46:46] <GothAlice> 1.9 million RPC calls per second… on a single host before any attempt to scale via sharding, with just 2 task producers and 4 workers is nothing to sneeze at. I should re-benchmark on modern hardware (that stat is… 3.5 years old) and with different sharding strategies to compare. Hmm. :3

[07:50:15] <Freman> nsq peobably wont (yay home) but it'll shuffle the problem further away from me :)

[08:29:49] <imjacky> Hi, I am using mongodb to do LBS query. I have met a problem on pagination. When I use skip and limit after find, some records with the same distance will show in both current page and previous page, which is not I want. So I wonder if there is a way to solve my problem. thx

[08:34:31] <imjacky> Hi, I am using mongodb to do LBS query. I have met a problem on pagination. When I use skip and limit after find, some records with the same distance will show in both current page and previous page, which is not I want. So I wonder if there is a way to solve my problem. thx

[08:40:54] <imjacky> uhm, anyone could help solve my problem?

[08:47:07] <hdon> hi all :) my mongo build is huge! it looks like every command built is statically linked with ~200MB of libs. how can i tell mongo to build dynamic binaries?

[08:48:09] <hdon> oh or... well it looks like the 'stripped' dir contains much smaller copies... maybe it's really all debug info

[09:14:45] <bo_ptz> Hi all how can I get from mongo last 10 document

[09:18:20] <mocx> i'm using mongodb 3.0 for debian which i installed using the package manager repos as described in the documentation

[09:18:44] <mocx> what setting in the mongod.conf file do i need for the new wiredTiger storage engine?

[09:19:00] <mocx> it seems to be an ini file

[09:35:55] <mocx> in this ini config file i've tried storageEngine, storageengine, and storage_engine

[09:35:59] <mocx> nothing is working

[11:20:09] <_QGuLL_> hi, i'm quite confused with the dropDatabase() : it doesn't seem to clean the files in dbdir : i still have files related with my droped db, and still used by the daemon (except if i restart mongo) : is that normal ?

[11:20:33] <_QGuLL_> and if i recreate the dropped db, i've some auth problems

[11:21:30] <fhainb> because MongoDB is garbage

[11:22:58] <_QGuLL_> what an odd answer from the mongodb chat :o

[11:24:18] <_QGuLL_> did you meant has garbage (collector) instead ?

[11:42:49] <fhainb> odd answer? you can't accept a critical answer? :)

[12:56:02] <quattr8> running mongodb 3.0 wiredtiger, went from ~180gb disk usage to 14gb, awesome :)

[12:57:07] <StephenLynx> noice

[13:14:20] <mskalick1> Hi, is it mongodump and mongorestore supported as update mechanism from mongodb 2.4 to 2.6?

[13:19:34] <Sagar> Hello

[13:19:48] <Sagar> i have a collection let us say: msg_logs

[13:20:07] <Sagar> i has field "uid" with each of them specifying the user it belongs to

[13:20:19] <Sagar> how i can get them all one by one using php and show in a div

[13:20:58] <Sagar> i tried $logs = $Mongo->msg_logs->findOne(array("uid" => $uid));

[13:21:05] <Sagar> but it just returns the first entry

[13:21:13] <Folkol> "findOne"

[13:21:32] <Folkol> Please at least read what you copy from somewhere :P

[13:21:33] <Sagar> i tried it with find as well

[13:21:44] <Sagar> but it returned null

[13:21:49] <Folkol> findOne should only return one object

[13:22:15] <Sagar> but as i said, find didn't worked as well

[13:22:31] <Folkol> Have you read the documentation for find?

[13:22:39] <Sagar> yes

[13:22:44] <Sagar> returns all entry.

[13:22:48] <Folkol> Ok. Did you iterate the result set?

[13:23:26] <Sagar> no

[13:23:29] <Sagar> how to do that? :o

[13:23:54] <Folkol> http://php.net/manual/en/mongocollection.find.php

[13:24:05] <Folkol> There is an example of how to do that in php.

[13:24:16] <Folkol> (I just googled it, I have no experience with the PHP client.)

[13:25:16] <Sagar> thanks :)

[13:45:13] <Cygn> Hey guys, i am using this statement to count elements of a subarray of my documents. The documents are selected by attributes of the subarray, afterwards the subarray is unwinded and the elements are counted… but i want only the elements to be counted which fit the criteria i also use to select the documents… http://pastie.org/9998972 - how would you realize that?

[13:45:46] <Cygn> Just use elemMatch again?

[13:53:24] <StephenLynx> hey, I got a bunch of warning when starting mongo 3.0 on centOS 7

[13:53:46] <StephenLynx> I migrated by uninstalling 2.6 and installing 3.0 from mongo's package

[13:53:59] <StephenLynx> anyone had anything similar?

[13:59:10] <fhainb> don't use mongodb

[14:02:39] <Cygn> fhainb: Who are you talking to?

[14:02:47] <fhainb> to the wall

[14:03:21] <Cygn> fhainb: Why should the wall not use mongodb? I mean… poor thing that wall.

[14:03:40] <fhainb> at least the wall listens to me

[14:22:47] <krisfremen> that's what the wall wants you to think it's doing... but secretly it's not

[14:48:48] <stefan_17> I have 2 documents. Author and Post. Author has many posts. How I can query to get only authors with 1 or more posts?

[14:49:20] <fhainb> by using a real database and not the mongodb toys

[14:50:00] <StephenLynx> if its a subarray, you can query for the array length if Im not mistaken.

[14:55:35] <pamp> hi guys, I need to make a query, and then forEach doc in result I need to make another query, and save the data into a "var"

[14:55:55] <pamp> for example

[14:55:56] <pamp> http://dpaste.com/2QS5392

[14:56:15] <d0x> Can someone explain my why a map that initilized with arr[key]=value gets broken when sending it through the scope of map reduce job? This reproduces the problem: http://pastebin.com/7yfU0bGL

[14:56:50] <pamp> i create this method, but the result of var parent is not what i expected

[14:57:02] <pamp> doesn't return the object

[14:57:08] <StephenLynx> pamp show me your code.

[14:57:10] <d0x> The broken output looks like this http://pastebin.com/rEC8GHBu (aMap1 is broken, aMap2 is right)

[14:57:32] <fhainb> should we guess what you expect?

[14:57:46] <pamp> StephenLynx: http://dpaste.com/2QS5392

[14:58:06] <StephenLynx> are you using the CLI?

[14:58:59] <pamp> robomongo

[14:59:09] <kali> d0x: you amap1 initializer is broken

[14:59:25] <kali> you data is broken before you call map/reduce

[14:59:36] <StephenLynx> yeah, I have no idea how it works, sorry. I thought it was application code :v

[15:02:11] <pamp> I need to do a query, then for each result, need through a property in this results make a new query and store the result in a var

[15:02:36] <pamp> what is the best procedure to do this

[15:02:52] <StephenLynx> how many results on the first query are you expecting?

[15:03:00] <StephenLynx> a couple hundreds? an assload?

[15:03:15] <pamp> thousands

[15:03:37] <d0x> kali: damn it, you are right

[15:03:41] <pamp> then for each of them, make another query to find parents

[15:03:45] <d0x> thx

[15:03:49] <pamp> and save data in a var

[15:03:52] <StephenLynx> hm

[15:03:54] <fhainb> mongodb is superfast according to the marketing, so you can do it!

[15:03:58] <fhainb> trust the marketing!

[15:04:00] <StephenLynx> can't you save the parent in the first place?

[15:04:10] <pamp> for then project an result with fields of first and second query

[15:04:21] <StephenLynx> if you need to do that kind of operation, your database is very badly designed.

[15:04:41] <StephenLynx> that is a heavily relational operation.

[15:04:48] <pamp> first i need to find all the doc of type "a"

[15:04:52] <fhainb> that's why Postgres is much better

[15:05:00] <pamp> then find the parent of each of them

[15:05:36] <pamp> at the end i need to present the result with fields of two querys

[15:05:38] <StephenLynx> you either use a relational db, you duplicate the data or you perform a query for each object and cripple your performance.

[15:06:05] <StephenLynx> thats your 3 options.

[15:07:09] <pamp> I cant do something like this : http://dpaste.com/2QS5392

[15:08:17] <StephenLynx> thats option 3

[15:08:28] <pamp> why the variable parent does not store the desired docs, and keep only one _id

[15:08:32] <StephenLynx> it will cripple your performance if you expect thousands of objects in the first query.

[15:08:43] <StephenLynx> because you didn't designed it that way.

[15:10:02] <StephenLynx> you could store an array inside the parent object with the child objects.

[15:15:08] <pamp> If I make a query outside of a forEach the result is that http://dpaste.com/2JXGKP9 , a normal document,

[15:15:57] <StephenLynx> I don't know your model nor your specific problem, I can't tell you much about your queries specifics

[15:16:04] <d0x> pamp: when u use robomongo you could vote on this issue :) https://github.com/paralect/robomongo/issues/657

[15:17:42] <pamp> but if i make inside of forEach the result is that http://dpaste.com/12QAFAQ

[15:17:48] <pamp> I dont nkow why

[15:59:34] <SpartanWarrior> hello guys, I installed a brand new replica set, does anyone know how do I set the --replSet arg to mongod when using the upstart scripts @ ubuntu?

[16:03:38] <coalado> hi. Is there anything new with mongodb 3 and authentication?

[16:04:17] <coalado> I cannot auth for example with mongovue or the java drivers

[16:34:45] <Redcavalier> Hi, I got a quick question regarding mongodb clients and replicasets.

[16:37:36] <Redcavalier> Basically from what I tested, when you shut down a primary in a replicaset, the client automatically connects to the new primary. How is that done?

[16:37:51] <Redcavalier> Is the client constantly aware of all the servers in the cluster?

[16:45:48] <Redcavalier> replace cluster by replicaset, sorry

[16:46:00] <Redcavalier> also, where would that information be stored?

[16:50:03] <MacWinner> Hi, is there a quick way to see how often a specific index is being used? I noticed a bunch of indexes on one of my collections, and I feel like half of them probably arent being used

[16:51:26] <gregf_> https://gist.github.com/anonymous/68b85c24f9bd643f9c59 <== eric

[16:51:40] <gregf_> bah :/ wrong chat

[16:53:22] <quattr8> Redcavalier: as far as i know the client caches the replicaset members, but for the php driver for example it is recommended to provide a seed list with all members

[16:56:03] <Redcavalier> quattr8, I see, so if the primary fails, it will automatically go through its cache and try the next server. Out of curiosity, how often is this cache refreshed?

[17:00:43] <quattr8> Redcavalier: I think the php driver and most other drivers cache the information on the first connection

[17:02:23] <Redcavalier> quattr8, ok, so if we add a node to the mongodb replicaset, then it would require the client to be restarted to reinitialize its cache and add the new member?

[17:03:59] <quattr8> Redcavalier: I think i read somewhere once that the information is cached for 5 minutes but can’t find anything on it anymore.

[17:04:56] <quattr8> Redcavalier: Probably best to connect using all replicaset members so you won’t have to wait for cache to change or reinitialize

[17:12:35] <Redcavalier> quattr8, thanks, that helps to answer our questions

[17:41:16] <pamp> its possible make a projection inside a find().forEach

[17:41:55] <pamp> something like, make a query (find) do some operations in each document inthe result set and then make a projection

[17:41:56] <pamp> ?

[17:43:11] <StephenLynx> no. you will need a projection.

[17:43:34] <StephenLynx> a projection is a function on itself, you can't use it in conjunction with another operations.

[17:43:50] <StephenLynx> oh wait

[17:43:53] <StephenLynx> I fucked up on that

[17:44:06] <StephenLynx> scrap that, I was thinking of aggregation :v

[17:44:11] <StephenLynx> yes, you can use projection on a find.

[17:44:23] <StephenLynx> the first parameter is the match block and the second one is the projection block.

[17:59:33] <jr3> where is the mongod.conf usually located

[18:00:28] <cheeser> on what OS?

[18:00:38] <cheeser> 1/

[18:00:50] <jr3> os x

[18:01:01] <cheeser> installed via homebrew?

[18:01:16] <jr3> I donnt remember, but I think so

[18:04:25] <cheeser> look in /usr/local/etc

[18:30:23] <fabiobatalha> Wy starting MongoDB with /etc/init.d/mongod start, don't works when using the an YAML config file?

[18:34:27] <fhainb> what?

[18:34:43] <mocx> i'm using mongodb 3.0 that i installed from the debian package managers specified in the documentation, this install comes with a /etc/mongod.conf file that is an ini file

[18:34:58] <mocx> what is the ini option to change the storage engine to wiredTiger?

[18:36:52] <MacWinner> if 2 indexes have teh sample beginning part, is it true that you can delete the shorter index?

[18:37:21] <MacWinner> eg: note.hash_1_verb_1_note.slidecount_1 and note.hash_1_verb_1_note.slidecount_1_note.ext_1

[18:37:41] <fhainb> mocx: mongod --help

[18:38:10] <fhainb> --storageEngine wiredTiger

[18:38:19] <mocx> i want to use it in my config file though

[18:38:40] <fhainb> well, turn on your brain,...

[18:38:46] <mocx> ....

[18:39:03] <mocx> i've tried storageEngine in the config file

[18:39:05] <fhainb> do you need everything on the golden plate?

[18:39:12] <mocx> i've tried engine, and storage_engine

[18:39:17] <mocx> nothing works

[18:39:25] <mocx> that's why i'm asking

[18:39:26] <fhainb> because MongoDB was cra

[18:39:27] <fhainb> crao

[18:39:29] <fhainb> is crap

[18:39:32] <fhainb> and will remain crap

[18:39:56] <mocx> not so some dick in an irc chat can give me his personal thoughts

[18:40:29] <fhainb> better a dick than a tard

[18:40:37] <mocx> someone kick this moron

[18:40:49] <fhainb> kick mocx

[18:40:53] <fhainb> that's what you want?

[18:41:11] <mocx> i'll get down to your level

[18:41:13] <mocx> your mom

[18:42:40] <pamp> I've a collection like this : { _id:1234321 , P: [ { k : "a" , v : 123 } , { k : "b", v:321 } , { k:"c" , v:345}, { ... } , { ... } ] }, How can I get the field "v" querying for the field "k" ? I want know what is the value "v" for the k:"a". How can I do that?

[18:44:02] <fhainb> query for k:a and pick up the value for v from the result

[18:45:21] <StephenLynx> you want to output a value of a field as the value of another field?

[18:45:29] <StephenLynx> no, wait

[18:46:15] <StephenLynx> you can use dot notation on a query. P.k:a

[18:46:25] <StephenLynx> then project P.v

[18:46:41] <pamp> yes, I've an array "Props" with a key value pairs in each position, its posssible do that?

[18:47:03] <StephenLynx> what is props?

[18:47:56] <fhainb> why do companies assign idiots to database work like pamp ? Idiots that can not read documentation

[18:50:36] <Boomtime> fhainb: that isn't very helpful

[18:50:46] <Boomtime> pamp: http://docs.mongodb.org/manual/reference/operator/projection/positional/

[18:50:53] <fhainb> and?

[18:51:41] <fhainb> <pamp> .!. fuck you asshole

[18:51:46] <fhainb> nice guy

[18:52:48] <fhainb> time to move the mongodb dirt to the trashcan

[18:52:56] <pbbunny0801> hello!

[18:53:11] <pbbunny0801> hope someone can help me...I'm trying to troubleshoot why mongodb isnt starting for me

[18:53:24] <pbbunny0801> i'm using it for katello, running on centos 6.5

[18:54:05] <fhainb> pbbunny0801: mongoldb only starts on computers where the owner has an IQ > 150

[18:54:15] <fhainb> your IQ is on the level of a pig, likely

[18:54:44] <mocx> pbbunny0801 ignore the douchebag, continue with your question

[18:54:47] <pbbunny0801> wow, that was unnecessary

[18:54:56] <fhainb> not it was necessary

[18:55:42] <StephenLynx> where did you install it from, how are you starting it and what is the error?

[18:56:07] <pbbunny0801> mocx, changed permission for the folder where the db is created, no go

[18:56:21] <pbbunny0801> StephenLynx, it was installed I believe as part of the Katello package

[18:56:32] <StephenLynx> no idea what katello is.

[18:57:02] <pbbunny0801> Katello is an extension for TheForeman, which uses modules like Puppet to control packages on the *Nix world usually

[18:57:03] <mocx> are you trying to start it as wiredtiger in a mmapv1 folder?

[18:57:18] <pbbunny0801> mocx, I'm simply just trying to start the db service

[18:57:25] <pbbunny0801> if I can get that to run, I'll be set with the others

[18:57:27] <mocx> anything in the error logs?

[18:58:26] <pbbunny0801> http://fpaste.org/193440/25494979/

[18:58:37] <StephenLynx> nah, never heard about any of these. except *nix, of course.

[18:59:33] <fhainb> mocx: especially for you : http://www.31337.pl

[18:59:40] <fhainb> pbbunny0801, and for you: http://www.31337.pl

[19:00:01] <GothAlice> And for him.

[19:00:10] <mocx> thank god

[19:00:23] <mocx> thank satan, whoever, i'm glad he's gone

[19:00:49] <pbbunny0801> I knew a higher power is watching over us :)

[19:01:06] <StephenLynx> http://crybit.com/mongodb-wont-start-error-child-process-failed-exited-with-error-number-100/ pbbunny0801 check this

[19:01:14] <StephenLynx> the error it fixes is the very same you have.

[19:01:32] <StephenLynx> and you gotta pm the OPs, they are not omniscient :v

[19:02:16] <StephenLynx> someone on stack overflow said that got it fixed by using repair.

[19:03:01] <mocx> hmmm

[19:03:04] <pbbunny0801> eh

[19:03:15] <pbbunny0801> I have plenty of space, so that wouldnt be the issue

[19:04:44] <mocx> is there a way to generate a default YAML config file?

[19:05:49] <GothAlice> mocx: Simply having an empty file would naturally populate the default values everywhere.

[19:06:08] <GothAlice> mocx: The config file technically just overrides the defaults.

[19:06:56] <MacWinner> GothAlice, do you know if it's redundant to have indexes like: pitchid_1 and pitchid_1_userid_1

[19:07:02] <pbbunny0801> StephenLynx, I tried the repair piece, I saw something strange, MongoDB starting : pid=16303 port=27017 dbpath=/data/db/ 64-bit host=mongo.my.net

[19:07:11] <GothAlice> MacWinner: It certainly is. The second includes the first.

[19:07:24] <pbbunny0801> i dont see the dbpath as data/db specified in the mongo config

[19:07:27] <MacWinner> so removing the first should have no harm?

[19:07:37] <GothAlice> (Index prefixes, i.e. the first field, or first and second in a set of three, etc. can be used as if they were standalone.)

[19:10:04] <MacWinner> thanks!

[19:11:06] <pbbunny0801> omg, i found the issue

[19:11:29] <medmr> love those moments

[19:12:08] <pbbunny0801> but it's really strange, how can i find where mongod is getting the dbpath info from, outside of the mongodb.conf

[19:12:31] <pbbunny0801> it's getting the dbpath=/data/db/ somewhere, and it's what's causing the first issue

[19:12:34] <GothAlice> pbbunny0801: The command-line. Anything in a conf file can be on the command-line.

[19:12:58] <GothAlice> Also, AFIK, that's the default path. http://docs.mongodb.org/manual/reference/program/mongod/#cmdoption--dbpath

[19:13:25] <GothAlice> pbbunny0801: So basically you currently _aren't_ specifying a dbpath, and you probably should. ;)

[19:13:27] <pbbunny0801> GothAlice, strange, because then the mongodb.conf file has /var/lib/mongodb/ as the default path

[19:13:50] <GothAlice> pbbunny0801: Ah, but MongoDB doesn't use a config file unless one is specified on the command-line. http://docs.mongodb.org/manual/reference/program/mongod/#cmdoption--config

[19:14:04] <GothAlice> (Note the lack of a default value for that one.)

[19:15:03] <pbbunny0801> so what's usually the way to mongod to run during startup, using the actual command and not services/

[19:15:06] <girb> please help …. I already have a 1 replica set named rs0 { 1 primary with secondary }

[19:15:06] <girb> now want to create a shard cluster of another 3 replca set { 3 primary with 3 secondary}

[19:15:06] <girb> so my confusion is in my new 3 replica set should I give the same replica name rs0 or shoud i give rs1, rs2 and rs3 for each

[19:15:48] <GothAlice> pbbunny0801: Depends on platform. Usually some combination of mongod -f /etc/mongod.conf and a daemon management tool like start-stop-daemon, launchd, etc., etc.

[19:17:02] <GothAlice> girb: We use r01.s01.db.example.com server naming for our sharded replica sets. (Since replicas are contained within shards, replica number for the set is the first domain element. Second element counts the shards.)

[19:17:50] <GothAlice> Thus your first cluster would have r01.s00.db.example.com and r02.s00.db.example.com, and your second cluster would be r01 and r02.s01.example.com through s03.example.com.

[19:18:24] <mocx> GothAlice: is there an INI file config option for the storage engine?

[19:19:13] <GothAlice> mocx: Yes, but you can't really change that value with any existing data on that node. http://docs.mongodb.org/manual/reference/configuration-options/#storage.engine

[19:19:21] <GothAlice> s/that/an existing/

[19:20:09] <GothAlice> mocx: Oh, you want INI. INI has been deprecated since 2.6 was introduced…

[19:20:31] <GothAlice> (And no, no new options are being added to that configuration format.)

[19:20:39] <mocx> right...but i installed mongodb via the package managers in the documentation

[19:21:01] <GothAlice> mocx: Are some lines indented in the config? (If so, it's YAML, not INI.)

[19:21:02] <mocx> so when i run sudo service mongod restart i think it automatically looks for the ini file

[19:21:12] <mocx> no it's ini

[19:21:17] <GothAlice> Then you should migrate.

[19:21:23] <girb> GothAlice: I can have a diferrent replica name within the same cluster

[19:21:28] <mocx> migrate to what?

[19:21:36] <GothAlice> mocx: The current YAML config.

[19:21:39] <mocx> i did

[19:21:45] <mocx> and now it won't start

[19:21:55] <GothAlice> mocx: Version of MongoDB installed?

[19:22:00] <mocx> 3.0

[19:22:12] <GothAlice> Also, did you change the storage engine after having already started mongod once without the engine defined?

[19:22:13] <mocx> on debian

[19:22:17] <mocx> no

[19:22:29] <mocx> i'm simply trying to migrate to yaml first

[19:22:41] <mocx> and be able to use sudo service to stop/start/restart

[19:22:43] <GothAlice> mocx: Pastebin/gist your console output when running mongod and contents of the mongod log, if available.

[19:23:12] <GothAlice> (I.e. try to manually start the service, remember -f /etc/mongod.conf option, and capture the output. Once it runs manually, we can worry about the platform automation.)

[19:23:33] <mocx> okay just a minute

[19:24:53] <pbbunny0801> how do you fork that to the background though

[19:25:51] <GothAlice> For testing, one generally doesn't. (Forking makes tracking down output harder.) But in production, either the daemon management tool (start-stop-daemon) worries about forking, or http://docs.mongodb.org/manual/reference/program/mongos/#cmdoption--fork (or the config option version of that) are specified.

[19:28:29] <mocx> GothAlice: http://pastie.org/pastes/10000208/text?key=r1xyj1dob58jgpohnodd7g

[19:28:48] <mocx> do i need to sudo?

[19:29:14] <pbbunny0801> hmmmm.....might have to try this differently

[19:30:01] <GothAlice> mocx: ls -l /var/log/mongodb/mongodb.log (that's a lower-case L, not a 1 or I.) Who is the file owned by?

[19:30:05] <GothAlice> (Hopefully 'mongodb'!)

[19:30:17] <mocx> it is

[19:30:26] <GothAlice> sudo -u mongodb mongod …

[19:31:41] <mocx> GothAlice http://pastie.org/pastes/10000220/text

[19:32:14] <GothAlice> mocx: Generally a good sign, esp. if you have the fork option set in your config. What's the contents of the log file?

[19:32:25] <mocx> it looks like it's running

[19:33:20] <mocx> GothAlice http://pastie.org/pastes/10000223/text?key=1xrmcw1nl3eweaqdyyoidq

[19:35:10] <mocx> GothAlice I just killed the pid and ran "sudo -u mongodb mongod -f /etc/mongod.conf" and now it's hanging?

[19:35:37] <GothAlice> mocx: What's the log?

[19:35:54] <GothAlice> They'd likely tell you why it's doing that.

[19:37:04] <mocx> nvm, i didn't have the fork option set

[19:38:01] <mocx> okay so now how about platform automation

[19:38:21] <GothAlice> Step 1: find out what /etc/init.d/mongod (or mongodb, or just mongo, mmm, standards) does.

[19:38:36] <GothAlice> Does it use something like start-stop-daemon to do the forking for you?

[19:38:43] <GothAlice> If so, don't have the fork option in your config.

[19:41:44] <mocx> okay thanks

[19:43:49] <pbbunny0801> so I guess I can't use the service command for mongodb right now

[19:45:09] <GothAlice> pbbunny0801: Certainly should be able to.

[19:45:39] <pbbunny0801> i mean i can do service mongod start but might be a noob question how can i have it take settings/values

[19:46:20] <GothAlice> pbbunny0801: Same as above: examine the init.d script that "service" will be calling, identify the configuration file it uses (/etc/mongod.conf, /etc/conf.d/mongod, etc.) and go from there.

[19:49:03] <mocx> GothAlice should the log file say that it's started properly?

[19:49:09] <mocx> i have verbosity set to 5

[19:50:09] <GothAlice> mocx: Generally yes. I haven't adjusted the verbosity and see something similar to the following after successful startup: [initandlisten] waiting for connections on port 27017

[19:50:19] <mocx> okay

[19:58:19] <fabiobatalha> Wy starting MongoDB with "/etc/init.d/mongod start", don't works when using an YAML config file?

[19:58:34] <fabiobatalha> the default config file is a ini file.

[19:58:58] <GothAlice> fabiobatalha: Pastebin/gist the config.

[20:00:44] <fabiobatalha> https://gist.github.com/fabiobatalha/c29aaa5359b9475c45de

[20:01:37] <fabiobatalha> it is a basic mongod.conf file.

[20:01:39] <GothAlice> fabiobatalha: The very first line of that indicates to me this is not the actual contents of the file, or if it is, it's missing its top half.

[20:02:07] <GothAlice> Additionally, disabling the journal is strongly discouraged.

[20:02:10] <fabiobatalha> now is updated.

[20:02:20] <mocx> thanks for your help GothAlice everything seems to be running smoothly

[20:03:05] <fabiobatalha> it is like: http://docs.mongodb.org/manual/reference/configuration-options/#config-file-format

[20:03:55] <GothAlice> fabiobatalha: The config looks fine except for the extraneous processManagement section (since you're not forking, that PID file path is ignored) — what's the output you get when attempting to start the service?

[20:04:06] <GothAlice> (Both console output when starting, and the contents of the mongodb log file.)

[20:04:40] <fabiobatalha> Starting mongod (via systemctl): Job for mongod.service failed. See 'systemctl status mongod.service' and 'journalctl -xn' for details.

[20:05:24] <GothAlice> fabiobatalha: Cool. Let's simplify for a moment. Run "ps aux | grep mongod" to see if it's already running.

[20:06:02] <fabiobatalha> it is not running.

[20:07:20] <GothAlice> fabiobatalha: Good. Now run: mongod -f /path/to/mongod.conf

[20:08:34] <fabiobatalha> not working

[20:08:38] <GothAlice> What's the output?

[20:08:53] <GothAlice> (This should be attempting to run mongod in the foreground.)

[20:09:41] <fabiobatalha> I think i have just figured out the problems.

[20:09:46] <fabiobatalha> just a second.

[20:09:49] <GothAlice> :)

[20:13:47] <fabiobatalha> it is a permission issue for the pid file.

[20:13:57] <fabiobatalha> thanks GothAlice!

[20:14:44] <GothAlice> Great that you've found it. :)

[20:15:35] <GothAlice> fabiobatalha: When diagnosing issues hidden by many levels of abstraction and automation, the trick is to always simplify down to the minimum complexity needed. Then, and only really then, do problems become obvious.

[21:00:29] <djam90> Hello everyone

[21:03:47] <hackel> Is anyone aware of an embedded MongoDB implementation for PHP integration testing? Looking for something like this Java solution: de.flapdoodle.embed.mongo

[21:04:01] <djam90> So I've just read mongodb.org page, and I am wondering if my organisation would benefit from using some form of No-SQL

[21:06:36] <djam90> My organisation sells cars and vans (used and new). We store vehicles, and their associated data (linking to lots of tables for things like specification etc)

[21:17:54] <girb> http://pastebin.com/K8tUHhyi

[21:18:50] <girb> please look and the end eventhough I enabled sharding by sh.enableSharding("test1.test_collection") … it still show "sharded" : false

[21:19:10] <girb> any help on this ?

[21:21:39] <mordonez> Hi guys, how can I make a query to search for objects id with in

[21:21:42] <mordonez> for example

[21:21:44] <mordonez> db["tag"].find({"$oid":{"$in":["54f483fb010000960462597e", "54ea8ecc80a9b88df9c5e344"]}})

[21:21:56] <mordonez> this gives me "Can't canonicalize query: BadValue unknown top level operator: $oid"

[21:22:33] <Boomtime> girb: 'enableSharding' only enables the ability to shard collections in the named database, now you need to shard the collection

[21:22:38] <GothAlice> mordonez: Two important points: first, ObjectIds are not strings. Best case you're doubling the storage space, worst case you're mixing the two types and all sorts of badness will ensue.

[21:22:40] <Boomtime> girb: http://docs.mongodb.org/manual/reference/method/sh.shardCollection/

[21:22:55] <GothAlice> mordonez: Second, the field name in the query does not need $. $ is for operations within a query.

[21:23:26] <mordonez> if you look here

[21:23:27] <mordonez> http://docs.mongodb.org/manual/reference/mongodb-extended-json/

[21:23:42] <mordonez> { "$oid": "<id>" }

[21:23:50] <mordonez> that appears as strict mode

[21:24:03] <GothAlice> mordonez: A MongoDB query isn't JSON, it's BSON.

[21:24:04] <girb> Boomtime: is not that automatically done based on chuck size ?

[21:24:50] <mordonez> the library I am using converts bsons to json

[21:24:51] <girb> Bootime: my DB data is 300GB more den check size i.e 64MB

[21:24:52] <mordonez> that way

[21:25:01] <Boomtime> girb: how can mongodb know how to partition your data?

[21:25:10] <Boomtime> girb: you need to tell it how

[21:25:26] <Boomtime> girb: shardCollection is the method you use to tell mongodb how to partition your data

[21:25:37] <mordonez> also I have read is possible to query that way

[21:25:54] <girb> Boomtime: ok .. will try out ..thx

[21:26:21] <Boomtime> mordonez: you need to specify the format exactly as it appears

[21:26:37] <mordonez> what you mean?

[21:26:39] <Boomtime> mordonez: $in: [ { $oid... ]

[21:26:45] <mordonez> I tried that

[21:26:49] <mordonez> but gives me another error

[21:26:50] <mordonez> hold on

[21:27:05] <Boomtime> in the shell, you should be using ObjectId anyway

[21:27:32] <Boomtime> mordonez: $in: [ ObjectId("..."), ObjectId(... ]

[21:30:07] <mordonez> I tried this

[21:30:10] <mordonez> db["tag"].find({"_id":{"$in":[{"$oid":"54f483fb010000960462597e"}]},"companyId":"54ea84cd80a9b825d6f6f309"})

[21:30:26] <mordonez> but it gives me "Can't canonicalize query: BadValue cannot nest $ under $in"

[21:32:31] <Boomtime> mordonez: congratulations, you have found a bug in the shell

[21:32:41] <Boomtime> you should use ObjectId instead

[21:33:19] <Boomtime> also, if you like, you should raise a server ticket quoting the query you tried and a pointer to the docs indicating it should permit the extended json

[21:33:34] <GothAlice> Boomtime: I never expected the JSON-encoded form to be directly accepted. :/

[21:34:09] <Boomtime> docs says it is, so there is a bug somewhere, if it's not in the shell, then it's in the docs

[21:34:33] <Boomtime> i will make a note to check up in a couple hours, if no tickets have been raised along these lines i will raise one

[21:36:31] <girb> Boomtime: http://pastebin.com/B9AXqn7J

[21:37:01] <Boomtime> mordonez: actually, the docs do cover this.. sort of: http://docs.mongodb.org/manual/reference/mongodb-extended-json/#input-in-strict-mode

[21:37:21] <girb> I did a ID based hasing on test1.test_collection .. does chuncks exist on shard0002 and shard000 ?

[21:37:41] <Boomtime> shell parses it but does not translate the type, so the server gets to see it, and handles it like an operator.. dies (correctly)

[21:38:17] <mordonez> thanks Boomtime

[21:38:46] <mordonez> I can't report it right now since I have to finish with some work, I will read more about it and report it if needed later

[21:38:51] <mordonez> thanks again Boomtime

[21:39:04] <Boomtime> mordonez: don't bother to report it, it's works as designed

[21:39:12] <Boomtime> but the docs are not very clear

[21:39:12] <mordonez> excellent

[21:39:23] <Boomtime> use ObjectId instead of $oid

[21:40:33] <girb> Boomtime: I got it thanks

[21:40:36] <Boomtime> girb: apparently your collectio is not large enough to split, in the shell, please run: db.getSiblingDB("test1").test_collection.stats()

[21:40:53] <Boomtime> girb: ok, you got it sorted?

[21:41:13] <girb> Bootime: I got it through db.test_collection.getShardDistribution() .. is of only 30MB

[21:45:20] <GothAlice> mordonez: json.loads(s, object_hook=bson.json_util.object_hook) — in Python, using PyMongo, one must decode the JSON first. There should be comparable approaches using other drivers.

[21:45:46] <mordonez> yes, I finally get it, thanks GothAlice

[22:41:56] <erewh0n> I'm trying to set up a 3-node repl cluster (pri, sec, arb) and the sec node reports "replSet error loading set config (BADCONFIG)"

[22:50:11] <Boomtime> erewh0n: can you pastebin/gist your replica-set config

[22:55:46] <erewh0n> Boomtime: sure thing: http://pastebin.com/n17L0cpe

[22:58:10] <erewh0n> I confirmed that all 3 servers can reach each other. The secondary (the one with the issue) is showing a ton of connect/disconnect activity in the log from the primary (which might just be the primary doing a check a few times every second to see if the secondary is ready).

[23:00:03] <erewh0n> the rs.status() from the primary reports "still initializing" for the secondary.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 4th of March, 2015