pmxbot IRC Log Viewer

[00:42:42] <titosantana> hey guys I'm struggling with map reduce can anyone help me out

[00:44:02] <skot> titosantana: best to just ask by posting what you have to gist/pastebin/etc.

[00:44:31] <titosantana> http://pastebin.com/NaNGCCa7

[00:44:51] <titosantana> so i pasted the map and reduce functions

[00:45:31] <titosantana> I'm trying to get the return to have a sub list of articles

[00:46:16] <skot> I think you might be better off using the aggregation framework.

[00:46:34] <titosantana> i tried

[00:46:50] <skot> Can you post a sample doc or two?

[00:46:53] <titosantana> but i have a whole bunch of logic to determine a article_rank

[00:47:21] <titosantana> sample doc in terms of results?

[00:47:45] <skot> no, source

[00:48:28] <skot> looks like your source docs are very flat, yes?

[00:49:04] <skot> also, part of your map function like count/score seem to be missing

[00:49:59] <titosantana> heres a source doc

[00:50:00] <titosantana> http://pastebin.com/NAMk1r7D

[00:50:23] <titosantana> yeah i didn't add it due to brevity

[00:50:53] <skot> and are you not able to get any result, or just not quite right results?

[00:51:07] <titosantana> not quite the right results

[00:51:37] <titosantana> id love to be able to get the same results via aggregation

[00:51:46] <skot> please post that too, along with what you really want.

[00:51:59] <skot> If you have complicated calculations then that might be a problem with aggregation

[00:52:36] <titosantana> using $regex and $cond in the grouping was getting really hard

[00:52:59] <titosantana> ok so lemme construct the optimal result

[00:58:11] <titosantana> http://pastebin.com/qxC2KFNe

[00:58:26] <titosantana> so thats what id like the return to be

[01:00:18] <skot> and what do you get now which is not right?

[01:00:52] <titosantana> i updated the map and reduce functions as well

[01:00:53] <titosantana> http://pastebin.com/D5sNqpxX

[01:01:02] <titosantana> to include the calculations

[01:02:27] <titosantana> http://pastebin.com/wZiAkrgi

[01:02:31] <titosantana> thats the results I'm geting

[01:03:04] <titosantana> so the articles aren't being set correctly

[01:03:16] <skot> your input doc doesn't seem to have players…http://pastebin.com/qxC2KFNe and your reduce function refers to things incorrectly, i think.

[01:03:34] <titosantana> sorry i grabbed a doc without one

[01:03:36] <titosantana> let me update

[01:04:02] <skot> okya

[01:04:09] <titosantana> or do u need it ?

[01:04:19] <titosantana> the player info is showing up

[01:04:52] <skot> yeah, just checking. The reduce need two forEach loops I think.

[01:05:14] <titosantana> also i want to ensure that I'm not adding dupe articles

[01:05:19] <skot> one for values, and one for articles in the value

[01:05:20] <titosantana> but wasn't certain how to do that

[01:05:40] <skot> You will need to check that when you do the reduce and/or in finalize

[01:05:41] <titosantana> actually id like to remove the documents without more than one article as well

[01:06:01] <titosantana> so in finalize how do u remove ?

[01:06:07] <skot> yes

[01:06:09] <skot> You can do that in map, no? just don't emit them.

[01:06:33] <titosantana> from map I don't know how many group

[01:07:07] <skot> ah, you don't mean in a single input doc, but in the group'd results.

[01:07:15] <titosantana> right

[01:07:44] <skot> you can use finalize to make them empty, but not to reject them.

[01:07:52] <skot> (unfortunately)

[01:08:28] <skot> I'd start with just trying to get the articles working by fixing up your reduce.

[01:08:44] <titosantana> how do you recommend I resolve that issue?

[01:08:45] <skot> then you can add finalize to clean up dups

[01:08:51] <titosantana> ok

[01:09:15] <skot> I think this is the problem line: ret.articles.push(V.article)

[01:09:30] <titosantana> so does this "value.forEach(function(V)" just refer to the grouped articles?

[01:09:45] <skot> yes

[01:09:58] <titosantana> er grouped documents sorry

[01:10:00] <skot> no, not the articles, but the value returned by reduce

[01:10:01] <skot> yes

[01:10:18] <skot> reduce returns the same object as comes in as the value (array)

[01:10:22] <titosantana> and so there could be either 1 or many V.articles?

[01:10:31] <skot> yes

[01:11:01] <skot> So just turn that into a forEach if V.articles exists

[01:11:02] <titosantana> so how do u combine them together?

[01:11:38] <titosantana> but won't V.articles always exist? since i set it in the map value

[01:12:10] <skot> if(V.articles) V.articles.forEach(function(a) {ret.articles.push(a)})

[01:12:37] <skot> I'm always in the habit of checking things before I use them.

[01:13:01] <titosantana> awesome that worked

[01:13:12] <skot> Your current line of V.article probably returned undefined

[01:13:30] <titosantana> i think it did

[01:13:39] <skot> so, you should be good now :)(

[01:14:24] <titosantana> ok so that makes sense its iterating through the articles array and dumping it to the return object

[01:15:15] <titosantana> so is it best to do all the dupe checks in finalize

[01:16:57] <titosantana> so in the finalize method u said it wasn't possible to filter out results right?

[01:17:13] <titosantana> id basically need to do that after retrieving the results?

[02:13:34] <titosantana> when you save the results from map-reduce

[02:13:45] <titosantana> is there a way to remove the _id, value ?

[02:14:03] <titosantana> i can't seem to sort on the map-reduce collection

[03:30:47] <xxtjaxx> Could you make something like an event emitter using mongodb so if a document in a collection is changed an event with a ref to the collection / document could be emitted?

[04:23:22] <ossipov> Do you use transactions in MongoDB?

[05:17:25] <asdfqwer> mongodb ruby driver questions ot here?

[05:40:41] <asdfqwer> exit

[05:52:46] <xxtjaxx> ossipov: Was that the question for me?

[06:06:03] <ossipov> xxtjaxx: that question was for community :) now we're in the proccess of choosing database for our project. We wanted to have nodejs+express+mongodb... but a friend from a big corporation suggested to use oracle or mssql as a more stable database that has transactions...

[06:27:54] <ossipov> Is there any specific time when mongodb community is active and supportive? :)

[06:38:52] <artdaw> :)

[07:27:48] <kizzx2> hey guys

[07:28:01] <kizzx2> if a replica lag exeeds "log length start to end"

[07:28:08] <kizzx2> does that mean there is no choice but to do a full resync?

[07:30:01] <rspijker> if I understand you correctly, that means that your secondary has gone stale. That is, it's in a state where it can't 'catch up' to the primary anymore, because the oplog data doesn't cover the entire timespan between the state of the primary and secondary.

[07:30:06] <rspijker> Then yeah, you have to resync

[07:30:59] <kizzx2> bummer

[07:31:06] <kizzx2> but i can't see any errormsg in rs.status()

[07:34:33] <[AD]Turbo> hola

[07:43:43] <rspijker> kizzx2: if something is wrong you should see it there

[07:43:58] <rspijker> kizzx2: how do you know for sure that something is wrong?

[07:44:21] <kizzx2> rspijker: well i keep doing db.printSlaveReplicationInfo() on primary

[07:44:31] <kizzx2> rspijker: a node's lag keeps increasing

[07:44:52] <rspijker> and it is now larger than oplog coverage?

[07:44:57] <kizzx2> yeah

[07:45:25] <kizzx2> i have 4 replicas

[07:45:41] <kizzx2> the primary has priority 100, the others have priority 0

[07:46:24] <kizzx2> replica B was behind 1 hour, db.printReplicationInfo() says the oplog can only hold 0.1hrs

[07:46:29] <kizzx2> however, it caught up alright automatically

[07:46:38] <kizzx2> replica D is behind 1.5 hours now, but it doesn't seem to be catching up

[07:57:03] <rspijker> kizzx2: db.printReplicationInfo() has a bug

[07:57:17] <rspijker> you should be seeing dates in 1970 (which should indicate to you that something is amiss)

[07:58:52] <kizzx2> o

[07:58:56] <ossipov> Do you use transactions in MongoDB?

[07:59:34] <kizzx2> rspijker: i don't see a date in 1970… (yet?)

[08:00:09] <rspijker> kizzx2: ok… that's weird

[08:00:24] <rspijker> kizzx2: let me craft a command for you that shows just the time in hours between first and last

[08:00:43] <kizzx2> rspijker: much appreciated

[08:00:57] <rspijker> kizzx2: try (db.oplog.rs.find().sort({$natural:-1}).limit(1).next().ts.t - db.oplog.rs.find().sort({$natural:1}).limit(1).next().ts.t)/3600

[08:01:08] <rspijker> that should give you the oplog coverage in hours

[08:01:18] <kizzx2> on primary?

[08:01:21] <kizzx2> or on the lagging replica

[08:01:46] <Nodex> ossipov : MongoDB doesn't have transactions

[08:02:57] <rspijker> kizzx2: on the primary

[08:03:34] <kizzx2> rspijker: ok i ran it on all 4 anyway

[08:03:41] <kizzx2> i supposed i needed to do "use local" first

[08:03:44] <kizzx2> primary: 26.1788

[08:03:45] <kizzx2> A (24 secs lag): 98.9619

[08:03:45] <kizzx2> B (876 secs lag): 98.7344

[08:03:45] <kizzx2> C (4991 secs lag): 0.024167

[08:04:11] <rspijker> kizzx2: you supposed correctly

[08:04:23] <rspijker> your primary is the one that matters. It shows it has 26 hours of oplog

[08:04:58] <rspijker> which means any secondary can run that amount behind (be down for that amount of time) and still recover by replaying the primary oplog

[08:05:08] <xxtjaxx> ossipov: Okay. Well, I personally don't like express since its not opinionated enough on how to structure which means making a big mess without guidelines is easier. MSSQL if you are running in a *Nix env. is kind of wrong and I highly doubt Sybase connectors work that well anymore(thinking 2012 SQLSrv.) Depending on the size of the project and predicted growth mongodb is fine.

[08:05:18] <kizzx2> oh i see

[08:05:30] <kizzx2> rspijker: that's useful infomration

[08:06:34] <kizzx2> ....so now i'm looping db.printSlaveReplicationInfo() on primary, it shows taht B and C's "secs ago" figure keeps increasing

[08:06:34] <xxtjaxx> ossipov: Transactions aren't necessarily what mongodb does. You have a cursor you talk to that inserts something similar to Javascript Objects or JSON data into the database. This means you have maps/objects nested aswell in your documents inside the collection. Please read the docs on how that works.

[08:07:16] <kizzx2> does that mean that's bugging and i can discard it..?

[08:07:17] <xxtjaxx> rspijker: Correct me if I'm totally off the point. And if you know better.

[08:08:28] <kizzx2> rspijker: i'm seeing this log on replica D… https://gist.github.com/kizzx2/99bb3f728107f3098b26, does that look suspcious?

[08:08:33] <rspijker> kizzx2: those figures should increase, but at some point decrease again swell, to 0...

[08:09:29] <kizzx2> rspijker: well, i meant replica C (the most lagging replica), anyways

[08:11:24] <ossipov> xxtjaxx: do you know how stable is mongodb now compared to mongodb 1.8? AFAIK it crashed with data losses...

[08:14:16] <rspijker> ossipov: mongodb is very stable now. Using it in all kinds of situations (production as well) and haven't seen any problems so far

[08:15:51] <kizzx2> rspijker: oh does that mean if "all goes well", i may suddenly discover that db.printSlaveReplicationInfo() jumps from 10000 to 0?

[08:16:50] <rspijker> yes, it says how long ago it synced, so when it eventually syncs, it will go to 0

[08:17:00] <rspijker> the figures you are seeing are *very* high though...

[08:17:01] <kizzx2> ok, fingers crossed :P

[08:17:17] <rspijker> let me have an actual look at the log you posted :P

[08:17:26] <kizzx2> yea, well i launched replica C and added it to the set an hour ago

[08:18:07] <kizzx2> it's running on EC2, so there's about an hour or so of lagging, because i needed to snapshot the EBS, launch a new instance, wait for it to settle and then add it to the set

[08:18:53] <ossipov> rspijker: have you tried to make a shop or any kind of a billing system using mongo?

[08:18:57] <Nodex> ossipov : I have never lost data with any version of MongoDB. Obviously keeping a backup is the best way to ensure no data loss

[08:20:53] <rspijker> ossipov: a billing system, yes

[08:22:03] <ossipov> rspijker: did you have any problems while making it?

[08:22:24] <rspijker> ossipov: loads, but not necessarily due to mongo ;)

[08:22:38] <Nodex> ossipov : if you need rollbacks of any sort I suggest you implement that in a database that was made to handle those sorts of things

[08:23:40] <rspijker> kizzx2: the fact that you are seeing rollbacks is a bad sign… did the replica (C) use to be a primary?

[08:23:49] <kizzx2> rspijker: no

[08:23:50] <kizzx2> well yes

[08:23:58] <kizzx2> actually, when i launched it, it primary

[08:24:11] <rspijker> ah, that's the problem

[08:24:17] <kizzx2> however, that's incorrect because i wanted the other one to be primary

[08:24:35] <kizzx2> so i did "use local; db.system.replset.remove()" on it

[08:24:37] <rspijker> ok, best option, drop the data from that secondary and have it resync

[08:24:37] <kizzx2> and tehn i restarted it

[08:25:22] <rspijker> what happened is: C was primary, things got written to it, they were not synced to the secondaries. Then C was stepped down and is now a secondary, but it has more information (the unsynced writes) than the current primary

[08:25:26] <rspijker> which is a problem...

[08:25:37] <rspijker> this is also what the rollback is used for

[08:25:39] <kizzx2> so the fact that i accidentally ran "rs.initate()" on it is causing issues? (it's included in the startup script i used, for some reason)

[08:25:42] <kizzx2> ok

[08:25:51] <kizzx2> that's weird, because i dont think C was visible to any node

[08:25:52] <rspijker> but according to the logs, no common point can be found to roll back to

[08:26:14] <kizzx2> but who knows, maybe something in the startup script inserts just one or two random records

[08:26:14] <rspijker> kizzx2: could be some small bit of config data

[08:26:17] <kizzx2> yea

[08:26:25] <rspijker> mongo doesn't distinguish, it just sees unsynched writes

[08:26:43] <kizzx2> so it feels like it'll just stall forever in this state?

[08:26:55] <rspijker> right now, yeah

[08:26:58] <kizzx2> argh

[08:27:12] <kizzx2> possible to tell it to not rollback because whatever data was written in that period, i wouldn't need them?

[08:28:11] <rspijker> how much data is actually in that secondary>?

[08:28:20] <kizzx2> about 500GB i think

[08:28:45] <kizzx2> i just snapshoted an up-to-date secondary's EBS, and then launched a new instance with it

[08:29:22] <kizzx2> so apparently somehting strange got written when it launched so it "diverged"

[08:29:29] <rspijker> then maybe do that again?

[08:29:38] <rspijker> only now without making it a primary :P

[08:29:39] <kizzx2> yeah probably, without doing rs.initate() this time, i guess

[08:38:28] <double_p> you can trigger a sync against a certain member

[08:39:27] <double_p> http://docs.mongodb.org/manual/reference/method/rs.syncFrom/

[08:42:15] <rspijker> who is that aimed at double_p ?

[09:03:38] <xxtjaxx> ossipov: I haven't used it yet in such a high demand sphere. I plan on this myself.

[09:05:19] <xxtjaxx> ossipov: failing is a harsh word given the abillity to shard the server and replicate per shard should give some sort of safety. (MongoDB Devs? Please either (1) correct my false assumption or (2) make me warm and fuzzy for being right)

[09:06:15] <xxtjaxx> ossipov: I also don't know how far you want to scale at this point in time.

[09:08:45] <Nodex> you can replaicate without sharding

[09:08:57] <Nodex> sharding simply allows separation and write scaling

[09:09:05] <Nodex> separation / isolation

[09:09:50] <kizzx2> rspijker: so i did the launching of new replica member without `rs.intiate()` on it, i guess it is mandatory to do a full resync on launch

[09:10:00] <kizzx2> i'm getting "replSet initial sync clone all databases" from the log

[09:10:24] <kizzx2> s/full resync on "proper" launch?/

[09:11:14] <rspijker> hmmm, strange… I thought you could start with a dataset

[09:11:20] <rspijker> never really tried it though, tbh :)

[10:23:59] <cllamas> hi

[10:24:00] <cllamas> :)

[10:24:04] <cllamas> anybody here?

[10:27:20] <cllamas> I cant get my mongo db working with graylog.

[10:27:21] <cllamas> I have a user graloguser and a database graylog2 but I can't get the user to log in direclty to the database

[10:27:21] <cllamas> I can login like this.

[10:27:22] <cllamas> mongo --username grayloguser --host localhost --password 123

[10:27:24] <cllamas> But I can't like this

[10:27:26] <cllamas> mongo --username grayloguser --host localhost --password 123 graylog2

[10:27:28] <cllamas> and when I login like the first option and I do

[10:27:30] <cllamas> > use graylog2

[10:27:32] <cllamas> switched to db graylog2

[10:27:34] <cllamas> > db.auth("grayloguser","123");

[10:27:35] <Nodex> can

[10:27:36] <cllamas> I get this error.

[10:27:36] <Nodex> you

[10:27:37] <Nodex> stop

[10:27:38] <Nodex> using

[10:27:38] <cllamas> Error: 18 { code: 18, ok: 0.0, errmsg: "auth fails" }

[10:27:39] <Nodex> \n

[10:27:40] <cllamas> PLEASE HELP!

[10:27:40] <Nodex> as

[10:27:42] <Nodex> punctuation

[10:27:43] <Nodex> its

[10:27:45] <Nodex> annoying

[10:27:59] <Derick> cllamas: please use a pastebin for code examples etc.

[10:28:16] <cllamas> :S sorry

[10:32:43] <rspijker> did you follow the tutorial on adding users?

[10:33:00] <rspijker> as in, first adding an admin user, then using that to add specific users to dbs?

[10:34:10] <rspijker> cllamas: check this: http://docs.mongodb.org/manual/tutorial/enable-authentication/ and the following 2 pages

[10:40:28] <cllamas> but my mongo is started from /etc/init.d/mongodb

[10:42:55] <remonvv> I love graylog's homepage. "Manage your logs in the dark and have lasers going and make it look like you're from space." and then proceed to say "used in big production deployments" only to list companies I've never even heard of.

[11:07:32] <cllamas> https://gist.github.com/anonymous/0f3eefb0aa6ca25682dd

[11:08:04] <cllamas> my user has the useradminanydatabase role, but i cannot access direclty the database with mongo --username grayloguser --host localhost --password 123 graylog2

[12:27:03] <remonvv> Anyone know if "numInitialChunks" parameter is supported for non-hashed indexes? It seems to be ignored.

[12:31:32] <rspijker> remonvv: afaik, nope

[12:33:51] <remonvv> Ah, yeah, found it in the code.

[14:24:16] <newbie35> Hi . I am using the C APi of mongodb and I want to run the command enableSharing("base_name") but it does not work with the : mongo_simple_str_command

[14:24:55] <newbie35> Can any one show me what is the good way to execute this commands in the server please ?

[14:30:08] <Routh> Hey, pretty new to MongoDB (fair warning :P ) - I'm looking at saving a new object with a many to many relationship in it. It's a business and the relationship is categories. I'm using NodeJS and Mongoose with MongoDB. I'm just trying to figure out how a save would work in this? Do I need to save to the category model or the business, or both tables?

[14:34:15] <remonvv> Well, a) Mongo is not relational so know that any sort of relationship is enforced by the application (or mongoose in this case). b) Given the fact that a) how you save many-to-many relationships (N:M) will be documented in mongoose docs most likely.

[14:34:59] <remonvv> Without mongoose you would have to pick one of the storage options based on how many items are typically on the left and right side of the relationship and so on.

[14:35:14] <remonvv> You could store IDs as an embedded array, make something similar to a ref table, etc.

[14:39:40] <newbie35> Hi . I am using the C APi of mongodb and I want to run the command enableSharing("base_name") but it does not work with the : mongo_simple_str_command

[14:39:44] <newbie35> Can any one show me what is the good way to execute this commands in the server please ?

[14:40:50] <remonvv> what is enableSharing?

[14:40:52] <remonvv> enableSharding?

[14:41:22] <remonvv> http://api.mongodb.org/csharp/1.3.1/html/4fe0d8b2-161e-ff83-355d-7517b251d81a.htm

[14:41:25] <remonvv> oops

[14:41:30] <remonvv> well, find the same for C

[14:41:33] <remonvv> that's C#

[14:43:53] <newbie35> Any idea about the way to write this commands ?

[14:44:24] <Routh> remonvv: Thanks for the response. I'll have to dig deeper into the mongo docs. I'm thinking I only have to save the relationship to one side of the many-to-many - just not sure if it should be the object or category.

[14:44:33] <Routh> *mongoose docs

[14:44:44] <newbie35> the mongos write me : command denied: { enableSharding: "base_name" }

[14:49:30] <remonvv> Routh, typically the relationship data is on the side where the relationship is heavy. Meaning if you have entities A and entities B and B tends to have a lot more relationships to A then add the id to B. If it's roughly equal AND the amount of relationships is high then use a dedicated collection that stores ID pairs

[14:51:01] <Routh> remonvv: So in my case, I would save the business to the category rather than the category to the business since a business will only have 1 - 3 cats while the categories could have hundreds of businesses?

[14:58:58] <remonvv> Right.

[14:59:20] <remonvv> In that case it's probably best (from Mongo perspective) to embed that small array of references in the document for business.

[15:00:03] <remonvv> Rather than have the ID pair collection which is probably not needed (and is a worse solution since it makes writes a two-step process or worse)

[15:36:12] <Routh> remonvv: So your approach would be a Boolean than?

[17:18:05] <hahuang65> anyone here use mongoid3 that's experiencing a shit ton of namespace queries?

[17:35:16] <titosantana> I'm trying to sort on a map-reduced collection but doesn't seem to be working

[17:35:26] <titosantana> i tried {"num_articles" : -1 }

[17:35:27] <awpti> Okay, I'm lost. How do I increment a score value inside a sub document.. and only a specific one? eg: http://pastie.org/8156258 Lets say, I only want to increment the number 35. The docs do not offer any clarity on this.

[17:35:37] <titosantana> and {"value.num_articles" : -1 }

[17:36:37] <titosantana> http://pastebin.com/magz07CM

[17:36:49] <titosantana> this is what a document looks like in the collection

[18:08:02] <tg2> anybody running mongodb inside compressed zfs? I imagine the compression rate would be substantial due to the fact key's are repeated?

[18:39:19] <titosantana> can you not sort on a map reduced collection?

[18:39:25] <titosantana> i can't seem to get it to work

[18:46:41] <t0th_-> hi

[18:47:02] <t0th_-> i am using mysql and sphinx, i am trying use mongodb, if i use mongo i don't need mysql ?

[19:43:22] <_tinman_`> So I have a question on migrating document structures in mongo.

[19:45:04] <_tinman_`> I know schema isn't enforced, but I need to embed some information from one collection into another collection's documents to avoid making two queries to the db.

[19:46:36] <_tinman_`> I was thinking of modifying each retrieved document as it was accessed. Is this a good plan and an idiomatic way of embedding related information within other documents after they are created?

[19:48:58] <_tinman_> Or should I just do the migrations up front like I would in SQL?

[19:50:51] <tg2> you mean like a foreign key in traditional relational db?

[19:51:42] <tg2> it would just be doing 2 queries anyway even if there were a way to reference an object in a foreign collection

[19:52:21] <_tinman_> yeah, so we have a set of forms and some formentries that previously were never displayed together. Now a customer wants the name of the form on the form entry csvs and reports.

[19:53:15] <_tinman_> Yes, i know. what i want to do is embed the form document within the formentry document now.

[19:53:52] <_tinman_> but i dont want to update them all at once if it will bring my db to its knees.

[19:54:55] <_tinman_> so i am asking which is the correct way to do it: run migrations on deploy or do them incrementally at document access?

[19:58:18] <_tinman_> maybe I am being unclear?

[20:00:00] <_tinman_> Existing Doc A relates to Doc B as a 1 to many. Requires a second query to get Doc A information when I find Doc A.

[20:01:13] <_tinman_> What I want to do DocB contains DocA. This will allow me to make one query and get all the information I need when I find Doc B.

[20:02:23] <t0th_-> i am using mysql and sphinx, i am trying use mongodb, if i use mongo i don't need mysql ?

[20:03:16] <_tinman_> What I want to know: is it better to go through the DocB collection, embedding the related DocA in each DocB up front, or should I do it on the first access of any DocB?

[20:07:35] <tg2> that is a questoin of your preference

[20:07:41] <tg2> it would utimately be best to have all data records updated

[20:07:48] <tg2> but if you want to upate-on-read

[20:07:49] <tg2> that could work

[20:07:57] <tg2> make sure if you are runnin gin a cluster

[20:08:06] <tg2> that you are writing to the one you're reading from

[20:08:20] <kevino> is there any way you can mark a secondary as down without talking to the primary?

[20:08:35] <tg2> you can also write a script that does it with a time delay between records so it doesn't bring the db to its knees

[20:08:41] <tg2> ideally you want all your documents updated

[20:08:42] <_tinman_> Good question.

[20:08:46] <tg2> not just the ones being read

[20:08:54] <_tinman_> Right.

[20:10:00] <_tinman_> so i could do it via, say sleep 10 in a thread batching one form at a time limit 30,000 or something?

[20:12:54] <_tinman_> Think that will work.

[20:13:01] <_tinman_> Thanks tg2

[20:13:21] <tg2> having some records updated and some old

[20:13:26] <tg2> is sort of irk inducing

[20:13:42] <_tinman_> yeah.

[20:13:44] <tg2> datasets should be as uniform as possible

[20:13:49] <tg2> its a simple batch job to write

[20:13:58] <tg2> and you can change the update rate to whatever owrks well

[20:17:58] <cpu> Hi

[20:20:23] <cpu> Anyone ever used more than 1 core with mongo? I've got a setup that should reach a million writes per minute (SSD/dozens of cores/200 byte per doc/1 secondary index), my single mongod service utilizes its 1 core to the end.

[20:30:49] <dusts66> having issues with bulk inserts into a large replica set.... anyone have experience with this?

[20:32:29] <tg2> you inserting with a write concern?

[20:33:08] <tg2> @cpu - I'm sure index updating etc is linear and cannot scale horizontally

[20:33:23] <tg2> solution could be to run more instances per node

[20:33:32] <tg2> in a distributed setup

[20:33:50] <tg2> on spinning disks this would be a concern with write i/o but on ssd it shouldn't be an issue

[20:39:35] <kali> cpu: there is an exclusive lock for write ops. so only one thread is writing at one given time

[20:49:52] <dusts66> inserting write concern currently...... trying to initially load data into a production database

[20:50:05] <tg2> if you remove write concern

[20:50:26] <tg2> would probably be better

[20:50:35] <tg2> if that is possible for your use case

[20:50:55] <tg2> sort of like disabling indexes on mysql before doing a bulk import

[20:51:22] <dusts66> Might be better for the first bulk import...... it can be turned back on, correct? This is a three member replica set btw

[20:52:59] <dusts66> disabling write concern on a replica set is not a issue for the replication process?

[20:53:09] <tg2> i mean

[20:53:18] <tg2> you insert with a write concern that says dont bother waiting to write this to disk before returning

[20:53:22] <tg2> you can also disable your indexes

[20:53:30] <tg2> or dont bother replicating to N nodes before returning

[20:53:31] <tg2> that will speed it up

[20:53:56] <dusts66> should I turn off the secondary and bulk import into the primary .... then spin up the secondaries and allow the replication to sync

[20:54:09] <tg2> no you leave the secondaries online

[20:54:15] <tg2> just make sure when you're inserting

[20:54:28] <tg2> you aren't specifying that it should be commiting to N replicasets before returning

[20:54:31] <tg2> look up write concern

[20:54:43] <tg2> also disable indexes if you can do that as it should speed up ingestion

[20:54:58] <dusts66> will do... thanks

[20:55:01] <tg2> I'm pretty sure index updates are non parallel

[20:55:11] <tg2> such a thing is, by nature, linear

[20:55:14] <dusts66> i believe they are blocking

[20:55:36] <tg2> without using some kind of index queue logic

[20:55:39] <tg2> which I don't htin kmongo has

[20:59:43] <leifw> yeah, index updates are not done in parallel

[21:00:02] <leifw> they'd need transactions for that, instead they have the write lock

[21:00:36] <tg2> you'd think with loose write commits they would have some kind of loose index pool also

[21:01:14] <leifw> not sure what you're suggesting by "loose index pool"

[21:01:15] <tg2> if you're doing away with ACID anyway, makes sense to leverage all potential gains

[21:01:25] <tg2> the index is being updated linearly as you enter dat

[21:01:27] <tg2> data *

[21:02:12] <tg2> you could create a "to do" pool for index updates

[21:02:21] <tg2> and as you insert records, append them to this pool

[21:02:39] <tg2> like a mysql insert delayed

[21:02:46] <tg2> where it could then be processed in parallel

[21:03:08] <leifw> like innodb insert buffer you mean?

[21:03:14] <tg2> like mongodb disk write buffer

[21:03:32] <leifw> they haven't gotten rid of ACID

[21:03:33] <tg2> I think innodb also has a non-parallel insert process

[21:03:47] <leifw> isolation's a little screwy but they still have C and D

[21:03:57] <leifw> which are provided by the write lock

[21:04:03] <leifw> (and journaling)

[21:04:07] <tg2> if I make a write to node a and to node b at the same time

[21:04:14] <tg2> there is no guarantee they will write in series, is there?

[21:04:31] <leifw> I don't think you can do what you're talking about unless you give up on secondary index consistency

[21:04:36] <leifw> which they definitely haven't yet given up on

[21:04:49] <leifw> (I shouldn't say "yet", I don't think they ever will)

[21:05:04] <tg2> index consistency should be maintained

[21:05:22] <tg2> there are some things that just can't be done in parallel

[21:05:43] <tg2> but they can be buffered and done in more efficient batches rather than per-write

[21:08:39] <leifw> not forever unfortunately

[21:08:51] <leifw> when your data gets larger the buffer gets less effective

[21:09:05] <leifw> it'll push that out a little bit but it's still going to get disk bound

[21:10:46] <Vile> Hi everyone! If i have a compound index, what is the best way to skip the first field?

[21:11:38] <tg2> there are other ways too, I would have to look into how mysql worked around it in the new version

[21:11:46] <tg2> or posgre

[21:11:50] <tg2> one of them has threaded writes now

[21:12:18] <tg2> was mysql 5.5

[21:12:32] <tg2> in innodb it has the background i/o threads

[21:13:10] <Vile> i.e. i have index like { "v": 1, "key": { "_id.a": 1, "_id.t": -1 }}

[21:13:57] <Vile> i need to do a query {"_id.t":{$gt:123456}}

[21:14:08] <Vile> what's the best way?

[21:17:53] <cpu> @tg2 or anyone else, where can I find a good ref on multiple instances of mongo on same machine (using nodejs native driver)

[21:18:10] <Vile> would something like { "_id.a":{$gt:0}, "_id.t":{$gt:123456} } work? (taking in account that all the "a"'s are > 0

[21:18:57] <tg2> @cpu you said you're using vm's right?

[21:19:21] <tg2> nvm you didn't

[21:20:17] <cpu> not a vm

[21:20:20] <cpu> it's an ssd

[21:20:23] <cpu> with many cores

[21:20:33] <cpu> nodejs

[21:21:29] <tg2> I imagine it would be as simple as using differnet data directories, pid directories and config directories when launching, I will look into it

[21:21:41] <tg2> each instance would have to be on a different port

[21:22:59] <Vile> Thanks guys! It actually works :)

[21:25:27] <tg2> @cpu

[21:26:34] <cpu> yeah

[21:27:01] <tg2> mongod --dbpath /var/lib/mongo_instance1 --config /etc/config_instance1.conf --port 11111 --fork --logpath /var/log/mongodb_instance1.log

[21:27:10] <cpu> I was reading a site where someone explain I should also configure in the shell something

[21:27:11] <tg2> change instance1 with instance2 and port to 11112

[21:27:33] <tg2> try it out let me know if it works

[21:27:52] <tg2> then you run them as if they're separate nodes

[21:28:25] <tg2> the other solution is to use vm's

[21:30:17] <cpu> Shoud I duplicate the config file?

[21:30:31] <tg2> i have to chekc with pid locks

[21:30:34] <tg2> if it's in teh config or not

[21:30:45] <tg2> pidfilepath

[21:30:47] <tg2> is in the confi

[21:30:50] <tg2> so you'd have to have multiple configs

[21:31:28] <cpu> yeah I got (ERROR: child process failed, exited with error number 100)

[21:31:34] <tg2> you might be able to just use separate config files with the appropriate dbpath, port and logpath in it

[21:32:26] <tg2> in your .conf

[21:33:09] <tg2> port = 27017, logpath=/var/log/mongodb/mongodb.log, dbpath=/var/lib/mongodb need to be changed

[21:33:13] <tg2> and you can just use --config then

[21:33:15] <tg2> without the other 2

[21:33:20] <tg2> 3 *

[21:33:25] <tg2> pidfilepath needs to be added also

[21:33:50] <tg2> you can add fork in there too

[21:34:18] <cpu> I'm trying

[21:34:29] <cpu> tried a different config and changed all the params to be different but still got the error 100 (still on it)

[21:34:48] <tg2> might be the mongod.lock

[21:34:55] <cpu> log says db path doesn't exist :)

[21:35:29] <tg2> do you have mongod running at all right now?

[21:35:29] <cpu> yeah, the main instance which is in use

[21:37:05] <cpu> ok

[21:37:08] <cpu> managed to launch it

[21:37:21] <tg2> ;)

[21:37:21] <cpu> now assuming It's running OK, and I got an instance (which is the main) already running with a db

[21:37:27] <cpu> how can I "extend" it to the new instance

[21:37:37] <tg2> extend = ?

[21:37:50] <tg2> you have 1 instance running with 1 collection?

[21:37:54] <tg2> it is not in a distributed setup

[21:37:55] <cpu> extend as it.. sharding maybe?

[21:38:12] <cpu> Right now I have 1 running with 4 collections and the new one I just started with nothing

[21:38:26] <tg2> http://docs.mongodb.org/manual/sharding/

[21:38:51] <tg2> http://docs.mongodb.org/manual/tutorial/add-shards-to-shard-cluster/

[21:38:58] <cpu> There's no way to let the mongo instance we just started to use the same db.. right?

[21:39:09] <cpu> thanks for the tutorial link

[21:39:21] <tg2> no

[21:39:28] <tg2> that would cause conflicts

[21:39:38] <tg2> you basically set up a sharded configuration on your single server

[21:39:45] <tg2> so that each mongod can write to its own set

[21:40:09] <tg2> this /should/ help bypass the threaded limitations

[21:40:16] <tg2> and get you some more throughput

[21:40:23] <cpu> I'm going to read this tutorial and try this setup (need to start mongos)

[21:40:44] <cpu> I'm assuming it is compatible transparently with nodejs mongo driver

[21:43:25] <joshua> you talk to mongos just like mongod

[21:43:53] <joshua> You will need a config server as well

[21:44:14] <joshua> http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

[21:44:45] <cpu> mongos --config gets a regular mongod config file?

[21:46:09] <joshua> Its similar

[21:46:17] <joshua> logpath, fork=true, port

[21:46:34] <joshua> but also configdb option to point to the config servers

[21:51:02] <joshua> All the config file /w 7

[21:51:06] <joshua> oops

[21:51:14] <joshua> damn buffer

[21:51:21] <cpu> ok.. started a config server (moving on to starting mongos)

[21:56:41] <cpu> ok, started mongos

[22:00:50] <cpu> what happens if a shard goes down?

[22:07:22] <cpu> Intersting, I get "couldn't connect to new shard socket exception [CONNECT_ERROR]" but the instances are listening according to netstat

[22:22:46] <cpu> I think I'm mixing the terms

[22:23:09] <cpu> I started a shard as replica set rs0, and did rs.initiate

[22:23:20] <cpu> the shell looks now like rs0:primary>

[22:23:47] <cpu> but when I try to addShard("rs0/hisAddress") I get a message that he's not part of rs0

[22:24:12] <cpu> I do the addShard on mongos by the way

[22:40:57] <cpu> do all replicate set members (/shards) need to know about all their "brother" shards.. why is there "members" field in the replica set

[22:51:16] <joshua> A replica set is a group of servers with the same data. Its a slightly different configuration than a single mongod

[22:52:35] <joshua> Sharding is use to scale your data horizontally so you split a collection up among more than one server/replica set

[23:20:08] <cpu> joshua - can I msg you in private?

[23:38:48] <joshua> cpu: Sorry, I'm at work and a little busy.

[23:39:50] <cpu> Never meant to be a burden :)

Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 19th of July, 2013