[00:54:10] <Guest3641> is there a config option for mongo-hadoop 1.3.0 to ignore this error
[00:54:12] <Guest3641> 11000 E11000 duplicate key error index
[00:55:03] <Guest3641> doing a big batch write of things, so can't just ignore the error on a doc-by-doc bases
[00:55:31] <Guest3641> for the same dataset in pre 1.3.0 the error never came up any ideas what changed?
[01:06:19] <Boomtime> "for the same dataset in pre 1.3.0 the error never came up any ideas what changed?" <- your dataset changed, that error is server side
[01:07:38] <Guest3641> but my dataset didn't change :(
[01:07:58] <Guest3641> i can downgrade the jar and run the exact same workflow with no problems
[01:20:21] <Guest3641> clarity i guess. i feel like maybe i'm not doing a good job of communicating the scenario
[01:20:58] <Guest3641> our application is essentially doing `collectionOfDBObjects.save` we don't really have too great a view into the internals after that
[01:21:32] <Boomtime> i am sorry, i am not familiar with that
[01:23:00] <Boomtime> you should research the hadoop connector to see if it has options for bulk inserts, in particular to ignore duplicate key errors (the affected documents would not be inserted)
[01:27:14] <Guest3641> gracias. i cloned the mongo-hadoop repo to start digging there, is there a better place to start the search?
[01:30:37] <Boomtime> not sure, i would have hoped there was docs like there are for most other drivers
[01:33:02] <Guest3641> good point, mongo java driver probably a better starting point, thanks
[01:53:16] <Guest3641> after chatting with somebody else on the team he suggested just upserting everything. any ideas on performance slow down of upsert ( with MongoUpdateStoage checking unique key) vs. MongoInsertStorage?
[01:54:06] <Guest3641> it's coming down to dev time vs. compute time, and we're leaning towards saving dev time :)
[01:54:53] <Guest3641> . . . unless it's some unholy slowdown
[01:55:04] <Boomtime> upsert and insert run roughly the same, but they have different behavior and you want to be sure you understand that change
[01:55:43] <Boomtime> both of them have to check if what is being inserted is unique, the only difference is one halts if it isn't the other overwrites
[01:57:30] <Guest3641> oh perfect. yeah the different dupes aren't doing anything useful, so overwriting is fine. upsert's definitely the way to go for us then!
[01:57:38] <Guest3641> thanks so much for all the advice!
[04:04:02] <edrocks> Boomtime: is there something like $project that doesn't require you to explicitly show each field? I just want to add one field not hide the rest and add it
[08:05:02] <Boomtime> if you are using secondary reads then you should be prepared to cull out fictional results in a sharded cluster
[08:05:32] <Boomtime> or better: always use primary reads in a sharded cluster and results will be correct
[09:26:06] <Lope> I've got a collection of country names. I'm thinking of using the country name as the _id instead of mongo's default ObjectID. I don't need a timestamp. How will performance be affected? examples of _id: 'tun', 'tur', 'tuv','twn' etc
[09:27:14] <Derick> they will take up less space than an objectid
[09:27:30] <Lope> Derick: yeah, I realized that, how about the effect on performance?
[09:28:52] <Lope> there will be a lot of similar indexes, and mongoDB's indexes might actually be faster. for example I'll have indexes like "southafrica" "south africa" "southafric" etc so when matching the indexes mongoDB might need to examine more bytes than by looking at it's hexadecimal ID's?
[09:31:51] <Lope> the second answer here kind of answers my question, with the second point, discouraging me from doing it: http://stackoverflow.com/questions/12211138/creating-custom-object-id-in-mongodb
[09:33:12] <Lope> oh, but I just realized most of my queries won't involve the _id field.
[09:33:32] <Lope> most of my performance intensive queries are just lookups
[09:43:56] <jordz> My balancer has gone heywire and won't stop..
[09:45:00] <jordz> If I restart my mongos instances, will it cause ghost chunks or any inconsistency issues?
[09:45:40] <tirengarfio> when I run test_database.Category.find() I get test_database is not defined..why? https://gist.github.com/Ziiweb/21d1eac52e0754f88455
[09:47:49] <rspijker> jordz: no. the distribution etc is all stored in the config servers.
[09:50:49] <jordz> rspijker, okay cool. I've disabled the balancer by setBalancerState(false) and also run stopBalancer which threw an exception when it was waiting for the lock
[09:51:27] <jordz> As far as I can tell, my shards are still registering chunk moves in their logs
[09:52:14] <jordz> I'm going to wait a little longer, see if it idles
[09:57:34] <jordz> I'm looking at the logs, it's showing the moveChunk data transfer progress, which I'm assuming moves 64mb as a single transfer. It's still moving more..
[09:58:03] <jordz> When you stop the balancer, does it go through the rest of the locks in the config database?
[09:58:09] <jordz> to try clear them before it stops?
[10:05:31] <tirengarfio> I have this collection: https://gist.github.com/Ziiweb/8b868a9b7d7e265f0d71 how can I get only the collection "products"?. When run db.Category.find() I get also the information about Category, but I just want the info about the embedded collections
[10:09:27] <Nodex> there is no such thing as "Embedded collections"
[10:09:55] <rspijker> tirengarfio: db.Category.find({},{products:1}) will return a cursor with documents with only the products field populated
[10:10:08] <rspijker> if you want more complicated stuff, you will have to move the aggregation framework
[10:18:47] <tirengarfio> rspijker, so there is no way to fetch all the embedded documents using find() ?
[10:19:42] <rspijker> tirengarfio: the first find I gave you does that, but every embeded document will still be wrapped inside of a document
[10:20:12] <rspijker> how would you tell them apart otherwise?
[10:20:43] <rspijker> and if you merge them, how do you do that? There are multiple options. Which is where the aggregation framework comes in as a generic wrapper to handle that kind of stuff
[10:24:50] <tirengarfio> rspijker, my issue is: I have a category with a ono to one relation (the categories could be root categories and subcategories). I would like to fetch the embeded document of the root categories, independently if they are subcategories or products, is this possible?
[10:26:42] <rspijker> tirengarfio: I’m not following. What are you querying for and what is it exactly what you want to get back?
[10:28:27] <tirengarfio> rspijker, what did you don't understand? Im querying the collection Category, to get Subcategories (Category collection) or Products.
[10:28:35] <rspijker> phycho_: secondaries will only start building indices after the primary is finished
[10:28:56] <Nodex> tirengarfio : they're called "sub documents" not sub collections
[10:29:34] <phycho_> we had a look and indexing does not appear to be taking place
[10:29:41] <rspijker> tirengarfio: I don’t know what your data looks like. You only showed us a tiny little example document. What might be obvious to you might not be obvious if you don’t know what the data model looks like
[10:33:37] <tirengarfio> rspijker, this is my data model in mongodb doctrine odm: https://gist.github.com/Ziiweb/a45e0541b0fc87020d88
[10:33:44] <tirengarfio> does this clarify anything?
[10:34:38] <Nodex> phycho_ : what did your log say?
[10:35:44] <Nodex> tirengarfio : the trouble with an ODM is that you don't get to learn the concepts of the database and how it works becuase it's all abstracted away from you which is why you're probably having trouble here
[10:35:53] <phycho_> Nodex: looking through at the moment.
[10:35:55] <rspijker> tirengarfio: yeah, a bit. I would suggest just looking into the aggregation framework. You will most likely need it to do what you want.
[10:36:10] <Nodex> you don't need to look through, it will tell you if it's indexing
[10:36:21] <Nodex> there iwll be a pecentage done message appearing
[10:36:23] <phycho_> Nodex: my logs contain a lot of chatty connect/disconnect info etc
[11:20:29] <rspijker> it just won’t redistribute chunks between shards
[12:14:57] <Lope> I just realized my code might have race conditions. I'm simultaneously looking up names in the DB, and if they don't exist I'm inserting them. I don't want to have duplicate names. would an upsert be good for this use case?
[12:22:29] <Lope> how can I do an upsert so that if the search criteria didn't exist a {created:new Date()} field get's inserted, but if it updates, it only updates lastUpdated?
[12:23:34] <Lope> Oh, I see thereis a $setOnInsert operator.
[12:33:51] <_clx> I would use the ensureIndex function to remove duplicate entries in a collection but I wonder how to setup that in a project? I mean, this function has to be called only once (while installing the app)
[12:35:55] <_clx> or is there a way to remove duplicate documents?
[12:38:42] <Lope> I think upsert with $setOnInsert is better.
[12:43:21] <jordz> Sorry rspijker, what's happening is the collection count is going up and down sporadically (there are no inserts currently running against the cluster).
[12:43:30] <jordz> I assume this is to do with the balancer moving chunks
[12:43:44] <jordz> the balancer is no longer running
[12:43:54] <jordz> but the counts are still going up and down.
[12:45:58] <aliasc> im programming a coffee bar app with nodejs express and mongodb
[12:46:35] <aliasc> iv read that denormalization is perfectly fine with mongo
[12:47:25] <aliasc> but is it ok if you put everything in a single collection
[12:48:01] <jordz> depends if your data model fits and also what kind of performance you want. Its difficult to say without knowing your data structure
[12:49:24] <Nodex> aliasc : there is only a limit to a document's size in mongodb which is currently 16mb
[12:49:31] <aliasc> well currently i have following collections, users, products, orders
[12:54:53] <Nodex> ok so there is no reason that you need to "relate" any documents together. You can keep 3 collections and embed the needed information with the order
[12:55:06] <Nodex> non-related does not mean chuck everything in one collection...
[12:55:29] <aliasc> no there is no need to relate things in fact in the orders collection the orderlist array contains all details about product
[12:55:47] <aliasc> they are embedded without referencing with ids
[12:55:53] <Nodex> exactly so I don't see the problem. You've attacked it in the right way
[12:56:20] <aliasc> well it was hard for me to think this way in the first place since im from the rdbms world
[12:56:36] <aliasc> and i find mongodb to be perfect especially the ability to alter and create collections on the fly
[12:56:41] <Nodex> everytime I think of a new thing to do I ask myself 2 questions. 1: What will I be doing more of. and 2: what's the path of least resistence
[12:57:26] <Nodex> even many moons ago when I used relational db's I used them in a non relational way because for me performance is king and joins suck
[12:58:21] <aliasc> agreed. if someone will try to attack me about consistency i say if you develop your software to destroy your data
[12:58:27] <aliasc> its not mongodbs fault its your fault
[13:00:19] <Nodex> correct. Consistency as far as inserts go is an app problem, consistency as fasr as making that data available on X nodes is a database problem
[13:00:29] <Nodex> at some point in the past those two got merged
[13:01:35] <aliasc> i have a problem though. my application will work as executable on windows
[13:02:06] <aliasc> i made c# program with embeded chromium browser which will launch node first then mongodb then point to localhost:port
[13:02:26] <aliasc> it runs smoothly but when the app needs to be closed so the mongodb needs to
[13:02:56] <aliasc> and i cant find a perfect way to do this instead im using a workaround i created a js file which is thrown to the mongo shell
[13:10:23] <aliasc> when node is terminated will be mongod.exe be closed safely ?
[13:11:59] <Nodex> it will be the same as command + c on mongod I imagine
[13:13:09] <aliasc> sometimes when the app closes unexpectedly say power loss mongodb creates lock file and the next time the app launches mongodb will not
[13:13:20] <aliasc> im trying to find good way to handle this
[13:19:41] <Nodex> matheuslc : a table is more like a collection than a document
[13:29:30] <jonjon> if write + read is set to be on master only in a replica set I would think adding to the replica set does not actually help with scaling?
[13:29:53] <jonjon> does it some how help with cpu overload on the master?
[13:32:14] <jonjon> seems more helpful in crash recovery
[13:41:37] <jordz> If a primary steps down during writes and a secondary takes over, will anything in the old primaries (which is now a secondary) oplog copy over to the new primary?
[14:40:35] <skot> There will be write errors on the client and the secondary which takes over may or may not have all the writes recorded on the primary, a wait period is allowed but depending on the secondaries progress it might not be long enough.
[14:41:14] <jordz> I saw the write errors, client side, which are rewritten. I'm more concerned about the oplog
[14:41:18] <skot> Generally a write concern of majority is suggested if you don't want a write to rollback.
[14:41:55] <jordz> writes just go back to the queue if they fail.
[14:44:16] <skot> That means the write will not be acknowledged until a majority of the nodes have it, and then when one of them becomes primary it will have it.
[14:45:14] <skot> Now, this model does allow for cases when the write is actually done, but the client gets an error like a network issue and the write has not failed nor has its success been communicated back to the client.
[14:45:41] <skot> As long as your client can deal with all error cases, that should be fine.
[14:45:47] <jordz> If the oplog on a primary is say 1 minute behind, and it steps down, will it's writes still be propagated to the rest of the replica set.
[15:51:42] <vrkid> hi, I need help setting up mongodb, authentication to an external LDAP server. I tried following the tutorials, but I keep failing setting up authentication. Anyone here can help?
[16:06:48] <uehtesham90> during aggregation, we do group after a match....but can we do a match after a grouping?
[16:07:35] <Derick> uehtesham90: sure, it won't use an index then though
[16:08:19] <uehtesham90> i may need the index but regardless...ur saying in the same aggregation query i can first do the grouping and then do matching on it
[20:10:29] <codezomb> would someone mind taking a look at this error? I'm trying to use mongodb with mongoid, and rails. https://gist.github.com/codezomb/1f1dcb0820f387dec2a3
[20:11:34] <codezomb> I've deleted, and recreated the db and this happens everytime. It was working fine last night, but I cannot seem to find anything that changed.
[20:12:43] <codezomb> it also works fine if I hook it up to an external mongodb host, like mongohq
[20:22:34] <skot> You have tried to push to a field named "", which is not valid: https://gist.github.com/codezomb/1f1dcb0820f387dec2a3#file-error-txt-L15
[20:23:02] <skot> You need to include a field name to push that embedded document into.
[20:24:48] <codezomb> skot: the interesting thing is this works if I switch to another mongodb installation (via mongohq)
[20:25:19] <skot> hard to say, but that update is misformed, do you depend on data which is different on the servers?
[20:25:42] <skot> Anyway, that update is not value without a field name.
[20:25:57] <codezomb> nope, local installation is a fresh install. remote installation is a clean database.
[20:26:09] <codezomb> I seed them both with the same script :/
[20:26:33] <codezomb> well, thanks for taking a look
[20:26:34] <skot> Here is a shell command you can run to show the same error: db.a.update({}, {$push:{"":1}} )
[20:26:56] <skot> just make sure there is a doc in the collection to update, otherwise it won't do any work.
[20:27:15] <skot> or no, guess it parses that before the query.
[22:07:40] <atrigent> hey guys, I'm pretty sure this is a longshot, but is there anything else I can do to optimize this query: db.annotations.aggregate([{ $group: { _id: '$work_id', count: { $sum: 1 } } }])? I have an index on work_id, but the docs claim that it won't be used
[22:08:38] <atrigent> the annotations collection has a little over 8000 documents
[22:11:00] <atrigent> it sorta seems like an index would help with this query, but the docs say that $group doesn't use indexes
[22:14:34] <atrigent> hmm the server version is 2.4.10
[23:39:13] <Boomtime> @atrigent, the aggregation you posted is a complete collection scan, the server would need to load every single document
[23:40:22] <Boomtime> if the collection is larger than RAM then performance will be limited by hard-drive speed
[23:41:03] <atrigent> I realize that that would be the naive way to compute it, but couldn't an index help in this case? I realize that mongo might not actually implement that, but couldn't it theoretically help?
[23:47:13] <atrigent> the SQL statement to get the same result in the shitty old mysql database that this data is being migrated from is almost instantaneous, and it even involves a join :-/
[23:55:45] <Boomtime> MongoDB != SQL, really, they are not related in any way, if you bring your SQL knowledge to MongoDB you are going to experience a lot of pain
[23:57:28] <atrigent> it's not like I'm trying to do a join in mongo or anything
[23:59:24] <atrigent> I'm just trying to use the aggregation framework that mongo provides
[23:59:31] <Boomtime> you are trying to do all your processing after the fact, something that SQL was designed for - MongoDB is storage-oriented, it provides ways to obtain information already in the format you want, with the layout you want
[23:59:47] <Boomtime> sure, but what you're asking of it is naive