PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 26th of September, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:54:10] <Guest3641> is there a config option for mongo-hadoop 1.3.0 to ignore this error
[00:54:12] <Guest3641> 11000 E11000 duplicate key error index
[00:55:03] <Guest3641> doing a big batch write of things, so can't just ignore the error on a doc-by-doc bases
[00:55:31] <Guest3641> for the same dataset in pre 1.3.0 the error never came up any ideas what changed?
[01:06:19] <Boomtime> "for the same dataset in pre 1.3.0 the error never came up any ideas what changed?" <- your dataset changed, that error is server side
[01:07:38] <Guest3641> but my dataset didn't change :(
[01:07:58] <Guest3641> i can downgrade the jar and run the exact same workflow with no problems
[01:08:15] <Guest3641> same input data and all
[01:08:18] <Boomtime> then there was a bug in 1.3.0 that was ignoring the error
[01:08:43] <Boomtime> the error is server-side, it absolutely is a problem, the documents affected were not inserted
[01:09:24] <Guest3641> i see, so for our usecase, that's totally okay. is there a way to just ignore that error?
[01:09:44] <Guest3641> either on the mongo server or the hadoop/mongo config?
[01:10:24] <Boomtime> the server cannot ignore the error, you are asking it to do something impossible, you can ignore the error on the client though
[01:10:51] <Boomtime> however, i do not explicitly know how to do that in hadoop
[01:12:24] <Guest3641> yeah, so spark is just doing a bulk store of the rdd and either all works or breaks with that. . .
[01:12:42] <Guest3641> hrm . . . so pretty much i'm just shit outta luck getting the pre 1.3.0 behavior back?
[01:14:38] <Boomtime> "either all works or breaks with that" <- that is unlikely to be the behavior
[01:15:13] <Boomtime> if you are using the bulk insert api then you can instruct it to plow on with doing the remaining inserts despite errors on some
[01:15:44] <Boomtime> if you are using the "old style" array insert that most drivers implemented, then it halts on the first error
[01:15:55] <Boomtime> but that means it stops midway
[01:16:17] <Boomtime> all the earlier inserts that succeeded will be there
[01:19:00] <Guest3641> would a stack trace be helpful?
[01:19:49] <Boomtime> helpful for what?
[01:20:21] <Guest3641> clarity i guess. i feel like maybe i'm not doing a good job of communicating the scenario
[01:20:58] <Guest3641> our application is essentially doing `collectionOfDBObjects.save` we don't really have too great a view into the internals after that
[01:21:18] <Guest3641> from spark if that matters
[01:21:23] <Guest3641> v 1.1
[01:21:32] <Boomtime> i am sorry, i am not familiar with that
[01:23:00] <Boomtime> you should research the hadoop connector to see if it has options for bulk inserts, in particular to ignore duplicate key errors (the affected documents would not be inserted)
[01:27:14] <Guest3641> gracias. i cloned the mongo-hadoop repo to start digging there, is there a better place to start the search?
[01:30:37] <Boomtime> not sure, i would have hoped there was docs like there are for most other drivers
[01:33:02] <Guest3641> good point, mongo java driver probably a better starting point, thanks
[01:53:16] <Guest3641> after chatting with somebody else on the team he suggested just upserting everything. any ideas on performance slow down of upsert ( with MongoUpdateStoage checking unique key) vs. MongoInsertStorage?
[01:54:06] <Guest3641> it's coming down to dev time vs. compute time, and we're leaning towards saving dev time :)
[01:54:53] <Guest3641> . . . unless it's some unholy slowdown
[01:55:04] <Boomtime> upsert and insert run roughly the same, but they have different behavior and you want to be sure you understand that change
[01:55:43] <Boomtime> both of them have to check if what is being inserted is unique, the only difference is one halts if it isn't the other overwrites
[01:57:30] <Guest3641> oh perfect. yeah the different dupes aren't doing anything useful, so overwriting is fine. upsert's definitely the way to go for us then!
[01:57:38] <Guest3641> thanks so much for all the advice!
[01:57:59] <Boomtime> cheers
[02:57:50] <edrocks> is there any way to sort by an arrays length?
[03:35:49] <Boomtime> @edrocks: using aggregation, yes, you could sort by array length
[03:37:12] <edrocks> Boomtime: I got that working but the problem now is I still need to run a $geowithin query before I do that
[03:38:09] <edrocks> here is my query http://pastebin.com/ZXtLsD1K
[03:40:44] <edrocks> I think I might just add a counter field and +/- it when I update
[03:41:18] <edrocks> but I already have a bunch of data so if I do that I need to make something to set that field
[03:51:11] <edrocks> o never mind got it working
[03:57:39] <Boomtime> you're welcome
[04:04:02] <edrocks> Boomtime: is there something like $project that doesn't require you to explicitly show each field? I just want to add one field not hide the rest and add it
[04:08:57] <Boomtime> that sounds like $group
[07:55:58] <jordz> What do I do if I think / know the count is wrong on collections that are sharded?
[07:56:28] <Boomtime> you discover that count() is an appoximation on sharded collections
[07:59:04] <Boomtime> @jordz: https://jira.mongodb.org/browse/SERVER-3645
[08:02:28] <jordz> Boomtime: Yeah I knew it was an approximation but, it's gone up by 1.5 million on 7 collections
[08:02:45] <Boomtime> @jordz: https://jira.mongodb.org/browse/SERVER-3645
[08:03:07] <jordz> thanks
[08:04:24] <Boomtime> also: https://jira.mongodb.org/browse/SERVER-5931
[08:05:02] <Boomtime> if you are using secondary reads then you should be prepared to cull out fictional results in a sharded cluster
[08:05:32] <Boomtime> or better: always use primary reads in a sharded cluster and results will be correct
[09:26:06] <Lope> I've got a collection of country names. I'm thinking of using the country name as the _id instead of mongo's default ObjectID. I don't need a timestamp. How will performance be affected? examples of _id: 'tun', 'tur', 'tuv','twn' etc
[09:27:14] <Derick> they will take up less space than an objectid
[09:27:30] <Lope> Derick: yeah, I realized that, how about the effect on performance?
[09:28:52] <Lope> there will be a lot of similar indexes, and mongoDB's indexes might actually be faster. for example I'll have indexes like "southafrica" "south africa" "southafric" etc so when matching the indexes mongoDB might need to examine more bytes than by looking at it's hexadecimal ID's?
[09:31:51] <Lope> the second answer here kind of answers my question, with the second point, discouraging me from doing it: http://stackoverflow.com/questions/12211138/creating-custom-object-id-in-mongodb
[09:33:12] <Lope> oh, but I just realized most of my queries won't involve the _id field.
[09:33:32] <Lope> most of my performance intensive queries are just lookups
[09:33:42] <Lope> (and not the _id)
[09:35:10] <Lope> okay, I'm gonna do it.
[09:43:56] <jordz> My balancer has gone heywire and won't stop..
[09:45:00] <jordz> If I restart my mongos instances, will it cause ghost chunks or any inconsistency issues?
[09:45:40] <tirengarfio> when I run test_database.Category.find() I get test_database is not defined..why? https://gist.github.com/Ziiweb/21d1eac52e0754f88455
[09:46:11] <Nodex> db.Cateogry.find()
[09:46:30] <tirengarfio> Nodex, thanks
[09:46:32] <Nodex> ;)
[09:46:46] <Nodex> typo in my collection name ^^
[09:47:49] <rspijker> jordz: no. the distribution etc is all stored in the config servers.
[09:50:49] <jordz> rspijker, okay cool. I've disabled the balancer by setBalancerState(false) and also run stopBalancer which threw an exception when it was waiting for the lock
[09:51:27] <jordz> As far as I can tell, my shards are still registering chunk moves in their logs
[09:52:14] <jordz> I'm going to wait a little longer, see if it idles
[09:54:00] <rspijker> patience is usually good :)
[09:57:34] <jordz> I'm looking at the logs, it's showing the moveChunk data transfer progress, which I'm assuming moves 64mb as a single transfer. It's still moving more..
[09:58:03] <jordz> When you stop the balancer, does it go through the rest of the locks in the config database?
[09:58:09] <jordz> to try clear them before it stops?
[10:05:31] <tirengarfio> I have this collection: https://gist.github.com/Ziiweb/8b868a9b7d7e265f0d71 how can I get only the collection "products"?. When run db.Category.find() I get also the information about Category, but I just want the info about the embedded collections
[10:09:27] <Nodex> there is no such thing as "Embedded collections"
[10:09:55] <rspijker> tirengarfio: db.Category.find({},{products:1}) will return a cursor with documents with only the products field populated
[10:10:08] <rspijker> if you want more complicated stuff, you will have to move the aggregation framework
[10:10:22] <rspijker> *move=use
[10:18:47] <tirengarfio> rspijker, so there is no way to fetch all the embedded documents using find() ?
[10:19:42] <rspijker> tirengarfio: the first find I gave you does that, but every embeded document will still be wrapped inside of a document
[10:20:12] <rspijker> how would you tell them apart otherwise?
[10:20:43] <rspijker> and if you merge them, how do you do that? There are multiple options. Which is where the aggregation framework comes in as a generic wrapper to handle that kind of stuff
[10:24:50] <tirengarfio> rspijker, my issue is: I have a category with a ono to one relation (the categories could be root categories and subcategories). I would like to fetch the embeded document of the root categories, independently if they are subcategories or products, is this possible?
[10:26:42] <rspijker> tirengarfio: I’m not following. What are you querying for and what is it exactly what you want to get back?
[10:26:56] <phycho_> lo all
[10:27:08] <rspijker> tirengarfio: perhaps a more verbose example would help
[10:27:10] <phycho_> the developers have just imported a large dataset (6GB) into our primary mongodb instance
[10:27:20] <phycho_> and have now added indexing
[10:27:49] <phycho_> seems to have very high I/O on our primary and next to no activity on the secondaries
[10:27:52] <phycho_> is this normal?
[10:28:27] <tirengarfio> rspijker, what did you don't understand? Im querying the collection Category, to get Subcategories (Category collection) or Products.
[10:28:31] <Nodex> until it's indexed yes
[10:28:35] <rspijker> phycho_: secondaries will only start building indices after the primary is finished
[10:28:56] <Nodex> tirengarfio : they're called "sub documents" not sub collections
[10:29:34] <phycho_> we had a look and indexing does not appear to be taking place
[10:29:41] <rspijker> tirengarfio: I don’t know what your data looks like. You only showed us a tiny little example document. What might be obvious to you might not be obvious if you don’t know what the data model looks like
[10:29:52] <Nodex> tail your logs
[10:29:56] <rspijker> phycho_: looks can be deceiving
[10:30:00] <phycho_> haha
[10:30:02] <Nodex> my gues is that it's indexing
[10:30:09] <phycho_> is there any sane way to find out if it _really_ is indexing?
[10:30:12] <rspijker> also, if it is done, maybe you just have primaryPrefered on your ops?
[10:30:30] <Nodex> yes, tail your logs
[10:30:36] <phycho_> k
[10:30:46] <Nodex> tail -f /path/to/mongod.log
[10:31:15] <rspijker> it should also show up in db.currentOp(), I think...
[10:31:42] <phycho_> nothing in db.currentOp()
[10:33:37] <tirengarfio> rspijker, this is my data model in mongodb doctrine odm: https://gist.github.com/Ziiweb/a45e0541b0fc87020d88
[10:33:44] <tirengarfio> does this clarify anything?
[10:34:38] <Nodex> phycho_ : what did your log say?
[10:35:44] <Nodex> tirengarfio : the trouble with an ODM is that you don't get to learn the concepts of the database and how it works becuase it's all abstracted away from you which is why you're probably having trouble here
[10:35:53] <phycho_> Nodex: looking through at the moment.
[10:35:55] <rspijker> tirengarfio: yeah, a bit. I would suggest just looking into the aggregation framework. You will most likely need it to do what you want.
[10:36:10] <Nodex> you don't need to look through, it will tell you if it's indexing
[10:36:21] <Nodex> there iwll be a pecentage done message appearing
[10:36:23] <phycho_> Nodex: my logs contain a lot of chatty connect/disconnect info etc
[10:36:23] <rspijker> anyway, lunchtimes!
[10:36:25] <Nodex> will*
[10:36:33] <phycho_> which floods the logs
[10:36:47] <rspijker> I’m like 99% sure it should be in currentOp anyway
[10:36:56] <rspijker> are you not just doing mostly writes?
[10:37:03] <rspijker> or reads with primaryPrefered?
[10:37:43] <Nodex> [11:32:58] <Nodex> you don't need to look through, it will tell you if it's indexing
[10:37:43] <Nodex> [11:33:09] <Nodex> there iwll be a pecentage done message appearing
[10:49:33] <phycho_> got it.
[10:49:46] <phycho_> seems that the reason it was performing poorly was our developers had not put indexes in to begin with..
[10:49:54] <phycho_> now that they are all in place its exactly how I would expect
[10:55:38] <Nodex> LOL, gotta love those devs
[10:58:47] <phycho_> its OK, I expect to hand in my notice today anyway ;)
[11:15:35] <jordz> When the balancer has stopped running, should the counts on collections still be fluctuating? rspijker
[11:17:52] <rspijker> ‘the counts’?
[11:18:09] <jordz> sorry yeah, when I run db.collection.count()
[11:18:24] <jordz> or even db.stats()
[11:20:02] <rspijker> disabling balancing doesn’t lock the db
[11:20:07] <rspijker> so you can still insert documents
[11:20:15] <rspijker> which affects the count
[11:20:29] <rspijker> it just won’t redistribute chunks between shards
[12:14:57] <Lope> I just realized my code might have race conditions. I'm simultaneously looking up names in the DB, and if they don't exist I'm inserting them. I don't want to have duplicate names. would an upsert be good for this use case?
[12:21:36] <Nodex> yes
[12:22:01] <Nodex> and or findAndModify
[12:22:29] <Lope> how can I do an upsert so that if the search criteria didn't exist a {created:new Date()} field get's inserted, but if it updates, it only updates lastUpdated?
[12:23:34] <Lope> Oh, I see thereis a $setOnInsert operator.
[12:32:55] <_clx> hi
[12:33:51] <_clx> I would use the ensureIndex function to remove duplicate entries in a collection but I wonder how to setup that in a project? I mean, this function has to be called only once (while installing the app)
[12:35:55] <_clx> or is there a way to remove duplicate documents?
[12:38:42] <Lope> I think upsert with $setOnInsert is better.
[12:42:56] <aliasc> hello mongo
[12:43:21] <jordz> Sorry rspijker, what's happening is the collection count is going up and down sporadically (there are no inserts currently running against the cluster).
[12:43:30] <jordz> I assume this is to do with the balancer moving chunks
[12:43:44] <jordz> the balancer is no longer running
[12:43:54] <jordz> but the counts are still going up and down.
[12:43:58] <jordz> down currently
[12:44:17] <jordz> which would make sense because each collection count seems to have 1 million more documents in it than it should
[12:44:26] <jordz> if I do .explain().n I get the correct count
[12:44:41] <aliasc> is there a limit to document count in a collection ?
[12:44:52] <jordz> aliasc, no.
[12:44:55] <jordz> unless its capped
[12:45:58] <aliasc> im programming a coffee bar app with nodejs express and mongodb
[12:46:35] <aliasc> iv read that denormalization is perfectly fine with mongo
[12:47:25] <aliasc> but is it ok if you put everything in a single collection
[12:48:01] <jordz> depends if your data model fits and also what kind of performance you want. Its difficult to say without knowing your data structure
[12:49:24] <Nodex> aliasc : there is only a limit to a document's size in mongodb which is currently 16mb
[12:49:31] <aliasc> well currently i have following collections, users, products, orders
[12:50:00] <aliasc> products have name price tax
[12:50:09] <aliasc> users have username password name
[12:50:30] <aliasc> and orders have ordertotal taxtotal date and array of selected products
[12:51:13] <Nodex> aliasc : what is the question?
[12:51:21] <aliasc> im just trying to not think of relational model while modeling my data in mongodb
[12:51:28] <aliasc> and im asking if im doing it ok
[12:51:37] <Nodex> it's better to think of what you'll be doing the most
[12:51:39] <aliasc> with this approach or is there a better way how its done in mongo
[12:51:41] <whaley> aliasc: coffee bar app?
[12:51:44] <Nodex> reading or writing...
[12:52:00] <aliasc> restaurant coffee software
[12:52:52] <aliasc> mostly writing since the app is configured once in a while
[12:53:00] <aliasc> orders need to be placed quickly
[12:53:21] <aliasc> for example the admin will configure the app and products and then just place orders
[12:54:01] <Nodex> do things often change in the orders between order and shipping?
[12:54:01] <aliasc> im just asking if the data model is ok because i still think in relational model world
[12:54:23] <aliasc> no they dont except if an order or an item in the order is placed by mistake
[12:54:27] <aliasc> it can be removed
[12:54:53] <Nodex> ok so there is no reason that you need to "relate" any documents together. You can keep 3 collections and embed the needed information with the order
[12:55:06] <Nodex> non-related does not mean chuck everything in one collection...
[12:55:29] <aliasc> no there is no need to relate things in fact in the orders collection the orderlist array contains all details about product
[12:55:35] <aliasc> like name price tax
[12:55:47] <aliasc> they are embedded without referencing with ids
[12:55:53] <Nodex> exactly so I don't see the problem. You've attacked it in the right way
[12:56:20] <aliasc> well it was hard for me to think this way in the first place since im from the rdbms world
[12:56:36] <aliasc> and i find mongodb to be perfect especially the ability to alter and create collections on the fly
[12:56:41] <Nodex> everytime I think of a new thing to do I ask myself 2 questions. 1: What will I be doing more of. and 2: what's the path of least resistence
[12:57:26] <Nodex> even many moons ago when I used relational db's I used them in a non relational way because for me performance is king and joins suck
[12:58:21] <aliasc> agreed. if someone will try to attack me about consistency i say if you develop your software to destroy your data
[12:58:27] <aliasc> its not mongodbs fault its your fault
[13:00:19] <Nodex> correct. Consistency as far as inserts go is an app problem, consistency as fasr as making that data available on X nodes is a database problem
[13:00:29] <Nodex> at some point in the past those two got merged
[13:01:35] <aliasc> i have a problem though. my application will work as executable on windows
[13:02:06] <aliasc> i made c# program with embeded chromium browser which will launch node first then mongodb then point to localhost:port
[13:02:26] <aliasc> it runs smoothly but when the app needs to be closed so the mongodb needs to
[13:02:56] <aliasc> and i cant find a perfect way to do this instead im using a workaround i created a js file which is thrown to the mongo shell
[13:02:58] <Nodex> winblows :/
[13:03:05] <aliasc> the js file contains the commands to shut down mongodb
[13:03:46] <aliasc> this may or may not work 100% of the time
[13:03:51] <Nodex> db.shutdownServer({timeoutSecs: 60}); ?
[13:03:55] <aliasc> yes
[13:04:06] <Nodex> why cna't you just send it as a command?
[13:04:45] <aliasc> because the c# form is completely separated from the app in fact its just a form with chromium browser embeded
[13:05:06] <aliasc> it launches node and mongodb with shell command and then points to localhost
[13:05:25] <Nodex> embedd the c# driver?
[13:05:33] <aliasc> there is no way to access mongodb from the c# form itself except if i include the c# driver
[13:05:45] <aliasc> which is not logical instead i could use C# for the whole app like this
[13:05:55] <aliasc> but i wanted to take advantage of html5 and css3 for styling the app
[13:06:26] <Nodex> does the browser not notify nodejs when it's being closed?
[13:06:36] <Nodex> or why don;'t you just start mongo from node?
[13:06:47] <aliasc> on form close event node is closed fine
[13:06:57] <aliasc> but i could not just kill mongodb
[13:07:06] <Nodex> start it from nodejs then
[13:07:13] <Nodex> then it will die when node does :)
[13:07:32] <aliasc> start the mongod.exe from node ?
[13:07:36] <Nodex> yes why not
[13:07:43] <aliasc> i havent thinked of it
[13:07:46] <aliasc> i will try it out
[13:09:04] <Nodex> probably need some module but you can certainly run mongo in the foreground no problems
[13:09:11] <Nodex> (on linux at least)
[13:10:23] <aliasc> when node is terminated will be mongod.exe be closed safely ?
[13:11:59] <Nodex> it will be the same as command + c on mongod I imagine
[13:13:09] <aliasc> sometimes when the app closes unexpectedly say power loss mongodb creates lock file and the next time the app launches mongodb will not
[13:13:20] <aliasc> im trying to find good way to handle this
[13:13:46] <aliasc> windows sucks
[13:14:06] <aliasc> for developers :D
[13:17:11] <Nodex> have your app check the existance of the lock at start up before it starts mongo
[13:17:16] <Nodex> if it's there then remove it :)
[13:17:28] <Derick> if it is there, *and* mongodb isn't running
[13:17:31] <Derick> you should check that
[13:17:58] <Nodex> "he lock at start up before it starts mongo"
[13:18:10] <Nodex> before it starts....
[13:18:19] <matheuslc> Guys, In relation model, I have a post and comments table, in mongodb, this will be a single document?
[13:18:43] <Derick> depends on how many comments really
[13:18:48] <Derick> and how often they're written
[13:19:41] <Nodex> matheuslc : a table is more like a collection than a document
[13:29:30] <jonjon> if write + read is set to be on master only in a replica set I would think adding to the replica set does not actually help with scaling?
[13:29:53] <jonjon> does it some how help with cpu overload on the master?
[13:32:14] <jonjon> seems more helpful in crash recovery
[13:41:37] <jordz> If a primary steps down during writes and a secondary takes over, will anything in the old primaries (which is now a secondary) oplog copy over to the new primary?
[14:40:35] <skot> There will be write errors on the client and the secondary which takes over may or may not have all the writes recorded on the primary, a wait period is allowed but depending on the secondaries progress it might not be long enough.
[14:41:14] <jordz> I saw the write errors, client side, which are rewritten. I'm more concerned about the oplog
[14:41:18] <skot> Generally a write concern of majority is suggested if you don't want a write to rollback.
[14:41:55] <jordz> writes just go back to the queue if they fail.
[14:41:59] <jordz> but anything that's written
[14:42:16] <skot> I'm not sure what queue you are talking about. On the client side?
[14:42:45] <jordz> Yeah, there's a queue client side that writes data to the db, if the writes fail another worker will pick it up and try again
[14:42:50] <skot> As I said, it depends on what you consider a "successful write".
[14:43:28] <skot> If you want it to survive fail-overs and be consistent in the whole replica set you need a majority write concern.
[14:43:41] <jordz> Ahh
[14:43:43] <jordz> O see
[14:43:46] <jordz> I*
[14:44:16] <skot> That means the write will not be acknowledged until a majority of the nodes have it, and then when one of them becomes primary it will have it.
[14:45:14] <skot> Now, this model does allow for cases when the write is actually done, but the client gets an error like a network issue and the write has not failed nor has its success been communicated back to the client.
[14:45:41] <skot> As long as your client can deal with all error cases, that should be fine.
[14:45:47] <jordz> If the oplog on a primary is say 1 minute behind, and it steps down, will it's writes still be propagated to the rest of the replica set.
[14:46:00] <skot> In short, it depends.
[14:46:12] <skot> Most likely, but not required.
[14:46:36] <skot> if that was a requirement then you couldn't have a primary for a possibly very long time, during transition.
[14:46:50] <jordz> I see
[14:47:16] <skot> (or indefinitely depending on the write load and transition behavior)
[14:47:39] <jordz> It was a graceful step down I guess.
[14:47:57] <skot> The docs cover this pretty well, and I'd suggest reading them and filing issues with any needed clarifications
[14:48:16] <skot> http://docs.mongodb.org/manual/core/write-concern/
[14:48:25] <skot> http://docs.mongodb.org/manual/core/replica-set-write-concern/
[14:48:52] <skot> http://docs.mongodb.org/manual/core/replica-set-write-concern/#modify-default-write-concern
[14:49:17] <skot> the "report a problem" link at the bottom.
[14:54:22] <jordz> thanks skot, appreciate it
[15:51:42] <vrkid> hi, I need help setting up mongodb, authentication to an external LDAP server. I tried following the tutorials, but I keep failing setting up authentication. Anyone here can help?
[16:06:48] <uehtesham90> during aggregation, we do group after a match....but can we do a match after a grouping?
[16:07:35] <Derick> uehtesham90: sure, it won't use an index then though
[16:07:44] <uehtesham90> really?
[16:07:59] <Derick> you can do 20 matches and 14 groups, 4 unwinds and 2 sorts
[16:08:04] <Derick> whatever, it doesn't matter
[16:08:19] <uehtesham90> i may need the index but regardless...ur saying in the same aggregation query i can first do the grouping and then do matching on it
[16:08:20] <uehtesham90> ?
[16:08:32] <Derick> uehtesham90: sure
[16:08:48] <_willermo_> how can i make a query to join this documents??
[16:08:52] <uehtesham90> ok ill give it a try :)
[16:33:18] <jordz> When creating an empty collection with a hashed shard key, shouldn't it spread some chunks across multiple shards by default?
[16:33:28] <jordz> that's how interpret the documentation
[16:33:33] <jordz> I interpretted*
[20:10:29] <codezomb> would someone mind taking a look at this error? I'm trying to use mongodb with mongoid, and rails. https://gist.github.com/codezomb/1f1dcb0820f387dec2a3
[20:11:34] <codezomb> I've deleted, and recreated the db and this happens everytime. It was working fine last night, but I cannot seem to find anything that changed.
[20:12:43] <codezomb> it also works fine if I hook it up to an external mongodb host, like mongohq
[20:22:34] <skot> You have tried to push to a field named "", which is not valid: https://gist.github.com/codezomb/1f1dcb0820f387dec2a3#file-error-txt-L15
[20:23:02] <skot> You need to include a field name to push that embedded document into.
[20:24:04] <skot> codezomb: ---^
[20:24:48] <codezomb> skot: the interesting thing is this works if I switch to another mongodb installation (via mongohq)
[20:25:19] <skot> hard to say, but that update is misformed, do you depend on data which is different on the servers?
[20:25:42] <skot> Anyway, that update is not value without a field name.
[20:25:57] <codezomb> nope, local installation is a fresh install. remote installation is a clean database.
[20:26:09] <codezomb> I seed them both with the same script :/
[20:26:33] <codezomb> well, thanks for taking a look
[20:26:34] <skot> Here is a shell command you can run to show the same error: db.a.update({}, {$push:{"":1}} )
[20:26:56] <skot> just make sure there is a doc in the collection to update, otherwise it won't do any work.
[20:27:15] <skot> or no, guess it parses that before the query.
[22:07:40] <atrigent> hey guys, I'm pretty sure this is a longshot, but is there anything else I can do to optimize this query: db.annotations.aggregate([{ $group: { _id: '$work_id', count: { $sum: 1 } } }])? I have an index on work_id, but the docs claim that it won't be used
[22:08:38] <atrigent> the annotations collection has a little over 8000 documents
[22:11:00] <atrigent> it sorta seems like an index would help with this query, but the docs say that $group doesn't use indexes
[22:14:34] <atrigent> hmm the server version is 2.4.10
[22:47:22] <atrigent> hmm this: https://jira.mongodb.org/browse/SERVER-11447 looks relevant
[22:48:42] <atrigent> I tried db.annotations.aggregate([{ $sort: { work_id: 1 } }, { $group: { _id: '$work_id', count: { $sum: 1 } } }]) but I'm not sure I really see a performance improvement
[22:48:55] <atrigent> perhaps I'm misunderstanding what this issue is saying
[23:34:33] <atrigent> anybody?
[23:39:13] <Boomtime> @atrigent, the aggregation you posted is a complete collection scan, the server would need to load every single document
[23:40:22] <Boomtime> if the collection is larger than RAM then performance will be limited by hard-drive speed
[23:41:03] <atrigent> I realize that that would be the naive way to compute it, but couldn't an index help in this case? I realize that mongo might not actually implement that, but couldn't it theoretically help?
[23:47:13] <atrigent> the SQL statement to get the same result in the shitty old mysql database that this data is being migrated from is almost instantaneous, and it even involves a join :-/
[23:55:45] <Boomtime> MongoDB != SQL, really, they are not related in any way, if you bring your SQL knowledge to MongoDB you are going to experience a lot of pain
[23:57:28] <atrigent> it's not like I'm trying to do a join in mongo or anything
[23:59:24] <atrigent> I'm just trying to use the aggregation framework that mongo provides
[23:59:31] <Boomtime> you are trying to do all your processing after the fact, something that SQL was designed for - MongoDB is storage-oriented, it provides ways to obtain information already in the format you want, with the layout you want
[23:59:47] <Boomtime> sure, but what you're asking of it is naive