[11:55:29] <zell> for instance i'd like to query "name" property on the folowwing: { id: 1, foo: bar, embeds: [ {id: 1, name: '1'}, {id:2, name: '2'}]}
[16:30:45] <morphic> hello, I have a bson with the fields 'value', 'place', 'date', and I want to store more informations about the place (like x, y, street) whats the best way to do it, since I will have the same place for different documents
[16:34:06] <Kaim> Mon Jan 14 16:45:33 [conn3352] warning: shard key mismatch for insert { _id: ObjectId('50f4281d6291df0380010839'), time: new Date(1358178268000), __broken_data: BinData }, expected values for { path: 1.0 }, reloading config data to ensure not stale
[16:34:22] <kali> morphic: can you be more specific ? have you imagined one or more schema that we could comment ?
[16:34:39] <Kaim> and my insert hash got path key :/
[16:35:14] <kali> Kaim: are you sure your data is corrent ? because the __broken_data is frightening
[16:36:27] <morphic> kali: i'm a little lost with the relationship in nonsql, The idea in SQL is table [Transactions, fk(place_id), value... ] 1 <- 1 [Places, x, y, pk(id) ]
[16:36:48] <Kaim> at the begining of insert all is ok, then after some minutes sth went wrong
[16:37:42] <kali> Kaim: i think you probably have one document that breaks your import
[16:38:00] <morphic> maybe I can store the Places's ObjectId on Transaction document
[16:38:43] <kali> morphic: the important thing to think about is: how you will access this data
[16:39:28] <kali> morphic: if it rarely changes and is read constantly, then yes, it's a good idea to embed some or all data from Places in Transaction
[16:42:30] <kali> Kaim: what version of mongo are you using ?
[16:42:45] <foofoobar> Hi. I have not used nosql before and because I think it's an interesting topic I am looking to integrate it to an future project for me. Now I am currently coding something in RoR which is similar to a job board. You can post job offers and people can view them. I already wrote half of this very simple app (most is UI stuff) with a relation sql db
[16:42:56] <foofoobar> I am now thinking what advantages could be there to use a nosql approach here
[17:03:13] <foofoobar> NodeX, I'm still unsure how I would "port" such a relational way to a nosql way.. Let's assume I have the following model: http://pastie.org/5683646
[17:05:34] <foofoobar> How would you do this in a nosql way?
[17:05:59] <bean> foofoobar: you'd probably just have a user, that contains details about its category and job
[17:06:33] <foofoobar> bean, a job is related to a category, not a user
[17:07:15] <foofoobar> also a user can offer a lot of jobs
[17:07:24] <NodeX> foofoobar : it's hard to answer without knowing your access patterns
[17:07:28] <foofoobar> and how would you show a list of all jobs? you would had to iterate over all user
[17:07:40] <foofoobar> NodeX, What do you mean with "access patterns"?
[17:07:48] <NodeX> err how you access your data lol
[17:07:55] <kali> foofoobar: what kind of request you'll need to do
[17:13:16] <kali> or if that's not good enough, you can store the jobs in the category documents (i don't think you'll need to do that)
[17:16:03] <foofoobar> so in a relational sql I would have a category table with and id and a name. So when I now save the category in the job document I would just save what for the category? the name or and id ?
[17:16:56] <NodeX> tbh I wouldnt even have a relational table for category
[17:18:44] <foofoobar> So just save the name.. all right. And in case there are more than one field for the category? e.g. an icon-name? also put it in there?
[17:19:08] <kali> foofoobar: if you need to display it when you show the list of jobs, yeah
[17:19:13] <NodeX> you can embed anything should you need
[17:19:24] <foofoobar> So redundant data is not a problem?
[17:19:39] <kali> foofoobar: it's an issue, not a problem :)
[17:19:41] <NodeX> redundant data often means fast performance
[17:20:41] <kali> foofoobar: the idea in mongodb design is to optimize your documents for reads as they are 99% of your typical web app requests
[17:22:44] <foofoobar> Sounds good. I think I will give it a try and look how it fits
[17:23:17] <NodeX> not everything works with nosql, but I am yet to be stopped by something that doesn't work
[17:45:53] <tworkin> what is the point of the comment at the bottom of this code snippet? What class is vals/Vals a member of? http://www.mongodb.org/pages/viewpage.action?pageId=19562815
[17:56:46] <tworkin> bson-inl.h:840:13: error: no matching function for call to ‘mongo::BSONElement::Val(unsigned int&)’ - why is there no symmetric error for BSONArrayBuilder.append(unsigned int)?
[18:04:37] <Kaim> kali, my cluster is clean now, but still have the pb
[18:05:46] <kali> Kaim: have you look for the document that you logging app is pushing and triggers the problem ?
[18:06:28] <Kaim> In fact I'm using fluentd and the mongodb plugin to do insert
[18:06:52] <Kaim> I patch the module to show me the insert query
[18:12:19] <Kaim> kali, btw do you have advice about max number of chunks per shard and max database size per shard?
[18:22:35] <nopp> Mon Jan 14 16:18:09 [repl writer worker 3] ERROR: warning: log line attempted (59k) over max size(10k), printing beginning and end ...
[18:46:17] <kali> Kaim: nope, i'm not sharding my data
[18:47:31] <calvinow> Hello all. I'm looking for some insight into my schema design in Mongo. I'm dealing with a dataset where there are a potentially large number of 'nodes', each of which would be stored as a seperate document. I need to store relationships between these nodes, potentially from each node to every other node, although that worst case would not occur often. Currently, I'm storing these relationships as subdocuments in each node, b
[18:56:15] <kali> calvinow: make shorter lines, we lost you at "subdocuments in each node, b"
[19:09:44] <HappyPsychoD> Good day, is it possible to use one result set as the input for a foreach which requires a second query per record set.
[19:10:21] <kali> HappyPsychoD: in your application language, yes :)
[19:11:23] <HappyPsychoD> Damn... I was hoping I could do something like this
[19:11:23] <HappyPsychoD> var toQuery = db.user.find({'last_seen': { $gt : 1355198400}}).toArray();
[19:27:03] <calvinow> kali: but I'm concerned that may be sub-optimal, and I'd be better off storing these relationships as seperate documents (the relationship consists of two integer counters). Thoughts?
[19:27:16] <calvinow> kali: , but I'm concerned that may be sub-optimal, and I'd be better off storing these relationships as seperate documents (the relationship consists of two integer counters). Thoughts?
[19:27:32] <calvinow> kali: , but I'm concerned that may be sub-optimal, and I'd be better off storing these relationships as seperate documents (the relationship consists of two integer counters). Thoughts?
[19:28:10] <kali> calvinow: the most performant way is to paginate the relationships
[19:28:30] <kali> calvinow: store them 1000 per document
[19:28:40] <kali> but it is also the most brain fucking one
[19:30:08] <calvinow> kali: My problem with that is that I use aggregate() to calculate a score based a subset (or all) of the relationships, which is complicated by not having them stored in the same document
[19:34:03] <calvinow> I mean, complicated from a computational standpoint, not design... I need to get N nodes with the highest 'score', and if I seperate the relationships that seems not to be do-able using aggregation...
[19:35:46] <jaimef> so if you are dealing with a lot of slack/sparse data files, and want to compact a db, is it best to just nuke the data directory on a replica, then let it sync up, then promote to primary, and lather, rinse, repeat with the rest of the mongo servers?
[19:35:48] <kali> calvinow: if you have... say 1000 relations per page, you'll make 1000 less random than accessing your relationship collection
[19:36:20] <kali> jaimef: i do resync in that kind of situation
[19:37:10] <jaimef> kali: ok. just getting back into it in over a year and remember doing that previously
[19:38:18] <calvinow> kali: Yes. I guess my ultimate question is why not store them all in the same document? In this use case, that seems as though it would be faster to me...
[19:38:47] <calvinow> The score calculations are lightning fast right now, and I want them to stay that way.
[19:39:31] <kali> calvinow: the documents have a 16M limit
[19:40:04] <kali> calvinow: do you denormalize the score ?
[19:40:22] <jaimef> kali: you resync through mongo with empty dir? or you copy it over from other rs?
[19:40:35] <calvinow> kali: Yes, but that can be changed at compile time, and once this project I'm on gets off the ground I'd likely be maintaining my own source tree for the servers, so that's not much of a concern
[19:40:48] <kali> jaimef: i stop the instance, rm -rf in the dir, and start again
[19:41:26] <calvinow> kali: The score has to be calculated - it's based on a currently selected subset of nodes. Storing it would be n! complexity...
[19:42:53] <jaimef> kali: ok, thought I remembered it working that way. thanks
[19:43:18] <calvinow> kali: I mean, I'm just playing devils advocate here... I think for the purposes of that calculation, everything being in the same doc is a big win.
[19:43:36] <calvinow> I'm just not sure if that benefit is outweighed by the overhead of potentially large documents
[19:44:15] <kali> calvinow: well... i don't know the size of your dataset. if you're sure you're not likely to hit the 16M limit, then why not
[19:44:42] <kali> calvinow: also... updating the relations could be a pain if the documents are huge
[19:45:38] <calvinow> kali: I'm talking terabytes and terabytes. here... massive amount of data. The documents _could_ be larger than 16M, but it would be rare... I'd probably change that to 64M or suchlike just to be safe.
[19:46:20] <calvinow> I can fold all the updates for each document into a single update() query without a problem... so I think that works out okay
[19:46:21] <kali> calvinow: i'd rather plan for 10k or 100k pages, honestly
[19:46:58] <kali> calvinow: i assume most docs will fit in the first page, and it avoid having to patch
[19:50:29] <calvinow> kali: hmm, ok. Thanks a bunch for the insight.
[20:08:32] <nyxtom> Does anyone know how to solve this problem I'm having with large indexes and having nscanned be much larger than n? https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/P4Tb9bcx5OU
[21:06:42] <strnadj> ohh, you want to object_id of object which are changed by your update command?
[21:07:28] <macpablo> i know the id, but i want to know if it was an insert or an update
[21:08:40] <macpablo> http://php.net/manual/en/mongocollection.insert.php i want to get the upserted value of that array
[21:08:50] <macpablo> but i don't know where is that
[21:09:52] <strnadj> :( now, Iam not sure if i can help you :/
[22:50:58] <dudebro> hey guys, cross-posting from #mongoengine since it isn't really specific to that driver - running into a problem with linking documents. say we have three types of objects that are related, for example customers, jobs and employees. jobs are associated with a customer and an employee so in this case it seems fine that a job contains a reference to each of those entities. for customers and employees however, the jobs associated list could/should grow
[22:51:23] <dudebro> it seems like the latter is the "mongo way", but there doesn't seem to be any reason to not have a join table as well apart from the atomicity of writes
[22:56:23] <jesta> dudebro: so whats the problem exactly? :)
[22:57:23] <dudebro> jesta: just trying to figure out if there's a common model for this type of interaction. i suppose even with references stored in the user/employee objects there's a lot i can store before running into document size limits
[22:58:04] <jesta> Lemme write out real quick how I'd envision it
[23:01:38] <jesta> dudebro: I would just make it so that the Jobs reference the Company and Employees involved, and don't store any refs on Employees and Customers - https://gist.github.com/4534362
[23:02:34] <jesta> Unless the Employees are employees of the customers, then you might want to ref them together
[23:02:40] <jesta> but you don't need to ref everything to everything
[23:02:47] <dudebro> jesta: so in that scenario, say i want to look up all jobs for a customer i just query the jobs collection itself to match on the customer ref?
[23:03:38] <jesta> you can use references to query
[23:04:12] <jesta> that would find all jobs where the customer reference is set to that customer
[23:07:33] <dudebro> thanks jesta, i can see that would be better since writes would only have to happen in one place, i guess i'll just wait until i'm super huge to worry about the relative performance. (this seems better than having a single user or employee grow huge though, so it's probably better all around)
[23:09:41] <jesta> dudebro: np np, and honestly the jobs shouldn't get that huge just due to the fact that they always only reference a company and maybe a few employees. Then ya just create a new job, which gets new references :) Much better than storing it on the employee or the customer, cause yeah, they'd grow to be huge.
[23:22:16] <solars> hey, is it possible to replace the 2.2 binaries with the dev version 2.3 currently for testing?