[00:31:45] <Petrochus> updating a document in-place, like via .update({_id: 1}, {$push: {"array": val}}) should be better/faster than doing a findOne and later save, right?
[00:34:26] <skot> yes, reduces network round-trips and data transmitted.
[00:36:40] <Petrochus> skot: well, let's assume the database will always be located on localhost
[00:38:42] <skot> Internally it has to do the find and then update the document in memory, and write it out. So in practice, it is really about the same.
[00:39:45] <skot> As for which is better, let me ask you this, what happens if two parties try to find+save at the same time with different data?
[00:44:14] <epsas> Hello - is this a place to ask motor questions too?
[02:06:54] <minerale> skot: I have nested classes, and for some reason Morphia is trying to serialize the parent class. (Because the parent class is referenced under the default this$0 ) Is there a way to stop that?
[02:13:09] <cheeser> minerale: declare the inner class as static
[05:22:56] <dotpot`> question: how do I increase lets say "calculation_weight" in aggregation when document property does not exists or is None or empty ?
[06:33:45] <dotpot> question: via aggregation, how do I increase lets say "weight" when document property let say "address" does not exists or is None or empty ?
[06:37:27] <LoneSoldier728> http://pastebin.com/ipFL1UqB hey anyone know how to avoid 4 queries
[06:37:44] <LoneSoldier728> since I cannot have pull and addToSet in the same query
[07:26:18] <weblife> I just finished a three part tutorial on how to get going with Node.js / MongoDB on Ubuntu and then deploy it to cloud services. Could I get any interested parties to review it for fixes / ideas on how to improve it? https://github.com/TheMindCompany/mongonode-app/blob/master/tutorial.pdf
[12:09:40] <Industrial> Howmuch of my database will mongodb keep in memory? I want to have 13 databases / collections and be able to efficiently grab time slices
[12:10:31] <Industrial> I guess the most recent history will always be most hit
[14:12:32] <remonvv> Are you asking which is more appropriate when?
[14:13:11] <DestinyAwaits> well yes that one and the other is when it makes a sense to create a sub document or a list?
[14:13:32] <remonvv> If so, that's a very hard question to answer. They're two rather different things. It's a similar problem to when you use a List or Set rather than a Map in code.
[14:14:46] <remonvv> When to embed what is an equally difficult question to answer in a few sentences.
[14:15:18] <remonvv> What language are you used to?
[14:15:30] <DestinyAwaits> not a problem I have time if you don't mind explaining.. :)
[14:15:54] <remonvv> I have to get into a conference call in a bit.
[14:16:38] <DestinyAwaits> well the one thing you mentioned above when to use List, Sets and Map there are certain criteria for that Big-O notation and key-value pairs etc
[14:17:26] <DestinyAwaits> I hope someone else when gets active might help.. :)
[14:17:40] <remonvv> Well I have a few more minutes. Maybe some more specific questions might help ;)
[14:18:05] <cheeser> perhaps you want a list of subdocuments.
[14:18:15] <cheeser> the question, without context, is next to meaningless.
[14:18:20] <DestinyAwaits> hehe.. I am starting with the language so I don't know much.. :)
[14:18:24] <remonvv> Are you asking what the practical difference is between [{key:<key>, value:<value>}, .., {key:<key>, value:<value>}] versus {<key>:<value>, <key:value>}?
[14:18:36] <cheeser> you're asking how to model your data when we know nothing about your data or how it will be used.
[14:19:27] <remonvv> Read up on MongoDB (and on Java if needed), especially best practice blogs/articles.
[14:19:44] <DestinyAwaits> cheeser: I might be wrong here but it's a general question I think to me as a starter.. on what basis one must choose what DS to use
[14:20:38] <remonvv> Get a good grasp of the problem you're trying to solve, determine what operations are common and which are rare, find persistence solutions that do what is common well, and then try it.
[14:20:47] <cheeser> a list is used to store scalar elements of like kind (usually)
[14:21:06] <cheeser> a subdocument is grouping related elements together in a single unit.
[14:21:47] <DestinyAwaits> cheeser: this answers my part of my question
[14:22:18] <remonvv> Think of an embedded document as the MongoDB equivalent of a class field that refers to another instance rather than a scalar/primitive. e.g. class Foo { Bar bar;} would be db.foos{bar:{..}} in JSON/BSON/MongoDB
[14:22:48] <DestinyAwaits> cheeser: Something related to performance when to avoid between the two or something like that.. I hope I explained the question.. :)
[14:23:06] <remonvv> Small footnote that arrays can contain objects as well as scalar elements, and usually do.
[14:23:42] <cheeser> right. arrays can contain subdocuments.
[14:25:18] <DestinyAwaits> Now the second part of my question when to use what? Time Complexities and all ?
[14:25:33] <Nodex> depends on your data and end goal :)
[14:25:34] <remonvv> DestinyAwaits: As for performance, hard to say. Arrays tend to grow in size as the document ages in most use cases which means MongoDB will have to move it around every so often (or very often, depending). Embedded documents by their nature are fairly static in size.
[14:26:02] <remonvv> Another thing is that arrays are a bit tricky to index on. Each array element is indexes seperately so for large arrays that may mean you can't really index them practically.
[14:26:12] <remonvv> Fields in sub documents can be indexed relatively easy
[14:27:27] <remonvv> DestinyAwaits: I think the conclusion you have to reach here is that these sort of questions are very context sensitive. Try to make something, see what problems you run into and try and fix them.
[14:27:34] <DestinyAwaits> remonvv: so if we talk of normal complexities documents are must easy to work with and maintain.. right?
[14:28:03] <remonvv> DestinyAwaits: It's as easy as you make your schema. And there are some pitfalls. Nested arrays are problematic for example.
[14:30:31] <remonvv> For embedding itself a very rough guideline could be : If the relationship with the sub entity is 1:1 embed document, if it's 1:N where N is small use array of objects, if it's 1:N with a big N use a seperate collection. There are a ton of exceptions to the former but that's the rough first guideline.
[15:14:26] <pwelch> hey everyone. if I have 3 mongos running on localhost that are part of a replicaset, does that mean I cant have another host join that replicaset? I got a weird error about can only be localhost
[15:15:16] <teeceepee> whats the difference with slave and only and slave and source (with master's ip)
[15:20:09] <Infin1ty> When recovering a new replicaset member using another members data files, do i need to do anythign special or just start the mongod instance as normal (same parameters as before)?
[15:20:22] <Infin1ty> no special switch parameter?
[15:46:36] <jordiOS> Hello, I wonder if I can use the utility mongodump and mongoimport on their own (I mean putting the files to the apache server or they have to live in the mongodb server), thanks
[15:47:11] <LoneSoldier728> does anyoe here know how to query and update
[15:47:21] <LoneSoldier728> part of an object that is nested in an array
[15:47:48] <teeceepee> I just got bitten by mongodb's 32-bit limit
[15:49:05] <remonvv> teeceepee: I can't decipher your question. Master/slave replication is deprecated. But if you mean primary/secondary then follow the appropriate manuals for setup
[15:49:21] <remonvv> LoneSoldier728: Post your object in a pastie along with what you need to do
[15:49:29] <teeceepee> remonvv point me to manual pls
[15:51:41] <remonvv> LoneSoldier728: Sorry, no energy to decipher code ;)
[15:51:58] <teeceepee> remonvv thanks, trying to prevent write on secondary…alright ?
[15:52:14] <weblife> I finished a three part tutorial on how to get going with Node.js / MongoDB on Ubuntu and then deploy it to cloud services. Could I get any interested parties to review it for fixes / ideas on how to improve it? https://github.com/TheMindCompany/mongonode-app/blob/master/tutorial.pdf
[16:36:28] <remonvv> Replace with your exact paste
[16:51:12] <NaN> if I have a collection like this > http://pastie.org/private/sjbcr6yjqrfiaskxudhha < should I index the keys from the doc with more keys?
[16:52:57] <remonvv> I assume that's not an actual example. How selective is "a"?
[16:53:27] <remonvv> Meaning, for a typical value of "a" what percentage of documents would match?
[16:54:13] <remonvv> You can index on {a:1}, {a:1, c:1} or {a:1, c:1, e:1} but I suspect the first is the most useful.
[16:54:26] <NaN> it's just an example, not real data, let's suppose "a" is date
[16:54:38] <remonvv> How many documents per unique date?
[16:55:04] <remonvv> If you give a value for that date and you're already left with a handful of document at most there's no value in adding other fields to the index usually
[16:55:05] <NaN> remonvv: the thing is that I'm ussing the "no schema" support so my documents are so variable
[16:55:12] <NaN> there's no document with the same date
[16:56:59] <remonvv> You can't index everything, so you need to make decisions.
[16:57:25] <remonvv> In this case, if "a" would always be a date, would always be part of a query and if there are always at most one document with the same date you only need an index on "a"
[16:57:54] <NaN> what happens if I don't do index?
[16:58:04] <remonvv> every query becomes a table scan
[16:58:20] <remonvv> That means it becomes somewhere between slower and unacceptably slow depending on the size of your data set.
[16:59:07] <remonvv> MongoDB would have to walk through the entire collection (assuming you didn't specify a limit) for every query.
[16:59:33] <remonvv> In all but a very few exceptions that isn't an option for production level software.
[17:00:05] <NaN> ok let's say that I index the date key because most of my querys will be about the date, but... what if my querys include more keys with $exists? does the date index will be enought?
[17:30:27] <rud> i'm running mongod 2.4.3 for a backend app that only opens a few connections to mongod, but makes a lot of requests to it. problem is after a while, mongod stops serving these connections, and all new connections attempts (even via --shell) are failing. i cannot figure if the issue is operating system related (it probably is, my mongod runs in a jailed freebsd-9.1-stable), or a 2.4.3 issue (i haven't seen anything related to this in the change log of 2.4.4, but
[17:30:27] <rud> could be wrong), or something … totally different ..
[17:30:43] <rud> i've gathered infos on this paste bin, in case one you good souls feel like helping :) http://pastebin.com/4dAt1VTu
[17:46:13] <fogus> hello. I have an existing 1.8 install and I want to set up master/slave replication to a new host. should I run 1.8 on the new host or something newer?
[17:47:04] <remonvv> Anything lower than 2.2+ is end-of-life
[17:47:10] <remonvv> So is master/slave replication
[17:48:12] <remonvv> You'll want to upgrade to 2.4.x, switch to replication sets
[18:36:04] <lucasliendo> Got simple question...I'm running a shard (think is well configured). I'm trying to restore a DB from scratch and wanted to test the hashed index, so after : db.runCommand({enableSharding: "myNewDB"}) I run : sh.shardCollection("myNewDB.myNewCollection", {field : "hashed"})
[18:36:36] <lucasliendo> then I get : { "ok" : 0, "errmsg" : "shard keys must all be ascending" }
[18:37:00] <lucasliendo> which I think does not make any sense because the collection is empty...
[18:37:54] <lucasliendo> However if I run a getIndexes on the collection I see that the index was created and the type is hashed indeed
[18:51:22] <lucasliendo> you're right again, the message is not very descriptive...
[18:51:31] <lucasliendo> thanks a lot for your time !
[18:52:13] <remonvv> Well, it's trying to check if the key value is positive (e.g. {key: 1}) so if the positive test fails it gives that error (assuming someone entered key: -1 or something).
[19:14:03] <bjorn248> hey, I stopped by here the other day with a question, specifically, is it possbible to use mongos as a dispatcher for a non-sharded replica set. I want to point my application at the mongos and not the primary of my replica set so in case of failover, mongos will handle the routing automatically. However, I only have a replica set currently (one primary, two secondaries). I don't have any configdbs for mongos to connect to. Last time I was here a fe
[19:16:40] <remonvv> bjorn248: If you can't have config servers you're better off with connecting directly. An appropriate read preference will deal with failover for you.
[19:17:32] <remonvv> bjorn248: Or more accurately, the repset will handle failover. Certain read preferences will make that minimally noticable on the client end of things.
[19:18:01] <bjorn248> remonvv: let's say I could have config servers, can I have config servers without sharding?
[19:18:46] <remonvv> You can have config servers and a single shard and not have sharded collections
[19:19:48] <remonvv> You set up a replica set as the only shard, and don't shard any collections.
[19:20:14] <remonvv> The nice thing about that setup is that if at any point you do decide to add shards and shard collections you can without downtime.
[19:20:50] <bjorn248> I see, so basically one shard that itself is a replica set
[19:21:08] <remonvv> Right. Most shards are in practice.
[19:21:19] <remonvv> And config servers can be small machines so that shouldn't add too much cost.
[19:21:30] <remonvv> Have 1 or 3, the latter being strongly preferred.
[19:21:31] <bjorn248> yeah I have machines to spare, so that's not a problem
[19:23:16] <bjorn248> yeah I mean that's what I have now...3 mongod, 1 replica set
[19:23:40] <remonvv> Yeah, so add 3 config servers, add mongos processes to your app servers and off you go.
[19:23:49] <remonvv> Setup is a bit different but well documented.
[19:24:32] <bjorn248> and so if I have 1 shard, nothing is actually sharded as long as I don't shard the collections? the shard is basically just a layer around the replica set that allows mongos to handle automatic failover?
[19:24:53] <remonvv> Think of it as sharding enabled.
[19:25:09] <remonvv> Sharding itself requires a few additional steps that are not relevant to you.
[19:25:22] <remonvv> But note that automatic failover happens without mongos as well.
[19:25:33] <remonvv> The difference there isn't huge.
[19:26:07] <remonvv> You'll still get errors (one per connection or many, depending on read pref) that you have to deal with during election phases.
[19:50:37] <ThePrimeMedian> Hi all.. I am looking at the mongodb docs, under http://docs.mongodb.org/manual/use-cases/storing-comments/ and what is this code? str_path = '.'.join('replies.%d' % part for part in path) am I supposed to loop through or ?
[19:51:30] <ThePrimeMedian> can someone change that line to reg javascript?
[19:52:56] <bjorn248> remonvv: well automatic failover doesn't happen because without mongos where do I point my application if I don't have a dispatcher in between the database and the application? The application will try to connect to a primary that is down, and just time out, how does it know about the new primary?
[19:53:43] <bjorn248> sure there is a new primary somewhere, but my application doesn't know about it
[19:58:20] <LoneSoldier728> if I want to set a second status in there do i do it like so? {$set: {'friendStatus.$.status': 3}, {second: 2}
[19:58:26] <LoneSoldier728> or right after 3 put a comma
[19:58:32] <LoneSoldier728> and add the field and value there
[20:08:01] <ThePrimeMedian> remonvv, i tried #javascript and i tried #python -- no answer in javascript and people in python are just rude and basically told me to f* off
[20:08:42] <ThePrimeMedian> all i want to know is how to convert this simple statement from python to reg. javascript syntax as I found it in mongodb's docs
[20:08:42] <ThePrimeMedian> str_path = '.'.join('replies.%d' % part for part in path)
[20:15:44] <cheeser> remonvv: there's the issue tracker...
[20:17:40] <remonvv> cheeser: Sounds comfy ;) I've been playing around with the idea to open source a sanitized version of our mongo stuff but I was curious if it might not be more interesting to contribute to morphia instead.
[20:17:51] <ThePrimeMedian> leifw: thanks.. I just didn't know if % part for part in path was a comment of a loop
[20:17:55] <remonvv> cheeser: Hence being curious about a roadmap, long term planning, etc.
[20:22:10] <LoneSoldier728> so i guess should i create a variable to store the answer and then return it at the end of the loop and render based on that
[20:26:43] <EmoSpice> So, I'm attempting to find some information on using ming (the python schema library) to retrieve only certain fields out of a document. if I do something like "File.m.find({'type': 'android'}, {'sha256': 1, '_id': 0})", I end up getting the full document filled out with dummy data. Is there a way to avoid this?
[20:27:01] <remonvv> leifw: Sorry, got distracted. I was curious if TokuMX follows the same consistency contract and so on compared to vanilla. And if not what are the differences. I read something about count() being different, any others?
[20:27:58] <leifw> remonvv: what do you mean by consistency, are you asking about replication and write concern?
[20:29:08] <remonvv> remonvv: Well there are some claims about very large speed improvements. Does that come at any cost? atomicity of single writes, different yielding patterns for multi writes, write->read consistency and so on.
[20:29:38] <remonvv> I suppose the question is, performance at the cost of what?
[20:29:40] <leifw> the downsides of tokumx are: no 2d/2dsphere/geo indexes, no 2.4 support yet (but we have backported some things like hashed shard keys), and since count() needs to check MVCC information for each doc, it's slower, because the implementation is more like find().itcount() in that we have to check each document (but it's faster than vanilla's find().itcount() because our range queries are faster)
[20:30:00] <leifw> our atomicity and concurrency guarantees are stronger than in vanilla mongodb
[20:30:09] <leifw> we use MVCC snapshots and transactions under the hood
[20:30:18] <remonvv> ah so your indexing approach doesn't support the interval optimization?
[20:31:02] <leifw> so if you do db.foo.insert([{a:1}, {a:2}, {a:3}]), either they all go in or none of them go in. in vanilla, if it fails halfway through, then the ones in the front get committed and the ones at and after the failure don't make it in
[20:31:23] <leifw> yes, the interval optimization is invalid due to concurrent writes
[20:31:50] <leifw> for example, if you start a count() operation, then someone else does an insert before you're done, you don't want the count() to include that new document because it came after your operation started
[20:31:54] <remonvv> okay so behaviour is different (but arguably better)
[20:34:07] <remonvv> If you put on your objective hat, what are the downsides other than trailing behind official mongodb releases.
[20:34:09] <leifw> write concern is slightly different ATM but we will probably fix it soon: currently, when getLastError returns and says it's on, say, 3 replicas, that means that the operation got copied into that many replicas' oplogs, but it doesn't mean that a subsequent read will necessarily see the result of the operation in the collections
[20:34:29] <leifw> i.e. there is a small race between when the data is safe on the secondary and when it is *queryable* on the secondary
[20:34:37] <remonvv> subsequent reads to secondaries?
[20:34:39] <leifw> we plan to fix this but we are considering UI options right now
[20:34:49] <remonvv> that's expected though, secondary reads are eventually consistent.
[20:35:10] <remonvv> as in, that's how vanilla behaves afaik
[20:35:13] <leifw> so if you do getLastError({w:'all'}) and then fire off a slaveOk read immediately when it returns, you *might* get old data when you wouldn't in vanilla
[20:35:27] <leifw> it's a tiny race though, and most applications are resilient to it anyway
[20:36:15] <leifw> when vanilla says "yep, that's on N secondaries" it means that you can immediately read that data from the collections on those secondaries
[20:36:16] <remonvv> That does seem like a bit of a detail.
[20:36:20] <leifw> for us it's just a guarantee that it's durable
[20:37:04] <remonvv> disk and mem usage similar/less/more?
[20:37:38] <leifw> <hat class="objective">if you need geo/2d/2dsphere/text indexes or the authorization stuff in 2.4, please come talk to us and we can see if it makes financial sense for us to backport those features.
[20:37:43] <leifw> otherwise, if I wanted the mongodb data model and I didn't work here, I would start with tokumx and not even consider mongodb (seriously, not just being a marketer here)
[20:38:12] <leifw> as far as future features, we plan to keep up with them, obviously no guarantees can be made, but we are pretty confident that we own the code now
[20:38:56] <leifw> disk usage is a lot smaller, because we compress aggressively (fractal trees have large blocks which is good for standard compressors like zlib), and we also don't fragment the way vanilla mongodb does, so you don't need compact() or reIndex()
[20:39:16] <leifw> compression is particularly good in the mongodb world (as compared to mysql), because of all those repeated field names
[20:39:37] <remonvv> true but it does eat cpu cycles.
[20:39:55] <leifw> <hat class="objective">also we haven't tested MMS and we're pretty sure MMS backup won't work</hat>
[22:38:17] <Ontological> I am having trouble $pull'ing the subdocument to which my query match belongs. Any suggestions? http://pastie.org/private/m5s6e2hwvdzzopuw1exmfa
[22:41:31] <minerale> skot: One more morphia question, how do I avoid having morphia insert a "className": "com.foo.bar" field when using toDBObject() ?
[23:37:17] <zachrab> how can I return a query with newest to oldest document sorting?