[00:37:01] <caitp> does anyone know if it's possible to share a mongodb js driver connection with mongoose? or would that be a bad idea even if it is possible
[00:38:24] <caitp> #mongoosejs is very inactive :(
[01:03:22] <kal1> does anyone know if objectid is stored outside the bson document or as a part of the bson document?
[01:17:48] <pasichnyk> hey guys, i'm brand new to mongo, and working on updating 30 or so values in a document at a time using the atomic $inc operator. My query to find the document to update is a point lookup (i.e. _id: "username-date") so that part is fast. The values i'm upsert/incrementing however are nested into a couple arrays, so i can pre aggregate this data at the day and hour levels. (i.e., value,
[01:17:48] <pasichnyk> days.value, days.{day}.hours.{hour}.value, country.{country}.value, days.{day}.country.{country}.value, days.{day}.hours.{hour}.country.{country}.value, etc) Do i need indexes on these searches that happen inside a document after its been selected to keep things fast, or does the index only help to find the document in the first place (i.e., the query part of the operation)? Or is this a
[01:17:48] <pasichnyk> "it depends" type of a question?
[01:18:55] <retran> you need to add an index for every field combination you query on
[01:19:24] <retran> you need indexing on the fields that have nested stuff in them
[01:19:31] <retran> if you want to avoid table scans
[01:19:51] <retran> (if you want to search on your fields with nested arrays, ever)
[01:21:13] <retran> personally, when i develop, i do a very blunt thing to force me to use indexes (but not create too many)
[01:21:27] <retran> i disable table scans in mongodb conf
[01:22:48] <pasichnyk> I won't be searching on the fields as part of the query, i'd be selecting the dcoument strictly based on the _id, and then updating or returning part of the document based on the array values.
[01:23:13] <pasichnyk> so i guess... if i can get to the document wihtout the index quickly, do i need it for intra document stuff?
[01:23:21] <retran> so your client application will be doing the searches
[01:25:32] <retran> you have to have index on all partrs of the condition
[01:25:44] <pasichnyk> the query that goes into my update is like {_id: "username1-201307"}, but then the command that i actually run on that single document is a whole bunch of $inc operations
[01:26:26] <pasichnyk> (i.e., i store a single document per user per month, for this reporting purpose, so its easy to get to it with a point lookup based on the _id)
[01:27:26] <retran> $inc wont need more index, that's just a way of indicating what you want done to a field
[02:01:11] <johann> hey guys so im trying to set up my update query where it makes sure that a userId is not inside an object--i am aware of '$nin' and even got a pattern working when userid is the only entry in the object
[02:01:22] <johann> but it stops working when the object contains another entry beside userid
[02:02:28] <johann> how would i define a query that checks for the userid field of all entries? sorry if my language is bad im a noob still getting accustomed to the proper wordings
[02:06:52] <retran> johann, you mean like a field exists? $exists
[02:07:24] <johann> i want to do a check where a user can only vote once
[02:07:33] <johann> so a user can vote for one out of three things
[02:07:48] <johann> a vote has the userid and what they voted for
[02:08:14] <johann> so i wanted to do a check to only allow an update on the collection if the userid has not been in any of the voting objects
[02:08:50] <SrPx> Hello! I've asked this yesterday but nobody replied. If my app crashes between 2 updates, is my db left in an invalid state? Can I avoid this somehow?
[02:08:51] <retran> are you looking for field value or field existing
[02:09:26] <retran> SrPx, we wont know if your db is in an invalid state
[02:09:54] <SrPx> retran: suppose the 2 updates MUST happen together or else an invalid state will happen
[02:09:56] <retran> it absolutely depends on your data
[02:10:06] <retran> SrPx, are you familiar with transactions?
[02:10:19] <SrPx> retran: not a lot, that's where my doubt came from
[02:10:34] <retran> mongodb wont magically know if you consider your data invalid or not
[02:10:44] <retran> you'd have to have your app use transactions
[02:11:25] <SrPx> retran: but how can I tell my app to call 2 "sets" on the api together without a chance of a crash between them?
[02:11:55] <retran> basically, it means, you do a "start transaction" command, then do your updates, inserts, etc, then do a "commit transaction" command at the end
[02:12:10] <retran> that's the concept of transactions in it's simplest
[02:12:52] <retran> SrPx, what does "on the api togather" mean
[02:13:21] <SrPx> retran: but mongodb supports that?
[03:24:29] <caitp> so, if I have a subdocument containing only an ObjectId, is there any way for me to use that in a query to refer to the document that the ObjectId actually points to?
[03:25:12] <caitp> actually, luckily in this case these subdocuments aren't subdocuments and are actually just objectids
[03:25:27] <caitp> but I still need to use the objects that they point to in queries
[03:35:03] <caitp> would I need to use mapReduce to solve this?
[03:43:43] <caitp> I need to filter a list of objects based on the contents of a linked document, to do that I need to query all of the linked documents first?
[03:47:13] <retran> (ignore that it says "for ruby on rails" for the moment)
[04:00:53] <retran> i wouldn't sweat too much you have to manually link
[04:01:07] <johann> so im trying to implement a logic where a user gets one vote for one item in a set of three
[04:01:33] <johann> and im trying to do a query that would make sure the collection doesnt get updated with a user voting for the same item twice or any of the other two items
[04:02:12] <retran> you have to control for it by updating transaction status
[04:02:22] <retran> did you read the link i gave earlier, johann
[04:02:47] <johann> i had zero idea that was directed at me i apologize i got lost in my own dream of solving it through intuition and kept on pounding the rock
[04:03:10] <johann> lemme scroll up and absorb some insight
[09:11:59] <Avish> I'm getting some kind of memory leak using MongoDB from Java. The thread dump shows multiple "MongoCleaner" threads in "com.mongodb.Mongo$CursorCleanerThread.run" -- is this normal? Am I forgetting to close something?
[12:23:52] <remonvv> then add 2 more with the proper hostnames
[12:23:53] <jpfarias> I trying googling but couldn't find any docs on it
[12:24:24] <jpfarias> remonvv: it won't let me add shards with hostnames different than localhost
[12:24:31] <remonvv> If that doesn't work (it might reject adding non-loopback hosts when there's already localhost machines in) you'll have to either.
[12:24:37] <jpfarias> on the shard that is already running
[12:26:56] <remonvv> The fast way is to take everything down but the config servers and 1 mongos, connect to that mongos, go to the config database and update the db.shards collection with the appropriate hostnames. remove() db.lockpings and db.locks
[12:28:24] <remonvv> The proper and safe way is to make a database dump, recreate your cluster with proper hostnames and restore it.
[12:28:41] <remonvv> If it's anything close to a production environment do the latter.
[12:28:57] <jpfarias> ok, that's what I figured....
[12:29:30] <jpfarias> if I do the first will it damage the data if something goes wrong?
[12:30:10] <remonvv> No. You're not touching the data. As long as you stay away from db.chunks and don't rename your shards you should be fine.
[12:30:26] <remonvv> If you make a backup you can try the former and do the latter if it does something wonky
[12:46:10] <jpfarias> why do you think backing up the data will work instead of a dump?
[12:46:33] <jpfarias> backup the config data too I guess, right?
[12:47:04] <jpfarias> if it fails I can stop all the servers and copy all the data / config back and do a dump
[12:49:35] <remonvv> Right, you backup the data minus config, start the mongod process with that data, then execute addShard so it'll redo cluster metadata, and so on
[12:55:36] <remonvv> Does anyone know what the reasoning is behind changing the removeShard command to not allow more than one shard to removed at a time?
[12:56:44] <jpfarias> maybe no one ever thought of removing more than one shard at a time...
[13:54:50] <tadasZ> hi guys, I have a little problem, it's not mongodb problem, but maybe some of you uses php and gets "couldn't find file size" exception when trying to get files body with getBytes() ? sorry if its wrong place to ask such question
[14:19:15] <remonvv> I agree. And how does amount of data tie in to that trend you think?
[14:19:15] <jmar777> So, on principle, I'm just going to say that "Big Data Analytics" doesn't work, so I'm going to go with "Large Scale Data Processing" unless something better comes up
[14:19:17] <jmar777> remonvv: general purpose databases are like sporks... they're reasonably suitable for soup, salad, and sometimes they even have little blades on the edge for spreading butter...
[14:19:19] <jmar777> remonvv: but if you need to slice through a nice 32 oz sirloin, you really want a steak knife
[14:19:40] <jmar777> remonvv: it's a specialized tool that's incredibly effective at cutting through steak, but the cost is you can't really eat your soup and salad with it
[14:20:14] <jmar777> remonvv: in general I think that describes the NoSQL movement... it's a shift from general purpose databases that are sortta ok at everything, but not great at any one thing
[14:20:19] <tadasZ> Nodex: what do you mean whats from mongo? I think my mongo-php-driver throws this exception, only description i find is "14 couldn't find file size The length property is missing"
[14:21:18] <jmar777> Nodex: Peta scale does sound nice
[14:21:42] <remonvv> jmar777, I'm familiar with the trend ;) I just don't see the direct relationship with big data.
[14:22:04] <remonvv> I don't think the major driving force behind people going for domain specific solution is big data.
[14:22:50] <Nodex> tadasZ : that might be your framework spitting that out - this is what I am trying to determine
[14:22:51] <jmar777> remonvv: i agree with that in general, but in the case of scalability issues, specialization at the database level is a driving factor
[14:24:07] <jmar777> remonvv: take cassandra... the first use cases were around needing incredibly high volume transactions. cassandra is specialized for that, but in return you end up with crappy range operations
[14:24:57] <jmar777> remonvv: or mongodb. awesome for (un|semi)-structured data, but you give up joins.
[14:25:47] <jmar777> remonvv: or hadoop... high volume, ad hoc queries... but miserable response times.
[14:26:07] <Nodex> each tools spefic for certain jobs, none claim to be all round replacements
[14:26:44] <jmar777> Nodex: exactly (hence the spork vs. steak knife analogy)
[14:28:54] <tadasZ> Nodex: thank you, I'll try to debug it further, maybe you're right
[14:29:20] <Nodex> I've not seen the PHP driver spit that out before
[14:29:41] <remonvv> jmar777, Sure but that's still not necessarily big data related. It's more CAP theorem, cost, durability, able to deal with nodes with low MTBFs and so on. That's what driving people to move to solution that fit their exact needs. Sure if big data is a need it will affect storage solution choices.
[14:31:16] <jmar777> remonvv: still agree with you in general, there. the context of my talk though is high volume data, and in that context it is a driving factor. admittedly that doesn't make it universal though
[14:31:43] <remonvv> jmar777, ah okay. Wasn't aware the talk itself was mostly data volume oriented ;)
[14:32:10] <jmar777> remonvv: i suck at explaining myself. good thing i'm having a trial run here :)
[14:36:22] <jmar777> i think my reaction is partly based on how I know I respond when other people throw it around... but i'm just cynical like that sometimes
[14:38:33] <remonvv> Well I don't know. Most people should be able to determine what is meant in a specific context when someone says "Big Data". I'll agree it's overused and misused though.
[14:46:29] <statu> hi all! I have a User schema and it has a list of friends (_id of other users). I am developing the getFriends(arrayOfFriends) method and, in order to develop it, I am going through each element of the array in the schema and retrieving the complete object from MongoDB. When I have all the user, I put them in an array and I return it. There is an easier way to do that?
[15:11:09] <Nodex> pass - personaly in my opinion it encourages BAD schema design
[15:11:36] <Nodex> + bad performance design- it doesn't make people think enough about their data and getting the best out of it
[15:32:21] <caitp> people suck at data modelling, nothing new about that
[15:32:38] <caitp> I don't think automatic "joins" are really making it any worse
[15:37:37] <caitp> and then you pair that with the fact that relational database theory is like half a century old or more, and nosql/document-oriented is much more recent and less widely understood
[15:39:43] <caitp> they teach you to normalize that data and use joins, and people apply that to NoSQL too, because there's no real widely distributed body of knowledge and principle for NoSQL, but there is for the relational model
[15:40:40] <caitp> and so the only way you really acquire that knowledge in the NoSQL world, is through personal experience and vague poorly written blog posts
[15:41:14] <Nodex> caitp : making unnessecary joins makes it worse, and relying on your ORM to do it for you is even worse
[15:42:00] <caitp> for the most part it doesn't make that much of a difference -- for the cases where it does make a huge difference, where the data sets are gigantic and the audience is large, people find ways around it
[15:42:06] <Nodex> people cannot fully apply relational thinking to non relational databases. Making tools / easy ways that erase the need to rethink these things is BAD
[15:42:38] <caitp> sure it's bad, but you can't just say "it's bad" and not offer something good
[15:43:21] <caitp> it's a relatively new technology, people don't understand the theory, their college professors didn't drill it into them for 3 years, there aren't very many books on it, etc
[15:43:23] <Nodex> it's not my job to offer something good
[15:43:30] <Nodex> I worked it out myself, why can't other people
[15:43:31] <caitp> so it's no surprise that it's not there
[15:43:46] <caitp> people do work it out by themselves
[15:43:54] <caitp> they might happen to be colleagues of yours while they are working it out for themselves
[15:43:59] <Nodex> the theory is simple, it's pure common sense
[15:44:02] <caitp> or you might see them ask on stackoverflow or something
[15:44:07] <caitp> no, nothing is ever pure common sense
[15:44:34] <caitp> if you ever think something is pure common sense, it's because you're too close to it
[15:48:30] <caitp> but the point is just to illustrate what happens when you're too close to the problem; it seems obvious when you're close to it, it seems like common sense
[15:48:40] <caitp> but once you distance yourself from the problem a bit, it becomes completely alien
[15:49:13] <caitp> the relational model is old and has been thus integrated more closely with people, so it seems more obvious (although not as obvious as riding a bike)
[15:49:21] <Nodex> what's the obvious answer to this. 3 queries to render a page or 1 ?
[15:49:53] <caitp> the obvious answer is simpler is better
[15:50:51] <Nodex> caitp : correct so how does distancing yourself from that question make it alien ?
[15:51:24] <caitp> because when something is right in front of you, it's very familiar and obvious -- when you're very involved with a problem, it becomes intuitive and natural
[15:51:29] <caitp> like playing an instrument or painting
[15:51:48] <Nodex> it's nothing like playing an instument, it's a simple logical question. 1 or 3
[15:52:33] <Nodex> i can't play an instument but I can choose 1 or 3 - my mistake - they're exactly the same
[15:52:55] <caitp> you're a vocalist, you intuitively understand how to shape the timbre of your voice, you have a feel for the upper and lower bounds of your pitch registers -- but now you're told to play the drums instead
[15:53:03] <caitp> and you have no idea how that process even works
[15:53:12] <caitp> the vocal model doesn't fit the drum model
[15:53:49] <caitp> in the same way, the relational model might be perfectly obvious to you, but once you're asked to play the object-relational or document models, things become less obvious
[15:53:57] <Nodex> but you know if you hit the drum once a second you will tap out a rhythm
[15:54:09] <Nodex> this is where you're wrong. Forget the MODEL
[15:59:04] <Nodex> I beg to differ, it doesn't get any faster
[16:11:21] <PaulECoyote> Morning. I've got a replication set of mongos up on ec2 on Ubuntu images, I can get to the web interface etc. But status mongodb seems to think it is stopped when it obviously is not
[16:11:57] <PaulECoyote> I think the start-stop-dameon could be wrong in mongo.conf because it doesn't use a pid file?
[16:12:35] <PaulECoyote> but I don't know, pretty new at all this. Happy it seems to be working, but would prefer the service status to be correct
[16:38:20] <dan__t> 'morning! let's try this in the right channel, eh?
[16:38:31] <dan__t> Still working on this one, where I want to set up a replica set on one node, when the two other nodes might not yet be available. Can I do this? I'm currently working with http://pastebin.com/PNGRdPth . When I run it against mongo, though, it doesn't seem to include other nodes in replication, presumably because they don't yet exist.
[16:38:37] <dan__t> Is there any way to override that and just have this instance of Mongo wait for other nodes to come up?
[16:42:35] <PaulECoyote> I dunno. I'm new to this also. Using a heavily modified cloud formation script so I can get it working on ubuntu
[16:42:56] <PaulECoyote> I have the set working, but it doesn't think the service is running even though it is
[16:43:27] <PaulECoyote> the way the scripts do it is to wait for the other servers to spin up first
[16:44:05] <PaulECoyote> then send that config last
[17:31:40] <lazrahl> So, text indexes in 2.4 is beta even in the final release?
[17:32:12] <lazrahl> I see a note about that in the tutorial on enabling the feature, but not in the main doc page.
[19:53:25] <sqler> How do I do something like the following in MongoDB? select count(*),t1.a,t1.b from t1, t2 where (t1.id = t2.t1_id) and (t2.t1_id in (select id from t1 order by id desc limit 10)) group by 2,3 order by 1 desc
[19:56:44] <retran> mongo doesn't have a structured query interface
[19:58:29] <sqler> I'm using the Java driver, and I've managed to do everything except the nested select. I have to run the outer query 10 times, once for each of the values generated by the nested select.
[19:59:11] <sqler> Doing this seems rather brute force and ignorance as it causes 10 scans
[19:59:51] <sqler> I'll have to add indexing to minimize the scan time
[20:28:44] <sqler> I'm no MongoDB expert, LOL, but my MySQL experience tells me that full scans on lots of data is **BAD**
[20:28:46] <kali> retran: or on ops that actualy needs to process a huge part of the collection
[20:30:51] <retran> if you have an edge example where you need lots of indexes but not slow down your inserts one little bit, you're probably in a situation where you can give up consistency and have a process to merge the new inserts to a collection that is properly indexed
[20:31:10] <retran> or have some recurring process that merges data togather indexed as needed
[20:34:29] <kali> retran: i'll give you an example: you user documents, _id is the login, the email is a separate field. you have an accept_newletter boolean, which is true for 95% of your user base
[20:35:24] <kali> retran: when you send your newsletter, you'll actually select the 95% of the documents that are "true". in that case, having an index on the boolean is probably counter productive.
[20:36:24] <kali> retran: because you'll access the whole collection anyway, and in the index order, instead of a scan in the natural order
[20:36:48] <kali> retran: plus the index cost on updates (and in memory)
[20:47:57] <SparkySparkyBoom> in the mongodb environment the driver does the heavy lifting
[20:48:03] <retran> you can profile/benchmark it in mongo
[20:48:22] <retran> but things seem absurdly fast to me so far
[20:48:34] <retran> i've even linked across diff mongo servers without much issue
[20:48:43] <SparkySparkyBoom> ive never tried sharding before
[20:49:59] <retran> just an example, i have the placed the entire IMDB database in mongo, and it runs on a 1GB digitalocean machine, created a webservice using it, gives me results in 0.01 sec
[20:50:29] <retran> i have all 'flat' document schema
[20:50:38] <SparkySparkyBoom> flat means non-nested?
[21:26:09] <titosantana> I'm confused how i can make the subdocument articles unique and create a score for each article based on a few things like if there is a : in the title
[21:42:15] <leifw> jpfarias: point queries (exact and $in queries) do, range queries don't
[21:42:52] <jpfarias> leifw: that's good, means I probably don't need the a second index on that field on my application :)
[21:43:09] <jpfarias> I used a hashed index to shard my collection
[21:43:22] <jpfarias> and also have a normal index which seems I don't need
[21:43:37] <titosantana> i tried using addToSet but it didn't do what i expected
[21:43:39] <jpfarias> all my queries on that field are exact or $in
[21:43:46] <leifw> if you don't do range queries, a hashed index is an excellent choice of shard key
[21:44:24] <SparkySparkyBoom> retran: im really liking futon