[00:18:23] <bjorn248> hey guys, new to mongo here, but I didn't find anything clear in the documentation on this question...so I decided to come here. I'm trying to set up something very simple. Mongo replication with automatic failover. I have a node app using mongoose pointing at my database server, and the database is replicated on two other boxes (minimum replica set). No sharding at the moment. Is there a load balancer or something I have to set up between the node
[00:19:24] <rafaelhbarros> bjorn248: let me know what you find out because I'm interested on that one too.
[00:20:35] <bjorn248> I mean, if I set up a sharded and replicated cluster, I understand that mongos can act as a router in between the application and the database, but in the case where I am not using sharding, I am not sure how that is handled.
[00:25:28] <cheeser> you can use mongos without sharding
[00:32:58] <bjorn248> cheeser: oh really? awesome, so I can just set up mongos on the application box then? neat
[01:36:01] <DanWilson> that's a piece of piss. It's just a label in the statement: db.tpayne.aggregate([{$group: {"_id": "$id", "id": {"$addToSet":"$list" }}}])
[02:23:00] <jackh> all, how to run the c++ tests in dbtests directory?
[02:36:02] <Petrochus> does anyone know of a nice python mongodb wrapper that can transform things like `mongo.db.users.save({"_id": x}, {"$push": {"array": "value"}})`into, something like `users(id=x).array.append(value)`? pymongo, mongokit, and mongoengine don't really have such features
[02:36:56] <Petrochus> an easy OOP way to modify documents in place, essentially
[03:20:42] <mscook> Hi - can anyone see something wrong with this query:
[06:35:12] <ron> LoneSoldier728: never ask that on irc. if you have a question, just ask it and wait until someone responds.
[07:50:59] <trupheenix> anyone here know about MongoLab DB hosting service? I wanted to know if I can create a db with my free shared plan. I logged in and tried doing show dbs but it doesn't show me dbs.
[07:57:11] <cpu> My mongo server is entering a "state" I can't understand. 100CPU, locked all the time. Can someone help me understand the results of db.serverStatus?
[07:59:42] <Garo_> cpu: don't ask if you can ask, just ask your question (that includes the output of your db.serverStatus in this case)
[08:02:19] <cpu> in log I find: serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 1460, after globalLock: 1460, after indexCounters: 1460, after locks: 1460, after network: 1460, after opcounters: 1460, after opcountersRepl: 1460, after recordStats: 1580, at end: 1580 }
[08:02:40] <cpu> yesterday everything was running smooth
[08:03:51] <cpu> how can I find the cause for the 100% lock time
[08:07:52] <trupheenix> I'm confused what kind of DB architecture I should use for my application. I'm trying to use MongoLab where I am limited by the number of databases I can have on a single connection. My current code assumes that multiple DBs can be accessed on the same connection. So is it a good idea to shift all my collections into one db and then connect to it?
[08:09:09] <Nodex> cpu, what is the operation directly before that one?
[08:15:26] <trupheenix> Nodex you got any suggestions for me?
[08:16:11] <Nodex> why are you trying to use multiple databases in one app for?
[08:17:11] <trupheenix> Nodex to separate the objects and functionality
[08:17:33] <trupheenix> Nodex, I have one data set for authentication another for storing images another for storing user data and so on
[08:19:31] <Nodex> amazing but why are they not collections, why do they have to be databases?
[09:10:07] <puppeh> is there the possibility to do a range query but searching on keys and not on values?
[09:11:47] <puppeh> for ex. I have documents like this: http://pastie.org/private/el4nquexdgla65cyttiqq what I want to do is to query for documents that are between "13_07_06" and "13_07_09" in their "d" fields
[09:12:00] <puppeh> is is possible? or my schema is not suited for this operation?
[09:38:11] <_Heisenberg_> just recognized that I built bullshit yesterday. adding members of my replica set (1 master 2 slaves) as shards to a sharded cluster makes no sense right? A shard should be a replica set (represented by the actual master), so what I would need to built is something like two or more replica sets which consits of 2 or more replica nodes each and then add the replica sets each as shards?!
[09:50:34] <_Heisenberg_> This would be the way to go right? https://www.lucidchart.com/documents/view/4295-3704-51f789dd-a4c3-65470a00507e (no config servers and multiple mongos modeled)
[09:54:44] <Nodex> really depends what you're trying to achieve
[09:57:18] <_Heisenberg_> Nodex: High Availability, Horizontal Scaling, Performance and Fault Tolerance are the main aspects. I'm building a prototype which makes use of polyglot persistence, MongoDB takeing the star role ;)
[09:58:02] <Nodex> I don't know what polyglot persistence is sorry
[09:58:43] <Nodex> is your app going to write to both masters?
[09:58:53] <_Heisenberg_> just means you use several datastores for your application. for example a document store for a product catalog, a kv store for sessions and a rdbms for critical oerations where you need transactions
[09:59:59] <_Heisenberg_> Nodex: If you use sharding I assume that queries get routed to both masters, yes
[10:00:16] <_Heisenberg_> depending where the record resides
[10:01:03] <_Heisenberg_> that's the key of horizontal scaling, isn't it? distributing your reads and writes over all available nodes...
[10:02:44] <Nodex> that's what your diagram suggests
[10:03:47] <_Heisenberg_> the master/slaves in a replicaset should have the same data, the shards will have different data depending on the sharding key
[10:04:31] <Nodex> in your diagram which are the shards?
[10:19:21] <_Heisenberg_> because if you have a network partition between your replica sets or a datacenter fails, half of the data would be unavailable
[10:21:21] <Nodex> I don't think you will ever have 100% network parition tollerance
[10:21:45] <Nodex> but replica sets of shards should be in a separate data center to get anywhere near it
[10:24:49] <_Heisenberg_> I don't get it why in this example (http://docs.mongodb.org/manual/core/replica-set-architecture-geographically-distributed/) the Secondary in Data Center 2 should never get primary?
[10:29:11] <_Heisenberg_> yes but it makes no sense to me. if datacenter 1 goes down the secondary in data center 2 should get primary to ensure availabilty?!
[10:46:54] <Andy80> hi I'm performing this aggregation on a collection that I've: db.campaigns.aggregate({'$group': {'_id': 'budget', 'daily_budget_sum': {'$sum': '$daily_budget'}}})
[10:47:59] <Andy80> where do I insert the "filter" in this collection?
[10:50:27] <Nodex> you need to add them to a $match iirc
[10:50:44] <Nodex> been a while since I wrote any aggregations, bear with me
[10:51:28] <Nodex> http://docs.mongodb.org/manual/tutorial/aggregation-examples/#states-with-populations-over-10-million <--- shows a $match
[10:52:00] <Andy80> Nodex, thank you so much! Going to read it now :)
[10:55:14] <neophy> my mongodb document some thing like this: http://pastebin.com/xXR8DnUh .most of my query is based on timestamp, hostname and message. Now I want to build index for timestamp and hostname for existing collection.
[10:55:51] <neophy> does it build index for timestamp and hostname db.events.ensureIndex( { "timestamp": 1, "hostname": 1 } , {background: true, name: "events_time_host"} ) right?
[10:56:37] <neophy> I planing to build compound index for timestamp and hostname.
[10:57:09] <Nodex> that will build a compound index for those 2 fields
[10:58:53] <neophy> existing collection right? I have to do this through client interface for new insert right?
[10:59:40] <Nodex> you do it once from the shell or your driver
[10:59:56] <Nodex> the collection doesn't even have to exist
[11:01:09] <Andy80> Nodex, from what I'm reading, the $match is more similar to the sql HAVING command. I would need something relative to WHERE instead.
[11:12:03] <neophy> Nodex: I have single node MongoDB instance which has last two months syslog. The actual syslog size is 20GB in flat file. but MongoDB takes around 600GB. There are lots of dbname.xxx file with 2GB size. Does it make sense?
[11:12:56] <neophy> I need some pointer to learn how MongoDB does it internally...
[11:40:03] <_Heisenberg__> the docs are a little confusing towards this. because if I would call sh.addShard() for each member of my replica set I would end up with 4 shards in my case (2 replica sets with two members each)
[11:43:12] <rspijker> Add each shard to the cluster using the sh.addShard() method, as shown in the examples below. Issue sh.addShard() separately for each shard. If the shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets.
[11:43:27] <rspijker> there is even an example of the exact command you need
[11:45:42] <rspijker> cpu: I'm guessing you are doing the find over a different connection than the insert?
[11:45:51] <_Heisenberg__> If the shard is a replica set, specify the name of the replica set and specify _a_ member of the set -> so, which one? don't I have to specify all members of my replica-set if I want the whole replica-set to be a shard?!
[11:46:17] <rspijker> _Heisenberg__: nope, any member should do. Remember, the members know what other members are in the RS
[11:46:53] <rspijker> back in the day (not that far back, actually). you had to specify all members of the RS
[11:47:03] <braoru> hello .. I have a little problem
[11:48:08] <braoru> I have a collection of document and in each document a set of included object like ["stat1":"1","stat2":"3"] actually I can use aggregation to count number of occurence of stat2, stat3 etc.. but can I have the sum of all stat2, stat1, etc ?
[11:48:22] <braoru> or do I have to code around the query and go over each document one by one ?
[13:02:33] <hroark> seams a massive workaround for something so simple
[13:02:43] <braoru> I got precision for my last unclear question :) I have record like that : http://paste.fedoraproject.org/28967/37518917/ and I would like to know if it's possible with aggregation api to get sum of "motivations" and get something like { "motivationXX":"44", "motivationTT":"66", ..} word are in french but if needed I can just make it completely English and generic..
[13:04:16] <braoru> I can try to change de shema too .. I think it would be a good idea..
[13:24:36] <aandy> having used coffeescript (not anymore, just toyed with it), it's not as bad as people try to make it out to be. everything is context
[13:24:54] <aandy> if you expect it to be a language, it's not. it's sugar and shorthand, use it as such :p
[13:29:15] <aandy> true, but the lack of for loops makes you write some big workarounds sometimes
[13:29:24] <hroark> do does checking.for?.nested?.objects == true
[13:29:46] <aandy> anyway, i don't use it. but it did teach me a few things about a complicated `this` nest i had :)
[13:30:15] <aandy> right, but the same can be done with checking.for && checking.for.nest, but i agree when it becomes more nested than that it's ugly and confusing
[13:30:48] <aandy> and if you go so nested, it's usually because you're *trying* to structure random data :p
[13:33:06] <aandy> the only example i have is github's hubot. it's way more readable, and easier to follow the plugin/interface use than the js implementation would be
[13:33:37] <aandy> i didn't say not large, i said small. it can work for larger aswel, but i've never tried it, so can't say
[13:33:43] <Nodex> do you guys use less / sass / compass etc ?
[13:45:04] <Nodex> an SPA cannot kill the actual latnecy between my app and the server. For example, a chat app between 2 people - one cannot update the other on the other side of the world any faster than actual network latency
[13:51:53] <Nodex> I don't agree that SPA's are the future, they have a place for those that want to use them sure but they are not the only way to accomplish the things they do and VV with non SPA's
[13:52:07] <chostrander> Hello all! Speaking of performance... I have a mongodb that consists of 525 million records... every day we must process ~40 million records that can either exist (update) or be new (insert).
[13:52:27] <chostrander> right now the performance is very slow... and we don't have shards...
[13:52:41] <Nodex> you should probably shard that asap
[13:59:29] <leifw> chostrander: I'm one of the developers of TokuMX, you can ask me questions or come to the #tokutek channel if you like
[14:00:39] <chostrander> Hi leifw! I understand your response... just wanting to peoples experiences! :-)
[14:03:09] <leifw> chostrander: because of concurrent writers, we can't support a particular count() optimization that vanilla mongodb does, but for almost everything else, pretty much everyone we've heard from has seen big performance improvements
[14:04:16] <leifw> chostrander: these things are of course workload dependent, and I think most people that have tried us were already hitting the limits of mongodb's performance, so there's some selection bias
[14:15:14] <Andy80> is there a way to compare the _id against a list of ids? something like {'_id': ['id1', 'id2', 'id3']}
[14:15:48] <leifw> Andy80: I think you want an $in query: http://docs.mongodb.org/manual/reference/operator/in/#op._S_in
[14:42:58] <remonvv> leifw: Out of pure curiousity, can you explain a bit where the performance gains come from that are being claimed here?
[14:46:51] <Nodex> http://blog.mongodb.org/post/56876800071/the-most-popular-pub-names <--- I thought Derick wouldv'e written that tbh
[14:47:06] <aandy> Nodex: agreed. that's also why i don't try out that manye of the new <fancy libraries>, as i just can't see the use case, and *worse* that it's a lot of work to do less. it's a bad trend :)
[14:47:45] <aandy> it's basically, FORGET EVERYTHING YOU THOUGHT YOU KNEW ABOUT THE WEB, then learn how they think web should be, change all your habbits and then it's super easy to use
[14:47:47] <cheeser> Nodex: heh. i was just reading that, oo.
[14:49:14] <Nodex> aandy : I agree, to much abstraction going on whilst waiting for the browser vendors to catch up
[14:56:29] <remonvv> leifw: You here? Where can I find technical details on TokuMX? I'm looking through the code but it's a little tricky to see what you've changed.
[14:56:59] <Nodex> remonvv : it's on their site iirc
[14:57:28] <remonvv> NodeX: Really? I've been looking at that and all I can find is something about fractal trees instead of b-trees and some benchmarks.
[14:57:45] <Nodex> yeh, what more do you need ? :P
[14:57:58] <Nodex> it's a new word that sounds cool so it must be fast
[14:58:03] <remonvv> I need to know if it breaks contract with vanilla MongoDB and why it's so much faster.
[14:58:17] <Nodex> fractal stemmed = fract which means smaller which must mean faster
[15:03:19] <Nodex> I wrote a hello world app once, it was awesome
[15:03:29] <Nodex> it said "Hello" then right after "world"
[15:04:02] <deepy> I learned about threads by writing my first "world!Hello " app
[15:04:30] <Nodex> hello world apps are kinda stupid because the only person seeing it is the developer and the developer is in no way the entire world
[15:04:42] <Nodex> it should say "Hi there person in front of me"
[15:05:02] <deepy> You're beeing too literal, the greeting is extended to the world, it's a wide audience, it doesn't mean that everyone has to read it just that everyone is encouraged to :-)
[15:15:29] <remonvv> Hm. I'll read up on that stuff tonight.
[15:25:09] <Nodex> pretty good video tbh, makes a lot of sense
[15:26:54] <_jo_> I'm seeing this error come up on our staging servers: https://jira.mongodb.org/browse/SERVER-7768 If I pull a more recent Python driver, can that work around the problem?
[15:27:17] <_jo_> Let's assume I can't upgrade the MongoDB version because we're not hosting it in-house.
[15:43:38] <mrlami> does mongoDB have a php driver and if so what are the risks that in 6 months time it'll be invalid and unusable due to changes?
[15:44:41] <cheeser> yes, it does. and it's a pretty safe bet. the php driver author sits two desks down from me. :)
[15:45:35] <remonvv> and he has a bottle of very expensive whiskey
[16:02:19] <leifw> remonvv: if you'll be in the boston area tomorrow you can go see my coworker Zardosht give a technical talk about how we do it
[16:03:05] <leifw> remonvv: but basically what we did was we took mongodb, ripped out the storage code (b-trees, file allocation, extents, all that) and replaced it with our storage library that we use in tokudb
[16:03:39] <leifw> remonvv: it's an implementation of a fractal tree, which is a write-optimized data structure
[16:04:37] <leifw> remonvv: http://youtu.be/c-n2LGPpQEw is a description of why they're better than b-trees, we have a bunch of other posts about it if you google around
[16:05:28] <leifw> here's the talk tomorrow in boston if you're interested: http://www.meetup.com/Boston-MongoDB-User-Group/events/127576442/
[16:05:51] <leifw> there's also this: http://www.tokutek.com/2013/07/tokumx-fractal-treer-indexes-what-are-they/
[16:07:32] <leifw> the compatibility issues are: no geo/2d/2dsphere yet, no mixed vanilla/tokumx clusters, otherwise apps shouldn't notice a difference
[16:07:53] <leifw> I'll stop flooding the channel now, come ask in #tokutek if you have specific questions or want a longer lecture
[16:10:22] <remonvv> leifw: Thanks ;) I'll read up on it tonight. The only google hits on Fractal trees are to Toku so it's hard to find additional info that isn't directly from Toku ;)
[16:25:00] <flaper87> figured it out. It means there are more than 1 bound
[18:35:41] <braoru> hello, I have a collection of document looking like that http://paste.fedoraproject.org/29073/37520921/ and I want to obtain a list of "motivations" containing the sum of each .. any idea how to proceed with aggregation or something else ? (I can change the shema if needed)
[18:36:44] <ninepointsix> Hey, is there a way to force an insert to behave like an upsert? (or have the update command take an array?)
[18:54:07] <braoru> ok I change motivation to an array and I will chain unwind ..
[19:53:56] <braoru> DanWilson, can it come from the fact taht value and id are string ?
[19:55:00] <tg2> on a server with 256G of ram, seems a bit of a waste to limit batch inserts to 16MB total seeing as single document size limitation is 16Mb
[19:55:23] <DanWilson> braoru: that's because you are trying to sum a string
[19:55:35] <DanWilson> you need to convert the data you want to sum into a number type
[19:55:41] <DanWilson> once you have that,you can use this query:
[20:11:16] <braoru> DanWilson, arf no I was wrong :(
[20:12:50] <tjmehta> How does mongo consider subdocuments distinct - will it compare values of keys or does distinct only work for fields that hold a literal value?
[20:13:09] <braoru> DanWilson, any idea on how I can get the sum by id ? instead of the global one ?
[20:19:31] <cheeser> tjmehta: i believe it compares the entire document
[20:21:05] <tjmehta> cheeser: field by field, so what about indexing fields that are subdocuments -- is it sufficient to index the key of the document or should each sub key be indexed?
[20:21:11] <DanWilson> braoru: did you figure out how to reinsert the document with the right types?
[20:21:38] <braoru> DanWilson, and your request sent bacl a global sum :)
[20:21:46] <cheeser> tjmehta: you should test it but I would think if you're querying by subkeys you should index those explicitly.
[20:22:01] <cheeser> since there are no implicit indexes on subdocuments
[20:22:35] <braoru> DanWilson, I tryed to use _id : _id in the last group
[20:24:29] <tjmehta> cheeser: okay so last question if I do a distinct query for subdocuments and I have indexed the key for the subdocument (but not the children) - is that enough indexing for the distinct query if it is doing a field by field comparison?
[20:24:58] <DanWilson> that one sends me back each id and a 0, because of the type difference
[20:25:55] <braoru> DanWilson, if I change the type I get number
[20:26:12] <tjmehta> cheeser: got to run thanks ahead - still waiting on your last response though.
[20:27:31] <DanWilson> braoru: but you only get a single record returned?
[20:27:51] <braoru> no I get multiple record but data seem to be wrong .. I have to verify my insert
[22:05:34] <Industrial> Say I have a SQlite database with 500MB of records in 1 table with a date column with a unix timestamp (number) in it. I want to seach through this data set by start/end time. Currently takes 20-30 sec. How would mongodb do a better job at this?
[22:12:48] <Goopyo> Industrial: if its in memory it should be fairly quick
[22:13:23] <Goopyo> Mongodb IDs also have a built in timestamp thats queryable