[00:44:26] <_m> I should mention that's a quad-core box with 4gigs of ram. Pretty lean, IMO
[00:45:15] <_m> _Tristan: Agreed. But that's the same reason I don't use puppet. wtf should I learn a not-quite-ruby DSL? (granted this is no longer the case for pupper)
[04:13:41] <niriven> hi, i have been struggling for some time on how to design a database. I come from standard relational databases so i have a tendency to do want to do things that may be wrong.
[04:16:17] <niriven> I have two types of documnts, users, events. user and event has a shard identified, user_id. users is small (10000 or so), and events is large (150 million) for a test set. the event may or may not have an id is a valid id for a user (it can be null). The majority (95% of the in event) is a null user_id, and most of the queries i will be running require one for me to be interested in the result.
[04:17:46] <niriven> so do i go about 1) having a db for users, and a db for events, and "join" in my code, or create one big event docoment which replicates user data per event, or have one big user document which for each user has its associated events (but given that the majority of data does not have a user means i would have to assign it to a fake user, making a massive document (gigs in size)). Anyone have any insight?
[04:19:21] <crudson> do you ever care about the events with no user?
[04:20:12] <crudson> (I think you did answer that, re-reading)
[04:23:24] <niriven> under other filter critera, of course
[04:24:03] <niriven> eg. user must match crit 1, 2, 3, and relate to a event that that has critera a, b, c
[04:25:19] <crudson> I think 1) is the only reasonable solution. Replicating user data across events seems like too much work and redundancy. Having gigs in size document rules that out from the very start, both technically for mongodb constraints and logically.
[04:25:25] <niriven> ideally one document with users / events if all these events had users :(
[04:26:40] <niriven> crudson: great, i came to that conclusion as well, though not an ideal situation, it should work
[04:26:45] <crudson> making 1 extra user query for a v large number of event queries is nothing
[04:26:57] <crudson> the only issue I would see is a matter of concerrency
[04:28:01] <niriven> lastly, how much should i start to nitpick about redundancy in small documents, lets say i have 10 fields in a document with 1 level, basic object. 4 of those 10 fields are repeated often, so i start to break apart? im assuming not
[04:29:01] <niriven> if i break it apart it creates problems queries, if i dont i may have increased size on disk
[04:31:25] <crudson> it's a matter of reads vs writes. The logical integrity of your data is the important thing. If one gets updated a lot, which would result in having to update a ton of other documents, I would consider that a warning.
[04:31:57] <crudson> e.g. having a zip code, and always adding geo coords or state, that is not a risk, as that (almost never) changes
[04:32:15] <crudson> but the benefit of having lng/lat in that instance is a real benefit
[04:32:23] <niriven> most likely those fields are not updating often, its just insert and forget, and query later
[04:32:24] <crudson> even if you have that in your zipcode collection
[04:33:42] <niriven> this document can have any set of values for its 10 attributes, i'm just worried there might be replication across those attributes, nothing obvious that can be represented as its own document though
[04:34:07] <crudson> if you do have v large documents though, judiciously use 'fields' or 'slice' to restrict what portions of the documents you are returning across the wire.
[04:56:54] <niriven> i'm also debating wheter or not to create a compound index on a document. i read on mongodb that if you index a,b,c you it is only efficent to query on a, a,b or a,b,c, not a,c, or c,b?
[05:00:35] <niriven> given i could a query that filters on [a,b,c], [a,b], [a], [a,c], [b,c], is it best to create a compound index or individual indexes?
[05:02:40] <niriven> nevermind, googled helped find a good example :)
[05:54:06] <VooDooNOFX> Need some advice on a fresh mongo installation (mongo-10gen-2.2.0-mongodb_1). Inserted 10,897,335 entries. Looking up the process in top shows the following: 29641 mongod 16 0 24.6G 4046M 4016M R 43.0 51.5 0:55.67 /usr/bin/mongod -f /etc/mongod.conf. That's 24.6G SWAP usage?
[06:23:19] <telmich> I would like to run an arbiter for a replica set externally of our 2 datacenters - now when I add it to the replica set, it tries to connect to the internal host names, which are not reachable from external
[06:23:57] <telmich> is there any way the arbiter gets contacted _from_ the other nodes instead of connecting to them?
[07:28:22] <wereHamster> VooDooNOFX: isn't it 24G virt?
[07:36:54] <Null_Route> Hi Guys! in http://docs.mongodb.org/manual/release-notes/2.0/ , "Upgrade all mongos instances first, in any order. Since config servers use two-phase commit, shard configuration metadata updates will halt until all are up and running."
[07:37:03] <Null_Route> do they mean "config" instances ?
[07:37:58] <Null_Route> The next line in the config says "Upgrade mongos routers in any order."
[08:34:01] <NodeX> range queries also have to be the last part of an index iirc
[08:40:06] <Null_Route> Anyway - Upgrading a sharded Replica Set - Mongos first, or config servers?
[08:40:25] <Null_Route> http://docs.mongodb.org/manual/release-notes/2.0/ is unclear
[08:47:19] <Lujeni> Null_Route, your driver at first
[08:47:24] <Lujeni> then Upgrade all mongos instances first, in any order.
[09:23:53] <Lujeni> Hello - I meet this error ( assertion: 13111 field not found, expected type 2 ) when i try to restore. However, the source and destination are the same version.
[09:27:38] <arussel> when modifying a document with .update({}, {$set: {foo: "myvalue"}}, how can I set using a field from the document ?
[11:46:30] <tonny> Hey, i have a little issue, don't know if it is related to gridfs or nginx-gridfs module, i have setup a gridfs with more than 1000000 image files, and serve the files with nginx with gridfs module. every webserver i have use an mongos application server. everything is fine but once in a while a request on 1 webserver hangs and timesout. and won't serve till i restart the mongos on that webserver. But the same file works on a different webs
[11:46:59] <tonny> Maybe that one of you has or knows this problem and how to fix it ?
[12:20:05] <gyre007> why am I getting following exception ? replSet exception loading our local replset configuration object : 13132 nonmatching repl set name in _id field
[13:12:32] <vegivamp> I seem to have a bit of a fight with mongobackup (2.2) - I can't seem to perform a backup of anything but the admin DB using a read-only admin user. Is this expected behaviour? Write privileges for a read-only operation seem overbearing :-)
[13:16:15] <aboudreault> I don't see the use either
[13:16:22] <aboudreault> it just add extra maintenance to your app.
[13:18:04] <Vinx> I have some production limitation.. that does not allow me to install no server on the production machine.. so i am trying to use some tricks.. for instance what do you all think about: http://stackoverflow.com/questions/6437226/embedded-mongodb-when-running-integration-tests
[13:19:24] <algernon> set up a mongodb instance for testing, use that from your app when running CI. problem solved.
[13:19:52] <aboudreault> Vinx, maybe you should just use look another DB, maybe a file-based DB ?
[13:20:01] <tonny> hi all, i run into some issues with gridfs, you seen my post while back ?
[13:21:38] <tonny> Hey, i have a little issue, don't know if it is related to gridfs or nginx-gridfs module, i have setup a gridfs with more than 1000000 image files, and serve the files with nginx with gridfs module. every webserver i have use an mongos application server. everything is fine but once in a while a request on 1 webserver hangs and timesout. and won't serve till i restart the mongos on that webserver. But the same file works on a different webs
[13:24:47] <Vinx> it is a very interesting topic anyway...
[13:32:55] <vegivamp> So, anyone else ever tries mongodump with a read-only user?
[13:34:17] <Rhaven> hi all, i got this in config server log
[13:34:18] <Rhaven> [DataFileSync] flushing mmaps took 60105ms for 3 files
[13:34:46] <Rhaven> and after this my cluster died
[13:36:26] <seltar> hey.. how would i go about finding all elements in a collection that are unique, without knowing what they are? so skip all the duplicates, basically
[14:30:05] <vegivamp> According to O'Reilly's definitive guide, "mongodump uses the regular query mechanism" and "is no guarantee of consistent backups" - so it seems particularly silly that a read-only user would not be able to perform queries.
[14:30:32] <NodeX> I would think the consistent bit would be due to safe writes
[14:30:43] <NodeX> or writes still in memory waiting to be flushed
[14:33:45] <vegivamp> that's what I'm trying to do, yes
[14:34:00] <vegivamp> it looks more and more as if I'll just have to roll my own script with mongoexport
[14:34:37] <vegivamp> although apparently "Neither JSON nor TSV/CSV can represent all data types. Please be careful not to lose or change data (types) when using this. For full data fidelity, or backups, please use mongodump."...
[14:36:36] <NodeX> You can just take a snapshot if you use journals / lvm
[14:49:41] <vegivamp> Oh, I can get LVM if necessary, that's not the problem. I just fail to see how snapshot backups yield off-host differential backups
[14:50:05] <coopsh> tried it out, but mongos returns an exception: 'no master found on ...'. primaryPreferred is used and secondaries are available. Works as designed?
[14:53:26] <IceGuest_77_> hi, i have an application where i can use a text as kind of contract rules... actually a contract :)... happens that in the contract we have some calculations like d=c+e*q and q has a previous set .. i have some examples of the contract, i need some advices
[15:07:31] <NodeX> that's just a page of text, I am not sure what you're asking
[15:07:41] <IceGuest_77_> i can make changes to the "contract" actually i have designed this way because i think i can teach emploees to use this syntax to write contracts
[15:11:33] <NodeX> there is no maybe, that's the only way to do it
[15:29:46] <coopsh> is anybody actually using primaryPreferred read preference?
[15:30:15] <coopsh> I've the feeling that I'm the first one testing that feature ;)
[15:31:05] <doxavore> Are there any tools to see why MongoDB is only using a fraction of available physical memory and instead running about 50-80% disk IO?
[15:32:12] <doxavore> (Ideally something that works with the 2.0.x branch)
[15:33:55] <coopsh> doxavore: how do you know that only a fraction of physical memory is used?
[15:35:41] <doxavore> coopsh: It's hovering at ~40% system memory (nothing is in swap). Is there something else I should be looking at?
[15:36:31] <coopsh> system memory? which OS? what's the cache size?
[15:36:46] <coopsh> free -m is your friend on linux
[15:37:18] <doxavore> ahh - it's there, in the cache, not actively used by mongodb. gotcha.
[15:37:50] <coopsh> right. see http://www.mongodb.org/display/DOCS/Caching
[17:53:51] <jrdn> what's the right way to check if an aggregation is using the index properly?
[18:03:51] <ron> I've officially lost trust in morphia.
[18:07:12] <jrdn> also, i've got a lot of data which has a created_at mongo date field…. how can I query something like, "hours 10-15 of the last 7 days"
[18:08:10] <jrdn> i know I can rework my data to have "date, year, month, day, hour, minute" fields… but just wondering if i can do it with one indexed field
[18:42:28] <eherot> Is there a way to specify which server in a replica set acts as the data source when a new member is added to the set (assuming slaveOk = true)?
[18:52:40] <underh2o> Need help setting up mongodb with a Acquia Drupal Dev Stack configuration on a macbook. Can anyone help?
[18:53:22] <Derick> underh2o: you need to ask questions before help can be given
[18:53:36] <Forbeck> hi, can I set table scan per collection?
[18:56:10] <underh2o> Derick: I guess my question is Does anyone have experience with how you set up mongodb to work with Acquia Dev Stack? Would you mind sharing the process with me?
[18:57:31] <Derick> I doubt that counts as "Experience" though
[18:59:27] <underh2o> I have mongo working on a macbook. I have Acquia on the same macbook. I have not been able to get the mongo-php driver to install properly to work with the Acquia Dev Stack. I think they use there own PHP with Acquia. How did you get the driver/db to work with the stack?
[18:59:52] <Derick> I didn't use the acquia php stack
[19:12:18] <doxavore> When an RS secondary isn't being queried from directly, the only thing causing it to have high page faults are inserts/updates, right?
[20:28:09] <joe_p> after adding new member to replica set getting assertions. [slaveTracking] User Assertion: 11000:E11000 duplicate key error index: local.slaves.$_id
[20:29:02] <joe_p> hoping not to have to stop primary as many production applications run on that machine
[20:36:20] <lukebergen> quick question. I feel like this should be easy but I'm not quite sure how do do this
[20:36:49] <lukebergen> if I have an array name = ["alice", "joe", "bob"]
[20:37:12] <lukebergen> and a collection of documents that all have a "names" attribute (I guess in this weird example people can have multiple names)
[20:37:46] <lukebergen> how would I query mongo to get me all the people whose name array contains at least one of the elements from my name array variable?
[20:38:23] <lukebergen> Is that possible in a single query or would I just have to do something like People.find({names: name[0]}) + People.find({names: name[1]}) + ...
[20:39:38] <crudson> lukebergen: use http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24in
[20:40:16] <lukebergen> I saw that but wasn't sure if it'd work if the field in the document is also an array
[21:10:52] <rustyrazorblade1> newrelic is reporting a lot (hundreds) of calls to find_one, but we don't have that anywhere in our codebase. does pymongo use find_one internally as a next() operation on the resultset?
[21:25:30] <TTimo> interesting. didn't realize a replica set would need > 5GB of disk out of the gate, even if there is no data in the db yet
[21:57:52] <xaka> i'm trying to update document using $set and it does nothing: db.collection.update({_id:"505a396f8cdaf26525000061"}, {$set:{field:1}}). "field" is always 0
[22:04:45] <cmendes0101> xaka: have to ask. Can you pull up the record with db.collection.find({_id:"505a396f8cdaf26525000061"}) ?
[22:08:45] <xaka> cmendes0101: no, but what i've found is that it works when i wrap _id into ObjectId
[22:13:17] <VooDooNOFX_> I'm seeing something like 332 ops/sec 1 MB/sec results when I run mongoperf according to the wiki page instructions. Any particular reason its much lower than the example output? (core i3, 1tb sata hdd, 8gb ram).
[22:14:33] <VooDooNOFX_> while on my i7, 16gb ram, osx box i'm seeing 1101113 ops/sec 4318 MB/sec
[23:57:18] <LouisT> Hello, I seem to be having a memory leak with the node-mongodb-native module, I was just wondering if anyone here would be willing to help me out with it? I'm not even sure if it's considered a memory leak, but it fills uses over 2.5GB of memory when I run my project.