[00:40:36] <nkconnor> has anyone gotten sporadically missing data when inserting on a new Mongo DB
[01:11:05] <pomke> an FYI, I just got punched in the face by mongoDB's date calculatings having TZ offsets, this is a thing to watch for ;p https://gist.github.com/pomke/9cb995701574420d6777
[03:10:58] <keeger> i am building basically a delayed message queue. so i was thinking of having a collection of Messages {} that contain a DueDate field and a Payload
[03:11:39] <keeger> is it good enough to just define an index on DueDate, so my queries are performant for Select messages where DueDate <= now ?
[03:12:06] <keeger> my terms are not mongo, i apologize. coming from sql server world
[03:12:17] <Boomtime> not enough information to know
[03:13:26] <daidoji> its heresy in this channel perhaps, but https://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf
[03:13:35] <Boomtime> how big is the collection? what is the cardinalty of DueDateacross those docs? can you limit the number of items you get back? are the results sorted?
[03:14:04] <keeger> i did look at rabbitmq and celery etc, but wasn't satisfied with many of them
[03:14:17] <daidoji> however, http://docs.mongodb.org/manual/tutorial/use-capped-collections-for-fast-writes-and-reads/ is the typical way to design such things in Mongo
[03:17:25] <keeger> thats my starting number for now, i want it to be able to scale, hence why i'm looking at mongo :)
[03:18:10] <keeger> the use case i'm working on putting together is to get 10k jobs that are Due at the same time
[03:18:48] <keeger> if i can order the collection, than that is great, since i dont' mind a slightly slower insert operation
[03:19:08] <Boomtime> an index is by definition an ordering system
[03:19:59] <keeger> i thought fom daidoji's response that there was a mongo sort that was somehow diff from an index
[03:20:20] <Boomtime> ok, you ask if this will "be fast", but if your query is for "before today" and that matches every document in the database and you list them all then, no, that isn't going to be fast
[03:20:46] <Boomtime> this is why you need to supply more information about what you intend to do
[03:21:10] <keeger> Boomtime, i apologize for the lack of clarity
[03:21:28] <keeger> i want to pop the messages off a queue at a specified DueDate
[03:21:29] <Boomtime> if you query for "before today sorted by duedate limit of 5" then yes, that will be pretty damn quick
[03:22:07] <Boomtime> ok, then you don't really want to query for more than 1 at a time - in fact, you don't want to query precisely at all
[03:22:19] <keeger> no, i do need to query for more than 1
[03:22:33] <keeger> the jobs are at second precision, so i could have 10k due at the same second
[03:22:40] <keeger> i'd like to get those 10k back
[03:32:51] <daidoji> Boomtime: I guess we'll just have to agree to disagree
[03:33:40] <Boomtime> fine, you implement your two queues, you re-implement a multi-threaded job distribution system that already exists because you don't know you've already got one
[03:34:07] <Boomtime> you are treating the database like a filesystem, it is better than that
[03:34:30] <Boomtime> you will also suffer from serious concurrency and recovery problems in your method
[03:34:49] <keeger> ur saying that mongo can notify me when DueDate is reached?
[03:35:14] <Boomtime> no, but there is nothing stopping you from sending workers to sleep when they run out of jobs
[03:35:20] <daidoji> well if we're going to go down this road I'll go back to my original point https://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf
[04:11:57] <daidoji> yeah, I'm a ZeroMQ fan myself
[04:30:36] <applecornsyrup> i do an insert and get a document too large error...so i split the document and try again. then i get a duplicate id error.
[04:40:37] <Boomtime> you have two distinct problems, you should check what you are actually inserting, and if you get a duplicate _id then print out what it is and do some debugging
[05:32:47] <applecornsyrup> so the issue is that pymongo modifies the object i was inserting. it adds the "_id" key. so when i copied after the failure, i was copying the inserted id
[06:08:52] <raar> Hey guys. I have a document that looks like: { _id: { "type": "type-a", "timestamp": "native" : NumberLong(1409961600) }, "feed": "someFeed"}, "values": { "hits": 20 } }. I'm trying to write an aggregation query to: get the sum of all hits, of all documents that have an "_id.feed" in a list (of 20 feed options), and have an "_id.timestamp.nativetime" between now and one week ago
[06:09:17] <raar> Could anyone give me any suggestions on how to do this?
[06:09:30] <raar> or if it's in fact possible to do with the aggregation framework?
[06:21:29] <zamnuts> Is there any way to perform an atomic update on a single document within a sharded collection without the use of the shard key? Any tricks? I'm afraid I'll be stuck w/ 2 queries (non-atomic): collection.findOne and collection.(update|findAndModify)
[08:17:17] <LinkRage> how do you completely disable ALL mongod logging while using --fork/fork=true ? Any suggestions?
[08:17:57] <LinkRage> I don't want to do ugly things like nohup/dtach/screen/tmux etc.
[08:19:04] <LinkRage> when logpath=/dev/null mongod crashes when there are many requests/connections. This is not the case when logging to regular file.
[10:15:20] <jordz> Anyone know where I can get MMS support?
[10:17:48] <Derick> jordz: it's part of our normal support offering IIRC
[10:20:47] <jordz> Hey Derick, yeah, there seems to be an issue with the graphs when I flick between the zoom options. The points are all wrong and backtrack
[11:56:57] <nyssa> helo everyone, a quick question - is there a way to sync an existing replicaset in DC1 to a new one in DC2 if they can't talk to each other directly?
[11:58:19] <nyssa> i have a machine that routes all traffic from DC1 to a node in DC2 but the data doesnt seem toget replicated, tried it other way around and syncing from a target, but no luck - any ideas?
[12:49:12] <cheeser> how are they going to sync afterwards if they can't talk to each other?
[12:49:42] <nyssa> i just need to do the initial sync - we are movingfrom DC1 to DC2
[12:50:22] <nyssa> so once the move is one all incoming traffic will go straight to DC2 - all I want is the data that is right now on DC1 to be replicated to DC2 without significant downtime
[12:51:24] <cheeser> you still have the potential for dropped data for anything between the time you copy over and reroute traffic, though.
[12:52:23] <nyssa> yeah, the problem is - we dont use snapshots, and turning off the mongod instance would be okay but the database in big, sync would take way too long to explain it to customers and partners
[12:53:43] <nyssa> rerouting traffic is almost instant
[12:55:15] <cheeser> doing the dump, the copy, and bringing up the servers in DC2 takes nonzero time. that window could potentially cover lost data unless you disable your app during this window.
[12:55:57] <nyssa> so there is no way i could sync the sencod eplicaset and bring it up to date on live set in DC1 ?
[12:57:28] <cheeser> you can't do that iteratively, no.
[12:58:01] <cheeser> well, if your data is timestamped, i think dump can take a query. then you can just dump the newest stuff and restore only that.
[12:58:13] <cheeser> but that's a lot of moving parts and room for failure.
[12:58:30] <cheeser> you can't just open up DC2 to DC1 and sync normally?
[13:01:06] <nyssa> that would take some serious firewall changes in DC1 and we had terrible experience with them doing it, I was just wondering if there is a clever way to sync them using a middleman
[13:03:13] <cheeser> you could ssh tunnel if that'd work
[15:09:43] <ohwhoa> is it ok If in replica set available only on instance and it runs as secondary ?
[15:26:11] <giorni> Hi guys. I'm trying to do an incremental update to some subdocuments of my collection, but whenever the defined subdocuments doesn't exist on the collection, I want to create it. First of all, Is this even possible?
[15:32:42] <EricL> Is there a way to force a sharded collection to rebalance?
[15:33:29] <EricL> I have 8k < 1k documents and that collection is read from very heavily. I want Mongo to balance that collection properly and don't want to take the downtime of reloading the collection.
[15:33:43] <EricL> Googling around basically just tells me nothing.
[16:03:08] <hydrajump> hi anyone running mongodb in a docker container? I'm trying to create my own from my own dockerfile but I'm having an issue with permissions even after following mongo docs and a docker doc on mongo
[16:03:58] <hydrajump> This is my dockerfile: https://gist.githubusercontent.com/hydrajump/f1907cc2789dbb136da7/raw/445da9757a09f35d2a5b4c80947545433c102997/gistfile1.txt
[16:05:26] <hydrajump> when I do docker run -p 27017:27017 hydrajump/mongodb all I get is the help documentation...
[16:10:05] <ttxtt> hi - I'm having trouble searching for documentation on this. Any pointers on best practices for storing DB config and user information in an app without having it hard-coded and checked into source control?
[16:16:52] <kali> ttxtt: configuration is usually something that is provided by an application framework
[16:21:41] <ttxtt> kali: thank you. assuming I have a way to store the information (e.g. environment variables), is it typical to use plaintext database user/password, and would I make a user specifically for a particular app deployment? Or is there some kind of public key strategy maybe?
[16:27:43] <kali> ttxtt: i can't help you much with that, i never used auth in mongo
[16:31:27] <ttxtt> kali: no problem. maybe I'm missing something though - is there a more common way to protect access to DBs, like having them just run on localhost?
[16:32:28] <kali> ttxtt: security at the ip level. firewalling basically.
[16:32:47] <kali> ttxtt: listening on localhost is similar in mono machine deployment
[16:36:48] <ttxtt> kali: ok i see. I was playing around with heroku+mongolab, but they seem to use auth and public DNS names. would be nice if there was a simple way to spin up a little firewalled cluster
[16:37:58] <kali> ha, yeah, with distants database, the firewalling approach is irrelevant
[16:38:45] <kali> unless you can use a vpm of some sort
[16:54:48] <EricL> Is there a way to force a sharded collection to rebalance? I have 8k < 1k documents and that collection is read from very heavily. I want Mongo to balance that collection properly and don't want to take the downtime of reloading the collection. Googling isn't helpful.
[16:56:24] <kali> EricL: a 8MB collection will not shard itself unless you change the chunk size. it's 64MB by default, iirc
[16:57:01] <kali> if you have a read problem, replication might be a better option
[16:58:42] <EricL> I do have a read problem, there are already 2 additional secondaries.
[16:58:58] <EricL> I just want the balance to be better than 97%, 0%, 3%.
[16:59:14] <EricL> I have 3 shards and each shard has 1 primary (obviously) and 2 secondaries.
[16:59:25] <EricL> And I have a single VERY high volume read collection.
[19:24:11] <prbc> How can I push a value to a Schema array only if it doesnt exist?
[20:18:19] <sellout> Can I do something like emit(null, this[x]) in mapreduce to have Mongo generate a random key, or do I need to create the key myself?
[20:25:16] <cheeser> you'd have to generate it, afaik
[21:58:18] <sellout> cheeser: ObjectId() did it for me.