PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 17th of September, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:40:12] <nkconnor> hi
[00:40:36] <nkconnor> has anyone gotten sporadically missing data when inserting on a new Mongo DB
[01:11:05] <pomke> an FYI, I just got punched in the face by mongoDB's date calculatings having TZ offsets, this is a thing to watch for ;p https://gist.github.com/pomke/9cb995701574420d6777
[01:11:14] <pomke> calculations*
[01:17:02] <pomke> TL;DR:
[01:17:10] <pomke> > new Date('09/17/2014')
[01:17:12] <pomke> ISODate("2014-09-16T14:00:00Z")
[01:17:25] <Boomtime> @pomke: that is nothing to do with mongodb: you are looking at javascript
[01:17:56] <Boomtime> use ISODate to say what you mean
[01:18:09] <Boomtime> otherwise you are at the whims of javascript to fill in the blanks
[01:18:33] <pomke> ayup
[01:18:40] <Boomtime> mongodb uses UTC only, any and ALL conversions are entirely a client problem
[01:19:14] <Boomtime> i agree it is a trap for beginners, but it's a trap unrelated to mongodb
[03:04:28] <keeger> evening
[03:04:55] <keeger> i was wondering about the global write lock in mongo
[03:05:08] <keeger> is that database level or mongodb process level?
[03:06:20] <daidoji> keeger: what do you mean?
[03:06:44] <keeger> I've read that when doing a write operation in mongo, it grabs a global write lock
[03:07:42] <Boomtime> there has not been a global write lock since 2.0, a very long time ago, so you read wrong
[03:07:57] <Boomtime> where did you read this?
[03:08:05] <Boomtime> btw: http://docs.mongodb.org/manual/faq/concurrency/#what-type-of-locking-does-mongodb-use
[03:08:09] <keeger> stack overflow i think
[03:08:52] <Boomtime> without a reference i'll assume it was in a dream, anyway it is not correct
[03:09:21] <keeger> lol
[03:09:32] <Boomtime> :)
[03:09:38] <keeger> i was researching it last week. so it appears that it's db lock level
[03:09:42] <keeger> that's good.
[03:09:51] <Boomtime> right, there are database level write locks
[03:10:30] <keeger> well lemme ask you a question
[03:10:58] <keeger> i am building basically a delayed message queue. so i was thinking of having a collection of Messages {} that contain a DueDate field and a Payload
[03:11:39] <keeger> is it good enough to just define an index on DueDate, so my queries are performant for Select messages where DueDate <= now ?
[03:12:06] <keeger> my terms are not mongo, i apologize. coming from sql server world
[03:12:17] <Boomtime> not enough information to know
[03:13:26] <daidoji> its heresy in this channel perhaps, but https://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf
[03:13:35] <Boomtime> how big is the collection? what is the cardinalty of DueDateacross those docs? can you limit the number of items you get back? are the results sorted?
[03:14:04] <keeger> i did look at rabbitmq and celery etc, but wasn't satisfied with many of them
[03:14:16] <keeger> i can sort the collection?
[03:14:17] <daidoji> however, http://docs.mongodb.org/manual/tutorial/use-capped-collections-for-fast-writes-and-reads/ is the typical way to design such things in Mongo
[03:14:34] <daidoji> keeger: yeah
[03:15:22] <keeger> the number of items is not small, so i was thinking of a collection first
[03:15:43] <keeger> and see how it performed.
[03:15:45] <Boomtime> "not small" - example numbers?
[03:16:03] <keeger> 20k to start
[03:16:12] <Boomtime> that is nothing
[03:17:25] <keeger> thats my starting number for now, i want it to be able to scale, hence why i'm looking at mongo :)
[03:18:10] <keeger> the use case i'm working on putting together is to get 10k jobs that are Due at the same time
[03:18:48] <keeger> if i can order the collection, than that is great, since i dont' mind a slightly slower insert operation
[03:19:08] <Boomtime> an index is by definition an ordering system
[03:19:59] <keeger> i thought fom daidoji's response that there was a mongo sort that was somehow diff from an index
[03:20:20] <Boomtime> ok, you ask if this will "be fast", but if your query is for "before today" and that matches every document in the database and you list them all then, no, that isn't going to be fast
[03:20:46] <Boomtime> this is why you need to supply more information about what you intend to do
[03:21:10] <keeger> Boomtime, i apologize for the lack of clarity
[03:21:28] <keeger> i want to pop the messages off a queue at a specified DueDate
[03:21:29] <Boomtime> if you query for "before today sorted by duedate limit of 5" then yes, that will be pretty damn quick
[03:22:07] <Boomtime> ok, then you don't really want to query for more than 1 at a time - in fact, you don't want to query precisely at all
[03:22:19] <keeger> no, i do need to query for more than 1
[03:22:33] <keeger> the jobs are at second precision, so i could have 10k due at the same second
[03:22:40] <keeger> i'd like to get those 10k back
[03:23:05] <Boomtime> so?
[03:23:30] <keeger> what is faster, pull back 10k in one call, or loop 10k times grabbing 1 at a time?
[03:23:34] <daidoji> keeger: no sorry, wasn't meaning to imply that
[03:23:48] <daidoji> keeger: pull as much back as you can hold in memory
[03:23:57] <keeger> daidoji, that's my thinking too
[03:24:08] <Boomtime> let's start with you defining what you mean by "grabbing one" - let's say you have this job, now what?
[03:24:13] <daidoji> https://www.youtube.com/watch?v=n5BgCnjVQII
[03:24:43] <Boomtime> @daidoji: congratulations you just re-implemented a database, why?
[03:24:44] <daidoji> keeger: the same principles as in this video apply
[03:25:31] <daidoji> Boomtime: because thats what's fastest in the context of his question
[03:25:41] <keeger> daidoji, interesting, am looking at it
[03:25:50] <Boomtime> assuming that he has to do nothing with those results, yes
[03:25:54] <daidoji> keeger: this is part of a series, its pretty good imo
[03:26:14] <Boomtime> i.e you must assume that the result-set is being piped to dev/null for that to be the fastest option
[03:26:34] <daidoji> I don't follow
[03:26:54] <Boomtime> you get your 10,000 results back... now what?
[03:27:10] <daidoji> Boomtime: do whatever I want with them?
[03:27:15] <keeger> i fire off a job pool that sends notifications out
[03:27:19] <Boomtime> you now have a single process that is holding all of your jobs to run
[03:27:39] <Boomtime> in other words, you just moved your database of 10,000 items to a new location
[03:27:48] <Boomtime> what did you achieve?
[03:28:01] <keeger> i have code that reads the payload and does an action on it
[03:28:12] <keeger> the important part is When to notify them with the payload
[03:28:24] <Boomtime> so.. you iterate over 10,000 items and farm them out?
[03:28:36] <keeger> yep
[03:28:40] <Boomtime> so you are single-threaded, really?
[03:28:57] <daidoji> Boomtime: I'm afraid I don't follow
[03:28:58] <keeger> no, it's multi threaded
[03:29:07] <daidoji> once those results are in memory I can thread as much as I want with them
[03:29:19] <Boomtime> why don't the available workers just get individual jobs directly from the database?
[03:29:20] <keeger> 1 thread that checks the queue and grabs work, that then farms it to other worker threads
[03:29:21] <daidoji> I can fork()
[03:29:27] <daidoji> I can shared memory those bitches
[03:29:42] <Boomtime> ok, so now you are bound to a single machine
[03:29:52] <daidoji> Boomtime: not particularly...
[03:30:24] <Boomtime> "1 thread that checks the queue and grabs work, that then farms it to other worker threads"
[03:30:33] <Boomtime> this is so perfect... where is "the queue"?
[03:30:42] <keeger> memory.
[03:30:43] <Boomtime> isn't the queue represented by the database?
[03:30:57] <Boomtime> why are you implementing two levels of storage?
[03:31:05] <daidoji> now I think you're just trolling me, so I'm gonna stop now
[03:31:10] <Boomtime> the database has your item queue, sorted and all
[03:31:18] <keeger> i need a queuu in a db so if the server crashes, those jobs are not lost
[03:31:42] <Boomtime> why would you retrieve it in bulk then re-implement a queue locally?
[03:31:42] <keeger> the work is done by code, which will then do its thing.
[03:31:54] <keeger> daidoji, i think he's trolling me too lol
[03:32:11] <Boomtime> of course i am trolling you, you are implementing two queues and i can't see why
[03:32:29] <Boomtime> it is a rookie mistake
[03:32:51] <daidoji> Boomtime: I guess we'll just have to agree to disagree
[03:33:40] <Boomtime> fine, you implement your two queues, you re-implement a multi-threaded job distribution system that already exists because you don't know you've already got one
[03:34:07] <Boomtime> you are treating the database like a filesystem, it is better than that
[03:34:30] <Boomtime> you will also suffer from serious concurrency and recovery problems in your method
[03:34:34] <keeger> so wait
[03:34:49] <keeger> ur saying that mongo can notify me when DueDate is reached?
[03:35:14] <Boomtime> no, but there is nothing stopping you from sending workers to sleep when they run out of jobs
[03:35:20] <daidoji> well if we're going to go down this road I'll go back to my original point https://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf
[03:35:26] <keeger> lol
[03:35:29] <Boomtime> only one worker needs to keep watch
[03:35:45] <Boomtime> which is exactly what you are planning to get your re-implemented queue to do
[03:36:03] <Boomtime> @daidoji: i agree with that doc
[03:36:07] <keeger> daidoji, i sent u a pm lol
[03:36:20] <Boomtime> this is not very good use of a database, but you may as well use all the features you have
[03:36:46] <Boomtime> there are lots of persistent queue technologies, if you want to research those
[03:49:37] <ketamisc> list
[04:03:50] <Boomtime> I knew I'd heard of RabbitMQ before somewhere
[04:03:52] <Boomtime> https://blog.serverdensity.com/replacing-rabbitmq-with-mongodb/
[04:11:57] <daidoji> yeah, I'm a ZeroMQ fan myself
[04:30:36] <applecornsyrup> i do an insert and get a document too large error...so i split the document and try again. then i get a duplicate id error.
[04:30:38] <applecornsyrup> http://codepad.org/cPf55KGs
[04:30:51] <applecornsyrup> it seems that the document is still being inserted on the document too large error
[04:31:15] <applecornsyrup> how can i know what happens on an insertion error...and if the document was added anyway?
[04:39:20] <Boomtime> @applecornsyrup: the DocumentTooLarge error was almost certainly raised locally without even talking to the server
[04:39:21] <Boomtime> https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L1072
[04:40:37] <Boomtime> you have two distinct problems, you should check what you are actually inserting, and if you get a duplicate _id then print out what it is and do some debugging
[05:32:47] <applecornsyrup> so the issue is that pymongo modifies the object i was inserting. it adds the "_id" key. so when i copied after the failure, i was copying the inserted id
[06:08:52] <raar> Hey guys. I have a document that looks like: { _id: { "type": "type-a", "timestamp": "native" : NumberLong(1409961600) }, "feed": "someFeed"}, "values": { "hits": 20 } }. I'm trying to write an aggregation query to: get the sum of all hits, of all documents that have an "_id.feed" in a list (of 20 feed options), and have an "_id.timestamp.nativetime" between now and one week ago
[06:09:17] <raar> Could anyone give me any suggestions on how to do this?
[06:09:30] <raar> or if it's in fact possible to do with the aggregation framework?
[06:21:29] <zamnuts> Is there any way to perform an atomic update on a single document within a sharded collection without the use of the shard key? Any tricks? I'm afraid I'll be stuck w/ 2 queries (non-atomic): collection.findOne and collection.(update|findAndModify)
[06:33:09] <raar> db.statistics.group({"initial": { "sumvalues_clicks": 0 }, "reduce": function(obj, prev) { prev.sumvalues_clicks = prev.sumvalues_clicks + (obj["values"]["hits"] - 0); }, "cond": { "_id.feed": { "$in": ["2222", "1111", "2323", "1111", "3333"]} } });
[06:33:18] <raar> this doesn't work, it tells me [ { "sumvalues_clicks" : NaN } ]
[06:33:39] <raar> seems to be a problem with obj["values"]["hits"]
[06:33:44] <raar> but I'm not sure what it is
[06:37:38] <zamnuts> raar, you sure all of obj.values.hits is a Number and exists (not sparse)?
[06:38:17] <raar> zamnuts: I'm not sure they all exist.. I thought with - 0 it would be cast to a number in either case
[06:39:07] <raar> how do I make sure where it doesn't exist will be set to zero?
[06:39:12] <zamnuts> raar, NaN - 0 = NaN, so no, do: prev.sumvalues_clicks + (obj['values']['hits']||0)
[06:40:02] <zamnuts> raar, sorry, should be `prev.sumvalues_clicks + Number(obj['values']['hits'])||0`
[06:41:29] <zamnuts> raar, blah and sorry again, you need to wrap taht Number(...)||0 in parens () due to operator precedence.
[06:41:46] <raar> zamnuts: got it, that worked like a charm! thank you so much
[06:41:55] <raar> I've been trying to fix this up for over an hour hehe
[06:42:00] <zamnuts> yw ;)
[08:17:17] <LinkRage> how do you completely disable ALL mongod logging while using --fork/fork=true ? Any suggestions?
[08:17:57] <LinkRage> I don't want to do ugly things like nohup/dtach/screen/tmux etc.
[08:19:04] <LinkRage> when logpath=/dev/null mongod crashes when there are many requests/connections. This is not the case when logging to regular file.
[10:15:20] <jordz> Anyone know where I can get MMS support?
[10:17:48] <Derick> jordz: it's part of our normal support offering IIRC
[10:20:47] <jordz> Hey Derick, yeah, there seems to be an issue with the graphs when I flick between the zoom options. The points are all wrong and backtrack
[10:20:55] <jordz> Like a pretty drawing
[10:21:02] <jordz> But not helpful
[10:21:04] <Derick> oh - have a screenshot?
[10:21:22] <jordz> Yeah, 2 ticks. It's fine when you refresh (on initial load of the graphs)
[10:21:59] <jordz> https://twitter.com/jordanisonfire/status/512183841813823488/photo/1
[10:22:34] <jordz> The labelling at the bottom is an issue with D3 which I've seen before
[10:22:46] <Derick> oh, that looks odd. Let me poke the support people
[10:22:46] <jordz> It's the lines I'm more bothered about :P
[10:23:00] <jordz> Thanks :)
[10:30:10] <Derick> not getting any reply...
[10:31:33] <Derick> jordz: http://jira.mongodb.org/MMS should be open for filing tickets
[10:31:54] <Derick> http://jira.mongodb.org/browse/MMS
[10:31:55] <Derick> even
[11:04:10] <Derick> jordz: ping?
[11:34:04] <jordz> Ahh sorry, Derick, yeah I'm going to open a ticket now. :)
[11:34:10] <Derick> np :-)
[11:34:27] <jordz> Thanks for the help :)
[11:34:32] <Derick> np
[11:34:44] <Derick> (that seems to be my only vocabulary)
[11:37:17] <jordz> Haha, np ;)
[11:56:57] <nyssa> helo everyone, a quick question - is there a way to sync an existing replicaset in DC1 to a new one in DC2 if they can't talk to each other directly?
[11:58:19] <nyssa> i have a machine that routes all traffic from DC1 to a node in DC2 but the data doesnt seem toget replicated, tried it other way around and syncing from a target, but no luck - any ideas?
[12:49:12] <cheeser> how are they going to sync afterwards if they can't talk to each other?
[12:49:42] <nyssa> i just need to do the initial sync - we are movingfrom DC1 to DC2
[12:50:21] <cheeser> http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/#replica-set-resync-by-copying
[12:50:22] <nyssa> so once the move is one all incoming traffic will go straight to DC2 - all I want is the data that is right now on DC1 to be replicated to DC2 without significant downtime
[12:51:24] <cheeser> you still have the potential for dropped data for anything between the time you copy over and reroute traffic, though.
[12:52:23] <nyssa> yeah, the problem is - we dont use snapshots, and turning off the mongod instance would be okay but the database in big, sync would take way too long to explain it to customers and partners
[12:53:43] <nyssa> rerouting traffic is almost instant
[12:55:15] <cheeser> doing the dump, the copy, and bringing up the servers in DC2 takes nonzero time. that window could potentially cover lost data unless you disable your app during this window.
[12:55:57] <nyssa> so there is no way i could sync the sencod eplicaset and bring it up to date on live set in DC1 ?
[12:56:02] <nyssa> i have to use the dump ?
[12:57:28] <cheeser> you can't do that iteratively, no.
[12:58:01] <cheeser> well, if your data is timestamped, i think dump can take a query. then you can just dump the newest stuff and restore only that.
[12:58:13] <cheeser> but that's a lot of moving parts and room for failure.
[12:58:30] <cheeser> you can't just open up DC2 to DC1 and sync normally?
[13:01:06] <nyssa> that would take some serious firewall changes in DC1 and we had terrible experience with them doing it, I was just wondering if there is a clever way to sync them using a middleman
[13:03:13] <cheeser> you could ssh tunnel if that'd work
[13:19:15] <nyssa> cheeser: thanks, Ill try that
[13:22:26] <Nodex> nyssa : to do initial syncs I just set my mongod as a slave, it syncs fine
[13:22:59] <cheeser> direct connections between machines is problematic is the wrinkle
[13:23:31] <Nodex> sorry, I didn't see the whole conversation
[14:05:09] <jordz> Anyone run MongoDB on CentOS 7?
[14:12:42] <jordz> I'm getting mongod.service: control process exited, code=exited status=1
[14:37:27] <jordz> So weird...
[14:37:30] <jordz> reinstalled
[14:37:31] <jordz> worked fine
[15:09:43] <ohwhoa> is it ok If in replica set available only on instance and it runs as secondary ?
[15:26:11] <giorni> Hi guys. I'm trying to do an incremental update to some subdocuments of my collection, but whenever the defined subdocuments doesn't exist on the collection, I want to create it. First of all, Is this even possible?
[15:32:42] <EricL> Is there a way to force a sharded collection to rebalance?
[15:33:29] <EricL> I have 8k < 1k documents and that collection is read from very heavily. I want Mongo to balance that collection properly and don't want to take the downtime of reloading the collection.
[15:33:43] <EricL> Googling around basically just tells me nothing.
[16:03:08] <hydrajump> hi anyone running mongodb in a docker container? I'm trying to create my own from my own dockerfile but I'm having an issue with permissions even after following mongo docs and a docker doc on mongo
[16:03:58] <hydrajump> This is my dockerfile: https://gist.githubusercontent.com/hydrajump/f1907cc2789dbb136da7/raw/445da9757a09f35d2a5b4c80947545433c102997/gistfile1.txt
[16:05:26] <hydrajump> when I do docker run -p 27017:27017 hydrajump/mongodb all I get is the help documentation...
[16:05:30] <hydrajump> hmm maybe I know
[16:07:06] <hydrajump> nope not sure
[16:07:12] <hydrajump> I'll ask on #docker
[16:10:05] <ttxtt> hi - I'm having trouble searching for documentation on this. Any pointers on best practices for storing DB config and user information in an app without having it hard-coded and checked into source control?
[16:16:52] <kali> ttxtt: configuration is usually something that is provided by an application framework
[16:21:41] <ttxtt> kali: thank you. assuming I have a way to store the information (e.g. environment variables), is it typical to use plaintext database user/password, and would I make a user specifically for a particular app deployment? Or is there some kind of public key strategy maybe?
[16:27:43] <kali> ttxtt: i can't help you much with that, i never used auth in mongo
[16:31:27] <ttxtt> kali: no problem. maybe I'm missing something though - is there a more common way to protect access to DBs, like having them just run on localhost?
[16:32:28] <kali> ttxtt: security at the ip level. firewalling basically.
[16:32:47] <kali> ttxtt: listening on localhost is similar in mono machine deployment
[16:36:48] <ttxtt> kali: ok i see. I was playing around with heroku+mongolab, but they seem to use auth and public DNS names. would be nice if there was a simple way to spin up a little firewalled cluster
[16:37:58] <kali> ha, yeah, with distants database, the firewalling approach is irrelevant
[16:38:45] <kali> unless you can use a vpm of some sort
[16:38:48] <kali> vpn
[16:40:01] <cheeser> virtual private mongo
[16:40:41] <Nodex> haha coined it!
[16:54:48] <EricL> Is there a way to force a sharded collection to rebalance? I have 8k < 1k documents and that collection is read from very heavily. I want Mongo to balance that collection properly and don't want to take the downtime of reloading the collection. Googling isn't helpful.
[16:56:24] <kali> EricL: a 8MB collection will not shard itself unless you change the chunk size. it's 64MB by default, iirc
[16:57:01] <kali> if you have a read problem, replication might be a better option
[16:58:42] <EricL> I do have a read problem, there are already 2 additional secondaries.
[16:58:58] <EricL> I just want the balance to be better than 97%, 0%, 3%.
[16:59:14] <EricL> I have 3 shards and each shard has 1 primary (obviously) and 2 secondaries.
[16:59:25] <EricL> And I have a single VERY high volume read collection.
[19:24:11] <prbc> How can I push a value to a Schema array only if it doesnt exist?
[19:27:52] <cheeser> $addToSet
[19:28:47] <prbc> cheeser: you're the best
[19:30:48] <prbc> cheeser: however, its not working with an array of ObjectId
[19:40:59] <nkconnor> hello
[19:41:13] <nkconnor> having nearly 68% write failure with a new install of Mongo
[19:41:20] <nkconnor> nothing in the logs
[19:41:33] <nkconnor> anybody have a clue
[19:46:36] <bogomips> are you sure your disk and ram are working fine ?
[19:46:44] <nkconnor> 100% sure
[20:18:19] <sellout> Can I do something like emit(null, this[x]) in mapreduce to have Mongo generate a random key, or do I need to create the key myself?
[20:25:16] <cheeser> you'd have to generate it, afaik
[21:58:18] <sellout> cheeser: ObjectId() did it for me.