PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 7th of May, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:09:56] <dewie01> Anyone an idea what the error : Failed: restore error: error applying oplog: applyOps: EOF means
[00:54:11] <dunkel2> hello
[01:36:21] <dewie01> I'm trying to do a replayOplog but getting some strange results
[01:36:46] <dewie01> mongorestore -d easytoinspect_development -h localhost --oplogReplay --oplogLimit "1430920393:2" t
[01:37:13] <dewie01> it looks ok
[01:37:16] <dewie01> 2015-05-07T03:29:11.365+0200 building a list of collections to restore from t dir
[01:37:16] <dewie01> 2015-05-07T03:29:11.365+0200 don't know what to do with file "t/test", skipping...
[01:37:17] <dewie01> 2015-05-07T03:29:11.366+0200 restoring easytoinspect_development.oplog from file t/oplog.bson
[01:37:17] <dewie01> 2015-05-07T03:29:14.365+0200 [######..................] easytoinspect_development.oplog 745.6 MB/2.6 GB (28.3%)
[01:37:17] <dewie01> 2015-05-07T03:29:17.366+0200 [#############...........] easytoinspect_development.oplog 1.5 GB/2.6 GB (56.9%)
[01:37:19] <dewie01> 2015-05-07T03:29:20.366+0200 [################........] easytoinspect_development.oplog 1.8 GB/2.6 GB (69.6%)
[01:37:22] <dewie01> 2015-05-07T03:29:23.366+0200 [#####################...] easytoinspect_development.oplog 2.3 GB/2.6 GB (87.8%)
[01:37:25] <dewie01> 2015-05-07T03:29:25.182+0200 no indexes to restore
[01:38:01] <dewie01> the dbs = easytoinspect_development 5.951GB
[01:38:12] <dewie01> but i don't seem to get any collections ?
[01:39:41] <dewie01> Anyone an idea why ?
[01:40:11] <dewie01> i made a big mistake overwriting a production db (stupid hosts file)
[01:40:22] <dewie01> so i'm despared :)
[01:40:27] <joannac> dewie01: pastebin next time
[01:40:34] <dewie01> sorry
[01:42:08] <joannac> you should have a easytoinspect_development.oplog collection, no?
[01:43:02] <joannac> I'm still not sure what you're trying to do
[01:46:25] <dewie01> one moment, will try to find the doc i'm following
[01:53:37] <dewie01> i"m trying this : http://stackoverflow.com/questions/15444920/modify-and-replay-mongodb-oplog
[01:55:12] <joannac> dewie01: okay, so you've already restored a backup?
[01:55:34] <joannac> what's the file t/test ?
[01:55:41] <joannac> why are you specifying a database?
[01:56:39] <dewie01> the file t is rubbisch
[02:03:30] <dewie01> if i don't spify a database it does nothing :(
[02:04:18] <dewie01> http://pastebin.com/gfXiCmJx
[02:14:37] <joannac> erm, what
[02:14:54] <joannac> the oplog.bson covers that timestamp?
[02:15:35] <dewie01> will look again
[02:32:34] <laurentide> i am having trouble installing mongodb: http://pastebin.com/Ti0CSPbV
[02:33:07] <dewie01> timestamp is a strange format: 1430920393,"i":2
[02:33:14] <laurentide> this is after apt-get remove mongodb* --purge and trying again
[03:05:00] <dewie01> the oplog is totaly complete
[03:05:44] <dewie01> but i'm unable to restore the it to the correct database
[03:05:58] <dewie01> is restores in easytoinspect_development.oplog ?
[03:06:02] <dewie01> ahhhh
[03:31:44] <dewie01> enyone an idea how to solve : error applying oplog: applyOps: EO
[03:31:54] <dewie01> i tried everything ;(
[03:51:55] <joannac> dewie01: you could answer my earlier question.
[03:52:06] <joannac> dewie01: have you already restored a consistent backup?
[03:52:32] <dewie01> i only have a backup of the requested databases
[03:52:42] <dewie01> sorry the database i want to restore
[03:53:01] <dewie01> the log has all the actions on all the databases
[03:53:15] <dewie01> i tried creating the database by hand, but that doesn't work
[03:54:46] <dewie01> could i do the following
[03:54:57] <dewie01> set time in the past
[03:55:07] <dewie01> before the first transaction in the log
[03:55:13] <dewie01> create the databases
[03:55:25] <dewie01> stop mongo
[03:55:39] <dewie01> set time correct
[03:55:46] <dewie01> try to replay ?
[03:59:01] <joannac> dewie01: where did you restore the backup?
[04:00:45] <dewie01> on a temp standalone mongo server
[04:03:51] <joannac> and you're sure the oplog.bson contains the timestamp you want?
[04:04:43] <dewie01> one moment, wil make a pasty
[04:13:04] <dewie01> @joannac: http://pastebin.com/kiDqh2c3
[04:13:38] <dewie01> and to be honest haven't had any sleep in 24 hours, so i'm sure of nothing ;(
[04:14:47] <dewie01> loggin backtrace of mongodb server
[04:14:48] <dewie01> http://pastebin.com/gELFRmgZ
[04:21:10] <joannac> huh
[04:21:14] <joannac> looks like corruption?
[04:22:53] <dewie01> when i do a bsondump --pretty i get a lot of: unable to dump document 1887: error converting BSON to extended JSON: conversion of BSON type 'bson.Symbol' not supported premium_50
[04:26:11] <dewie01> the strange thing is:
[04:26:12] <dewie01> http://pastebin.com/MQZNsJJe
[04:26:50] <dewie01> is get's imported fine in oplogR.oplog
[04:40:18] <dewie01> @jonanac
[04:40:20] <dewie01> any idea ?
[04:40:53] <dewie01> or do you know anyone with experience who i can hire
[04:41:04] <dewie01> i need this data back :(
[04:41:31] <dewie01> and is seems it is all there in the oplog file
[04:42:37] <joannac> dewie01: https://www.mongodb.com/products/production-support
[04:43:12] <jr3> retur nongoose.exec() doesnt seem to like .catch
[04:43:26] <jr3> but if I wrap it in .try it does
[04:43:28] <jr3> o.0
[04:44:00] <dewie01> don't you know any freelancer ?
[04:49:51] <laurentide> does anyone have experience setting up mongodb on a dreamhost vps
[05:12:28] <dewie01> anyone experience in recovering a database with an oplog file ?
[05:12:38] <dewie01> i can't seem to get it imported
[07:38:35] <arussel> is there a way to allow user to create collection in a secondary ?
[07:40:50] <joannac> arussel: um, no.
[07:41:06] <joannac> arussel: how would that work? what's the use case
[07:42:36] <arussel> we have outside consultant that need to have access to some of the data to do some analysis.
[07:43:11] <arussel> I gave them read access to a secondary, but I would like them to be able to $out from aggregate without impacting the primary
[07:43:26] <joannac> you can't $out on a secondary
[07:43:40] <arussel> basically, I need a sync db on which they can play
[07:43:50] <arussel> is there a way to do this ?
[07:43:58] <joannac> take a snopshot from production and put it on a staging server
[07:44:28] <arussel> was doing it untill it reached 250GB, it does get a bit tiresome now
[07:44:47] <joannac> write your own oplog tailer?
[07:45:08] <arussel> joannac: can you elaborate ?
[07:45:09] <joannac> have a hidden secondary that you pull out and restart as standalone
[07:45:56] <joannac> and put it back every once in a while and let it catch up
[07:46:39] <arussel> there is no hack to get the oplog from primary but without 'really' being part of the set ?
[07:47:14] <joannac> that's called "write your own oplog tailer"
[07:47:27] <arussel> how do you do that ?
[07:47:46] <joannac> just like replication works by reading the oplog and applying operations, you can write a script/app/whatever to do it yourself
[07:48:31] <dewie01> @joannac Just doing that, ready for first test run :)
[07:48:52] <arussel> joannac: thanks, I'll give it a shot
[08:42:15] <Lujeni> Hello - There is a critical threshold for the dirty field (mongostat) ? Thx
[09:21:50] <zhaoyeming> i have some new fields in my app, is it a common practice to immediately save the doc when i load it from mongodb so that the new fields are there for further app logic?
[10:09:24] <pamp> hi
[10:09:48] <pamp> the values of db.collection.dataSize() is in bits or bytes?
[10:10:22] <pamp> bytes
[10:10:24] <pamp> hehe
[10:51:03] <dewie01> Stupid question how do i query a field in the database as "ts" : Timestamp(1427176819, 1)
[11:17:40] <pamp> page_faults : 2851652, its normal have this page faults values?
[11:17:50] <pamp> whats this mean
[11:18:15] <pamp> I have not enough memory?
[11:18:52] <pamp> I ve a cluster with two shards, 64 gb ram each
[11:23:47] <lost_and_unfound> Greetings, can i use a capped collection on gridfs?
[12:14:45] <StephenLynx> hey, is it possible to use map reduce to update documents?
[12:15:28] <StephenLynx> I need to perform an operation on a field that can't be updated with any operators, is a complex update and I have to update this field on all documents.
[12:19:27] <deathanchor> guessing you are doing a different update on each doc?
[12:19:32] <StephenLynx> yes.
[12:19:42] <deathanchor> you need it to be atomic?
[12:19:46] <StephenLynx> this field is a recursive structure that I have to traverse
[12:19:50] <StephenLynx> nah
[12:20:01] <deathanchor> why not just run a script to do all the updates for you?
[12:20:16] <StephenLynx> I could, that is not the problem, but then I have to perform a query for each document.
[12:20:26] <StephenLynx> I am looking for a way to query only once.
[12:20:33] <deathanchor> findandmodify?
[12:20:53] <StephenLynx> wat
[12:21:07] <StephenLynx> how does that helps the issue?
[12:21:10] <deathanchor> or you can do a find({}) and then do an update.
[12:21:21] <StephenLynx> I don't think you are paying attention.
[12:21:42] <StephenLynx> the field I have to update is like this
[12:21:57] <StephenLynx> comments: [text:blah, comments:[...]]
[12:22:17] <StephenLynx> so I want to update the text of all them
[12:22:35] <StephenLynx> I would have to perform 1 query for each document
[12:22:49] <StephenLynx> I could read them all at once, no biggie, the problem is writing them back.
[12:23:32] <deathanchor> so you want to update the nested comment?
[12:23:35] <StephenLynx> yes.
[12:23:52] <westernpixel> I'm not a big mapreduce user, but couldn't you just output the result in the same collection?
[12:24:29] <StephenLynx> I don't know, that is what I am asking. is map reduce able to not only read but also write?
[12:24:50] <StephenLynx> because I know aggregate is read only.
[12:25:04] <westernpixel> yes, the third parameter has a "out" param
[12:25:07] <StephenLynx> thanks
[12:25:07] <deathanchor> into the same collection? I don't think so, but you could do a find({}) iterate over the cursor, each doc traverse the nesting modifying your instance of the doc, then do a single update with the new doc
[12:25:08] <westernpixel> see here: http://docs.mongodb.org/manual/core/map-reduce/
[12:25:10] <StephenLynx> will RTFM
[12:25:31] <westernpixel> ah, well if it's not possible on the same collection I don't know
[12:25:35] <StephenLynx> deathanchor for the third time: still n queries.
[12:25:44] <StephenLynx> that is everything I don't want to do.
[12:26:06] <deathanchor> oh you don't want to do updates for each doc?
[12:26:26] <StephenLynx> exactly.
[12:27:02] <deathanchor> what do you think the mapreduce is going to do? magically update the entire collection in a single update?
[12:27:30] <StephenLynx> the dude just told the yes, that is exactly map reduce might do.
[12:27:38] <StephenLynx> that yes*
[12:28:56] <StephenLynx> unless he got it wrong and I got it wrong.
[12:29:44] <StephenLynx> "can return the results of a map-reduce operation as a document, or may write the results to collections"
[12:31:25] <westernpixel> I think that what deathanchor is getting at is that event though you don't have to iterate over your results within your code (which is what you want to avoid if I understood you correctly), it will still have to be done within mongodb
[12:31:56] <StephenLynx> I know, but it would be done in a single query.
[12:32:02] <westernpixel> not sure it'd be a lot slower though, no?
[12:32:19] <StephenLynx> I don't mind for two reasons:
[12:32:24] <westernpixel> yup, got it. I'd be interested in knowing if that works if you try it
[12:32:34] <StephenLynx> 1- wouldn't hold up my application's thread
[12:32:56] <StephenLynx> 2- this operation is not meant to be done on user input, but rather it is on a timer
[12:33:41] <StephenLynx> and of course, it wouldn't go back and forth between my application and mongo.
[12:33:53] <StephenLynx> mongo can take its sweet, sweet time.
[12:40:31] <salty-horse> hey, GothAlice. here?
[12:42:34] <StephenLynx> the api docs for the node driver are much better, imo
[12:42:39] <StephenLynx> on 2.0
[12:50:06] <salty-horse> can anyone help point me at the commit for https://jira.mongodb.org/browse/SERVER-12783 ??? None of the commit messages include "12783" and I couldn't find it in the commits between 2.6.0-rc1 and 2.6.0-rc2
[13:43:51] <greyTEO> GothAlice, I found this awesome job queue...have you heard of it? ;)
[13:43:52] <greyTEO> https://github.com/marrow/task
[13:44:02] <GothAlice> ;)
[13:44:15] <GothAlice> I've got some of my worker bees fixing that package up at the moment.
[13:44:47] <greyTEO> I am thinking of impmenting it. PHP-AMQP library is.....slow. Especially not allowing connection pooling.
[13:45:06] <greyTEO> almost 50-100ms just to establish a connection and send a message..
[13:45:57] <GothAlice> The main reason I wrote the original (https://gist.github.com/amcgregor/4207375) was to avoid adding additional service dependencies to an app already using MongoDB. We didn't want the extra sysadmin and infrastructure overhead.
[13:46:27] <greyTEO> I have a question, does it process queues immediately or does it take a polling approach? I know there are delayed task but that is different.
[13:46:31] <GothAlice> (And benchmarked 4 years ago at 1.9 million DRPC calls per second… so, it should be performant.)
[13:46:40] <GothAlice> It's push.
[13:46:48] <GothAlice> I.e. the workers know the moment a task is added to the queue.
[13:47:10] <greyTEO> Yea I am seeing that now with other overhead.
[13:47:49] <greyTEO> And the task queue would be mongo?
[13:48:02] <GothAlice> The approach I took was two-fold, to better utilize sharding.
[13:48:41] <GothAlice> a) The task data (function to call, arguments, return value, exception, and state information) is preserved in a normal collection (sharding-capable), and the messages about those tasks are sent over a capped collection (ring buffer queue).
[13:48:49] <GothAlice> Well, that was both a) and b) there. XD
[13:49:35] <GothAlice> Capped collections can't be sharded, so information injected into the queue is kept extremely minimal.
[13:49:50] <GothAlice> (And can be fully rebuilt from the real collection if needed.)
[13:51:58] <greyTEO> Nice. Well thought out design
[13:52:31] <greyTEO> now the only problem is it's in python...... :|
[13:53:13] <greyTEO> I need it for very basic task right now...send emails, saving events, etc.
[13:53:19] <GothAlice> Check the gist I linked; the methedology is provided in effectively one file. (See the link at the bottom of the presentation slides for the code.) The slides summarize the data structures, the code illustrates what types of atomic updates you need to perform locking and friends.
[13:53:48] <greyTEO> I think this will be lighter and faster than rabbitmq overhead. and will free up a server from our cluster
[13:53:50] <GothAlice> (Ignore the scheduled task stuff, though. In the original, that was a sub-optimal solution.)
[13:54:19] <GothAlice> Replacing rabbitmq and zeromq (yes; the project was using both when I joined…) was what we did with this. :)
[13:55:05] <greyTEO> https://github.com/marrow/task/blob/develop/example/basic.py
[13:55:37] <greyTEO> I have been reviewing that file. Reminds me of ruby
[13:55:40] <jr3> is the mongoose.exec() promise structure differ from func(err,result)
[13:55:41] <GothAlice> That's the "core" API in V2 of the runner.
[13:56:34] <greyTEO> Should I use the gist you linked instead?
[13:56:47] <GothAlice> greyTEO: Because functions are first-class, decorators (@something) are instructions to pass the function being defined to the @something function (@task in my case) and save the return value of the decorator as if it were the function being defined. It lets you write wrappers or whole replacements for functions quite easily.
[13:57:05] <GothAlice> greyTEO: I'd stick to the gist.
[13:57:31] <GothAlice> (The code in the gist at least works. marrow.task at the moment is non-functional. ;)
[13:59:13] <greyTEO> lol i see the build tags are down now.
[13:59:20] <GothAlice> Yup.
[13:59:21] <greyTEO> Ill stick to the gist.
[13:59:35] <GothAlice> (As I said, I've got some worker bees fixing up marrow.task at the moment.)
[13:59:52] <greyTEO> good stuff. Ill chceck back.
[14:00:07] <greyTEO> i am not familair with pip, but does the pull from the master?
[14:00:15] <greyTEO> that should be working still correct?
[14:00:44] <GothAlice> "pip" only installs officially released packages unless you explicitly give it a URL in the form scm+<scm-url> (i.e. git+https://github.com/marrow/task.git)
[14:01:11] <GothAlice> (Marrow Task has no official releases yet.)
[14:01:12] <JDiPierro> In a sharded cluster is there a way to see something like the rate at which chunks are being split?
[14:01:54] <greyTEO> ok got it. thanks
[14:03:08] <jmeister> hey, we're running on a sharded environment (shard key is hashed(_id), _id is bsonId) - what is the best way to read through all the documents in a certain collection without impairing the quality of service of the DB?
[14:03:10] <GothAlice> JDiPierro: There may be something in the documentation: http://docs.mongodb.org/manual/core/sharding-chunk-migration/ i.e. "serverStatus" might report chunk transfer information.
[14:03:30] <JDiPierro> Will peek there, thanks GothAlice
[14:19:04] <Zelest> because using a single client is too mainstream
[14:22:18] <greyTEO> GothAlice, would it be possible to just insert into the collection directly and bypass the worker?
[14:22:20] <greyTEO> https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L164
[14:23:09] <GothAlice> greyTEO: Indeed, yes. The "Worker" is just a concrete API that orchestrates everything.
[14:23:54] <GothAlice> greyTEO: https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L224-L244 < this "adds a job", and it's basically just an insert and a push to the queue, with some error handling to kill it if the queue couldn't be contacted.
[14:24:03] <greyTEO> Ok I didnt know if it needed to be called with the worker class as it is in the sample
[14:24:04] <greyTEO> https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L505
[14:24:31] <greyTEO> Ok perfect
[14:24:42] <GothAlice> The examples use the API; anything worth doing twice is worth automating, no? ;) (Worker is common between the consumer and producer examples.)
[14:24:52] <greyTEO> that will abstract the php -> python and should work for my case
[14:25:02] <GothAlice> Well.
[14:25:16] <jmeister> anyone has any idea? How to read an entire collection on a live production DB without harming (too much) the quality of service?
[14:25:57] <GothAlice> While PHP isn't very good for anything involving threading (i.e. reimplementing the "worker" in PHP would be a very interesting challenge!) the Python worker expects to be running Python code as distributed tasks.
[14:26:05] <greyTEO> meaning I can just write python and submit jobs with php. As long as I know the callable....i think
[14:26:23] <greyTEO> yea php would be a terrible idea
[14:27:05] <GothAlice> jmeister: Rate limit your consumption of the cursor. I.e. add a sleep(0.1) at the end of the loop. You'll probably need to also handle "retrying" (i.e. sort by _id, track the "last" _id seen, if the cursor dies re-try _id: {$gt: last_id})
[14:27:19] <greyTEO> php->mongo->python task (might be a better representation)
[14:27:32] <GothAlice> greyTEO: You are correct.
[14:27:59] <GothAlice> greyTEO: There is one big note about the difference between this gist's version 1 and marrow.task (version 2). Version 2 doesn't use Python pickle encoding anywhere; only MongoDB native BSON types.
[14:28:18] <greyTEO> I was looking at that.
[14:28:34] <greyTEO> is pickle used for refelctions? I am not familiar with it
[14:29:00] <greyTEO> oh...serializer. nm
[14:29:26] <GothAlice> Replace https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L222 with: https://github.com/marrow/package#3-getting-object-references and https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L433 with https://github.com/marrow/package#41-resolving-import-references
[14:29:54] <GothAlice> This will let your PHP submit tasks with "callables" that look just like a fancy Python import path. I.e. "mypackage.mymodule:somefunction"
[14:30:33] <jmeister> GothAlice: and doing $gt / sort on a sharded environment won't cause problems? since it's non-inclusive I believe it will cause scatter gather queries, right?
[14:30:39] <jmeister> and thanks btw
[14:31:25] <GothAlice> jmeister: Alas, to support retrying (which is especially important a) if you are processing a _lot_ of records, and b) are making them process slower on purpose) you need to sort on something that will return the records in a consistent order. _id isn't the only option; any other index you have that is non-sparse should work.
[14:31:53] <GothAlice> Not just sort, but also track the last seen indexed value, to allow retrying.
[14:53:26] <jmeister> GothAlice: Thanks :)
[14:54:11] <koren> Hello! Question: I recently switched my DB engine to WiredTiger, a query on indexes subdocument array of objects takes now seconds to complete (was fast before engine change). Does the engine needs some time to rebuild the indexes or something? I switched 24h ago and now its already a bit faster for the same query
[14:55:26] <koren> -indexes +indexed
[14:58:48] <GothAlice> koren: Is there some overriding reason why you are switching to WT?
[14:59:12] <GothAlice> koren: The reason I ask is that it has a number of outstanding severe issues including data loss and excessive memory consumption (to the point of invoking the oom-killer).
[15:00:41] <GothAlice> https://jira.mongodb.org/browse/SERVER-17424 https://jira.mongodb.org/browse/SERVER-17456 https://jira.mongodb.org/browse/SERVER-16311 (See also: https://jira.mongodb.org/browse/SERVER-17386)
[15:15:22] <greyTEO> GothAlice, at a quick glance, does this seem like it should work?
[15:15:23] <greyTEO> https://gist.github.com/dmregister/f5f301740060cc096cd1
[15:16:08] <GothAlice> greyTEO: Why the newline in the "callable"? Shouldn't that be: mailer:sendEmail?
[15:16:17] <GothAlice> (Import the "sendEmail" function from the "mailer" module.)
[15:16:49] <GothAlice> Also notably you aren't passing that function any positional or keyword arguments, so it probably wouldn't do much with that execution spec. ;)
[15:17:34] <greyTEO> yea i am trying to connect to pieces right now. Have hard coded values at the moment
[15:18:23] <GothAlice> BTW, if you're doing this initially to defer mail delivery, I can help with that on the Python side, too: https://github.com/marrow/marrow.mailer#3-basic-usage ;)
[15:18:33] <greyTEO> I guess I need to do the replacements to specificed earlier for the syntax?
[15:18:45] <GothAlice> Sorry, I can't parse that question.
[15:18:53] <greyTEO> lawlz
[15:19:07] <greyTEO> Replace https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L222 with: https://github.com/marrow/package#3-getting-object-references and https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L433 with https://github.com/marrow/package#41-resolving-import-references
[15:19:18] <greyTEO> from earlier thread
[15:19:29] <greyTEO> This will let your PHP submit tasks with "callables" that look just like a fancy Python import path. I.e. "mypackage.mymodule:somefunction"
[15:19:29] <GothAlice> Ah! Yes.
[15:19:50] <GothAlice> marrow.package is what marrow.task is/will be using to save and load references to functions.
[15:20:04] <GothAlice> (That one is released and fully tested; pip install marrow.package)
[15:20:58] <GothAlice> greyTEO: However, I'd recommend moving Python-side assistance to ##python-friendly to keep this channel MongoDB-specific. :)
[15:21:13] <greyTEO> Will do
[15:58:54] <StephenLynx> what is a good way to estimate how many simultaneous connections I should have given the hardware I am using?
[15:59:46] <GothAlice> It's less hardware-dependant than client-dependant. MongoClient typically pools connections, so it then depends on how many threads your application uses, and how many instances of your application are running, + any server-to-server connections for replication or sharding.
[16:00:13] <GothAlice> For my own application deployments this typically means around 100 connections per application server.
[16:00:50] <StephenLynx> I believe I will have to change my designs, currently I open a connection per thread and have a thread per core.
[16:01:08] <StephenLynx> under heavy loads I can imagine that so few connections would cause a bottleneck.
[16:01:19] <GothAlice> It certainly would.
[16:01:46] <GothAlice> I run N threads per core. Depending on the size of the app that'll be 10 to 25 threads each.
[16:02:20] <GothAlice> (The threads spend most of their time waiting on IO, so there's little contention.)
[16:02:32] <StephenLynx> I will not change how many threads I use because of the runtime environment design, but I will change how each thread handles the db.
[16:02:47] <julianlam> hi all -- we have a large document that stores what is essentially a hash table (email to uid mapping). We're running into the mongodb document size limit, but we are not sure how to proceed with refactoring the data.
[16:03:01] <GothAlice> julianlam: Refactor the data.
[16:03:02] <GothAlice> :P
[16:03:13] <julianlam> :(
[16:03:27] <StephenLynx> use the email as an unique index
[16:03:40] <GothAlice> {"a@example.com": UUID(…), "b@example.com": UUID(…), …} ever growing is unmaintainable, unindexable, etc., etc.
[16:03:41] <StephenLynx> and store the uid in a field of the document.
[16:03:45] <GothAlice> StephenLynx: +1
[16:03:46] <julianlam> GothAlice, ironically, when I google stuff about max document size in Mongo, I get your name helping someone else!
[16:03:54] <GothAlice> julianlam: Really? Neat.
[16:04:35] <GothAlice> That was likely surrounding forums embedding replies to a topic in the topic, and using a "continuation" marker when the document becomes full to point at a new document to continue working on.
[16:04:47] <StephenLynx> I got an irc historic that mentions you
[16:04:54] <julianlam> Yes, that was the one...
[16:05:07] <StephenLynx> http://www.corecompute.com/mongodb/mongodb_20150321.html
[16:05:11] <GothAlice> That's in the form: {forum: ObjectId(…), replies: [{author: ObjectId(…), message: "…"}, …]}
[16:06:01] <GothAlice> StephenLynx: Wow, that's a hideous IRC log. http://irclogger.com/.mongodb/2015-03-21 is much easier to read. ;)
[16:06:51] <GothAlice> Unfortunately the "ever growing dynamic field names" approach is, well, it's a trap. It seems like it might be a good idea for some things, but it's really not.
[16:07:19] <GothAlice> In your case (a simple mapping) you can use the e-mail address as the _id. {_id: "a@example.com", uid: UUID(…)}
[16:07:40] <GothAlice> Then it's automatically indexed, guaranteed to be unique, and easily queryable.
[16:09:26] <StephenLynx> wouldn't he lose stuff like document expiration and other things that depend on how the default _id is set?
[16:10:46] <GothAlice> "Document expiration"?
[16:10:57] <GothAlice> And "other things"?
[16:12:04] <GothAlice> Time-To-Live (TTL) Indexes operate on date fields, not _id.
[16:12:37] <StephenLynx> ah
[16:15:27] <julianlam> ok, thanks GothAlice :)
[16:15:30] <julianlam> and StephenLynx
[16:15:51] <GothAlice> It never hurts to help.
[16:16:01] <julianlam> indexing on email is more difficult because of some third-party constraints
[16:16:12] <julianlam> but splitting the hash table into one document each would work
[16:16:34] <julianlam> so instead of 600k k/v in the document, we'd just have 600k document, which sounds pretty scary.
[16:16:40] <julianlam> documents*
[16:23:45] <GothAlice> julianlam: Potentially scary, but indexed (so fast to search) and easily dividable across different mongo servers using sharding.
[16:24:20] <GothAlice> (I.e. you can divide up the records in half by having two shards and using a hashed shard key on _id to instruct MongoDB to spread the data evenly across all shards.)
[16:30:19] <StephenLynx> yeah, having many documents is not an issue at all.
[16:32:15] <julianlam> awesome, thanks guys!
[16:32:26] <julianlam> ... what if we end up having trillions of documents ;)
[16:36:32] <GothAlice> julianlam: Throw more shards at it.
[16:36:41] <GothAlice> Really, scaling "number of documents" can be that easy. ;)
[16:38:19] <StephenLynx> afaik mongo was pretty much designed for that kind of scenario
[16:38:26] <StephenLynx> having a fuckton of entries.
[16:38:43] <StephenLynx> instead of large and complex entries
[16:39:00] <julianlam> excellent...
[17:12:57] <shlant> can I start all replica members using a keyfile and then add users? or do I have to add users, stop instance, start with keyfile?
[17:13:47] <dewie01> @joannac just wanted to let you know was able to restore the entire database true a handmade ruby script
[17:14:07] <dewie01> @joannac thanks for your help
[17:21:32] <GothAlice> shlant: You must enable authentication (provide a keyfile) and then use the localhost exception to populate the first admin (root role) user. After that, authentication is fully required.
[17:21:53] <GothAlice> shlant: Ref: http://docs.mongodb.org/manual/tutorial/enable-authentication/
[17:37:51] <shlant> GothAlice: awesome, thanks
[17:56:54] <shlant> GothAlice: and do backup and restore roles have access to all db's?
[17:57:35] <GothAlice> shlant: Roles are typically tied to a database, i.e. the database you create the user in. Certain roles include "all" in their name; these give wider access.
[17:59:10] <shlant> GothAlice: and I assume I can't add a user that has access to a db before that db exists? even if I know the db name ahead of time?
[18:01:31] <StephenLynx> the db is created in that case.
[18:01:46] <StephenLynx> everything that doesn't exist when you try to use is created dynamically.
[18:01:50] <shlant> hmmm
[18:01:52] <shlant> alright
[18:01:57] <StephenLynx> thats why there isn't a "create" for dbs, collecions
[18:03:01] <shlant> so if I want a user for MMS that applies to all db's would readWriteAnyDatabase be an appropriate role?
[18:03:22] <GothAlice> Yes, though MMS manages users, too.
[18:58:20] <renlo> how do you share a 'user' between two different databases?
[18:58:28] <renlo> ie the owner
[19:04:43] <GothAlice> renlo: Give the user the needed permissions on two DBs, then on the second when authenticating use --authenticationDatabase for CLI tools (or equivalent URI for client drivers).
[19:05:27] <renlo> ah cool
[19:05:29] <renlo> thanks
[19:08:29] <rusty78> Hey guys
[19:08:44] <rusty78> I'm writing a messenger system with node and mongo, however when I'm trying to save collections
[19:08:59] <rusty78> mongo seems like it reaches a point where it stops saving my messages
[19:09:10] <rusty78> after about 100 or so, anyone know a reason this may be?
[19:10:37] <StephenLynx> that is weird.
[19:10:49] <rusty78> yeah it seems to save perfectly
[19:10:53] <StephenLynx> can I see the code?
[19:11:00] <rusty78> but then it slows down after awhile and then stops recording
[19:11:01] <rusty78> sure
[19:11:10] <rusty78> lemme plnkr something real fast
[19:11:32] <StephenLynx> are you checking for errors?
[19:11:40] <rusty78> yes
[19:12:29] <StephenLynx> and you don't get an error? it just doesn't save?
[19:12:34] <Gevox> does it stop after exactly the same number of records everytime?
[19:13:44] <rusty78> I believe it's differing
[19:13:46] <StephenLynx> and btw, since you are using node, I suggest checking out io.js. Is a fork that has surpassed node.js in every single aspect, including performance.
[19:13:47] <rusty78> right now it seems about 132
[19:13:58] <rusty78> io.js is the open source version right?
[19:14:20] <rusty78> 132 documents*
[19:14:22] <rusty78> https://github.com/DaftMonk/mean-chat
[19:14:24] <cheeser> oh, god. not this again.
[19:14:32] <rusty78> http://plnkr.co/edit/ow5OJhIr2g5swp7MMVtu?p=catalogue
[19:15:00] <rusty78> Right now it's just a fork of mean-chat with chat.js replaced with my mongo file
[19:15:10] <rusty78> and the index page edited one sec I'll throw that up there as well
[19:15:20] <rusty78> should be only revisions so far, was just migrating it to mongo
[19:15:35] <StephenLynx> what is that weird stuff?
[19:15:54] <rusty78> Heh sorry that's
[19:16:03] <rusty78> where I'm saving the documents
[19:16:08] <rusty78> and where it works perfectly
[19:16:13] <rusty78> until I hit that point where it stops saving
[19:16:19] <StephenLynx> ah, there are files in the left
[19:16:48] <StephenLynx> at line 73
[19:16:55] <StephenLynx> you proceed regardless of errors.
[19:17:21] <StephenLynx> and where is the code that writes to mongo?
[19:17:27] <StephenLynx> ah
[19:17:28] <rusty78> It should at least output if I have any right?
[19:17:29] <StephenLynx> mongoose.
[19:17:31] <rusty78> mongoose
[19:17:31] <rusty78> yes
[19:17:33] <rusty78> sorry
[19:17:34] <rusty78> :/
[19:17:39] <rusty78> I should have made that clear earlier
[19:17:40] <StephenLynx> no, I am sorry for you.
[19:17:48] <StephenLynx> mongoose is very
[19:17:50] <StephenLynx> what is the work?
[19:17:58] <StephenLynx> notorious for giving issues.
[19:18:01] <rusty78> Should I just try redis then?
[19:18:02] <StephenLynx> word*
[19:18:08] <StephenLynx> no, just use the driver.
[19:18:18] <rusty78> driver? (sorry not familiar)
[19:18:20] <StephenLynx> it is officially supported by 10gen.
[19:18:28] <StephenLynx> mongodb module.
[19:18:41] <StephenLynx> https://www.npmjs.com/package/mongodb
[19:18:46] <StephenLynx> works perfectly.
[19:18:53] <StephenLynx> and is very well documented.
[19:19:02] <StephenLynx> 10/10 would endorse again.
[19:19:08] <rusty78> ok so mongodb driver instead of mongoose
[19:19:19] <StephenLynx> mongoose just adds an abstraction layer on top of the driver.
[19:19:28] <StephenLynx> and does really weird shit.
[19:19:39] <StephenLynx> and personally I wouldn't use express either.
[19:20:00] <rusty78> Alright so to get this straight
[19:20:08] <rusty78> use the driver from the npm module instead of using mongoose
[19:20:11] <StephenLynx> yes.
[19:20:45] <rusty78> Ok great, and instead of express, I'm rather new with node but what would you recommend
[19:20:57] <StephenLynx> nothing.
[19:21:07] <StephenLynx> I avoid any web frameworks like the plague.
[19:21:11] <rusty78> Well I'd rather not build all the functions from scratch
[19:21:16] <StephenLynx> core modules.
[19:21:22] <StephenLynx> you don't build it from scratch.
[19:21:28] <StephenLynx> have a try.
[19:21:40] <StephenLynx> prejudice is your worst enemy.
[19:21:41] <rusty78> Alright I'll take your word and I'll experiment
[19:21:43] <rusty78> fair enough
[19:21:51] <rusty78> last question
[19:21:53] <rusty78> for messaging app
[19:22:03] <StephenLynx> ah, I wrote one of these once.
[19:22:07] <rusty78> should I use mongo or redis for storing these messages?
[19:22:08] <StephenLynx> used the websockets module
[19:22:15] <rusty78> I like my socket.io :p
[19:22:17] <StephenLynx> I don't like how socket.io does too much stuck.
[19:22:19] <StephenLynx> stuff*
[19:22:22] <StephenLynx> and fallbacks
[19:22:30] <StephenLynx> and requires specific fe code.
[19:22:41] <StephenLynx> I can't see why not using mongo.
[19:23:03] <rusty78> For speed I heard Redis is faster than mongo for applications like messaging
[19:23:06] <rusty78> but then again I have no idea
[19:23:14] <rusty78> my prejudice is strong
[19:23:29] <rusty78> save me from my prejudice StephenLynx
[19:23:43] <StephenLynx> I don't know how good is redis, but I know mongo focus on performance and high load of data.
[19:23:51] <StephenLynx> I am pretty sure mongo will fulfill your needs.
[19:23:58] <StephenLynx> I do know redist is used on imgur.
[19:24:08] <StephenLynx> but they have to handle media, not text.
[19:24:21] <StephenLynx> so they have a much higher load than any messaging application ever.
[19:24:39] <StephenLynx> maybe redis is more suited for these REALLY gigantic loads.
[19:25:01] <StephenLynx> but I don't know how well redis is to work with documents.
[19:25:23] <StephenLynx> I suggest you take a look at it, but from what I know, I don't believe your problem fits redis better than mongo.
[19:25:42] <rusty78> Alright I'll take the advice thanks
[19:25:53] <StephenLynx> mongo has this strong point, you save documents, not raw data.
[19:26:10] <StephenLynx> so its really easy to index, project and match
[19:26:46] <StephenLynx> if you want to see how is a project without a framework
[19:26:50] <StephenLynx> let me link my main project
[19:26:57] <rusty78> awesome, thank you
[19:26:58] <StephenLynx> gitlab.com/mrseth/bck_lynxhub
[19:27:36] <StephenLynx> spec alone has over a thousand lines and the interface has 46 pages. so is a pretty robust project.
[19:27:45] <StephenLynx> all using just mongo driver.
[19:27:59] <StephenLynx> and other few modules for specific operations, like cryptography, email
[19:28:27] <rusty78> wow, looking through it
[19:28:30] <rusty78> should be a great reference
[19:28:34] <rusty78> ^^ thanks so much
[19:31:19] <StephenLynx> np
[21:35:03] <Glitches> I'm new to mongodb and I have some questions regarding it if you don't mind me asking
[21:36:09] <Glitches> Let's say I want to make a database... I need to make a collection
[21:36:21] <Glitches> do I do that just by inserting JSON into it?
[21:36:35] <Glitches> and do I have to make my whole database by hand in JSON?
[21:37:04] <Glitches> that's all I'm seeing from tutorials
[21:37:16] <ehershey> more or less
[21:37:22] <Glitches> ehershey: oh reaally?
[21:37:35] <ehershey> but it's more that you create the data with json
[21:37:48] <ehershey> and not by hand but with application code building it and using a driver to send it into the db
[21:38:35] <Glitches> ehershey: seeing as I'm just making an independent database as is for education purposes that means I have to do it all by hand?
[21:38:54] <Gevox> Glitches: I'm beginner as you, this is the a very fast place to learn about mongodb and get your head transformed from regular mysql to mongodb https://www.youtube.com/results?search_query=mongodb+for+dbas+
[21:39:24] <ehershey> you can do a lot by hand but you don't have to do almost any of it by hand
[21:39:28] <Gevox> Watch week #1 at least videos and you will have good vision about what is this all about, later you will be able to just google about what you looking for
[21:39:47] <Glitches> Gevox: sweet, that's exactly what I needed
[21:39:56] <GothAlice> Glitches: There's also http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html which is a summary of things if you are coming from a more traditional database background.
[21:39:57] <Glitches> this is all really foreign to me atm...
[21:40:45] <GothAlice> Glitches: We're here to help. :)
[21:41:00] <Gevox> Glitches: Don't get your head tired by these many questions inside it, watch these videos (its a course provided by the mongo developers) and very short one. If you are doing things in java i have already done a full app that manipulates the CRUD you can take its source and look after what you need as a reference too :)
[21:41:45] <Glitches> Gevox: I'm just designing an indepentant db, but thanks
[21:58:45] <shlant> anyone know why I get "Unauthorized not authorized on admin to execute command { replSetHeartbeat:" when using a keyfile? I am definitely using the same one for all members
[21:59:21] <GothAlice> shlant: A keyfile isn't enough. You need to add a user, too.
[22:00:57] <shlant> GothAlice: oh. I had thought that members used the keyfile for communication
[22:01:09] <StephenLynx> funny think, when I enabled auth on a server, I didn't set up a keyfile or anything.
[22:01:22] <GothAlice> They do. But after enabling auth, you _must_ set up that first admin user.
[22:01:43] <GothAlice> StephenLynx: It's for enabling auth in a replica set.
[22:01:49] <StephenLynx> aah
[22:01:52] <GothAlice> Replicas use the key, not usernames/passwords, to communicate.
[22:01:53] <StephenLynx> k
[22:02:06] <shlant> ah ok
[22:17:56] <shlant> GothAlice: alright, so the only way to use keyfiles is to start one instance without auth, create a user, stop the instance, start all replica members?
[22:19:19] <shlant> cause if so, that throws a serious wrench in my automation plans...
[22:19:55] <GothAlice> https://gist.github.com/amcgregor/c33da0d76350f7018875#file-cluster-sh-L126-L148 < my own code which automates testing of things like this does the following: 1. generate a key. 2. Start enough mongod's for a replica set. (In this case, two replica sets.) 3. replSetInitiate the sets. 4. Add users.
[22:20:46] <GothAlice> You should certainly be able to add the user *after* initializing the replica set. If things are dying prior to this, extra-super-double-check your keyfiles are identical across the servers, and all are configured with the same replSet name.
[22:21:34] <shlant> alright, I will try that. Thanks again
[22:21:47] <GothAlice> (md5sum the keyfiles if you have to ;)
[22:22:16] <GothAlice> Oh, also check permissions. The keyfiles must be "mode 600".
[22:22:27] <GothAlice> (Only readable by the MongoDB process/user, no-one else.)
[23:28:28] <Glitches> How do I ensure a index in a subdocument?
[23:28:55] <Glitches> db.students.ensureIndex({"studentID": 1})
[23:29:04] <Glitches> I did that
[23:29:31] <Glitches> but now I'm trying to index "courseID" which is inside the course document which is inside the student collection
[23:30:04] <Zelest> "course.courseID"
[23:30:17] <Glitches> Zelest: thanks
[23:37:58] <Glitches> How do I limit my find statement to 2?
[23:40:18] <cheeser> db.coll.find().limit(2)
[23:40:25] <Glitches> cheeser: sweet!
[23:40:43] <cheeser> http://docs.mongodb.org/manual/reference/method/db.collection.find/
[23:41:06] <Glitches> cheeser: that helps, thanks
[23:41:10] <cheeser> np
[23:57:28] <Glitches> I'm trying to run aggregate on my collection
[23:57:38] <Glitches> but how do i make it group everything together?
[23:58:02] <Glitches> I'm trying to find the average age of every student