[04:22:53] <dewie01> when i do a bsondump --pretty i get a lot of: unable to dump document 1887: error converting BSON to extended JSON: conversion of BSON type 'bson.Symbol' not supported premium_50
[07:47:46] <joannac> just like replication works by reading the oplog and applying operations, you can write a script/app/whatever to do it yourself
[07:48:31] <dewie01> @joannac Just doing that, ready for first test run :)
[07:48:52] <arussel> joannac: thanks, I'll give it a shot
[08:42:15] <Lujeni> Hello - There is a critical threshold for the dirty field (mongostat) ? Thx
[09:21:50] <zhaoyeming> i have some new fields in my app, is it a common practice to immediately save the doc when i load it from mongodb so that the new fields are there for further app logic?
[11:18:52] <pamp> I ve a cluster with two shards, 64 gb ram each
[11:23:47] <lost_and_unfound> Greetings, can i use a capped collection on gridfs?
[12:14:45] <StephenLynx> hey, is it possible to use map reduce to update documents?
[12:15:28] <StephenLynx> I need to perform an operation on a field that can't be updated with any operators, is a complex update and I have to update this field on all documents.
[12:19:27] <deathanchor> guessing you are doing a different update on each doc?
[12:25:07] <deathanchor> into the same collection? I don't think so, but you could do a find({}) iterate over the cursor, each doc traverse the nesting modifying your instance of the doc, then do a single update with the new doc
[12:25:08] <westernpixel> see here: http://docs.mongodb.org/manual/core/map-reduce/
[12:28:56] <StephenLynx> unless he got it wrong and I got it wrong.
[12:29:44] <StephenLynx> "can return the results of a map-reduce operation as a document, or may write the results to collections"
[12:31:25] <westernpixel> I think that what deathanchor is getting at is that event though you don't have to iterate over your results within your code (which is what you want to avoid if I understood you correctly), it will still have to be done within mongodb
[12:31:56] <StephenLynx> I know, but it would be done in a single query.
[12:32:02] <westernpixel> not sure it'd be a lot slower though, no?
[12:32:19] <StephenLynx> I don't mind for two reasons:
[12:32:24] <westernpixel> yup, got it. I'd be interested in knowing if that works if you try it
[12:32:34] <StephenLynx> 1- wouldn't hold up my application's thread
[12:32:56] <StephenLynx> 2- this operation is not meant to be done on user input, but rather it is on a timer
[12:33:41] <StephenLynx> and of course, it wouldn't go back and forth between my application and mongo.
[12:33:53] <StephenLynx> mongo can take its sweet, sweet time.
[12:50:06] <salty-horse> can anyone help point me at the commit for https://jira.mongodb.org/browse/SERVER-12783 ??? None of the commit messages include "12783" and I couldn't find it in the commits between 2.6.0-rc1 and 2.6.0-rc2
[13:43:51] <greyTEO> GothAlice, I found this awesome job queue...have you heard of it? ;)
[13:44:15] <GothAlice> I've got some of my worker bees fixing that package up at the moment.
[13:44:47] <greyTEO> I am thinking of impmenting it. PHP-AMQP library is.....slow. Especially not allowing connection pooling.
[13:45:06] <greyTEO> almost 50-100ms just to establish a connection and send a message..
[13:45:57] <GothAlice> The main reason I wrote the original (https://gist.github.com/amcgregor/4207375) was to avoid adding additional service dependencies to an app already using MongoDB. We didn't want the extra sysadmin and infrastructure overhead.
[13:46:27] <greyTEO> I have a question, does it process queues immediately or does it take a polling approach? I know there are delayed task but that is different.
[13:46:31] <GothAlice> (And benchmarked 4 years ago at 1.9 million DRPC calls per second… so, it should be performant.)
[13:46:48] <GothAlice> I.e. the workers know the moment a task is added to the queue.
[13:47:10] <greyTEO> Yea I am seeing that now with other overhead.
[13:47:49] <greyTEO> And the task queue would be mongo?
[13:48:02] <GothAlice> The approach I took was two-fold, to better utilize sharding.
[13:48:41] <GothAlice> a) The task data (function to call, arguments, return value, exception, and state information) is preserved in a normal collection (sharding-capable), and the messages about those tasks are sent over a capped collection (ring buffer queue).
[13:48:49] <GothAlice> Well, that was both a) and b) there. XD
[13:49:35] <GothAlice> Capped collections can't be sharded, so information injected into the queue is kept extremely minimal.
[13:49:50] <GothAlice> (And can be fully rebuilt from the real collection if needed.)
[13:51:58] <greyTEO> Nice. Well thought out design
[13:52:31] <greyTEO> now the only problem is it's in python...... :|
[13:53:13] <greyTEO> I need it for very basic task right now...send emails, saving events, etc.
[13:53:19] <GothAlice> Check the gist I linked; the methedology is provided in effectively one file. (See the link at the bottom of the presentation slides for the code.) The slides summarize the data structures, the code illustrates what types of atomic updates you need to perform locking and friends.
[13:53:48] <greyTEO> I think this will be lighter and faster than rabbitmq overhead. and will free up a server from our cluster
[13:53:50] <GothAlice> (Ignore the scheduled task stuff, though. In the original, that was a sub-optimal solution.)
[13:54:19] <GothAlice> Replacing rabbitmq and zeromq (yes; the project was using both when I joined…) was what we did with this. :)
[13:55:37] <greyTEO> I have been reviewing that file. Reminds me of ruby
[13:55:40] <jr3> is the mongoose.exec() promise structure differ from func(err,result)
[13:55:41] <GothAlice> That's the "core" API in V2 of the runner.
[13:56:34] <greyTEO> Should I use the gist you linked instead?
[13:56:47] <GothAlice> greyTEO: Because functions are first-class, decorators (@something) are instructions to pass the function being defined to the @something function (@task in my case) and save the return value of the decorator as if it were the function being defined. It lets you write wrappers or whole replacements for functions quite easily.
[13:57:05] <GothAlice> greyTEO: I'd stick to the gist.
[13:57:31] <GothAlice> (The code in the gist at least works. marrow.task at the moment is non-functional. ;)
[13:59:13] <greyTEO> lol i see the build tags are down now.
[14:00:07] <greyTEO> i am not familair with pip, but does the pull from the master?
[14:00:15] <greyTEO> that should be working still correct?
[14:00:44] <GothAlice> "pip" only installs officially released packages unless you explicitly give it a URL in the form scm+<scm-url> (i.e. git+https://github.com/marrow/task.git)
[14:01:11] <GothAlice> (Marrow Task has no official releases yet.)
[14:01:12] <JDiPierro> In a sharded cluster is there a way to see something like the rate at which chunks are being split?
[14:03:08] <jmeister> hey, we're running on a sharded environment (shard key is hashed(_id), _id is bsonId) - what is the best way to read through all the documents in a certain collection without impairing the quality of service of the DB?
[14:03:10] <GothAlice> JDiPierro: There may be something in the documentation: http://docs.mongodb.org/manual/core/sharding-chunk-migration/ i.e. "serverStatus" might report chunk transfer information.
[14:03:30] <JDiPierro> Will peek there, thanks GothAlice
[14:19:04] <Zelest> because using a single client is too mainstream
[14:22:18] <greyTEO> GothAlice, would it be possible to just insert into the collection directly and bypass the worker?
[14:23:09] <GothAlice> greyTEO: Indeed, yes. The "Worker" is just a concrete API that orchestrates everything.
[14:23:54] <GothAlice> greyTEO: https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L224-L244 < this "adds a job", and it's basically just an insert and a push to the queue, with some error handling to kill it if the queue couldn't be contacted.
[14:24:03] <greyTEO> Ok I didnt know if it needed to be called with the worker class as it is in the sample
[14:24:42] <GothAlice> The examples use the API; anything worth doing twice is worth automating, no? ;) (Worker is common between the consumer and producer examples.)
[14:24:52] <greyTEO> that will abstract the php -> python and should work for my case
[14:25:16] <jmeister> anyone has any idea? How to read an entire collection on a live production DB without harming (too much) the quality of service?
[14:25:57] <GothAlice> While PHP isn't very good for anything involving threading (i.e. reimplementing the "worker" in PHP would be a very interesting challenge!) the Python worker expects to be running Python code as distributed tasks.
[14:26:05] <greyTEO> meaning I can just write python and submit jobs with php. As long as I know the callable....i think
[14:26:23] <greyTEO> yea php would be a terrible idea
[14:27:05] <GothAlice> jmeister: Rate limit your consumption of the cursor. I.e. add a sleep(0.1) at the end of the loop. You'll probably need to also handle "retrying" (i.e. sort by _id, track the "last" _id seen, if the cursor dies re-try _id: {$gt: last_id})
[14:27:19] <greyTEO> php->mongo->python task (might be a better representation)
[14:27:59] <GothAlice> greyTEO: There is one big note about the difference between this gist's version 1 and marrow.task (version 2). Version 2 doesn't use Python pickle encoding anywhere; only MongoDB native BSON types.
[14:29:26] <GothAlice> Replace https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L222 with: https://github.com/marrow/package#3-getting-object-references and https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L433 with https://github.com/marrow/package#41-resolving-import-references
[14:29:54] <GothAlice> This will let your PHP submit tasks with "callables" that look just like a fancy Python import path. I.e. "mypackage.mymodule:somefunction"
[14:30:33] <jmeister> GothAlice: and doing $gt / sort on a sharded environment won't cause problems? since it's non-inclusive I believe it will cause scatter gather queries, right?
[14:31:25] <GothAlice> jmeister: Alas, to support retrying (which is especially important a) if you are processing a _lot_ of records, and b) are making them process slower on purpose) you need to sort on something that will return the records in a consistent order. _id isn't the only option; any other index you have that is non-sparse should work.
[14:31:53] <GothAlice> Not just sort, but also track the last seen indexed value, to allow retrying.
[14:54:11] <koren> Hello! Question: I recently switched my DB engine to WiredTiger, a query on indexes subdocument array of objects takes now seconds to complete (was fast before engine change). Does the engine needs some time to rebuild the indexes or something? I switched 24h ago and now its already a bit faster for the same query
[14:58:48] <GothAlice> koren: Is there some overriding reason why you are switching to WT?
[14:59:12] <GothAlice> koren: The reason I ask is that it has a number of outstanding severe issues including data loss and excessive memory consumption (to the point of invoking the oom-killer).
[15:00:41] <GothAlice> https://jira.mongodb.org/browse/SERVER-17424 https://jira.mongodb.org/browse/SERVER-17456 https://jira.mongodb.org/browse/SERVER-16311 (See also: https://jira.mongodb.org/browse/SERVER-17386)
[15:15:22] <greyTEO> GothAlice, at a quick glance, does this seem like it should work?
[15:16:08] <GothAlice> greyTEO: Why the newline in the "callable"? Shouldn't that be: mailer:sendEmail?
[15:16:17] <GothAlice> (Import the "sendEmail" function from the "mailer" module.)
[15:16:49] <GothAlice> Also notably you aren't passing that function any positional or keyword arguments, so it probably wouldn't do much with that execution spec. ;)
[15:17:34] <greyTEO> yea i am trying to connect to pieces right now. Have hard coded values at the moment
[15:18:23] <GothAlice> BTW, if you're doing this initially to defer mail delivery, I can help with that on the Python side, too: https://github.com/marrow/marrow.mailer#3-basic-usage ;)
[15:18:33] <greyTEO> I guess I need to do the replacements to specificed earlier for the syntax?
[15:18:45] <GothAlice> Sorry, I can't parse that question.
[15:19:07] <greyTEO> Replace https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L222 with: https://github.com/marrow/package#3-getting-object-references and https://gist.github.com/amcgregor/52a684854fb77b6e7395#file-worker-py-L433 with https://github.com/marrow/package#41-resolving-import-references
[15:19:29] <greyTEO> This will let your PHP submit tasks with "callables" that look just like a fancy Python import path. I.e. "mypackage.mymodule:somefunction"
[15:58:54] <StephenLynx> what is a good way to estimate how many simultaneous connections I should have given the hardware I am using?
[15:59:46] <GothAlice> It's less hardware-dependant than client-dependant. MongoClient typically pools connections, so it then depends on how many threads your application uses, and how many instances of your application are running, + any server-to-server connections for replication or sharding.
[16:00:13] <GothAlice> For my own application deployments this typically means around 100 connections per application server.
[16:00:50] <StephenLynx> I believe I will have to change my designs, currently I open a connection per thread and have a thread per core.
[16:01:08] <StephenLynx> under heavy loads I can imagine that so few connections would cause a bottleneck.
[16:01:46] <GothAlice> I run N threads per core. Depending on the size of the app that'll be 10 to 25 threads each.
[16:02:20] <GothAlice> (The threads spend most of their time waiting on IO, so there's little contention.)
[16:02:32] <StephenLynx> I will not change how many threads I use because of the runtime environment design, but I will change how each thread handles the db.
[16:02:47] <julianlam> hi all -- we have a large document that stores what is essentially a hash table (email to uid mapping). We're running into the mongodb document size limit, but we are not sure how to proceed with refactoring the data.
[16:03:01] <GothAlice> julianlam: Refactor the data.
[16:04:35] <GothAlice> That was likely surrounding forums embedding replies to a topic in the topic, and using a "continuation" marker when the document becomes full to point at a new document to continue working on.
[16:04:47] <StephenLynx> I got an irc historic that mentions you
[16:05:11] <GothAlice> That's in the form: {forum: ObjectId(…), replies: [{author: ObjectId(…), message: "…"}, …]}
[16:06:01] <GothAlice> StephenLynx: Wow, that's a hideous IRC log. http://irclogger.com/.mongodb/2015-03-21 is much easier to read. ;)
[16:06:51] <GothAlice> Unfortunately the "ever growing dynamic field names" approach is, well, it's a trap. It seems like it might be a good idea for some things, but it's really not.
[16:07:19] <GothAlice> In your case (a simple mapping) you can use the e-mail address as the _id. {_id: "a@example.com", uid: UUID(…)}
[16:07:40] <GothAlice> Then it's automatically indexed, guaranteed to be unique, and easily queryable.
[16:09:26] <StephenLynx> wouldn't he lose stuff like document expiration and other things that depend on how the default _id is set?
[16:23:45] <GothAlice> julianlam: Potentially scary, but indexed (so fast to search) and easily dividable across different mongo servers using sharding.
[16:24:20] <GothAlice> (I.e. you can divide up the records in half by having two shards and using a hashed shard key on _id to instruct MongoDB to spread the data evenly across all shards.)
[16:30:19] <StephenLynx> yeah, having many documents is not an issue at all.
[17:12:57] <shlant> can I start all replica members using a keyfile and then add users? or do I have to add users, stop instance, start with keyfile?
[17:13:47] <dewie01> @joannac just wanted to let you know was able to restore the entire database true a handmade ruby script
[17:14:07] <dewie01> @joannac thanks for your help
[17:21:32] <GothAlice> shlant: You must enable authentication (provide a keyfile) and then use the localhost exception to populate the first admin (root role) user. After that, authentication is fully required.
[17:56:54] <shlant> GothAlice: and do backup and restore roles have access to all db's?
[17:57:35] <GothAlice> shlant: Roles are typically tied to a database, i.e. the database you create the user in. Certain roles include "all" in their name; these give wider access.
[17:59:10] <shlant> GothAlice: and I assume I can't add a user that has access to a db before that db exists? even if I know the db name ahead of time?
[18:01:31] <StephenLynx> the db is created in that case.
[18:01:46] <StephenLynx> everything that doesn't exist when you try to use is created dynamically.
[19:04:43] <GothAlice> renlo: Give the user the needed permissions on two DBs, then on the second when authenticating use --authenticationDatabase for CLI tools (or equivalent URI for client drivers).
[19:13:46] <StephenLynx> and btw, since you are using node, I suggest checking out io.js. Is a fork that has surpassed node.js in every single aspect, including performance.
[21:37:35] <ehershey> but it's more that you create the data with json
[21:37:48] <ehershey> and not by hand but with application code building it and using a driver to send it into the db
[21:38:35] <Glitches> ehershey: seeing as I'm just making an independent database as is for education purposes that means I have to do it all by hand?
[21:38:54] <Gevox> Glitches: I'm beginner as you, this is the a very fast place to learn about mongodb and get your head transformed from regular mysql to mongodb https://www.youtube.com/results?search_query=mongodb+for+dbas+
[21:39:24] <ehershey> you can do a lot by hand but you don't have to do almost any of it by hand
[21:39:28] <Gevox> Watch week #1 at least videos and you will have good vision about what is this all about, later you will be able to just google about what you looking for
[21:39:47] <Glitches> Gevox: sweet, that's exactly what I needed
[21:39:56] <GothAlice> Glitches: There's also http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html which is a summary of things if you are coming from a more traditional database background.
[21:39:57] <Glitches> this is all really foreign to me atm...
[21:40:45] <GothAlice> Glitches: We're here to help. :)
[21:41:00] <Gevox> Glitches: Don't get your head tired by these many questions inside it, watch these videos (its a course provided by the mongo developers) and very short one. If you are doing things in java i have already done a full app that manipulates the CRUD you can take its source and look after what you need as a reference too :)
[21:41:45] <Glitches> Gevox: I'm just designing an indepentant db, but thanks
[21:58:45] <shlant> anyone know why I get "Unauthorized not authorized on admin to execute command { replSetHeartbeat:" when using a keyfile? I am definitely using the same one for all members
[21:59:21] <GothAlice> shlant: A keyfile isn't enough. You need to add a user, too.
[22:00:57] <shlant> GothAlice: oh. I had thought that members used the keyfile for communication
[22:01:09] <StephenLynx> funny think, when I enabled auth on a server, I didn't set up a keyfile or anything.
[22:01:22] <GothAlice> They do. But after enabling auth, you _must_ set up that first admin user.
[22:01:43] <GothAlice> StephenLynx: It's for enabling auth in a replica set.
[22:17:56] <shlant> GothAlice: alright, so the only way to use keyfiles is to start one instance without auth, create a user, stop the instance, start all replica members?
[22:19:19] <shlant> cause if so, that throws a serious wrench in my automation plans...
[22:19:55] <GothAlice> https://gist.github.com/amcgregor/c33da0d76350f7018875#file-cluster-sh-L126-L148 < my own code which automates testing of things like this does the following: 1. generate a key. 2. Start enough mongod's for a replica set. (In this case, two replica sets.) 3. replSetInitiate the sets. 4. Add users.
[22:20:46] <GothAlice> You should certainly be able to add the user *after* initializing the replica set. If things are dying prior to this, extra-super-double-check your keyfiles are identical across the servers, and all are configured with the same replSet name.
[22:21:34] <shlant> alright, I will try that. Thanks again
[22:21:47] <GothAlice> (md5sum the keyfiles if you have to ;)
[22:22:16] <GothAlice> Oh, also check permissions. The keyfiles must be "mode 600".
[22:22:27] <GothAlice> (Only readable by the MongoDB process/user, no-one else.)
[23:28:28] <Glitches> How do I ensure a index in a subdocument?