[00:11:16] <user23> How do I fix this startup error Permission denied: "/sys/dev/block/9:2/queue/read_ahead_kb", I've already given the file +rx permission
[00:18:45] <joannac> user23: did you check all the way up the tree?
[05:02:10] <NoOutlet> To explain the message "mix of inclusion and exclusion", when you project with {field: 0}, mongodb will return the documents without 'field', everything else included by default. When you use {field: 1}, mongodb will return documents with only 'field', everything else excluded by default (with the exception of _id).
[05:56:35] <NoOutlet> I'm a bit confused by some statements made here. Who is the instructor?
[05:57:57] <NoOutlet> Because it looks to me like you're just doing Homework 6.5 in m101 and I could be wrong but I would guess you're doing the Node.js course based on your other gists.
[05:59:44] <NoOutlet> I'm currently in that class (already took m101p, but was hoping to learn some Node.js and didn't realize how similar the class was going to be) and there isn't a .sh file. There are some boxes with the commands you have to enter in the shell.
[06:01:49] <joannac> morenoh149: you can ask for help with homework after the due date. Until then, do your own homework
[06:05:57] <NoOutlet> This isn't on topic of mongodb at all, but I'm just coming across these 14 Black Paintings by Goya. He painted these things on the walls of his own house and I'm having trouble looking at Saturn Devouring His Son. Cannot imagine that being on the wall.
[06:14:16] <morenoh149> nah. If someone offers to walk me through the parts I get stuck on I'll take the help
[06:26:43] <flusive> Hi, I have big problem with mongo config server and logs, I used quiet option but my log file still increase :( inside is only a lot of writes like CMD fsync: sync:1 lock:0, its the problem because this file takes few GB's space in 24 hours
[06:27:31] <joannac> what else is in the log file?
[07:16:43] <flusive> i have only one database on this shard
[07:17:09] <joannac> "The maxSize field in the shards collection in the config database sets the maximum size for a shard, **allowing you to control whether the balancer will migrate chunks to a shard.**"
[07:17:47] <flusive> joannac, do you suggest me i write something to shard without balancer?
[07:18:01] <joannac> "...Also, the balancer will not move chunks off an overloaded shard. This must happen manually"
[12:31:47] <queretaro> Hi, are permissions also replicated from within a replicaset primary to the secondaries? That is, if I have authentication (plaintext user/password) enabled in the primary and add the secondaries as a part of the replicaset just using rs.add(), do I have to do anything else for the system to work?
[13:29:51] <izolate> I'm setting up a dedicated db server. do you guys strongly advise against a standalone server in favor of a replica set?
[13:30:44] <Sticky> depends on your usecase and how reliable you want it to be and how valuble the data is
[15:32:45] <NoOutlet> Would it be bad if it synced more often?
[15:34:23] <cheeser> maybe cron this daily? http://docs.mongodb.org/manual/reference/method/db.cloneCollection/
[15:37:59] <NoOutlet> I think it would be a lighter weight operation to do something like tailing the oplog more regularly than once a day.
[15:38:29] <cheeser> sure. but you'd have to process the oplog entries manually.
[15:39:26] <cheeser> or maybe this might help: http://docs.mongodb.org/manual/reference/program/mongooplog/
[15:41:46] <NoOutlet> That's not the understanding I have from this video: https://www.youtube.com/watch?v=lGcruDWeWA8#t=325
[15:42:45] <NoOutlet> I'm not sure of the exact operations that were used to create the tailable cursor Adam is talking about there, but I didn't get a sense of having to manually process the oplog.
[15:42:54] <cheeser> that's for feeding info in to solr. you'd still need to translate the entries to tell solr what to do with them.
[15:42:56] <NoOutlet> Maybe he was talking about the operation you linked to.
[15:44:26] <guest9999> hello. how would i go about adding a new column to a sub document (in an array) of every matching root document? e.g. http://pastebin.com/eBbMbBZG
[17:08:12] <GothAlice> trial: First, there's nothing semantically wrong with your desired object. There are several points of concern, however. Indentation is confusingly inconsistent, having split address lines is rarely required (since your string can simply include a newline to separate as many lines as needed), and having inconsistent keys under items (length1/length2) will lead to madness. Also don't paste code into the channel!
[17:16:22] <GothAlice> saml: I don't understand the question.
[17:16:22] <saml> this thing comes up in every meeting
[18:38:44] <MacWinner> finally made the decision to migrate everything over to MongoDb + Gridfs.. all data in mysql as well as all files stored in gluster
[18:39:01] <MacWinner> should I be waiting for 3.0 release?
[18:39:52] <MacWinner> or is upgrade from 2.6.7 to 3.x supposed to be fairly smooth? Not a huge dataset as of now.. maybe a couple millions documents
[18:39:57] <GothAlice> MacWinner: Do you absolutely require anything it would provide?
[18:40:23] <MacWinner> GothAlice, The storage savings looks interesting since a lot of our documents have repetitive data in it
[18:40:49] <GothAlice> Migration with re-compression (basically switching storage engines) is not a minor thing.
[18:42:33] <GothAlice> I implemented compression of low-hanging fruit (bulk text data like HTML fragments) application-side two years ago, since I couldn't wait for compression to be added upstream. (Same for full text search, also implemented it prior to it getting added upstream and better handles multilingual searches.) With an appropriate ORM/ODM rolling it yourself isn't difficult at all.
[18:42:48] <GothAlice> (And would give you a broader selection of compression schemes which could be tailored on a per-field basis…)
[18:42:54] <MacWinner> k.. I'm leaning towards just going with 2.6.7 as the baseline now and worrying about those features later..
[18:43:09] <MacWinner> GothAlice, which language/ODM are you using?
[18:44:10] <GothAlice> https://github.com/marrow/cache and https://github.com/marrow/task are some high-level tools built on top of that ODM we've open-sourced from work. :) (Task is WIP pending a bug fix MongoDB-side.)
[18:44:33] <MacWinner> we are kinda stuck on PHP for the time being
[18:44:48] <MacWinner> but thanks for all the tips.
[18:45:39] <MacWinner> as a first step, I'm going to be converting my existing replica-set to a sharded cluster.. we have a 2 member replica set plus an arbiter
[18:45:59] <GothAlice> Choice of sharding key is, well, key. ;)
[18:46:49] <MacWinner> GothAlice, yeah! i think I got some good pointers on that
[18:47:59] <GothAlice> MacWinner: https://gist.github.com/amcgregor/c33da0d76350f7018875 is a handy little shell script I wrote to assist in testing sharding keys locally (for automated testing rigs). This script sets up a 3x2 sharded replica set with authentication on one physical host (i.e. your dev box). Variables at the top to configure. :)
[18:48:43] <GothAlice> My day is complete if I can save someone time. Anything worth doing once is worth automating! :D
[18:51:52] <foo2> Anyone here know how to do full text search using mongoose?
[18:52:14] <foo2> I am trying to understand this error:
[18:52:16] <foo2> "planner returned error: failed to use text index to satisfy $text query (if text index is compound, are equality predicates given for all prefix fields?)] name: 'MongoError' }"
[18:55:50] <GothAlice> foo2: Could you pastebin/gist the exact query you are attempting to run, an example document, and your index?
[19:00:24] <foo2> and I am just doing "Message.find({ $text : { $search : 'hello' } });"
[19:04:12] <GothAlice> foo2: Your error relates to http://docs.mongodb.org/manual/core/index-text/#text-index-and-sort (see bullet point two under "Compound Index")
[19:05:28] <GothAlice> foo2: What is the result of db.Message.getIndexKeys()?
[19:05:48] <GothAlice> (In the mongo shell, where "Message" should be replaced with whatever the actual name of the collection is.)
[19:09:51] <foo2> GothAlice: I haven't mongo-ed in a long time so I am bit rusty with the shell stuff
[19:10:14] <foo2> Basically, there is an existign app with that Message document and I am trying to add full text search to it
[19:11:46] <foo2> what's the besy way to go about it?
[19:15:01] <GothAlice> Well, a full text index is pretty useful for that. But the issue I think we're seeing is that {} objects in JS might not be preserving the definition order. (So the $text index is actually coming later in the index's field list than it should be.)
[19:15:43] <GothAlice> To determine if this is the case I need the output of db.message.getIndexKeys() from the shell. (The shell isn't actually that scary… you just need to connect and run that one line a la: mongo mydb)
[19:25:46] <foo2> GothAlice: Yeah, i am just reading the commands now
[19:26:50] <MacWinner> GothAlice, pretty awesome script you created to test out sharding! does it self-contain all the data and configuration into the directories specified in BASE, RUN, and LOG variables?
[19:27:03] <GothAlice> MacWinner: It certainly does. :)
[19:27:30] <GothAlice> (And if you nuke the contents of BASE, it'll re-create the local cluster from scratch.)
[19:28:29] <GothAlice> It's also very useful for automated test suites.
[19:30:06] <MacWinner> GothAlice, at teh end of the script, it says this: "Remember to run db.runCommand({enablesharding: 'somedb'}) against your db of choice." not sure why it's not automatic?
[19:30:26] <GothAlice> Because the script doesn't assume which database you would like to shard.
[19:32:02] <GothAlice> And sometimes you want to test slightly different load scenarios. I.e. shard, then bulk load, or bulk load, then shard and rebalance, to see the performance implications.
[19:35:45] <foo2> GothAlice: I am getting ERROR: dbpath (/data/db) does not exist.
[19:35:57] <sang> hi i'm looking for some help. when i create a document via postman, i'm returned an { ok: 1 }. Is there any way to get back the entire document I just created?
[19:36:08] <foo2> but the data has to be somewhere, cause the app is loading it fine
[19:37:01] <MacWinner> GothAlice, is the admin user/password only for the admin database?
[19:37:29] <foo2> Is there any way to find out what data folder it is using?
[19:46:11] <foo2> I am trying to add a search message history feature
[19:46:16] <GothAlice> So you can see in your mongo shell output the index on lines 13 through 19? That's the full text index you're creating.
[19:46:49] <GothAlice> (It is indexing room, posted, _id, and _fts = full text search.)
[19:48:14] <GothAlice> foo2: See how _fts (and _ftsx, together) are at the very end of that index. That isn't what you asked it to create, those need to be at the top. For whatever reason your JavaScript runtime isn't preserving the order of the object properties when you're creating that index spec. Is there an alternate syntax using array notation you could use to ensure the order is correct?
[19:49:23] <foo2> GothAlice: The output I pasted was after I reverted the code to remove any index on "text" attribute
[19:49:55] <GothAlice> Well, it's still there. It didn't delete the index.
[19:54:28] <foo2> GothAlice: I am not sure how to do that, but thanks for providing the direction
[19:54:35] <GothAlice> foo2: https://gist.github.com/amcgregor/d5a90090f6cf6754cf95 < creating the index in the mongo shell does preserve order correctly.
[19:56:04] <foo2> GothAlice: I copy pasted that in the shell
[19:56:13] <foo2> but it didn't preserve the order
[20:13:02] <GothAlice> d0x: The typical approach with MongoDB is to "denormalize" your data so that you wouldn't need the effect of a JOIN. Sometimes this gives the impression of data duplication, but practicality beats purity. MongoDB encourages you to design your data structures around how you're actually _using_ that data.
[20:13:32] <fxmulder> if I am seeing this when migrating shard chunks: errmsg: "E11000 duplicate key error index: mail.message_data.$message_identifier_1_chunk_1 dup key: { : "41178dad47a9bfab05fd4fb3c339f3a6.9d7408770af8e2bd107f..."
[20:13:44] <GothAlice> foo2: Creating indexes can block access to the collection (if background==false) or take a really, really long time (background==true) to initially build.
[20:13:52] <foo2> GothAlice: I removed indexes and now ran my program again and now it works :/
[20:14:08] <fxmulder> I would assume where should be a document in the destination replica set that would have a message_identifier starting with 41178dad47a9bfab05fd4fb3c339f3a6.9d7408770af8e2bd107f right?
[20:14:24] <foo2> Now it searches stuff. But I found a weird thing
[20:14:37] <foo2> GothAlice: It doesn't search the query 'this'
[20:14:55] <foo2> even those I have text containing that term
[20:14:58] <GothAlice> foo2: Well, the order may be pseudo-random. In Python (the language I have the most experience with) dictionaries ({} objects) store their keys in ID order, which for strings is the memory location the string is stored at. This can change on each execution of a script.
[20:14:59] <foo2> for all other term it seems to work fine
[20:18:36] <d0x> GothAlice: Yeah, seems like i don't get around "prejoining" the data. I had a look to apache spark sql (this Hive relacement) but with this i need to "copy" the data into their system (if i understood it correct)
[20:18:51] <d0x> And i dont like to setup a 2nd infrastrcutre
[20:18:58] <d0x> And i need response times from <200ms
[20:20:34] <GothAlice> d0x: Heh, part of the reason I use MongoDB instead of Redis/Membase. And MongoDB instead of ZeroMQ/RabbitMQ. And MongoDB instead of Syslog. And MongoDB instead of some external bulk file storage system like S3. And so forth. ;)
[20:21:28] <d0x> I need to find some arguments that Single Source of truth isn't that important when comes to our usecase
[20:21:57] <d0x> our production stores data in a single collections but our dashboard needs some joins
[20:23:44] <GothAlice> On one project I replaced Membase (cache; MongoDB provides TTL indexes), ZeroMQ/RabbitMQ (one has persistence, one is "faster"; MongoDB has tailing cursors on capped collections), Postgres (actual data storage), Celery (distributed tasks; replaced with 180 lines of Python and capped collections), and Boto (S3; MongoDB offers GridFS).
[20:24:38] <d0x> Junky :). I only replaced RabbitMQ by MongoDB
[20:26:50] <d0x> But this aggregation limitation and single source of truth principle is really a problem for me. We had a SAP Hana sales man in our house and he mentioned that the maintanence of this ETL scripts and stuff you don't have with hana.
[20:27:09] <d0x> But he haven't mentioned the prices for infrastrucutre as well...
[20:27:13] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework may prove useful.
[20:27:51] <GothAlice> d0x: Heh, with one of my datasets using a "popular cloud hosted MongoDB service" would cost me a half million dollars per month.
[20:28:43] <d0x> yeah, their prices per GB is strange. At the beginning is thought we have a design fault
[20:28:54] <d0x> because using their services would cost too much
[20:29:06] <d0x> and i thought we are using mongo for the wrong purpose :)
[20:29:18] <GothAlice> I have 26 TiB of data in my at-home DB. ¬_¬
[20:29:38] <d0x> lool :) I think even chuck norris can't do that!
[20:31:18] <GothAlice> I go to conferences and make dev teams for Dropbox, that "popular cloud hosted MongoDB service", and others uncomfortable at the parties. XD At one I even managed to kill bitbucket for 15 minutes by clicking the "Delete Repository" button. ¬_¬
[21:17:42] <liamkeily> if i have a collection of items. And for each item there is a reference to a user. Whats the best way to get the user for each item from a query. Is this bad structure?
[21:18:20] <GothAlice> liamkeily: A better question is what data about that user do you need in relation to each of those records? A name for display purposes?
[21:19:46] <GothAlice> liamkeily: There are several approaches, in order of complexity. Simplest (and least efficient) is to query the database for each of those names at your display layer, as you iterate the initial result set.
[21:20:15] <GothAlice> Better is to perform the first query, get your results, then issue a single additional query to get a mapping of user IDs to names for the whole set in one go.
[21:20:16] <liamkeily> i was thinking of querying for each one at a time. But then a cache layer would stop most queries.
[21:20:58] <GothAlice> I take the most complicated approach, which combines the first two approaches with caching.
[21:22:36] <GothAlice> My caching layer is MongoDB itself, but I'm caching more than just the user's name. I have a generic "render this key/value thing" function which, if the value is an ObjectId, will attempt to find a model with the same name as the key, load the record with that ID, then run str() across it, additionally wrapping it in an <a href=""> to the canonical URL for that record with an icon based on the type of record it is, then caches that.
[21:24:25] <GothAlice> A third approach, which I also use in some places (notably invoice documents) is to include the information you need in the records you have the "reference" in. I.e. {creator: {_id: ObjectId(…), name: "Bob Dole"}, …}
[21:25:48] <GothAlice> You just need to remember to, if the user changes their name, to run a db.invoices.update({'creator._id': user}, {$set: {'creator.name': newsman}}) to update the "cached" values.
[21:28:09] <liamkeily> GothAlice, thanks that helps a lot
[21:29:09] <GothAlice> The MongoEngine ODM has a schema type which can manage the synchronization for you: http://docs.mongoengine.org/apireference.html#mongoengine.fields.CachedReferenceField (FYI ;)
[21:29:42] <GothAlice> (And the syntax allows querying those cached values as if it were a join. Which is bad-ass.)
[23:13:01] <blizzow> I connected to my database/admin as the admin user. I did: use mynewdbname , then I did: db.createUser( { "user" : "auserwhoexistsonother", pwd: "myawesomepass", roles: [ "readWrite" ] } ) and get an error saying.
[23:13:02] <blizzow> Tue Feb 17 16:07:36.542 TypeError: Property 'createUser' of object metrics is not a function
[23:13:18] <blizzow> can someone explain to me what I'm doing wrong in creating a new user?
[23:15:34] <Boomtime> blizzow: can you type version()
[23:17:39] <blizzow> ahhh, my mongoshell is out of date. Thanks Boomtime.