[00:00:13] <GothAlice> Also, if your ODM layer has the concept of inheritance, model('Task').find({}) may actually be doing something like db.tasks.find({_type: 'Task'})
[00:00:33] <shoerain> This could very well be the case. There are a number of other collections (Campaigns, Users) that match up though
[00:01:31] <shoerain> the schema for Task is at 500 lines, so I guess there's a lot of logic going on there
[00:02:13] <GothAlice> shoerain: You can get mongod to output all queries by running db.setProfilingLevel(2) in the DB of choice, then run the .model('Task').find({}) line and see what it's actually doing. :)
[00:03:37] <GothAlice> http://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/ is if you're doing it from the 'mongo' interactive shell, http://docs.mongodb.org/manual/reference/command/profile/#dbcmd.profile if you need to issue it as a remote command.
[00:09:14] <GothAlice> shoerain: As a minor note… a single document structure containing 500 lines of definition is probably a bad sign. MongoDB has some querying limitations (i.e. only query one child list at a time) and I don't even want to think about your indexing strategy… XD
[00:11:20] <GothAlice> (I have bizarre data models, but even then, 99% of my documents never exceed two dozen static fields.)
[00:15:43] <shoerain> GothAlice: cool, found the reason why. It was filtering by a property that didn't exist in my collection, but does on other people's.
[00:59:19] <GothAlice> (marrow.task is being written in the promise/future style)
[01:48:29] <drags> is there any special internal collection for mongod errors?
[01:52:15] <GothAlice> drags: AFIK the server log is the canonical error log, but may also be able to use http://docs.mongodb.org/manual/core/auditing/ to log errors.
[01:52:45] <GothAlice> They key being if the server process experiences a critical fault, it should still be able to write to a log file or syslog daemon even if it can't write safely to its own data files.
[01:53:10] <EmmEight> Keep logs on a separate drive than mongo data IMO
[01:53:47] <GothAlice> drags: ^ - quite important if reliable error recovery is something you want
[02:05:58] <drags> GothAlice: mostly looking to alert based on errors, if I could write a client that hit the mongo process as a client and pulled errors from a collection that'd be a start
[02:06:14] <drags> the proper method would be to alert from a central syslog server, but we don't have the central logging setup yet
[02:06:41] <GothAlice> drags: Sever log monitoring tools are plentiful, both in the syslog aggregation and tail -f | grep styles. ;)
[02:08:44] <EmmEight> Here is a syslog algamation repo I started, lols https://github.com/chris-childress/amalgamatron
[02:13:20] <GothAlice> I love seeing different wheel designs.
[02:13:39] <GothAlice> ("Re-inventing the wheel, every time." was my company's slogan for many years.)
[02:23:03] <tylerdmace> Hello! Anyone with MEAN stack experience and a few minutes to spare? I've got a really weird issue where I cannot seem to create a mongo document with a custom _id value taken from a web form.
[02:25:14] <tylerdmace> If I leave _id off of the schema and let Mongoose/Mongo create it itself, the rest of the form submits and the document gets created as normal; but simply adding in _id into the schema causes a 404 when trying to PUT
[02:27:40] <GothAlice> tylerdmace: Well, the 404 is your web framework masking the true error (i.e. a default "unrecoverable state" response), I suspect. What, exactly, is the value you are trying to assign to _id?
[02:27:56] <GothAlice> (Pastebin the insert statement and sample values for variables if possible, please.)
[03:22:12] <GothAlice> kapil: I.e. after getting a document containing a "foreign" ObjectId field, you have to re-query to get the related data. And yes, this means you can't issue JOIN-like queries.
[03:23:01] <kapil> thanks <GothAlice> , does this mean that i should store references of one object into another
[03:26:02] <GothAlice> kapil: It really means you should think very hard about how you structure your data. Sometimes what might appear as "duplication" of data is actually needed to efficiently query, etc.
[03:26:36] <GothAlice> kapil: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html — There are also some good documents in the MongoDB documentation itself on modelling data.
[03:26:54] <kapil> thanks .. let me go thorugh those before i ask more questions..
[06:34:53] <GothAlice> Ouuuuuch. Yeah, that clearly indicates a problem. (You only have 213 documents, and it had to evaluate your query against each one separately… all to return a single document.) You mustn't have an index on 'key'. However, that query is fast—millis=0—it's not actually the query causing you problems. What's an .explain() on the correct query?
[06:37:43] <garbagegod> Interesting, I am pretty positive that is the correct query
[06:38:08] <garbagegod> I can run it without .explain() and it does in fact take an enormous amount of time from the CLI, if that's any indication
[06:38:56] <bazineta> I'd take a look at what the log has during that time for things hitting the 100ms slow trigger
[06:39:37] <GothAlice> It may also be a case of the interactive shell's repr trying to pull in the full resultset before printing anything. Is the line slow even if you assign it to a variable? a la: var foo = db.example.find(…)
[06:40:15] <bazineta> Given that it's not that particular query that's slow, even though it's unindexed, I'd guess that there are other things going on at the same time.
[06:41:37] <garbagegod> Odd, nothing in system.profile after running the query a few times, I suppose you're correct
[06:41:49] <bazineta> Then again, I'd just put an index on key, to eliminate the scan as a noise issue
[06:42:40] <garbagegod> To answer your previous question, foo.next() is very slow in that situation
[06:42:44] <garbagegod> Also, there is an index on key
[06:43:25] <bazineta> Odd that it's using a BasicCursor then. What does a stats() call on the collection return?
[06:44:47] <GothAlice> garbagegod: But you're not actually issuing a $text query… http://docs.mongodb.org/manual/reference/operator/query/text/#op._S_text
[06:44:59] <GothAlice> You're issuing a string comparison query.
[06:46:33] <Soothsayer> I have a Products collection which has two fields of concern - a) store (which store does this product belong to?) b) categories (array of category ids) — When a user visits a category I’m doing a find query on this collection on a particular store and list of categories. What should be the order of my compound index for this collection? store,categories OR categories,store ?
[06:46:36] <GothAlice> So what it might be doing when table scanning in this instance is hashing the large input text set prior to comparison… which would make it slow indeed. (No clue if this is how it's implemented, of course.)
[06:48:21] <garbagegod> Let me see about the performance of a $text query
[06:49:02] <GothAlice> Soothsayer: Filter the returned fields to eliminate network transmission of large documents as being the issue, too: .find({…}, {_id: 1})
[06:54:16] <Soothsayer> it instead says "cursor": "BtreeCursor _id_ reverse", <— isn’t this as good as no index? why doesn’t it say BasicCursor instead
[06:54:33] <Soothsayer> bazineta: since its anwyay doing a full collection scan. t
[06:56:30] <bazineta> Soothsayer Show us the stats() on the collection, please
[06:59:08] <Soothsayer> bazineta: please ignore the index “"categoryOriginalIds_1_store_1" : 3859072” as I created it just now after pasting the explain.. so it appeared in the stats
[06:59:21] <bazineta> Soothsayer I would guess that there's such low cardinality on store for that store id that it figures it's pretty much going to do a scan anyway.
[07:00:13] <Soothsayer> bazineta: so that answers my question? creating an index categories, store instead of store, categories
[07:00:29] <GothAlice> Well, you haven't measured the impact of those queries yet.
[07:00:33] <bazineta> Soothsayer That is, back to databases 101, if you're going to select at least 10% of the total records, then a scan is typically as fast as anything else you could do. Is that particular store id very common in the data set?
[07:01:08] <Soothsayer> bazineta: yes it is, has the most number of products.
[07:01:30] <Soothsayer> right now, but it can get skewed as we add more stores
[07:01:46] <Soothsayer> GothAlice: I am creating both the indexes and running explains, fewm ins
[07:02:35] <bazineta> Soothsayer The optimizer could be then punting, basically, as the return set would be slowed by an index given the low cardinality. I think the alternate index and additional explains will shed more light.
[07:05:38] <bazineta> Soothsayer Ah, I might see it.
[07:05:59] <bazineta> Soothsayer See how in the reverse id plan, it doesn't need to to a scanAndOrder
[07:06:12] <bazineta> Soothsayer Whereas in the store plan, it does.
[07:06:31] <Soothsayer> bazineta: GothAlice exaplains on both the new indexes - category, store ( http://pastie.org/private/jsch6qpyogdl3tb7fliajw ) and store, category ( http://pastie.org/private/aorzqpdmh6engrlvvmmzg )
[07:07:07] <bazineta> Soothsayer Is there a sort you're doing as well, as part of that query?
[07:07:19] <Soothsayer> bazineta: understood… but doesn’t the number of documents need to scan outweight the job of sorting?
[07:07:43] <Soothsayer> its sorting by _id, DESC..
[07:07:50] <GothAlice> Soothsayer: From the looks of that both suffer the scanAndOrder hiccup (since you're getting results back in 'natural' order), but store_1_categoryOriginalIds_1 is superior (on that query).
[07:09:02] <Soothsayer> even the query time has gone down dramatically, 4 millis
[07:09:14] <Soothsayer> scanAndOrder has become false now
[07:09:28] <GothAlice> Phew. No descending sort on _id for you! ;)
[07:09:29] <Soothsayer> bazineta: I understand your reasoning, that was insightful
[07:09:34] <bazineta> scanAndOrder false is always joy and happiness
[07:10:05] <GothAlice> indexOnly can be celebration-worthy.
[07:10:24] <Soothsayer> GothAlice: i wish.. but will be too many fields to cover in an index
[07:10:54] <Soothsayer> plus, i have multiple multi-valued array fields
[07:11:12] <Soothsayer> bazineta: GothAlice I guess if I have an index on categories in a compound index, I can’t have another multi-valued field in it too?
[07:12:41] <GothAlice> (Having multiple arrays in a document suffers other limitations, too, notably on $elemMatch.)
[07:13:16] <Soothsayer> now coming back to my main dilemma..
[07:14:00] <Soothsayer> these two explains… http://pastie.org/private/floyn8xg9cqjlcfq0zbxza | http://pastie.org/private/liy10myzi7sfvhifsprwg
[07:14:12] <Soothsayer> both have near equal millis over multipl tries
[07:14:23] <GothAlice> Indeed. For that particular query, they are basically equivalent.
[07:15:16] <GothAlice> (They're within half a percentage point of each-other.)
[07:15:30] <bazineta> Same order of magnitude on both plans. I'd favor store first for simplicity, both being about equal, and assuming that you'll have more stores in the set later.
[07:23:56] <Soothsayer> GothAlice: bazineta my aggregate pipeline explain says - [planError] => InternalError No plan available to provide stats
[10:01:21] <kevc> does anyone know why mongodb would block on writes when write concern is set to 0 using the python client?
[10:13:00] <someotherdev> Hey all, have a question about the best way to implement a query. Imagine you have have content which has a ref to a content type. Then you want to select 5 piece of content per content type. Is there a way to do this without multiple queries?
[10:34:05] <someotherdev> repost: hi, have a question about the best way to implement a query. Imagine you have have content which has a ref to a content type. Then you want to select 5 piece of content per content type. Is there a way to do this without multiple queries?
[10:43:48] <kevc> oh god the pymongo client is blocking...
[10:49:56] <oipat> someotherdev: i'm a noob but afaik, that is something you need joins for and mongo doesn't have them
[10:54:12] <someotherdev> well, you don't need a join as you can user $in to filter on an array
[10:55:38] <someotherdev> I just want to know if there is a way to say, select 15 or some fixed amount of documents but ensure each content type only has 5 documents.
[10:56:16] <someotherdev> The simple way would be to do find(bycontentid).limit(5) - repeat for the length of the array of content type - but that's really ugly
[11:30:02] <kevc> if I want a different primary for a database, can I set primary on a per database level, or do I need to create a new replica set for that?
[11:31:28] <kephu> I've got a bit of a noobish question: I wanted to store some functions in my DB, but I'd also like to assign some extra values to them (like, e.g. how effective they've been in the past). Should I use db.system.js, or can I create my own collection? Or is the whole idea stupid to begin with?
[11:48:05] <someotherdev> repost: hi, have a question about the best way to implement a query. Imagine you have have content which has a ref to a content type. Then you want to select 5 piece of content per content type. Is there a way to do this without multiple queries?
[11:52:05] <kephu> so... anyone here or is everyone idling? ;)
[11:54:29] <someotherdev> kephu, no idea how to solve your issue sorry. Though put logic on the database is probably not a good idea
[12:18:11] <kephu> someotherdev: well tbh, it's not, strictly speaking, logic
[12:19:30] <someotherdev> why not keep them on your server and have an activity log of some sort to see how effective they are?
[12:41:52] <kephu> someotherdev: using a log for that seems magnificently counter-productive, out of the two evils "store the function name in mongo, store the function in a hashmap in the code" seems less hacky
[12:43:23] <Rogero> Quick question: are Object IDs are unique across the database or collection?
[12:45:48] <Rogero> question in other words: can I refer to an ObjectID without the need to refer to the holding collection or db, considering the ObjectID is unique across the database?
[12:51:09] <kephu> someotherdev: okay quick update: turns out it works without a hitch. What tripped me up is that mongodb.org's "try it out" interpreter can't handle that thang
[12:58:24] <someotherdev> but that doesn't change the fact that are timestamps and you can reference other collections. It will mostly likely be an index in that collection
[13:03:44] <Rogero> kali: one more question if you don't mind: if some collection exceeds the maximum allowed storage space, which is 16MBs, does sharing is turned on automatically ?
[13:04:09] <Rogero> or what happens exactly in this case?
[13:44:45] <mongohacker> I am using a mongodb replica set with one primary and two secondaries and running nodejs on top of it. I am using mongodb nodejs native driver (2.0.0). When the primary goes down, the driver doesen't update the replica set status. Can someone please help me
[13:49:39] <mongohacker> @Derick: Can you please help me here?
[13:55:16] <kali> mongohacker: have you checked that the replica set has elected a new primary ?
[13:56:27] <kali> mongohacker: then, it depends on the driver. it may require it to have all the already open connections to fail once to get new connections to the new primary
[13:59:37] <mongohacker> kali: Yes. A new primary is coming up. I have checked rs.status()
[14:00:21] <mongohacker> kali: Thanks for the feedback!
[14:59:00] <old_black> hi - I have a mongomapper issue .. Id like to suppress mongo cursor timeouts, and I switched over from MyObject.where() to MyObject.find_each().. but I am dealing with millions of records.. is there a way to pass a limit in or do I need to go down to the raw driver?
[15:31:29] <EricL_> Anyone know if there is code floating around to marshal a Mongo JSON dump in to Golang object with struct tags for JSON?
[15:55:26] <Mmike> why rs.initiate() returns a dictionary with fancy 'ok' key, same as rs.add(), but rs.delete returns just some message
[15:56:44] <GothAlice> Mmike: I don't actually have an rs.delete() shell command. :/
[15:58:50] <GothAlice> Mmike: If you type the names of the methods without parenthesis in the shell, you can see the code for them. In most cases rs.remove() return the result of replSetReconfig, just like rs.add(), but on certain error conditions i'tll return a bare string.
[15:59:30] <Mmike> GothAlice: now, when I want to do rs maintenance stuff from, say, python... how would you go with it?
[16:00:20] <GothAlice> Mmike: By replicating the functionality of those mongo shell commands (remember: you can see the source). You'd need to fetch the current config, modify, then issue a http://docs.mongodb.org/manual/reference/command/replSetReconfig/ command.
[16:04:54] <kexmex_> how do i prevent Mongo from writing out details of slow Queries/Updates to Logfile?
[16:05:49] <cheeser> turn the profiler off http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/
[16:06:31] <GothAlice> kexmex_: Note: slowms queries are usually a symptom of a correctable problem; bad indexes, etc.
[16:08:30] <GothAlice> https://github.com/mongolab/dex — slowms is also insanely useful for automated analysis tools, including this really nifty one that can suggest optimal indexing strategies to match your queries.
[16:08:33] <kexmex_> i dont think i've seen quries
[16:15:08] <GothAlice> kexmex_: http://docs.mongodb.org/manual/tutorial/create-a-unique-index/ — there is some control over dropping of duplicates and whatnot, too.
[16:15:30] <kexmex_> i was looking at reference :)
[16:15:33] <shoerain> hmm, can I use the mongo shell in javascript? i.e. an express.js app? i'd like to do a read-only dump of db.tasks.find({publishDate: {$gte: new Date()}}) and serve it.
[16:16:13] <GothAlice> shoerain: It would likely be best if you wrote your code against the standard Node.js driver rather than trying to piggyback on the shell.
[16:17:44] <kexmex_> so all my writes will fail until both real severs are online?
[16:18:17] <GothAlice> kexmex_: http://docs.mongodb.org/manual/core/replica-set-write-concern/#verify-write-operations-to-replica-sets — see the last paragraph of this section (just below the code sample).
[16:18:31] <GothAlice> (And yes, for any query where you have w=2.)
[16:19:11] <kexmex_> so arbiter is not counted as a server in this case?
[16:19:20] <GothAlice> Certainly not; it doesn't hold any data.
[16:19:26] <kexmex_> i am trying to guarantee that both servers are up to date
[16:19:36] <kexmex_> so if one goes down, the other one is a good copy
[16:21:25] <GothAlice> kexmex_: Generally you have to rely on replication to handle this for you. Using a w=1, j=true concern will make sure it gets to the primary, the replica will then take a few millis to stream the write from the primary. (You'd want to measure the replication lag in your setup, though.) This gives you a "window of failure".
[16:22:00] <GothAlice> kexmex_: In my own setups I use three real replicas and w=2 for critical inserts/updates. (No arbiter.)
[16:23:36] <kexmex_> GothAlice: ouch... cant afford a real replica now :)
[16:24:28] <GothAlice> kexmex_: Then alas, the highest durability you can safely achieve (if you want to be able to run when mongo is degraded) is w=1, j=true. :/
[16:25:28] <GothAlice> (j=true is several orders of magnitude slower than j=false, for obvious IO-bound reasons)
[16:25:30] <kexmex_> can j=true be set in server conf? or it's on command level?
[16:26:30] <GothAlice> kexmex_: You can specify it as a per-connection default if you wish, you can also set it for the whole replica set. See: http://docs.mongodb.org/manual/core/write-concern/#default-write-concern and http://docs.mongodb.org/manual/core/replica-set-write-concern/#modify-default-write-concern
[16:27:05] <GothAlice> (The per-connection default would be set in your MongoDB client app.)
[16:28:14] <adamobr> Hi there, anyone know some bug that make mongos open too many connections in a replicaset primary?
[16:29:11] <cheeser> the primary opens too many connections to its secondaries?
[16:29:28] <cheeser> or the primary refuses connections from your client because of too many open connections to the primary?
[16:29:56] <adamobr> my mongos are oppening too many connections to each primary in shard environment
[16:30:30] <adamobr> the secondary the open connections is low
[16:31:06] <adamobr> and in primary of each replica raises until 20k and starts to refuse new connections
[16:31:50] <GothAlice> adamobr: Have you examined the output of netstat? (I'd be looking for something like lots of TCP connections in CLOSE_WAIT state…)
[16:32:15] <GothAlice> I.e. does the machine think those are _real_ connections, or _dead_ connections?
[16:32:48] <adamobr> insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time
[16:33:48] <GothAlice> (I've run into CLOSE_WAIT in my testing of high-performance HTTP servers after I noticed the connections weren't being freed up for re-use fast enough.)
[16:39:14] <GothAlice> adamobr: What's the output of the following command (in BASH) on the host running mongos: netstat -nt | awk '{print $6}' | sort | uniq -c | sort -n -k 1 -r
[16:42:52] <kexmex> the update (upsert) contains _id field
[16:43:26] <GothAlice> adamobr: So the initial mongostat output actually looks fine. (Mongostat only shows current activity, not historical.) The initial netstat output (and the summary of connection types) clearly indicates a problem, though, with ~4K connections to each of your replica set members.
[16:44:16] <GothAlice> kexmex: nModified: 0, upsert: 1 = it inserted a record.
[16:44:56] <adamobr> yes, it's strange because my secondarys have 270 connections only GothAlice
[16:45:10] <adamobr> it only occurs in primary mongod
[16:46:51] <adamobr> and this is my mongos01 mongostat output: http://pastebin.com/A82SsYkS
[16:46:53] <GothAlice> kexmex: Well, no, nscanned indicates it had to run through 2,096,711,937 entries (either documents or index elements) to determine that while two of them were of interest—nscannedObjects—and one matched—nMatched—it still wasn't appropriate and a new record was inserted anyway.
[16:47:51] <GothAlice> kexmex: Sounds like a standard case of needing a better index to cover that query, or a better query to better reduce the scope of the records being evaluated.
[16:54:07] <GothAlice> adamobr: That's actually rather impressive. Also worthy of note, a few of those connections are actually draining from the pool.
[16:56:23] <adamobr> GothAlice: do you know if exists a ticket or a message like this? I searched for informations but don't found anything that could explain.
[16:58:07] <GothAlice> adamobr: How many shards, how many nodes in each shard (i.e. is it a sharding replica set), and what is the connection pool size of your client application?
[16:59:19] <adamobr> 1 shard with 3 replicasets x 3 hosts and head shard is 3 mongos with 3 mongoconfig
[16:59:22] <GothAlice> Also, which version of mongo?
[17:04:20] <GothAlice> http://cl.ly/Yb7L — apologies for using a link shortener, JIRA (the ticket system) isn't IRC friendly. This is a list of all networking, replication, and sharding bugs fixed in versions newer than your current version. Several tickets seem rather important.
[17:04:36] <adamobr> but , as we can see my apis (php) aren't using more than 300 connections http://pastebin.com/WmHPiayb
[17:04:38] <GothAlice> (You can drag the vertical separator between the list of tickets and ticket details to make more room for hte list.)
[17:05:55] <adamobr> the last 3 ones are bettwen mongod(s) process.
[17:06:00] <GothAlice> You should be seeing only ~1000 or so connections active on the mongos host. :/
[17:10:20] <GothAlice> Hmmmmmmmm. Do you have connection pooling or persistent connections enabled, PHP-side?
[17:11:02] <GothAlice> http://php.net/manual/en/mongo.connecting.pools.php and http://php.net/manual/en/mongo.connecting.persistent.php
[17:11:27] <adamobr> hum, but doing this wouldn't cause the connections to rise even more?
[17:11:37] <adamobr> I need to handle disconnections now?
[17:11:45] <adamobr> or just rely on tcp timeouts?
[17:11:50] <GothAlice> It *should* automatically handle that.
[17:14:51] <GothAlice> http://stackoverflow.com/questions/8968125/mongodb-connection-pooling is a related question in Java-land. (Use one MongoClient instance per invocation of your scripts being the translation, and if you use multiple, yes, you need to clean them up yourself.) http://us2.php.net/manual/en/mongoclient.close.php notes that "you should never have to do this under normal circumstances"
[17:15:38] <GothAlice> (I would suspect the call to .close() would be a shutdown function a la http://us2.php.net/manual/en/function.register-shutdown-function.php )
[17:20:03] <GothAlice> adamobr: PHP exists in a strange and unholy place where it wants to be request-transactional (i.e. only caring about reality during the processing of a request), making standard optimization like connection pooling an incredible PITA and source of obtuse bugs—the fault in your case may be client-side _or_ server-side. I'd strongly recommend opening a JIRA ticket describing your case (and including those pastes) to see if official support
[17:26:09] <adamobr> We will check our envirovnment and create a ticket. Thank you for ou help.
[17:27:14] <GothAlice> (My thought process that PHP may not be closing connections properly, or that mongos isn't picking up that the connections are closed, and is leaving them hanging, creating bundles of new monogod connections each time PHP re-connects.)
[17:41:26] <rendar> GothAlice: everybody lies. especially 3rd party data
[17:41:31] <GothAlice> (The most well-known of the classic blunders being: "never get involved in a land war in Asia")
[17:41:51] <GothAlice> rendar: You'd almost think people couldn't screw up RSS. Title fields are kinda useful in that format. ¬_¬
[20:15:43] <shoerain> speaking of open connections, do I have to do anything to make sure a DB connection stays up, say after a network failure and so on?
[20:16:35] <GothAlice> shoerain: MongoDB client drivers have an auto-reconnect policy by default to handle things like replica set failover, etc.
[20:17:06] <GothAlice> shoerain: (I.e. every time there's an election in a rs, active connections will be dropped.)
[20:17:15] <GothAlice> But they'll come back, being the idea.
[20:25:25] <shoerain> cool, and that's something that would show on the error/1st variable of the callbacks in javascript, I'm guessing
[20:25:40] <shoerain> I'm wondering how to use this: http://docs.mongodb.org/v2.4/reference/connection-string/#uri.w
[20:26:10] <shoerain> would it be like 'mongodb://db1.example.net,db2.example.net:2500/?w=-1' for one way of preventing writes?
[20:26:27] <shoerain> In addition to having a user with readOnly permission
[20:33:14] <GothAlice> w=-1 means "throw the data at the socket and pray it sticks"
[20:33:52] <GothAlice> (I only use -1 on performance-critical logging operations where loss of the log entry is recoverable.)
[20:35:07] <GothAlice> There are two main ways to enforce read-only access: readOnly permission, or only ever connecting to a secondary. (The latter also has the side-effect of potentially returning ever-so-slightly out of date data.)
[21:10:42] <asheinfeld> Is there a good place to see different schemas applied to different kind of apps? I want to compare my schema with others to understand what i'm doing right / wrong
[21:13:13] <GothAlice> asheinfeld: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html is the general summary of "how to model your data" I frequently reference, with http://docs.mongodb.org/manual/data-modeling/ as the core docs on that, and the tutorials providing deeper insight into certain patterns: http://docs.mongodb.org/manual/tutorial/#development-patterns
[21:13:50] <asheinfeld> thanks a lot! @GothAlice. Will def take a look a those
[21:22:20] <old_black> any handy rules of thumb for calculating a good limit when reading large data sets?
[21:57:25] <smsfail> anyone have a few to discuss Mongo as an analytics persistence layer? Context: I used mongo heavily about 3 years ago and haven't been back since. Hearing it being used for things like analytics isn't a choice I would have made then, but want to become more enlightened via discussion.
[21:57:47] <smsfail> More specifically I'd love to talk about bitwise storage and performance in an analytics space
[21:58:00] <smsfail> so if anyone is available, id love to chat :)
[22:20:36] <GothAlice> smsfail: Once I get home from work, that sounds like an excellent discussion.
[22:22:13] <GothAlice> As a note: https://jira.mongodb.org/browse/SERVER-3518 — somewhat impacts bitwise use.
[22:26:19] <Foxandxss> Hello, I have a question (no mongo knowledge)
[22:27:22] <Foxandxss> I was trying to setup robomongo with a vagrant machine and then I discovered that since I forward the port 27017 which is the one that mongod uses, when I execute "mongo" in my machine, it "sees" that there is a mongod running in that port (which is actually running on vagrant) and it doesn't give an error
[22:27:35] <Foxandxss> can't access any document tho
[22:27:43] <Foxandxss> so I think that is just a side effect
[22:33:26] <GothAlice> Foxandxss: What's the exact message you get if you try to get a document? (I.e. what happens if you run "mongo test" then "db.foo.find()"?)
[22:33:40] <GothAlice> smsfail: I'm also home now. :)
[22:35:00] <GothAlice> Foxandxss: You've enabled authentication, but haven't authenticated. (Likely you've also not added a user; you can do this from within the vagrant instance only—using the localhost exception to authentication. Make sure this first user is an administrator!) Or just disable auth for local testing. ;^P
[22:36:06] <Foxandxss> so it ask for auth only when not working on localhost? (I had a demo from a book and it access to it, no auth required)
[22:36:30] <GothAlice> Foxandxss: No. You've enabled authentication explicitly; all remote connections will need to be authenticated because of this.
[22:36:45] <Foxandxss> right, I commented out that and it works from outside
[22:36:59] <Foxandxss> I didn't touch the config, I guess the puppet conf put it
[22:37:06] <GothAlice> Foxandxss: Since many people enable authentication before actually adding users, MongoDB added a "special case" when accessing from localhost if no users are currently configured: let that first user be created without needing authentication.
[22:37:35] <GothAlice> Foxandxss: I have some coworkers who like vagrant (none who like puppet) for the purpose of service isolation, but then I throw things like https://gist.github.com/amcgregor/8ccb06ebc15c7fb4ddd4 (a BASH script to start a 3x3 sharded replica set with authentication, including initial user creation) at them and do a little dance… possibly with a clown hat. ;)
[22:37:36] <smsfail> alright GothAlice . . . I am ready. Wanna do in public or pvt?
[22:37:51] <Foxandxss> Oh, robomongo can now access without ssh tunnel, nice
[22:37:56] <GothAlice> smsfail: This is rather topical to MongoDB, so no reason not to have the mongodb-for-analytics discussion here. ;)
[22:38:13] <GothAlice> Foxandxss: Ah ha! You were using the localhost exception (via ssh tunnel) to get robomongo to even work in the first place. ;)
[22:38:27] <Foxandxss> no, there wasn't a first place ;(
[22:38:46] <Foxandxss> well, I think it connected but wasn't able to show any database (auth issues for sure)
[22:39:03] <GothAlice> The localhost exception is only meant to let you set up that first user; AFIK it won't let you do general purpose queries in that state.
[22:39:28] <Foxandxss> it does (basic app from a book)
[22:39:39] <Foxandxss> I will research on the auth, every example on internet doesn't use it
[22:40:16] <GothAlice> Foxandxss: It's… useful. I host multiple clients on one cluster, with dedicated clusters for a few others (that still use authentication to prevent a hacked app from being able to reconfigure my replica set ;)
[22:40:30] <drorh> Hi. I'm using mongoose. I have nested schemas using Schema.Types.ObjectId ref. Is there a way to insert/save the root model with object instances instead of true ids, and have mongodb create the nested objects and return the refs to the main query?
[22:41:17] <GothAlice> drorh: MongoDB has no concept of relational data, so no, not like that. What you really want are "embedded documents" (nested schemas, or however your driver names them) instead of ObjectId references.
[22:41:59] <Foxandxss> guess so, normally stupid questions are answered on an aggressive manner, sadly
[22:42:20] <GothAlice> drorh: However, it sounds like stepping back and having a good look at how you model your data may be advisable. http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html is a nice reference if you come from a relational background.
[22:42:29] <GothAlice> Foxandxss: There are no stupid questions. Only stupid answers. :)
[22:42:53] <drorh> Sorry if my question was stupid.
[22:43:23] <GothAlice> drorh: Not at all! It was… complex… which is usually the first sign to take a meditation coffee break. ;)
[22:43:47] <smsfail> so. Currently have a medium scale analytics persistence layer, implemented by storing 'rows' of bitmaps (with ~300k offsets per 'row') inside of Redis. Where I am at now, is that I am about to bump the amount of offsets up into the millions, store more things AND implement a clustered solution across 5 amazon aws regions. The last one there is a big blocker with Redis and has me searching out more capable layer, in regards to clustering whil
[22:43:47] <smsfail> e maintaining atomicity and the ability to do bitwise stuffs just as fast as Redis. So, my first question is what do you think about something like this for Mongo at a high level? cc: GothAlice
[22:45:01] <GothAlice> smsfail: MongoDB, while it can be used for bitwise stuff, is currently limited in terms of querying by lacking bitwise query operator support. See: https://jira.mongodb.org/browse/SERVER-3518 (linked just before I left the office)
[22:45:09] <drorh> GothAlice: :). my driver can take nested as long as its wrapped in an array
[22:45:29] <GothAlice> drorh: … that doesn't sound quite right. {foo: {bar: "baz"}} — perfectly legit document in MongoDB.
[22:46:09] <smsfail> GothAlice: oh damn! looks like someone just commented on that ticket minutes ago too.
[22:46:11] <GothAlice> smsfail: However, using giant bitmasks seems to be an odd approach; what is the larger problem domain (not just the symptom, what the design choice was, but the cause of such a design choice will provide some insight)?
[22:47:05] <GothAlice> smsfail: (Giant bitmasks are why socket.select is hideous, performance-wise, and why alternatives like kqueue, epoll, etc. exist—they avoid bitmasks like the plague.)
[22:47:07] <drorh> GothAlice: Yeah I have np doing that. but having refs would allow me to avoid a lot of duplicate data. Question is, if i can easily/plausibly query out this way.
[22:47:45] <GothAlice> drorh: Give the article I linked you a thorough read. If you've read it, I have some ideas, but they all require some level of manual labour to implement.
[22:48:07] <GothAlice> (It explains roughly when to embed, when to reference.)
[22:50:01] <smsfail> GothAlice: so the problem is tracking things like web:login:today where the the offset is the UUID of the user. also things like messages:GUID:delivered/read/deleted etc where simple booleans are written to quickly grok all the results and see what has happened. there are a lot of queries that run at large scale currently that work almost realtime for this sort of thing.
[22:50:18] <smsfail> those keys are pseudo redis btw
[22:50:26] <GothAlice> drorh: I take a hybrid approach to allow for easier querying (and fewer additional lookups) while reducing the amount of data duplication. Requires trapping updates to the "related" data in order to also update the "cache" next to the references. A la: invoice = {company: {_id: ObjectId(…), name: {en: "…", fr: "…"}}, items=[…], …}
[22:51:24] <GothAlice> drorh: As you can see, if I update company I also need to find all of these special "cached references" to update the saved values; and while this is happening, there's a lag similar to replication lag.
[22:52:05] <GothAlice> smsfail: Ah; so to simplify all of that down, you're tracking events.
[22:53:07] <GothAlice> Problem 1: Storing logged events the MongoDB way, ref: http://docs.mongodb.org/ecosystem/use-cases/storing-log-data/
[22:53:31] <GothAlice> Problem 2: Getting useful analytics out of the events in such a way as to permit extremely fast querying, ref: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework
[22:53:33] <smsfail> and the huge offsets come from the significant amount of users we currently have.
[22:53:46] <GothAlice> smsfail: That's… a non-issue when pre-aggregating in MongoDB.
[22:55:10] <GothAlice> Now, in terms of analytics and data processing, most of my "events" at work are click stats, and while I wasn't tracking temperature data from ocean buoys (like that last article) it was insanely useful in helping me figure out how to structure the data, and the impact of different approaches on MongoDB. (Data storage, index size, query performance.)
[22:56:08] <smsfail> GothAlice: my concern with Solution/Problem 1 is that I could have done that with Redis as well. But choose bitmaps as the performant solution. Something about storing a more elaborate structure feels . . . slower? less compact? than bitmaps
[22:56:09] <GothAlice> smsfail: What you currently have as your "key" in Redis would be the fields you use as your unique index in MongoDB pre-aggregation.
[22:56:18] <smsfail> GothAlice: ok. let me read further
[22:57:27] <GothAlice> (I.e. you would store each of the colon-separated things as a different field, or even just as a sub-field of _id. I.e. use custom IDs like {_id: {class: "web", action: "login", period: Date(some period)}}
[22:59:09] <GothAlice> drorh: Did the article help, and is the idea applicable to your use case? (I.e. only really try that hybrid approach if you can automate it at your ODM/driver layer!)
[22:59:33] <drorh> GothAlice: ive read it and now i lean towards the more data way.
[22:59:57] <smsfail> ok. makes sense. GothAlice I have to run to a few meetings. I will try and catch up with you later, but if I cant I will read up and sync up tomorrow or the next day. thanks for taking the time so far and I will leave this will open until I am back.
[23:00:00] <GothAlice> drorh: :) What often appears as "data duplication" in MongoDB is often actually query optimization.
[23:00:16] <GothAlice> smsfail: No worries. :) It's an interesting topic!
[23:00:30] <drorh> GothAlice: yeah. i was thinking about the contrary, and realized its not going to work
[23:01:20] <GothAlice> smsfail: One last link for you, if you're still here.
[23:02:23] <GothAlice> Erm; gist.github isn't loading for me, so never mind. XD
[23:15:08] <garbagegod> Having a few issues with queries using text indexes being slow
[23:15:26] <garbagegod> First, a query utilizing a text index usually takes 6+ seconds
[23:15:43] <garbagegod> Second, when it finally completes, none of the subdocuments are fetched
[23:15:47] <GothAlice> garbagegod: Pastebin your exact query and its .explain()?
[23:16:18] <drorh> GothAlice: Good IDE/GUI for mongo?
[23:16:47] <GothAlice> drorh: I don't use IDEs, nor graphical admin tools other than MMS. (And even then, I prefer a good old interactive shell.)
[23:17:42] <drorh> GothAlice: do u have a nice atricle about complex queries?
[23:17:46] <GothAlice> drorh: It's useful to not use one for educational purposes, as well. The amount learned is inversely proportional to the amount of scaffolding beneath you. ;)
[23:19:30] <garbagegod> But actually outputting the result set takes a very long time, and unfortunately so does loading it into my ODM (I assume that's why it's so slow...)
[23:20:01] <GothAlice> garbagegod: Indeed; you are not filtering the returned list of fields, thus you're getting a (likely) immense amount of data back.
[23:20:20] <GothAlice> garbagegod: Is the query still monstrously slow if you add a second argument to .find of {}?
[23:23:16] <GothAlice> garbagegod: Aye; that should select only _id for returning over the wire.
[23:23:30] <garbagegod> GothAlice: well what the eff...
[23:23:36] <GothAlice> drorh: Okay; so what's the question you are trying to ask that data?
[23:24:07] <garbagegod> GothAlice: does that include subdocuments?
[23:24:34] <GothAlice> garbagegod: Different drivers/ODMs may offer field filtering in different ways. MongoEngine offers a .exclude, .only, and .scalar methods on querysets for field selection. And no, it wouldn't. Only _id should be returned if you specify an empty field list.
[23:25:17] <garbagegod> GothAlice: This is from mongo on CLI
[23:25:50] <GothAlice> What happens when you run this in the shell: db.things.find({}, {}).next() ?
[23:26:23] <GothAlice> (I.e. how long does it take, and does it return anything other than a document containing an _id?
[23:26:54] <garbagegod> It executes quickly, it outputs all the fields of a single document within the collection, excluding subdocuments
[23:27:17] <Boomtime> a filter documents needs to not be empty
[23:27:42] <Boomtime> if you want only the _id then you will need to specify {_id:1}
[23:27:46] <drorh> GothAlice: "please give me all of the trips (outward+return operator+vendor+segment_cnt+dates+locations+summed_price (if group has price sum it in as well)) grouped by group"
[23:30:27] <garbagegod> I know a single collection's size shouldn't exceed like 16MB
[23:30:41] <garbagegod> I didn't expect this POS to grow as rapidly as it did
[23:31:15] <garbagegod> well fak. What should I do
[23:31:16] <GothAlice> 70k is up there. (My own worst-case subdocument nesting is for some forums that embed replies to a thread within the thread… and those don't have much more than a few hundred to a thousand, tops.) Organic growth is a terrible thing. Is there an example record you could pastebin?
[23:34:02] <Boomtime> a single document may not exceed 16MB
[23:34:12] <Boomtime> a collection can be whatever size you want
[23:34:16] <drorh> GothAlice: I don't mean grouping thechnically. these are actual groups of legs
[23:34:37] <drorh> GothAlice: they are already normalized in the doc
[23:35:20] <drorh> GothAlice: and, no. I am not familiar with those :P
[23:37:58] <GothAlice> drorh: You have some reading ahead of you, but aggregate queries are one of my favourite features. http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/ will give you an introduction from a relational perspective, http://malagastockholm.wordpress.com/2013/02/01/aggregation-queries-in-mongodb/ is a simple real-world example, http://www.thefourtheye.in/2013/04/mongodb-aggregation-framework-basics.html a more detailed example.
[23:38:13] <GothAlice> Finally, http://pauldone.blogspot.ca/2014/03/mongoparallelaggregation.html are some interesting ideas on optimizing the queries.
[23:58:02] <GothAlice> https://soundcloud.com/gothalice/sets/apex-willow — what I've been listening to for the last week or so. There are some oddities in there. (The MJ remix isn't great, but the vocal cutting was very well done. Yves Simon is completely out of place, and Ai Vist Lo Lop is just weird. ;)
[23:58:34] <drorh> in the morning wanna die, in the evnin wanan die; IF I AINT DEAD ALREADY: GIRL u know the reason why.