[02:14:08] <GothAlice> 1-5? Embedding may be worth it. There are some limitations when embedding that become important when you have two lists of sub-documents.
[02:14:35] <bros> So, does the schema I posted look flawed?
[02:15:51] <GothAlice> (Well, more than one.) Specifically, you can only $elemMatch on one at a time, it complicates aggregate queries (lots of unwinding and re-grouping), and there was something else I'm forgetting at the moment.
[02:16:27] <GothAlice> Stores and users may be embeddable. It simplifies some things, as often you're checking a login and also need the account if successful anyway, right? :)
[02:23:55] <bros> I'll take your tiredness into consideration :P
[02:24:14] <bros> I was going crazy with the indices too. :P
[02:24:47] <GothAlice> You're trying to put all eggs in one basket. It wasn't going to be query-able at all like that.
[02:24:59] <bros> At first I had too many baskets.
[02:25:10] <bros> It didn't feel right. I thought moving over from relational prevents that
[02:28:23] <GothAlice> Quick q, bros: within your nested model there, what did orders->status and batches->status represent? Why order_number and not order_id? Were you expecting orders embedded documents to have IDs?
[02:29:22] <GothAlice> Does one item have multiple barcodes? (UPC doesn't work that way, AFIK.)
[02:29:31] <bros> GothAlice, one item could have multiple barcodes.
[02:29:40] <bros> For example: 1 gallon sprayer body, plus the sprayer.
[02:29:50] <bros> order->status: either open, in progress, closed, etc.
[03:16:23] <GothAlice> Important note: only top-level Schemas get an _id automatically.
[03:16:28] <bros> what's the protocol: store or stores. barcode or barcodes
[03:16:34] <bros> yeah, i didn't know that. how do you generate one down the road?
[03:16:53] <GothAlice> I use singular to describe the schema, plural to name the collection. (A "collection" has many things, a schema describes a single thing.)
[03:17:16] <bros> so it should be schema.account, yes?
[03:18:53] <GothAlice> When appending a new value to an embedded list like that, use mongoose.Types.ObjectId() to generate an ID for the "embedded record".
[03:18:53] <bros> what's the rule to embedding? you shouldn't embed if it is over 100 rows?
[03:18:53] <GothAlice> (Having IDs like this isn't required per se, but it can be invaluable if you need to update nested documents. I do this on my forums to embed replies to a thread within a thread, while still allowing liking/editing/removal.)
[03:18:53] <bros> what are the consequences to too many indices?
[03:19:04] <GothAlice> Slower inserts/updates as more trees may need balancing.
[03:19:49] <bros> when do i decide when not to embed?
[03:19:55] <GothAlice> It should be kept small, it shouldn't fluctuate in size too much (because that requires moving data around), and you should almost always only embed documents that only make sense in the context of the parent. I.e. if you delete the parent, you want the nested values deleted too. When querying for a nested value, you always _also_ want the parent record. Etc.
[03:20:21] <bros> why not just break everything up into separate elements and search by IDs?
[03:20:23] <GothAlice> See also: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html <- an excellent short article which covers the subject from the perspective of migrating from SQL.
[03:20:47] <GothAlice> bros: Because it will require extra queries client-side, and you can introduce race conditions. MongoDB doesn't have joins.
[03:20:56] <GothAlice> Name added for emphasis on that. ;^P
[03:21:39] <GothAlice> http://mongoosejs.com/docs/guide.html was the only bit of Mongoose documentation I have ever read in my life, BTW. XP
[03:22:13] <GothAlice> Specifically the "sub docs" section. Haven't read the rest. ¬_¬
[03:22:44] <bros> what would happen if i kept my super embedded schema?
[03:24:22] <GothAlice> It wouldn't be query-able within MongoDB. This means you would pretty much need to load the entire record every time, to do anything. It would also require you to effectively re-save the entire record on every update. If it's possible to have two updates come in at the same time, in that scenario, good luck knowing how your data will end up.
[03:25:50] <GothAlice> I.e. you can't do a query inside one list of embedded documents and _another_ list at the same time.
[03:26:28] <bros> i don't really want to load all of the users for an account every time i need to load a user...
[03:26:35] <bros> what are the benefits to switching to mongo? i'm not currently seeing many
[03:29:40] <GothAlice> Include any other fields you want, including those you may need from the rest of the account information. Note, this $ operator is the thing you can only use once in a query, and why you can't just lump everything together.
[03:30:38] <bros> I think I might just be better off sticking to SQL...
[03:30:47] <GothAlice> The returned document from that would look like: {_id: ObjectId(…), users: [{email: "bobdole@whitehouse.gov", …}], …} — note, only one "user".
[03:30:54] <GothAlice> (Only the one that was queried for.)
[03:30:55] <bros> I don't see why 1/4th of my data should be embedded and the rest shouldn't be.
[03:31:47] <bros> what if I broke everything into models?
[03:31:59] <bros> https://gist.github.com/anonymous/3cfdf93c838c9f84effb like this
[03:34:18] <GothAlice> db.account.find({'user.email': /@gmail.com$/i}, {'stores.store_id': 1, 'stores.credentials': 1}) — give me every store ID and credential for each account with a user registered with a gmail e-mail address.
[03:34:47] <GothAlice> Pseudo-joins like that only work with embedded records.
[03:37:14] <GothAlice> You pointed out one thing that you were doing already that is perfect for MongoDB.
[03:37:19] <GothAlice> You were storing JSON data in a string.
[03:37:36] <GothAlice> In MongoDB, you don't need to do that. You just embed whatever structure you want, as needed.
[03:38:18] <GothAlice> You already had a type to differentiate, so you can know what fields to expect at any given time, if you want/need.
[03:38:46] <GothAlice> And then it'll be suddenly query-able, too.
[04:13:39] <MacWinner> can you have a subdocument with an objectid? eg, I want to have a main level document like this domain = { domainname: 'example.com', websites: [{name: 'site1', url: 'site1.com'}, [{name: 'site2', url: 'site2.com'}] }
[04:14:05] <MacWinner> if I want to reference the 'websites' attribute in another subdocument, what would be the best way to do this?
[04:17:50] <Boomtime> MacWinner: do you mean you want to store an ObjectID as the value of some other field? if so, yes, this is quite common
[04:19:23] <MacWinner> Boomtime, cool.. just wanted to make sure I wasn't doing some weird design practices.. like i have ads and campaigns that are part of a domain.. i want the campaign to reference ads
[07:28:54] <lessthanzero> are there any known pitfalls when dealing with nested arrays in mapReduce (I'm having problems emitting deep into arrays)
[10:24:00] <andefined> does anyone knows how to recover dbs after resrver restart?
[10:32:16] <kali> andefined: you're not supposed to have much to do. what are you seeing ?
[10:35:35] <gma> Are there any mongoid users here who know how to use `.and()` ? I'm struggling to make it produce the right query…
[10:41:00] <andefined> kali: i am seeing empty databases but also now i cant restart mongod
[10:42:16] <kali> andefined: if you can't restart mongod, you probaly can't see your database anywgat. you should get an error message in the log when you try to start it
[10:44:18] <andefined> kali: that my address is allready in use (27017)
[10:44:50] <kali> ok. so mongod is actually running already :)
[11:29:45] <ladessa-db> hi, pls...help me with this question https://stackoverflow.com/questions/28594558/mongodb-find-where-count-5/28595270?noredirect=1#comment45498328_28595270
[11:30:18] <ladessa-db> read the answer and my comments below
[11:39:26] <StephenLynx> hey, is there a download available for documentation? I got a pdf with a manual, but I'd like something for technical reference.
[14:50:44] <jaitaiwan> Depending on your implementation you can either iterate or theres probably a helper function to pull out all the results at once, although that could lead to some fun memory issuesxs.
[14:54:13] <evangeline_> jaitaiwan, thank you; one more question; how could I easily add new attributes to objects - where the old values should be overwritten if already exist?
[15:07:24] <GothAlice> Using full-text indexing that way is a bit abusive of the system. FTI does a lot more than one needs to answer the proposed question.
[15:08:29] <jaitaiwan> GothAlice: definitely. In my benchmarks for production, we foudn that text was many times faster than regex. So that's what we resorted to.
[15:08:57] <GothAlice> jaitaiwan: Indeed, there is that. Excluding regexen in the form /^…/ – prefix searches should still be relatively speedy.
[15:09:36] <jaitaiwan> Yeh, unfortunately prefix regexes weren't in the question
[15:10:13] <jaitaiwan> It's a shame that you can't have super fast regex indexes without prefixes
[15:12:04] <GothAlice> Of course, if that type of querying is required it might even be worthwhile to adapt the data. Split e-mail addresses into two fields, recipient and server. Index server and your queries become fast indexed hash comparisons.
[15:12:10] <GothAlice> To take things to the extreme, a bit. ;)
[15:13:04] <jaitaiwan> Haha true that. Not fun for the application logic though I guess
[15:13:27] <GothAlice> With an appropriate ORM/ODM/DAL it would be seamless for the rest of the application.
[15:13:49] <GothAlice> (In Python I'd have an "email" @property which re-combines the two dependant fields, the app would just use that where needed.)
[15:15:56] <jaitaiwan> Man, that's one of the reasons I love python *sigh*.
[15:16:13] <GothAlice> Heh. That @property would also allow assignment. >:3
[15:17:39] <jaitaiwan> I've only recently been getting back into python after a 5 year sabatical. Hardest thing is transitioning from PHP cause I know it so well. But that's probably #offtopic for this room.
[15:23:36] <cheeser> i'm assuming you've seen the fractal of bad design blog post?
[15:23:44] <GothAlice> It's a reasonable template language. It's not a real general purpose language, however, and its fundamental design encourages anti-patterns. Also the fractal.
[15:26:48] <jaitaiwan> Sounds cool. I think PHP is making improvements, all thanks to the guys at facebook with their hack lang and HHVM. In some benchmarks PhalconPHP for instance beats a lot of python mini web framworks
[15:27:15] <jaitaiwan> I still can't get past meaningful whitespace of python though
[15:27:20] <GothAlice> Python (specifically the RPython runtime written by the Pypy folks) runs PHP faster than HHVM.
[15:27:48] <GothAlice> http://hippyvm.com < the runtime
[15:33:01] <GothAlice> Heh, cult. Casual dismissal of others opinions. Not the basis of a logical argument or constructive discussion, sadly. :(
[15:33:07] <jaitaiwan> Bit of a hipster cult behind node these days too :P
[15:33:24] <StephenLynx> thats true for anything web and new.
[15:33:44] <GothAlice> jaitaiwan: Clueless (https://gist.github.com/amcgregor/016098f96a687a6738a8 and https://gist.github.com/amcgregor/a816599dc9df860f75bd) may be amusing to you. :)
[15:34:01] <jaitaiwan> Go lang didn't really end up getting a cult following I don't think
[15:34:04] <StephenLynx> but check this out about python: "Van Rossum is Python's principal author, and his continuing central role in deciding the direction of Python is reflected in the title given to him by the Python community, benevolent dictator for life"
[15:34:17] <StephenLynx> I consider python, PHP and ruby to form the web trinity of shit.
[15:34:47] <StephenLynx> all of them have bad syntax, are slow as hell and really have no purpose to be used when there are better tools.
[15:35:00] <GothAlice> StephenLynx: BDFL is a typical title in long-running open source projects. Note the Python Enhancement Proposal process for a structured method to make changes to the language. Vs, I don't know, throwing things at the wall to see what sticks a la PHP and JS. (The full-page numbered list for evaluating automatic typecasting in JS is kinda nuts.)
[15:35:42] <StephenLynx> also, python is so bad designed, it don't even have retro compatibility.
[15:35:52] <GothAlice> StephenLynx: My Python code is faster than equivalent C code. My HTTP/1.1 server supports 10,000 concurrent requests per second and compiles to 171 Python opcodes. Attempting to use a C extension for header parsing actually slowed it down…
[15:36:54] <GothAlice> I'm guessing you haven't really investigated the Pypy (Python-on-Python) JIT compiler.
[15:37:17] <StephenLynx> I got my benchmarks from here http://benchmarksgame.alioth.debian.org/u64/compare.php?lang=gcc&lang2=v8
[15:38:05] <StephenLynx> I assume the cause of some python code being faster than some C code is very badly written C code rather than some superb python VM or JIT compiler.
[15:38:53] <StephenLynx> and speaking about web-servers, python does not support non-blocking IO, if I'm not mistaken.
[15:38:57] <GothAlice> http://morepypy.blogspot.ca/2011/02/pypy-faster-than-c-on-carefully-crafted.html and http://morepypy.blogspot.ca/2011/08/pypy-is-faster-than-c-again-string.html pardon the potentially borked images on those pages.
[15:39:06] <GothAlice> StephenLynx: It supports non-blocking IO in numerous ways.
[15:39:38] <GothAlice> GEvent, native coroutines, epoll/kqueue, etc., etc.
[15:40:12] <jaitaiwan> One of the key things to these sorts of arguments is that we developers are passionately romantic about our languages in some ways. Very hard to be objective, especially when benchmarks vary so widely haha.
[15:41:10] <StephenLynx> "Hence, PyPy 50% faster than C on this carefully crafted example. The reason is obvious - static compiler can't inline across file boundaries." Interpretation strength vs compilation weakness, nothing exclusive to python on the first example
[15:41:27] <StephenLynx> " This is clearly win for dynamic compilation over static - the sprintf function lives in libc and so cannot be specializing over the constant string, which has to be parsed every time it's executed." same case.
[15:41:52] <StephenLynx> so these two links mean nothing in favor of python, just in favor of interpreted tools.
[15:42:33] <GothAlice> Specifically, the Python implementation written in Python and its ability to deeply understand and optimize your code, including JIT compilation to machine code.
[15:42:48] <StephenLynx> which is true for any JIT.
[15:43:04] <StephenLynx> and from what I see, the V8 is way, way ahead python.
[15:43:05] <GothAlice> Not really. Pypy's JIT is somewhat unique.
[15:44:07] <StephenLynx> http://benchmarksgame.alioth.debian.org/u64/compare.php?lang=python3&lang2=v8 these benchmarks show the distance between V8 and python is greater than the distance between C and V8.
[15:44:23] <GothAlice> Python 3, which is slower than other versions.
[15:44:28] <jaitaiwan> GothAlice: got through some of your links. You have WAAAY too much time on your hands
[15:46:13] <GothAlice> https://gist.github.com/amcgregor/405354 compares operations between Python 2.7 and 3.2 (in most cases, a version two major versions behind).
[15:46:37] <GothAlice> There are two separate things at play here: the Python syntax, and implementation details for a particular runtime.
[15:46:58] <GothAlice> Python 2 is also supported. Both are offered to let you pick between advanced features or greater raw performance.
[15:47:37] <StephenLynx> exactly, they couldn't make it right so they fragmented it.
[15:47:40] <jaitaiwan> I have to say StephenLynx. I do like node but gee the callback hell gets me. Promises are good but their a bit of a mind-screw sometimes
[15:47:47] <StephenLynx> python is the worse when it comes down to this.
[15:48:00] <GothAlice> Worse when it comes to callback hell?
[15:48:09] <StephenLynx> callback hell is just bad code.
[15:48:21] <StephenLynx> you don't have to have it if you write good code.
[15:49:34] <StephenLynx> and when you get benchmarks, node/io dwarfs python.
[15:49:50] <jaitaiwan> Mmmk... If I have to open 3 files, to do one operation, I'm seeing callback hell as gonna happen (Without promises obviously).
[15:50:09] <GothAlice> Oh, Pypy also recently added automatic STM (software transactional memory) features. The biggest unique feature of Pypy is that the compiler acts as a filtering pipeline. Enabling STM simply adds a filter layer to the compiler. You have pluggable garbage collection schemes, also a filter, and the intermediate representation is a flow graph of your entire application with multiple different back-end compilers. I.e. compile to C, or
[15:50:10] <GothAlice> compile to .NET, or compile to JVM, or compile to JS. Yes, JS.
[15:50:13] <StephenLynx> unless you make it happen.
[15:50:58] <StephenLynx> building to other languages only adds to the argument the language itself is useless.
[15:51:00] <jaitaiwan> What size js applications do you work on?
[15:51:22] <GothAlice> StephenLynx: So Java is useless because Google are compiling it to JS?
[15:52:51] <GothAlice> http://www.rfk.id.au/blog/entry/pypy-js-first-steps/ — see also: http://www.pypyjs.org/demo/ (have some Python in your web browser)
[15:53:23] <StephenLynx> jaitaiwan this is my biggest project so far https://gitlab.com/mrseth/bck_lynxhub
[15:53:25] <GothAlice> The importance of this work isn't that hey, you can now run Python in your browser, but rather the improvements this project brought to asmjs and Emscripten.
[15:55:19] <GothAlice> (https://www.rfk.id.au/blog/entry/pypy-js-faster-than-cpython/ is an amusing instance of a benchmark running faster under Pypy.js than CPython. Also links to some of those improvements to other packages I mentioned.)
[15:55:46] <jaitaiwan> StephenLynx: well structured
[15:56:03] <StephenLynx> told you, callback hell is just bad code.
[15:56:46] <jaitaiwan> Its natural progression of the language which has required an elegant solution
[15:56:46] <StephenLynx> I got my standards from https://github.com/felixge/node-style-guide
[15:57:41] <StephenLynx> not that I defend javascript as a good language. it is hilariously inconsistent and you need to use strict mode and a lint to keep on track.
[15:58:35] <GothAlice> A completely broken object model works for me. :)
[15:59:11] <StephenLynx> that line is just intentionally bad written code.
[16:02:09] <jaitaiwan> The prototype object system takes a fair bit to get used to when you come from a c background. Not that its technically a con for the language itself.
[16:02:27] <GothAlice> The "what not to do in JavaScript" conversation, however, never really ends. It's a minefield that requires a rather substantial head space to navigate. Core language features are inherently broken, requiring non-obvious workarounds and heavily defensive code. Or, even better, automated conversion tools (CoffeeScript) to gloss over the multitude of issues for you. (Which is not really any better than going from Python->JS.)
[16:05:31] <GothAlice> I run many servers with it. I linked to my HTTP/1.1 server already. I also run XMPP, a secure pub/sub proxy through to MongoDB's capped collections, and MUSHes (telnet-things), amongst others.
[16:05:58] <StephenLynx> you got a benchmark of pypy against V8?
[16:06:06] <GothAlice> Python (Pypy runtime again) is used for all of the management scripts, and even the entire package management system of the distro I use, Gentoo. This automation (also using MongoDB) runs a nearly 2000-node cluster.
[16:06:06] <ehershey> I have stumbled into #javascript
[16:08:12] <StephenLynx> pretty much like my relation with the java community.
[16:08:27] <ehershey> much easier to discuss it amongst the database community
[16:08:52] <GothAlice> StephenLynx: http://blog.kgriffs.com/2012/11/13/python-vs-node-vs-pypy-benchmarks.html compares them, sorta. It's benchmarking gibberish in most of the graphs. (wsgiref = unoptimized HTTP server for debugging) The Gevent vs. Node.js req/sec tests indicate that for 64 KiB responses Njode.js and Gevent are at par, with Node.js having higher standard deviation.
[16:08:57] <StephenLynx> yeah, most language communities are eco chambers because most programmers don't step outside their zone of confort.
[16:09:16] <GothAlice> (It's also a really old article, not taking into account current Pypy optimizations.)
[16:09:23] <StephenLynx> I could easily use java for fucking everything. instead I went out of my way to learn from javascript to C++
[16:12:14] <GothAlice> Ignoring the performance bits on http://www.cdotson.com/2014/08/nodejs-vs-python-vs-pypy-a-simple-performance-comparison/ (since his method was flawed) the memory comparison remains valid, however. Node uses multiples of the amount of RAM that Python does on the same problem, until the problem scope grows to the maximum.
[16:12:53] <GothAlice> Pypy (due to needing to graph the whole problem at the start) starts off with ludicrous memory usage, but it flips at the mid-point of problem difficulty to being less memory-hungry than the others.
[16:14:21] <StephenLynx> but, it is the most we have. you could come up with your own benchmark to try and show python is not completely obsolete and redundant as technoloty when it comes down to web.
[16:16:29] <GothAlice> jaitaiwan: Of course. Hopefully each also factors in finding the right solution for a problem, too. (I've written stream filters in Brainfuck before… right tool for the right job. ;)
[16:17:13] <jaitaiwan> Yeh exactly. Anyway great discussion StephenLynx and GothAlice. Its about 2am in my timezone so its time for some much needed rest haha
[18:18:45] <shlant1> poll: anyone using sharding with Apache MEsos?
[18:19:01] <shlant1> I am trying to determine if it's even doable/worth looking into
[18:28:59] <MacWinner> if I have 3 files with different filenames, but the same md5 hash of the data (ie the same data), is there some best practice on storing them without duplication?
[18:29:33] <MacWinner> i feel like the chunks reference back to the files.. rather than files referencing the chunks
[18:30:06] <GothAlice> MacWinner: That's due to the way they're queried. More often you're looking for all of the chunks for a file and you already know the file ID.
[18:30:34] <MacWinner> seems like I might need a 3rd collection then?
[18:31:06] <GothAlice> MacWinner: I abstract GridFS with a metadata collection storing the actual "file metadata" and GridFS storing BLOB data without metadata (i.e. no file names). The metadata collection then references the ID of the GridFS BLOB to use, which means I can de-duplicate without issue.
[18:31:40] <MacWinner> GothAlice, cool.. sounds good. I was planning on something like that. just wanted to make sure I wasn't missing some built-in feature
[18:39:27] <MacWinner> GothAlice, do you typicall split out your gridfs database from the rest of your app data onto a different mongo cluster?
[18:40:27] <StephenLynx> I would save the file regularly then have a document to track it's aliases and md5.
[18:40:29] <GothAlice> Depends on the application. With my 26 TiB Exocortex project, yeah, the BLOB data is separate. It'd be flatly unqueryable if I didn't do that. For smaller projects where the dataset (even with BLOBs) still fits in RAM, nah.
[18:41:18] <GothAlice> StephenLynx: Most filesystems behave badly with large numbers of files in one inode (directory), which then adds the complication of hierarchical organization, usually by substring prefixes. This way lies madness.
[18:41:41] <GothAlice> MacWinner: Yeah, it's a transparent proxy that records every digital bit of information I touch, and has been doing so since 2001.
[18:42:13] <StephenLynx> GothAlice what purpose this project serves?
[18:42:30] <wayne> GothAlice that sounds interesting
[18:42:40] <GothAlice> StephenLynx: GNU/Linux isn't a filesystem. ext* filesystems behave _terribly_ (and you can run out of inodes! `free -i` to check), reiserfs is better (no limit), but directory listings can consume huge amounts of RAM when you have millions of files. Others get worse.
[18:42:46] <MacWinner> does it record when you touch the exocortex?
[18:43:28] <StephenLynx> and yeah, I misread when you said filesystem.
[18:45:44] <GothAlice> Exocortex serves several purposes: it's a great dataset for natural language processing and personal AI (computer vision, fact extraction, etc.) research. It's a development playground to try out new technologies, i.e. I added full-text indexing and compression to MongoDB 6 years ago. :)
[18:47:00] <GothAlice> It also serves an interesting purpose in predictive linking, i.e. I Google something and ignore the result page. Within 30 seconds I get two links pushed to my browser that are the "best" result, taking into consideration my active project, previous related searches, etc. (And since they were already fetched, no need to fetch them again.) Exocortex visited each result on the first page and performed its own ranking on them.
[18:47:58] <StephenLynx> and currently it holds 26 terabytes of data?
[19:01:44] <StephenLynx> going to look into that. storing 26tb for just 5 bucks per month got my attention
[19:01:58] <GothAlice> T'was a bit of a frankenstein monster to get MongoDB on-disk stripes to back up that way, but it works. (A point-in-time freezing filesystem like ZFS makes this much easier.)
[19:08:31] <MacWinner> i wonder how they compress down video files
[19:08:42] <GothAlice> They use bzip2 compression, FYI.
[19:09:12] <GothAlice> Doesn't help that 80% of my dataset is xz compressed…
[19:09:18] <GothAlice> Took three months to do the initial backup.
[19:11:01] <MacWinner> out of curiousity.. any recommended mongodb hosting provivders?
[19:11:07] <MacWinner> if I don't want to have to create my own cluster
[19:11:18] <GothAlice> MacWinner: Create your own cluster and slap MMS on it.
[19:11:26] <GothAlice> "Cloud" providers will empty your wallet.
[19:11:29] <uizouzoitzt> GothAlice: you are an idiot
[19:12:09] <StephenLynx> alice, mind linking your github account?
[19:12:19] <MacWinner> GothAlice, i figure I can do that later once hosted mongo costs become an issue.. right now I depend on reliability and I don't quite trust myself on it
[19:12:22] <StephenLynx> I got 78 results for "clueless" and I'm lazy.
[19:13:48] <GothAlice> StephenLynx: Clueless is an unpublished WIP toy lang of mine. Most of its bits are in gists. https://gist.github.com/amcgregor/016098f96a687a6738a8 (docs) https://gist.github.com/amcgregor/a816599dc9df860f75bd (some sample code)
[19:14:08] <StephenLynx> I know, I just looked for it to try and find you
[19:14:33] <GothAlice> The unpublished part of that would certainly make that difficult. XP
[19:14:42] <wayne> StephenLynx: another riddle for you! this is a player: {games: [{scores: [8,9,13213213213], cheat: true}, {scores: [1,2,3], cheat: false}]}
[19:14:56] <wayne> a player has many games, games have many scores and may have been cheated or not
[19:15:13] <wayne> how do i sort players by highest scores that weren't cheated?
[19:15:16] <giowong> im getting a http://localhost:3000/[object%20Object] error but my json returns correctly, should i just ignore this?
[19:15:24] <StephenLynx> but you had a public repo, didn't you?
[19:15:32] <wayne> i know mongodb supports sort("a.b.c")
[19:15:42] <wayne> but the predicate of checking that cheat == false is boggling
[19:16:06] <GothAlice> StephenLynx: Not yet. I ran into a hiccup with the dynamic EBNF parser and flow grapher and Real Life™ side-tracked me.
[19:19:47] <GothAlice> wayne: Store a "top score" field on the player and have your updates (which add scores or games) only update that value if a) it's smaller than the maximum for that round/game and b) that round/game wasn't cheat.
[19:28:50] <windsurf_> which operator do I use to find all event objects whose categories array (array of _id) contains at least one matching _id in an array I have to compare it against?
[19:29:07] <GothAlice> wayne: I've been dox'd by a gaming group in the past. Always fun to have "that talk" with employers.
[19:29:12] <StephenLynx> I wouldn't expect you to use the same image you use on github on linkedin
[19:29:21] <windsurf_> use case: user passes array of category ids he wishes to search within, need to find all events that are tagged with any of those categories
[19:56:37] <giowong> so i have a array of objects, each object has a unique name, but have similar categories attribute
[19:57:33] <giowong> and each object also has a count
[19:57:48] <giowong> i need to sum up the total counts for each category
[19:58:07] <giowong> should i do it in the back or front after i pass the whole kson array
[20:15:02] <fewknow> giowong: not sure I follow..you have an array of sub-documents? each has a count and you want to sum them up ?
[20:17:48] <jake__> Hi. I'm trying to optimize a mapreduce job. Looking at db.currentOp() I see that "planSummary" : "COLLSCAN". Does this mean it isn't using an index for the query?
[20:28:11] <MacWinner> fewknow, i have one database that will have a bunch of configuration information that will rarely change.. but another set of data that is activity data related to the configuration information
[20:28:32] <MacWinner> the activity and configuration info is heavy read... teh activity is also heavy write
[20:29:16] <fewknow> k....locking doesn't prevent readds....it just slows down writes
[20:30:29] <kali> fewknow: a read does not lock another read, but a write will lock all reads
[20:30:30] <fewknow> the lock% you see when you look at mongostat is only the write lock
[20:30:38] <fewknow> its doesn't include the read lock
[20:32:16] <fewknow> kali: yes i agree with that..
[20:33:05] <fewknow> but a read won't fail on a lock....and unless and unless you are hitting 2500 writes a second the read will be sub second most of the time
[20:33:26] <fewknow> i guess it depends on your scaling needs
[20:33:45] <fewknow> but I never worry about reads due to a lock...espically since the working set of data is in memory
[20:33:51] <fewknow> and you can cache on top of that
[20:33:59] <fewknow> reads should not be a huge concern
[20:34:09] <fewknow> slow writes should be the concern
[20:53:16] <cook01> When you’re dealing with large collections (each document is fairly small) and trying to efficiently page over the entire collection, then is the recommended way to page with $sort and $limit? http://docs.mongodb.org/manual/reference/operator/aggregation/sort/
[20:54:13] <kali> cook01: well, even with an index for $sort, $limit will have a linear complexity
[20:54:27] <kali> linear on the number of documents you skip
[20:54:55] <kali> cook01: so it may be more efficient to use the sort key to pull the next page
[20:55:27] <GothAlice> cook01: I'm not sure of the technical details in MongoDB, however as kali touches on, skipping will still require ordering the result set and massaging the records you're skipping. (Even the most efficient method of doing this, sorted on _id, would be a "log n" situation as it walks the index b-tree.)
[20:56:57] <kali> yeah, log(number of doc in the database) but at least it means you 1000e page will have the same performance as the first
[22:11:44] <gansbrest> one other question, what's the best strategy to update beta replica set with production data? We have similar needs for solr and we ended up sending multiple writes to both though proxy library. Maybe there is something simpler for mongo?
[22:52:56] <dacuca> how can I read the oplog? I’d like to use it to index the data in a FTS engine
[23:34:28] <GothAlice> dacuca: db.oplog.rs.find() on the "local" database
[23:35:14] <GothAlice> https://github.com/cayasso/mongo-oplog is a handy tool to abstract it a bit
[23:37:51] <GothAlice> dacuca: As a side note, I hope you are aware of http://docs.mongodb.org/manual/reference/operator/meta/comment/ which allows your application to send extra information to the back-end you seem to be writing.