PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 19th of February, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:45:32] <bros> http://ideone.com/onOOKt I think my MySQL ways are hurting me. Am I doing this right?
[02:07:49] <GothAlice> bros: In this situation, why is account separate from user?
[02:08:07] <GothAlice> Looks like a user only ever has one. Instead, you might consider embedding the value.
[02:10:08] <bros> GothAlice, http://ideone.com/ai7Z26 like this?
[02:11:44] <GothAlice> Ah, multiple users on one account? I don't expect that. ;)
[02:12:13] <bros> What should I do then? No embedding?
[02:12:25] <GothAlice> If you have multiple users with one account, yes, embedding.
[02:12:39] <GothAlice> Wait.
[02:12:40] <bros> Is what I linked correct?
[02:12:48] <bros> 1 account, many users, many stores
[02:12:57] <GothAlice> How many "users" are there on a typical "account"?
[02:13:01] <bros> <5
[02:13:29] <bros> 1 account, 1-5 users, 1-5 stores, 300-500 barcodes, 1k+ orders/scans/stats/batches
[02:13:33] <bros> per account
[02:13:45] <GothAlice> Perfect, those ranges give the answers on embedding or not.
[02:13:50] <bros> do embed?
[02:14:08] <GothAlice> 1-5? Embedding may be worth it. There are some limitations when embedding that become important when you have two lists of sub-documents.
[02:14:35] <bros> So, does the schema I posted look flawed?
[02:15:51] <GothAlice> (Well, more than one.) Specifically, you can only $elemMatch on one at a time, it complicates aggregate queries (lots of unwinding and re-grouping), and there was something else I'm forgetting at the moment.
[02:16:25] <bros> So, denormalize it?
[02:16:27] <GothAlice> Stores and users may be embeddable. It simplifies some things, as often you're checking a login and also need the account if successful anyway, right? :)
[02:16:56] <bros> GothAlice, http://ideone.com/g1qejW
[02:17:21] <bros> This is all embedded within account.
[02:18:17] <GothAlice> On item_barcodes are you storing the item_id/last_update/quantity in addition to barcode_id as a caching mechanism?
[02:18:29] <GothAlice> Or would that represent your "joining table" to use SQL-talk?
[02:19:04] <GothAlice> (MongoDB makes it hard to tell the difference. ;)
[02:19:50] <bros> GothAlice, That'd be a parent-child relationship from what I can see. What could I do? Make barcodes have an array of item_ids?
[02:20:12] <bros> I'm going to want to be resolving item_ids to barcodes, a lot.
[02:20:46] <GothAlice> Without reference to a particular store? (IDs should be unique, so there would be no need?)
[02:21:45] <bros> I see what you are saying.
[02:21:52] <bros> I should make barcodes account wide, then item_barcodes account wide. No?
[02:23:01] <GothAlice> Uno momento.
[02:23:10] <bros> thank you so much, by the way.
[02:23:41] <GothAlice> You're overly nesting and overly complicating. I'm about to massacre your last paste. ;)
[02:23:47] <GothAlice> Also no worries. :)
[02:23:48] <bros> Please do!
[02:23:55] <bros> I'll take your tiredness into consideration :P
[02:24:14] <bros> I was going crazy with the indices too. :P
[02:24:47] <GothAlice> You're trying to put all eggs in one basket. It wasn't going to be query-able at all like that.
[02:24:59] <bros> At first I had too many baskets.
[02:25:10] <bros> It didn't feel right. I thought moving over from relational prevents that
[02:28:23] <GothAlice> Quick q, bros: within your nested model there, what did orders->status and batches->status represent? Why order_number and not order_id? Were you expecting orders embedded documents to have IDs?
[02:29:22] <GothAlice> Does one item have multiple barcodes? (UPC doesn't work that way, AFIK.)
[02:29:31] <bros> GothAlice, one item could have multiple barcodes.
[02:29:40] <bros> For example: 1 gallon sprayer body, plus the sprayer.
[02:29:50] <bros> order->status: either open, in progress, closed, etc.
[02:29:53] <bros> same with batch
[02:29:58] <GothAlice> Are those two related?
[02:30:16] <GothAlice> I.e. when the batch goes "closed", so do the order->status'es?
[02:30:23] <bros> Yes.
[02:30:28] <bros> Batch is made up of orders.
[02:30:34] <GothAlice> Cool. No need to duplicate it, then. :)
[02:30:39] <bros> order_number is a string. eBay has order numbers like '100-1000', as well as Amazon
[02:31:16] <bros> order_id was going to refer to the Order object of the model
[02:32:16] <GothAlice> Which is not represented here?
[02:32:25] <bros> no, it is
[02:32:27] <bros> account.stores.orders
[02:32:30] <GothAlice> Cool.
[02:32:42] <bros> anything ObjectId is represented here
[02:33:05] <bros> GothAlice, on a scale from 1/10, how complex is this schema? like, 4?
[02:33:19] <GothAlice> As you have it vs. how it needs to be, 8.
[02:33:22] <GothAlice> :P
[02:33:33] <bros> it needs to be more complex?!
[02:33:41] <GothAlice> No, it's too complex. XP
[02:33:46] <bros> Perfect! haha
[02:33:48] <bros> great news!
[02:33:55] <bros> I'm glad I'm learning this mistake early.
[02:34:14] <bros> After good schema and proper indices, what else is there to a properly good Mongo setup?
[02:35:00] <GothAlice> Query design.
[02:35:09] <GothAlice> What are the questions you want your data to answer?
[02:35:25] <bros> I think I might be in the clear here. I'm dealing with very few strings, and a lot of straight forward IDs.
[02:36:16] <GothAlice> It's still important to plan for those queries. How one models in MongoDB is heavily effected by those questions.
[02:36:36] <GothAlice> Doubly so for indexes.
[02:36:38] <bros> Could I see what you have so far? I'm really axious.
[02:36:42] <bros> anxious*
[02:37:44] <GothAlice> bros: https://gist.github.com/amcgregor/e7cd2c9995fdd57b82da
[02:37:56] <GothAlice> I'll continue editing, and I've added some notes. Start with line 51. ;)
[02:38:03] <bros> do you see line 19
[02:38:11] <bros> credentials is actually a JSON string, is that ok/smart?
[02:38:18] <bros> I didn't want to have 10 different models for 10 different kinds of integrations
[02:38:45] <bros> quantity is not associated with an order
[02:38:46] <GothAlice> Does Mongoose not have a way of saying "this is a dynamic embedded document"?
[02:38:59] <GothAlice> MongoDB itself doesn't enforce schema.
[02:39:16] <GothAlice> item_barcodes represents stock?
[02:39:43] <bros> item_barcodes represents the barcodes for the item. quantity is how many times they are required
[02:39:56] <bros> for example: item_id 123456 could be an item with a variant: "powder, 2 bottles"
[02:40:02] <bros> so you'd need the barcode, with a quantity of 2
[02:42:09] <GothAlice> Right. So we're back to the questions thing. What do you need this data to answer?
[02:42:19] <bros> "This item needs these barcodes"
[02:42:48] <GothAlice> That's not a question. :P
[02:42:58] <bros> What barcodes does item X need?
[02:43:00] <bros> so it should be an array
[02:43:08] <GothAlice> It's also way too generic. What about the barcodes do you need to know?
[02:43:16] <GothAlice> Everything all at once?
[02:43:26] <bros> everything the barcode model I showed represents
[02:44:33] <GothAlice> Cool. What other questions?
[02:44:46] <GothAlice> (For all of the model, not just this aspect.)
[02:44:57] <bros> What account has what stores
[02:45:06] <bros> Those stores are responsible for certain items
[02:46:12] <GothAlice> Do you ever need the answer to: "What are all of the items for all of the stores for account X?"
[02:46:54] <bros> Possibly.
[02:47:00] <bros> If so, what needs to be done?
[02:47:01] <bros> Index?
[02:47:02] <bros> Embed?
[02:48:40] <GothAlice> Just planning. Embedding isn't an all-or-nothing thing.
[02:49:46] <GothAlice> I may have already asked this, but how many barcodes per item?
[02:50:35] <bros> GothAlice, 1 or many.
[02:50:49] <bros> item_id = item_id or variant_id
[02:53:18] <GothAlice> Hmm. Barcodes shared between items? (For some reason I'm not seeing an "item"-specific schema here.)
[02:53:34] <bros> Barcodes shared between items, yes.
[02:55:30] <GothAlice> Rats; that makes some of the querying more painful. ^_^
[02:55:55] <bros> lmfao my co-workers says rats.
[02:56:52] <GothAlice> Oh, possibly lastly, how many orders per batch on average?
[02:56:57] <bros> 9
[02:57:16] <bros> account_schema.index({ 'users.email': 1, type: -1 }) could you explain that that does?
[02:57:35] <GothAlice> Creates a compound index.
[02:57:45] <bros> Compound?
[02:57:49] <bros> oh, fuck
[02:57:52] <GothAlice> First on users.email, then on type in descending order.
[02:58:33] <bros> how do I tell mongo: "be ready to sometimes sort users by email"?
[02:58:34] <bros> type doesn't exist
[02:58:36] <bros> ignore type
[02:59:53] <bros> ah, i figured it out
[03:01:56] <GothAlice> Home stretch. :)
[03:04:59] <bros> excited
[03:05:02] <bros> my biggest problem is
[03:05:08] <bros> i've never queried in mongo before haha
[03:05:14] <bros> can i see where you're at?
[03:06:12] <GothAlice> https://gist.github.com/amcgregor/e7cd2c9995fdd57b82da
[03:07:12] <bros> oh no
[03:07:18] <bros> an order doesn't always have to be in a batch
[03:07:33] <GothAlice> … wat?
[03:07:38] <GothAlice> Even if it's a batch of 1?
[03:07:42] <bros> yup
[03:07:59] <bros> there'd be no batch then
[03:08:25] <bros> accounts have stores and batches. batches have orders. stores have barcodes, orders.
[03:09:07] <bros> https://gist.github.com/anonymous/07dd7bec6f60f9aa5e86
[03:12:25] <GothAlice> Who has item_barcodes? ;)
[03:12:53] <bros> Should be the store, really.
[03:13:13] <bros> item_id is store specific
[03:14:12] <GothAlice> https://gist.github.com/amcgregor/e7cd2c9995fdd57b82da updated
[03:15:30] <bros> so item_barcodes remains unchanged?
[03:16:09] <GothAlice> store_id added.
[03:16:23] <GothAlice> Important note: only top-level Schemas get an _id automatically.
[03:16:28] <bros> what's the protocol: store or stores. barcode or barcodes
[03:16:34] <bros> yeah, i didn't know that. how do you generate one down the road?
[03:16:53] <GothAlice> I use singular to describe the schema, plural to name the collection. (A "collection" has many things, a schema describes a single thing.)
[03:17:16] <bros> so it should be schema.account, yes?
[03:17:21] <bros> schema.item_barcode
[03:18:53] <GothAlice> When appending a new value to an embedded list like that, use mongoose.Types.ObjectId() to generate an ID for the "embedded record".
[03:18:53] <bros> what's the rule to embedding? you shouldn't embed if it is over 100 rows?
[03:18:53] <GothAlice> (Having IDs like this isn't required per se, but it can be invaluable if you need to update nested documents. I do this on my forums to embed replies to a thread within a thread, while still allowing liking/editing/removal.)
[03:18:53] <bros> what are the consequences to too many indices?
[03:19:04] <GothAlice> Slower inserts/updates as more trees may need balancing.
[03:19:18] <cheeser> and deletes
[03:19:49] <bros> when do i decide when not to embed?
[03:19:55] <GothAlice> It should be kept small, it shouldn't fluctuate in size too much (because that requires moving data around), and you should almost always only embed documents that only make sense in the context of the parent. I.e. if you delete the parent, you want the nested values deleted too. When querying for a nested value, you always _also_ want the parent record. Etc.
[03:20:21] <bros> why not just break everything up into separate elements and search by IDs?
[03:20:23] <GothAlice> See also: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html <- an excellent short article which covers the subject from the perspective of migrating from SQL.
[03:20:47] <GothAlice> bros: Because it will require extra queries client-side, and you can introduce race conditions. MongoDB doesn't have joins.
[03:20:56] <GothAlice> Name added for emphasis on that. ;^P
[03:21:39] <GothAlice> http://mongoosejs.com/docs/guide.html was the only bit of Mongoose documentation I have ever read in my life, BTW. XP
[03:22:13] <GothAlice> Specifically the "sub docs" section. Haven't read the rest. ¬_¬
[03:22:44] <bros> what would happen if i kept my super embedded schema?
[03:24:22] <GothAlice> It wouldn't be query-able within MongoDB. This means you would pretty much need to load the entire record every time, to do anything. It would also require you to effectively re-save the entire record on every update. If it's possible to have two updates come in at the same time, in that scenario, good luck knowing how your data will end up.
[03:25:50] <GothAlice> I.e. you can't do a query inside one list of embedded documents and _another_ list at the same time.
[03:26:28] <bros> i don't really want to load all of the users for an account every time i need to load a user...
[03:26:35] <bros> what are the benefits to switching to mongo? i'm not currently seeing many
[03:27:07] <GothAlice> Ah, but you can do that.
[03:27:23] <bros> by un-embedding?
[03:27:28] <GothAlice> Nope. Uno momento.
[03:29:07] <GothAlice> db.account.find({'users.email': 'bobdole@whitehouse.gov'}, {'users.$': 1})
[03:29:40] <GothAlice> Include any other fields you want, including those you may need from the rest of the account information. Note, this $ operator is the thing you can only use once in a query, and why you can't just lump everything together.
[03:30:38] <bros> I think I might just be better off sticking to SQL...
[03:30:47] <GothAlice> The returned document from that would look like: {_id: ObjectId(…), users: [{email: "bobdole@whitehouse.gov", …}], …} — note, only one "user".
[03:30:54] <GothAlice> (Only the one that was queried for.)
[03:30:55] <bros> I don't see why 1/4th of my data should be embedded and the rest shouldn't be.
[03:31:09] <GothAlice> To pre-join.
[03:31:47] <bros> what if I broke everything into models?
[03:31:59] <bros> https://gist.github.com/anonymous/3cfdf93c838c9f84effb like this
[03:34:18] <GothAlice> db.account.find({'user.email': /@gmail.com$/i}, {'stores.store_id': 1, 'stores.credentials': 1}) — give me every store ID and credential for each account with a user registered with a gmail e-mail address.
[03:34:47] <GothAlice> Pseudo-joins like that only work with embedded records.
[03:37:14] <GothAlice> You pointed out one thing that you were doing already that is perfect for MongoDB.
[03:37:19] <GothAlice> You were storing JSON data in a string.
[03:37:36] <GothAlice> In MongoDB, you don't need to do that. You just embed whatever structure you want, as needed.
[03:38:18] <GothAlice> You already had a type to differentiate, so you can know what fields to expect at any given time, if you want/need.
[03:38:46] <GothAlice> And then it'll be suddenly query-able, too.
[04:13:39] <MacWinner> can you have a subdocument with an objectid? eg, I want to have a main level document like this domain = { domainname: 'example.com', websites: [{name: 'site1', url: 'site1.com'}, [{name: 'site2', url: 'site2.com'}] }
[04:14:05] <MacWinner> if I want to reference the 'websites' attribute in another subdocument, what would be the best way to do this?
[04:17:50] <Boomtime> MacWinner: do you mean you want to store an ObjectID as the value of some other field? if so, yes, this is quite common
[04:19:23] <MacWinner> Boomtime, cool.. just wanted to make sure I wasn't doing some weird design practices.. like i have ads and campaigns that are part of a domain.. i want the campaign to reference ads
[05:14:29] <et09> is _id indexed by default?
[05:15:50] <Boomtime> yes
[05:16:04] <et09> ok thanks
[07:28:54] <lessthanzero> are there any known pitfalls when dealing with nested arrays in mapReduce (I'm having problems emitting deep into arrays)
[10:24:00] <andefined> does anyone knows how to recover dbs after resrver restart?
[10:32:16] <kali> andefined: you're not supposed to have much to do. what are you seeing ?
[10:35:35] <gma> Are there any mongoid users here who know how to use `.and()` ? I'm struggling to make it produce the right query…
[10:41:00] <andefined> kali: i am seeing empty databases but also now i cant restart mongod
[10:42:16] <kali> andefined: if you can't restart mongod, you probaly can't see your database anywgat. you should get an error message in the log when you try to start it
[10:44:18] <andefined> kali: that my address is allready in use (27017)
[10:44:50] <kali> ok. so mongod is actually running already :)
[11:29:45] <ladessa-db> hi, pls...help me with this question https://stackoverflow.com/questions/28594558/mongodb-find-where-count-5/28595270?noredirect=1#comment45498328_28595270
[11:30:18] <ladessa-db> read the answer and my comments below
[11:39:26] <StephenLynx> hey, is there a download available for documentation? I got a pdf with a manual, but I'd like something for technical reference.
[11:39:39] <StephenLynx> specially operators
[11:48:12] <HewloThere> Hi! I
[11:48:16] <HewloThere> Oh, wrong channel.
[14:46:06] <evangeline_> hi
[14:46:31] <evangeline_> I would like to get all _id unique IDs from the collection from pymongo; any ideas on how to do that?
[14:46:46] <cheeser> all _id values are unique
[14:47:09] <jaitaiwan> So you'd just use db.collection.find({}) in the mongo shell
[14:47:30] <cheeser> db.collection.find({}, {_id:1})
[14:49:13] <evangeline_> jaitaiwan, that works in the mongo shell; I guess I should ask how to do it with pymongo in #python
[14:49:30] <evangeline_> in python it returns this object:
[14:49:31] <evangeline_> <pymongo.cursor.Cursor object at 0x7ffec8a41590>
[14:49:49] <evangeline_> ah, the object is iteratable
[14:50:02] <jaitaiwan> Yeh
[14:50:44] <jaitaiwan> Depending on your implementation you can either iterate or theres probably a helper function to pull out all the results at once, although that could lead to some fun memory issuesxs.
[14:54:13] <evangeline_> jaitaiwan, thank you; one more question; how could I easily add new attributes to objects - where the old values should be overwritten if already exist?
[14:54:27] <cheeser> just $set
[14:55:17] <bin> is
[14:55:31] <bin> db.user.find({email:{$regex:'/*@gmail.com/'}}) correct?
[14:55:51] <bin> because not hing is returned and i Know there are records with gmail addresses
[14:56:18] <jaitaiwan> bin: regex doesn't look valid for what you're trying to achieve I think
[14:56:31] <bin> to get all users matching gmail addresses
[14:59:07] <jaitaiwan> bin: alternative depending on your data structure is to use a full text index and search $text: '@gmail.com'
[14:59:25] <bin> can'\t i match them via regex?
[14:59:36] <bin> and i do this only in the shell i don't need it in prod
[14:59:44] <jaitaiwan> bin: of course, just throwing out options
[14:59:44] <bin> t
[14:59:54] <bin> can you give me how to do it with regex
[15:00:40] <jaitaiwan> What you had was fine I think, just replace in between the quotations with /@gmail.com$/i
[15:01:02] <StephenLynx> I remember I used regext, I just used 'text'
[15:01:17] <bin> you are a magician jaitaiwan thanks :d
[15:01:29] <jaitaiwan> No worries bin :)
[15:07:24] <GothAlice> Using full-text indexing that way is a bit abusive of the system. FTI does a lot more than one needs to answer the proposed question.
[15:07:46] <GothAlice> db.user.find({email:{$regex: '/@gmail.com$/i'}})
[15:08:29] <jaitaiwan> GothAlice: definitely. In my benchmarks for production, we foudn that text was many times faster than regex. So that's what we resorted to.
[15:08:43] <jaitaiwan> Pretty hacky
[15:08:57] <GothAlice> jaitaiwan: Indeed, there is that. Excluding regexen in the form /^…/ – prefix searches should still be relatively speedy.
[15:09:36] <jaitaiwan> Yeh, unfortunately prefix regexes weren't in the question
[15:10:13] <jaitaiwan> It's a shame that you can't have super fast regex indexes without prefixes
[15:12:04] <GothAlice> Of course, if that type of querying is required it might even be worthwhile to adapt the data. Split e-mail addresses into two fields, recipient and server. Index server and your queries become fast indexed hash comparisons.
[15:12:10] <GothAlice> To take things to the extreme, a bit. ;)
[15:13:04] <jaitaiwan> Haha true that. Not fun for the application logic though I guess
[15:13:27] <GothAlice> With an appropriate ORM/ODM/DAL it would be seamless for the rest of the application.
[15:13:49] <GothAlice> (In Python I'd have an "email" @property which re-combines the two dependant fields, the app would just use that where needed.)
[15:15:56] <jaitaiwan> Man, that's one of the reasons I love python *sigh*.
[15:16:13] <GothAlice> Heh. That @property would also allow assignment. >:3
[15:17:39] <jaitaiwan> I've only recently been getting back into python after a 5 year sabatical. Hardest thing is transitioning from PHP cause I know it so well. But that's probably #offtopic for this room.
[15:18:59] <cheeser> php is a cancer :)
[15:20:00] <jaitaiwan> Now I'm curious to survey how many php haters we have in the room haha
[15:22:03] <StephenLynx> +1 on php haters group.
[15:23:36] <cheeser> i'm assuming you've seen the fractal of bad design blog post?
[15:23:44] <GothAlice> It's a reasonable template language. It's not a real general purpose language, however, and its fundamental design encourages anti-patterns. Also the fractal.
[15:24:00] <cheeser> http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/
[15:24:08] <GothAlice> parse_str = register globals that never went away, as an example of my favourite function in PHP ever.
[15:24:50] <GothAlice> cheeser: Have you seen the "do not use" MySQL post, similar to the PHP fractal?
[15:26:04] <GothAlice> http://grimoire.ca/mysql/choose-something-else
[15:26:48] <jaitaiwan> Sounds cool. I think PHP is making improvements, all thanks to the guys at facebook with their hack lang and HHVM. In some benchmarks PhalconPHP for instance beats a lot of python mini web framworks
[15:27:15] <jaitaiwan> I still can't get past meaningful whitespace of python though
[15:27:20] <GothAlice> Python (specifically the RPython runtime written by the Pypy folks) runs PHP faster than HHVM.
[15:27:48] <GothAlice> http://hippyvm.com < the runtime
[15:27:49] <GothAlice> :)
[15:28:24] <GothAlice> jaitaiwan: That's a common myth.
[15:28:55] <GothAlice> https://github.com/marrow/tags/blob/develop/examples/widgets.py < indentation in this file is effectively 100% arbitrary.
[15:29:48] <jaitaiwan> GothAlice: I meant I actually really like the whitespace haha
[15:29:48] <GothAlice> :P
[15:29:48] <GothAlice> Ah, then you are unknowingly perpetuating that myth. XP
[15:30:07] <jaitaiwan> Lol fair enough. Love that hippyvm link btw
[15:30:31] <jaitaiwan> The question is does it run the zend thing in the bg
[15:30:32] <jaitaiwan> lol
[15:30:32] <GothAlice> There's also a better-than-Ruby Ruby interpreter called Topaz: http://www.topazruby.com
[15:30:36] <StephenLynx> python is bad too.
[15:30:48] <GothAlice> Nope, it's a pure reimplementation in Python of the PHP runtime.
[15:30:54] <StephenLynx> specially because of the blocks by identation thing.
[15:30:55] <jaitaiwan> StephenLynx: don't tell me your an erlang fan? haha
[15:31:00] <StephenLynx> nah
[15:31:10] <StephenLynx> I use io.js for servers.
[15:31:15] <StephenLynx> fast and minimalistic.
[15:31:22] <GothAlice> StephenLynx: Then you'll hate my own language, Clueless. Clueless only recognizes tabs for scope nesting.
[15:31:24] <GothAlice> ;P
[15:31:31] <StephenLynx> the thing is
[15:31:32] <jaitaiwan> haha
[15:31:36] <StephenLynx> people don't use it.
[15:31:50] <StephenLynx> it is just another esoteric language and you are not trying to promote it.
[15:31:54] <jaitaiwan> StephenLynx: how do you find io.js over node?
[15:32:02] <StephenLynx> theres a whole cult behind python
[15:32:09] <StephenLynx> how or what?
[15:32:23] <jaitaiwan> How
[15:32:53] <StephenLynx> hard to not find out.
[15:33:01] <GothAlice> Heh, cult. Casual dismissal of others opinions. Not the basis of a logical argument or constructive discussion, sadly. :(
[15:33:07] <jaitaiwan> Bit of a hipster cult behind node these days too :P
[15:33:24] <StephenLynx> thats true for anything web and new.
[15:33:44] <GothAlice> jaitaiwan: Clueless (https://gist.github.com/amcgregor/016098f96a687a6738a8 and https://gist.github.com/amcgregor/a816599dc9df860f75bd) may be amusing to you. :)
[15:34:01] <jaitaiwan> Go lang didn't really end up getting a cult following I don't think
[15:34:04] <StephenLynx> but check this out about python: "Van Rossum is Python's principal author, and his continuing central role in deciding the direction of Python is reflected in the title given to him by the Python community, benevolent dictator for life"
[15:34:17] <StephenLynx> I consider python, PHP and ruby to form the web trinity of shit.
[15:34:47] <StephenLynx> all of them have bad syntax, are slow as hell and really have no purpose to be used when there are better tools.
[15:35:00] <GothAlice> StephenLynx: BDFL is a typical title in long-running open source projects. Note the Python Enhancement Proposal process for a structured method to make changes to the language. Vs, I don't know, throwing things at the wall to see what sticks a la PHP and JS. (The full-page numbered list for evaluating automatic typecasting in JS is kinda nuts.)
[15:35:42] <StephenLynx> also, python is so bad designed, it don't even have retro compatibility.
[15:35:49] <jaitaiwan> Imma get some popcorn. Brb
[15:35:52] <GothAlice> StephenLynx: My Python code is faster than equivalent C code. My HTTP/1.1 server supports 10,000 concurrent requests per second and compiles to 171 Python opcodes. Attempting to use a C extension for header parsing actually slowed it down…
[15:35:53] <GothAlice> jaitaiwan: :P
[15:36:14] <GothAlice> StephenLynx: So it seems your preconceptions about the language do not match reality.
[15:36:19] <StephenLynx> are you saying python is faster than C?
[15:36:31] <GothAlice> In certain cases, and with appropriate tuning, yes.
[15:36:53] <StephenLynx> lol
[15:36:54] <GothAlice> I'm guessing you haven't really investigated the Pypy (Python-on-Python) JIT compiler.
[15:37:17] <StephenLynx> I got my benchmarks from here http://benchmarksgame.alioth.debian.org/u64/compare.php?lang=gcc&lang2=v8
[15:38:05] <StephenLynx> I assume the cause of some python code being faster than some C code is very badly written C code rather than some superb python VM or JIT compiler.
[15:38:53] <StephenLynx> and speaking about web-servers, python does not support non-blocking IO, if I'm not mistaken.
[15:38:57] <GothAlice> http://morepypy.blogspot.ca/2011/02/pypy-faster-than-c-on-carefully-crafted.html and http://morepypy.blogspot.ca/2011/08/pypy-is-faster-than-c-again-string.html pardon the potentially borked images on those pages.
[15:39:06] <GothAlice> StephenLynx: It supports non-blocking IO in numerous ways.
[15:39:17] <StephenLynx> reading the links.
[15:39:38] <GothAlice> GEvent, native coroutines, epoll/kqueue, etc., etc.
[15:40:12] <jaitaiwan> One of the key things to these sorts of arguments is that we developers are passionately romantic about our languages in some ways. Very hard to be objective, especially when benchmarks vary so widely haha.
[15:41:10] <StephenLynx> "Hence, PyPy 50% faster than C on this carefully crafted example. The reason is obvious - static compiler can't inline across file boundaries." Interpretation strength vs compilation weakness, nothing exclusive to python on the first example
[15:41:27] <StephenLynx> " This is clearly win for dynamic compilation over static - the sprintf function lives in libc and so cannot be specializing over the constant string, which has to be parsed every time it's executed." same case.
[15:41:52] <StephenLynx> so these two links mean nothing in favor of python, just in favor of interpreted tools.
[15:42:33] <GothAlice> Specifically, the Python implementation written in Python and its ability to deeply understand and optimize your code, including JIT compilation to machine code.
[15:42:48] <StephenLynx> which is true for any JIT.
[15:43:04] <StephenLynx> and from what I see, the V8 is way, way ahead python.
[15:43:05] <GothAlice> Not really. Pypy's JIT is somewhat unique.
[15:43:09] <StephenLynx> really
[15:43:10] <StephenLynx> how?
[15:44:07] <StephenLynx> http://benchmarksgame.alioth.debian.org/u64/compare.php?lang=python3&lang2=v8 these benchmarks show the distance between V8 and python is greater than the distance between C and V8.
[15:44:23] <GothAlice> Python 3, which is slower than other versions.
[15:44:28] <jaitaiwan> GothAlice: got through some of your links. You have WAAAY too much time on your hands
[15:44:29] <GothAlice> Also doesn't have a JIT.
[15:44:46] <GothAlice> So yeah, V8 will trounce Python 3.
[15:44:56] <StephenLynx> is python 3 the latest version of python?
[15:45:24] <GothAlice> It's the latest CPython implementation, yes.
[15:45:31] <StephenLynx> ok, so you are telling me
[15:45:33] <GothAlice> There are quite a few other implementations.
[15:45:35] <StephenLynx> python got slower?
[15:45:38] <GothAlice> Yup.
[15:45:42] <StephenLynx> lol
[15:45:47] <StephenLynx> do you even defend it?
[15:46:13] <GothAlice> https://gist.github.com/amcgregor/405354 compares operations between Python 2.7 and 3.2 (in most cases, a version two major versions behind).
[15:46:37] <GothAlice> There are two separate things at play here: the Python syntax, and implementation details for a particular runtime.
[15:46:58] <GothAlice> Python 2 is also supported. Both are offered to let you pick between advanced features or greater raw performance.
[15:47:37] <StephenLynx> exactly, they couldn't make it right so they fragmented it.
[15:47:40] <jaitaiwan> I have to say StephenLynx. I do like node but gee the callback hell gets me. Promises are good but their a bit of a mind-screw sometimes
[15:47:47] <StephenLynx> python is the worse when it comes down to this.
[15:48:00] <GothAlice> Worse when it comes to callback hell?
[15:48:09] <StephenLynx> callback hell is just bad code.
[15:48:21] <StephenLynx> you don't have to have it if you write good code.
[15:49:34] <StephenLynx> and when you get benchmarks, node/io dwarfs python.
[15:49:50] <jaitaiwan> Mmmk... If I have to open 3 files, to do one operation, I'm seeing callback hell as gonna happen (Without promises obviously).
[15:50:06] <StephenLynx> it won't happen.
[15:50:09] <GothAlice> Oh, Pypy also recently added automatic STM (software transactional memory) features. The biggest unique feature of Pypy is that the compiler acts as a filtering pipeline. Enabling STM simply adds a filter layer to the compiler. You have pluggable garbage collection schemes, also a filter, and the intermediate representation is a flow graph of your entire application with multiple different back-end compilers. I.e. compile to C, or
[15:50:10] <GothAlice> compile to .NET, or compile to JVM, or compile to JS. Yes, JS.
[15:50:13] <StephenLynx> unless you make it happen.
[15:50:58] <StephenLynx> building to other languages only adds to the argument the language itself is useless.
[15:51:00] <jaitaiwan> What size js applications do you work on?
[15:51:22] <GothAlice> StephenLynx: So Java is useless because Google are compiling it to JS?
[15:51:30] <GothAlice> Weird perspective to take.
[15:51:40] <StephenLynx> google is doing one thing.
[15:52:00] <StephenLynx> you example, who added this implementation?
[15:52:04] <StephenLynx> in your example*
[15:52:51] <GothAlice> http://www.rfk.id.au/blog/entry/pypy-js-first-steps/ — see also: http://www.pypyjs.org/demo/ (have some Python in your web browser)
[15:53:23] <StephenLynx> jaitaiwan this is my biggest project so far https://gitlab.com/mrseth/bck_lynxhub
[15:53:25] <GothAlice> The importance of this work isn't that hey, you can now run Python in your browser, but rather the improvements this project brought to asmjs and Emscripten.
[15:53:36] <GothAlice> Benefitting everyone.
[15:55:19] <GothAlice> (https://www.rfk.id.au/blog/entry/pypy-js-faster-than-cpython/ is an amusing instance of a benchmark running faster under Pypy.js than CPython. Also links to some of those improvements to other packages I mentioned.)
[15:55:46] <jaitaiwan> StephenLynx: well structured
[15:56:03] <StephenLynx> told you, callback hell is just bad code.
[15:56:46] <jaitaiwan> Its natural progression of the language which has required an elegant solution
[15:56:46] <StephenLynx> I got my standards from https://github.com/felixge/node-style-guide
[15:56:53] <GothAlice> Array(8).join("wat" - 1) + " Batman!"
[15:56:54] <GothAlice> :P
[15:57:41] <StephenLynx> not that I defend javascript as a good language. it is hilariously inconsistent and you need to use strict mode and a lint to keep on track.
[15:57:53] <jaitaiwan> defs
[15:58:08] <StephenLynx> but nothing you find in it is bad than the abomination of using whitespaces as part of syntax.
[15:58:25] <GothAlice> typeof '' === 'string' — typeof new String() === 'object' — also the existence of ===
[15:58:29] <StephenLynx> as bad*
[15:58:35] <GothAlice> A completely broken object model works for me. :)
[15:59:11] <StephenLynx> that line is just intentionally bad written code.
[16:02:09] <jaitaiwan> The prototype object system takes a fair bit to get used to when you come from a c background. Not that its technically a con for the language itself.
[16:02:27] <GothAlice> The "what not to do in JavaScript" conversation, however, never really ends. It's a minefield that requires a rather substantial head space to navigate. Core language features are inherently broken, requiring non-obvious workarounds and heavily defensive code. Or, even better, automated conversion tools (CoffeeScript) to gloss over the multitude of issues for you. (Which is not really any better than going from Python->JS.)
[16:03:05] <StephenLynx> strict mode + lint. hm, yeah, really though.
[16:03:43] <StephenLynx> specially when you get back the best performance with interpreted languages.
[16:04:02] <GothAlice> [Citation that includes Pypy Needed]
[16:04:27] <StephenLynx> can you run a server using it?
[16:04:36] <StephenLynx> and how feasible is it?
[16:05:31] <GothAlice> I run many servers with it. I linked to my HTTP/1.1 server already. I also run XMPP, a secure pub/sub proxy through to MongoDB's capped collections, and MUSHes (telnet-things), amongst others.
[16:05:48] <StephenLynx> hm.
[16:05:58] <StephenLynx> you got a benchmark of pypy against V8?
[16:06:06] <GothAlice> Python (Pypy runtime again) is used for all of the management scripts, and even the entire package management system of the distro I use, Gentoo. This automation (also using MongoDB) runs a nearly 2000-node cluster.
[16:06:06] <ehershey> I have stumbled into #javascript
[16:06:10] <GothAlice> :P
[16:06:31] <StephenLynx> I avoid javascript community like the plague.
[16:07:17] <StephenLynx> the effort put into adding bloat instead of learning how to code is baffling.
[16:07:30] <jaitaiwan> haha just about ehershey
[16:08:12] <StephenLynx> pretty much like my relation with the java community.
[16:08:27] <ehershey> much easier to discuss it amongst the database community
[16:08:52] <GothAlice> StephenLynx: http://blog.kgriffs.com/2012/11/13/python-vs-node-vs-pypy-benchmarks.html compares them, sorta. It's benchmarking gibberish in most of the graphs. (wsgiref = unoptimized HTTP server for debugging) The Gevent vs. Node.js req/sec tests indicate that for 64 KiB responses Njode.js and Gevent are at par, with Node.js having higher standard deviation.
[16:08:57] <StephenLynx> yeah, most language communities are eco chambers because most programmers don't step outside their zone of confort.
[16:09:16] <GothAlice> (It's also a really old article, not taking into account current Pypy optimizations.)
[16:09:23] <StephenLynx> I could easily use java for fucking everything. instead I went out of my way to learn from javascript to C++
[16:09:41] <StephenLynx> going to read it.
[16:09:50] <StephenLynx> GothAlice
[16:11:00] <StephenLynx> yeah, that example is kind of bad because it adds stuff like frameworks.
[16:12:06] <GothAlice> Yeah.
[16:12:14] <GothAlice> Ignoring the performance bits on http://www.cdotson.com/2014/08/nodejs-vs-python-vs-pypy-a-simple-performance-comparison/ (since his method was flawed) the memory comparison remains valid, however. Node uses multiples of the amount of RAM that Python does on the same problem, until the problem scope grows to the maximum.
[16:12:53] <GothAlice> Pypy (due to needing to graph the whole problem at the start) starts off with ludicrous memory usage, but it flips at the mid-point of problem difficulty to being less memory-hungry than the others.
[16:14:21] <StephenLynx> but, it is the most we have. you could come up with your own benchmark to try and show python is not completely obsolete and redundant as technoloty when it comes down to web.
[16:14:25] <StephenLynx> technology*
[16:14:40] <GothAlice> "Obsolete and redundant." lol
[16:14:43] <jaitaiwan> I'm out folks. I'll pick this up in the logs later.
[16:14:59] <GothAlice> jaitaiwan: I think I'm out too. Can't argue with sedimentary positions. ;)
[16:15:20] <StephenLynx> I'm not interested in arguing any further either, we got the data.
[16:15:44] <StephenLynx> I'm more than willing to change my position in the face of new evidence.
[16:15:50] <jaitaiwan> I called it. Always comes down to each to their own.
[16:15:59] <StephenLynx> and benchmarks.
[16:16:29] <GothAlice> jaitaiwan: Of course. Hopefully each also factors in finding the right solution for a problem, too. (I've written stream filters in Brainfuck before… right tool for the right job. ;)
[16:17:13] <jaitaiwan> Yeh exactly. Anyway great discussion StephenLynx and GothAlice. Its about 2am in my timezone so its time for some much needed rest haha
[16:17:21] <GothAlice> Have a great one!
[16:17:45] <jaitaiwan> StephenLynx, thanks for that styleguide link too.
[16:17:50] <StephenLynx> :v
[16:17:51] <jaitaiwan> GothAlice you too
[17:28:32] <wayne> each player has an array of scores. i want all scores over 100 to be reset to 0. what's an update query that i can use to do this?
[17:29:25] <StephenLynx> update({score:{$gte:99}},{$set:{score:0}})
[17:29:26] <wayne> db.players.update({scores: $elemMatch {$gt: 100}}, /* what goes here? */)
[17:29:38] <Penagwin> Hey I'm super new, why does this fail? http://hastebin.com/exofajipum.bash
[17:29:39] <wayne> StephenLynx: an array of scores
[17:29:44] <StephenLynx> hm
[17:30:19] <StephenLynx> {set:{scores.$:0}}
[17:30:21] <StephenLynx> try this
[17:31:32] <StephenLynx> $ works as "the index of the element matched on query"
[17:31:55] <StephenLynx> wayne
[17:32:05] <StephenLynx> Penagwin where are you trying to run it?
[17:32:07] <wayne> thanks, do you know where to find docs for that?
[17:32:13] <wayne> it seems so perl-like haha
[17:32:37] <StephenLynx> http://docs.mongodb.org/manual/reference/operator/update/
[17:33:00] <StephenLynx> the problem is, I don't know if you will be able to update several scores at once.
[17:33:08] <wayne> ah
[17:33:13] <StephenLynx> $elemMatch only gives you the first element in some cases.
[17:33:24] <wayne> also, what if there's nesting
[17:33:29] <StephenLynx> and nesting.
[17:33:31] <wayne> i don't think $ scales that way
[17:33:33] <Penagwin> Does mongodb do something wierd when looking for text?
[17:33:39] <wayne> i guess i might have to write a mapreduce
[17:33:50] <StephenLynx> in general, sub arrays have several limitations.
[17:33:59] <wayne> :(
[17:34:10] <StephenLynx> and in some cases you might be better using a separate collection.
[17:35:05] <wayne> StephenLynx: yeah that's what i have now, but i'm having huge performance issues due to application-level joins
[17:35:14] <wayne> so the application is hitting the database like a drunkard hits the bottle
[17:35:41] <wayne> nesting is my attempt to ameloriate this
[17:35:49] <StephenLynx> do you perform several queries for this update?
[17:35:54] <wayne> yes
[17:35:55] <StephenLynx> because you wouldn't have to.
[17:36:04] <StephenLynx> you can just update multiple documents
[17:36:05] <Penagwin> Could somebody help me with this odd(to me) issue?
[17:36:08] <StephenLynx> with a single queries
[17:36:12] <wayne> these are across collections
[17:36:24] <wayne> so i'm nesting some collections into others
[17:36:27] <StephenLynx> hm, I will have to know your model better then.
[17:37:50] <wayne> well, i'm just sticking the many from a many-to-one relationship as an array in the one
[17:38:25] <wayne> but the many collection documents aren't too simple and have arrays themselves
[17:38:38] <wayne> so i need to figure out the query acrobatics to make my old queries faster
[17:43:10] <Penagwin> Hey guys, I need to search through objects and find those that don't have a specific string in a field
[17:43:26] <StephenLynx> hm
[17:43:36] <StephenLynx> I think I done that before. hold on
[17:44:40] <StephenLynx> http://pastebin.com/b6HkeRaj
[17:44:52] <StephenLynx> you need an exact match or just part of it?
[17:45:18] <Penagwin> WHole thing
[17:45:24] <Penagwin> And it has some characters in it
[17:45:27] <Penagwin> "Mozilla/5.0 (X11; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0"
[17:46:48] <StephenLynx> so you need all that aren't exactly that string?
[17:46:55] <Penagwin> Yup
[17:47:23] <Penagwin> Testing it on mongolab atm, just get an "unexpected error" with most of my tries.
[17:48:52] <giowong> hi is there a way for mongodb to return a response as a json object
[17:49:30] <giowong> currently its returning as a array, and i was pulling data directly from the json file itself
[17:49:50] <StephenLynx> {login:{$not:/loin/}} this worked for me.
[17:50:05] <StephenLynx> $not uses either a regex or a document
[17:50:22] <StephenLynx> so you can't just use a regular string
[17:51:11] <StephenLynx> giowong you can use findOne
[17:51:28] <Penagwin> StephenLynx: I would need to escape my string then right?
[17:51:40] <StephenLynx> I'm not familiar with regex.
[17:51:46] <StephenLynx> but probabl
[17:52:18] <StephenLynx> Penaqwin wait
[17:52:21] <StephenLynx> theres is $ne
[17:52:36] <Penagwin> What does that one do?
[17:52:44] <StephenLynx> exactly what you need
[17:52:52] <StephenLynx> {login:{$ne:"logn"}}
[17:52:57] <StephenLynx> anything that is not the string
[17:53:37] <Penagwin> Thank you so much!
[17:53:39] <StephenLynx> np
[18:18:45] <shlant1> poll: anyone using sharding with Apache MEsos?
[18:19:01] <shlant1> I am trying to determine if it's even doable/worth looking into
[18:28:59] <MacWinner> if I have 3 files with different filenames, but the same md5 hash of the data (ie the same data), is there some best practice on storing them without duplication?
[18:29:33] <MacWinner> i feel like the chunks reference back to the files.. rather than files referencing the chunks
[18:30:06] <GothAlice> MacWinner: That's due to the way they're queried. More often you're looking for all of the chunks for a file and you already know the file ID.
[18:30:34] <MacWinner> seems like I might need a 3rd collection then?
[18:31:06] <GothAlice> MacWinner: I abstract GridFS with a metadata collection storing the actual "file metadata" and GridFS storing BLOB data without metadata (i.e. no file names). The metadata collection then references the ID of the GridFS BLOB to use, which means I can de-duplicate without issue.
[18:31:40] <MacWinner> GothAlice, cool.. sounds good. I was planning on something like that. just wanted to make sure I wasn't missing some built-in feature
[18:32:23] <MacWinner> thanks!
[18:39:27] <MacWinner> GothAlice, do you typicall split out your gridfs database from the rest of your app data onto a different mongo cluster?
[18:40:27] <StephenLynx> I would save the file regularly then have a document to track it's aliases and md5.
[18:40:29] <GothAlice> Depends on the application. With my 26 TiB Exocortex project, yeah, the BLOB data is separate. It'd be flatly unqueryable if I didn't do that. For smaller projects where the dataset (even with BLOBs) still fits in RAM, nah.
[18:41:00] <MacWinner> exocortex project?
[18:41:18] <GothAlice> StephenLynx: Most filesystems behave badly with large numbers of files in one inode (directory), which then adds the complication of hierarchical organization, usually by substring prefixes. This way lies madness.
[18:41:37] <StephenLynx> including GNU/linux?
[18:41:41] <GothAlice> MacWinner: Yeah, it's a transparent proxy that records every digital bit of information I touch, and has been doing so since 2001.
[18:42:13] <StephenLynx> GothAlice what purpose this project serves?
[18:42:30] <wayne> GothAlice that sounds interesting
[18:42:38] <wayne> are you logging me nooow
[18:42:40] <GothAlice> StephenLynx: GNU/Linux isn't a filesystem. ext* filesystems behave _terribly_ (and you can run out of inodes! `free -i` to check), reiserfs is better (no limit), but directory listings can consume huge amounts of RAM when you have millions of files. Others get worse.
[18:42:46] <MacWinner> does it record when you touch the exocortex?
[18:42:53] <GothAlice> MacWinner: Yes.
[18:43:00] <StephenLynx> I see.
[18:43:06] <GothAlice> wayne: Yes.
[18:43:28] <StephenLynx> and yeah, I misread when you said filesystem.
[18:45:44] <GothAlice> Exocortex serves several purposes: it's a great dataset for natural language processing and personal AI (computer vision, fact extraction, etc.) research. It's a development playground to try out new technologies, i.e. I added full-text indexing and compression to MongoDB 6 years ago. :)
[18:47:00] <GothAlice> It also serves an interesting purpose in predictive linking, i.e. I Google something and ignore the result page. Within 30 seconds I get two links pushed to my browser that are the "best" result, taking into consideration my active project, previous related searches, etc. (And since they were already fetched, no need to fetch them again.) Exocortex visited each result on the first page and performed its own ranking on them.
[18:47:58] <StephenLynx> and currently it holds 26 terabytes of data?
[18:48:02] <GothAlice> Aye.
[18:49:20] <GothAlice> That includes a complete copy of Wikipedia, minus history deltas and discussion pages.
[18:49:26] <wayne> GothAlice: does this mean you're more of a cyborg than the rest of us?
[18:49:38] <GothAlice> wayne: Yes. I'm an ACI. (Advanced Cybernetic Intelligence) ;)
[18:49:44] <wayne> cool i dig it
[18:49:50] <StephenLynx> how do you store this data?
[18:50:08] <GothAlice> StephenLynx: Three Drobo 8-something-i RAID arrays and three Dell 1U rack servers.
[18:50:37] <StephenLynx> do you store it at home or a remote server?
[18:50:43] <GothAlice> (I get unlimited electricity at my apartment complex, and have a spare room with racks.)
[18:52:48] <GothAlice> It is fully backed up off-site, though.
[18:53:25] <StephenLynx> and how much does it costs this remote back-backup?
[18:53:30] <GothAlice> $5/mo.
[18:53:35] <GothAlice> Backblaze FTW.
[18:54:01] <uizouzoitzt> hello inbred MongoDB scum
[18:54:06] <GothAlice> It's this dataset that would cost me $500,000/mo. or so to host on compose.io. Silly cloud pricing.
[18:57:55] <StephenLynx> where do you host it?
[18:59:06] <uizouzoitzt> k
[19:00:02] <MacWinner> have you had any issues with compose.io?
[19:00:25] <uizouzoitzt> weq
[19:00:25] <uizouzoitzt> e
[19:00:25] <uizouzoitzt> wr
[19:00:26] <uizouzoitzt> wr
[19:00:26] <uizouzoitzt> ds
[19:00:26] <uizouzoitzt> f
[19:00:32] <uizouzoitzt> t
[19:00:34] <uizouzoitzt> r
[19:00:36] <uizouzoitzt> z
[19:00:38] <GothAlice> StephenLynx: At home.
[19:00:38] <uizouzoitzt> re
[19:00:40] <uizouzoitzt> wrw
[19:00:47] <GothAlice> MacWinner: I don't use compose.io, alas.
[19:00:57] <StephenLynx> so you don't remotely back it up?
[19:01:06] <GothAlice> StephenLynx: Oh, yeah, I do. Backblaze.
[19:01:18] <StephenLynx> oh
[19:01:44] <StephenLynx> going to look into that. storing 26tb for just 5 bucks per month got my attention
[19:01:58] <GothAlice> T'was a bit of a frankenstein monster to get MongoDB on-disk stripes to back up that way, but it works. (A point-in-time freezing filesystem like ZFS makes this much easier.)
[19:02:06] <GothAlice> StephenLynx: Yeah.
[19:02:21] <uizouzoitzt> fds
[19:02:21] <uizouzoitzt> f
[19:02:21] <uizouzoitzt> wr
[19:02:21] <uizouzoitzt> wer
[19:02:21] <uizouzoitzt> r
[19:02:22] <uizouzoitzt> z
[19:02:24] <uizouzoitzt> tr
[19:02:26] <uizouzoitzt> w
[19:02:28] <uizouzoitzt> e
[19:02:30] <uizouzoitzt> qe
[19:02:34] <uizouzoitzt> q
[19:02:36] <uizouzoitzt> ret
[19:02:38] <uizouzoitzt> et
[19:02:40] <uizouzoitzt> rw
[19:02:42] <uizouzoitzt> q
[19:02:44] <uizouzoitzt> e
[19:02:48] <MacWinner> GothAlice, you can store that much data for only $5?
[19:02:50] <uizouzoitzt> qweewr
[19:02:52] <uizouzoitzt> ewr
[19:02:54] <uizouzoitzt> sdf
[19:02:56] <uizouzoitzt> sd
[19:02:57] <GothAlice> MacWinner: Yup, they're unlimited.
[19:02:58] <uizouzoitzt> fd
[19:03:00] <uizouzoitzt> fg
[19:03:04] <uizouzoitzt> fh
[19:03:06] <uizouzoitzt> gf
[19:03:08] <uizouzoitzt> hg
[19:03:10] <uizouzoitzt> fh
[19:03:12] <uizouzoitzt> we
[19:03:14] <uizouzoitzt> r
[19:03:16] <GothAlice> It's also deep-storage, so unlike Dropbox you don't have "live" access to your files. You have to queue up restores.
[19:03:20] <uizouzoitzt> ewr
[19:03:22] <uizouzoitzt> w
[19:03:24] <uizouzoitzt> 4
[19:03:26] <uizouzoitzt> 234
[19:03:28] <uizouzoitzt> 234
[19:03:30] <uizouzoitzt> 23
[19:03:48] <MacWinner> GothAlice, interesting.. i'll check it out
[19:04:08] <GothAlice> (Also fully encrypted if you so choose. That's the big reason I went with 'em.)
[19:04:37] <MacWinner> it seems too good to be true :)
[19:05:03] <MacWinner> will check it out.. thanks
[19:08:31] <MacWinner> i wonder how they compress down video files
[19:08:42] <GothAlice> They use bzip2 compression, FYI.
[19:09:12] <GothAlice> Doesn't help that 80% of my dataset is xz compressed…
[19:09:18] <GothAlice> Took three months to do the initial backup.
[19:11:01] <MacWinner> out of curiousity.. any recommended mongodb hosting provivders?
[19:11:07] <MacWinner> if I don't want to have to create my own cluster
[19:11:18] <GothAlice> MacWinner: Create your own cluster and slap MMS on it.
[19:11:26] <GothAlice> "Cloud" providers will empty your wallet.
[19:11:29] <uizouzoitzt> GothAlice: you are an idiot
[19:12:09] <StephenLynx> alice, mind linking your github account?
[19:12:19] <MacWinner> GothAlice, i figure I can do that later once hosted mongo costs become an issue.. right now I depend on reliability and I don't quite trust myself on it
[19:12:22] <StephenLynx> I got 78 results for "clueless" and I'm lazy.
[19:12:38] <GothAlice> StephenLynx: https://github.com/amcgregor/ (personal) https://github.com/marrow/ (open source collective) https://github.com/illico/ (work)
[19:12:55] <StephenLynx> god damn it
[19:12:57] <StephenLynx> :v
[19:12:59] <GothAlice> :P
[19:13:03] <StephenLynx> how old are you?
[19:13:12] <GothAlice> I believe I'm 30.
[19:13:48] <GothAlice> StephenLynx: Clueless is an unpublished WIP toy lang of mine. Most of its bits are in gists. https://gist.github.com/amcgregor/016098f96a687a6738a8 (docs) https://gist.github.com/amcgregor/a816599dc9df860f75bd (some sample code)
[19:14:08] <StephenLynx> I know, I just looked for it to try and find you
[19:14:33] <GothAlice> The unpublished part of that would certainly make that difficult. XP
[19:14:42] <wayne> StephenLynx: another riddle for you! this is a player: {games: [{scores: [8,9,13213213213], cheat: true}, {scores: [1,2,3], cheat: false}]}
[19:14:56] <wayne> a player has many games, games have many scores and may have been cheated or not
[19:15:13] <wayne> how do i sort players by highest scores that weren't cheated?
[19:15:16] <giowong> im getting a http://localhost:3000/[object%20Object] error but my json returns correctly, should i just ignore this?
[19:15:24] <StephenLynx> but you had a public repo, didn't you?
[19:15:32] <wayne> i know mongodb supports sort("a.b.c")
[19:15:42] <wayne> but the predicate of checking that cheat == false is boggling
[19:16:06] <GothAlice> StephenLynx: Not yet. I ran into a hiccup with the dynamic EBNF parser and flow grapher and Real Life™ side-tracked me.
[19:19:47] <GothAlice> wayne: Store a "top score" field on the player and have your updates (which add scores or games) only update that value if a) it's smaller than the maximum for that round/game and b) that round/game wasn't cheat.
[19:21:03] <wayne> ugh was hoping to avoid that
[19:21:27] <wayne> i don't like duplicate values and annoying recalculation cycles
[19:21:41] <wayne> i'd prefer that the DB just have advanced enough querying
[19:21:44] <GothAlice> It's technically pre-aggregation, which deserves a slightly different philosophy than "duplication". ;)
[19:22:10] <wayne> it is duplicate information that could have consistency issues, nonetheless
[19:22:14] <GothAlice> It's a single update() query when otherwise recording the scores.
[19:24:18] <StephenLynx> you have an artist account on lastfm?
[19:24:28] <GothAlice> I did last time I checked. :)
[19:24:41] <StephenLynx> I don't know lastfm.
[19:24:44] <GothAlice> http://soundcloud.com/gothalice — I'm also on iTunes, etc., etc. and bitterness my own albums.
[19:24:47] <StephenLynx> you make remixes or something?
[19:24:56] <StephenLynx> yeah, I noticed the itunes account
[19:25:05] <GothAlice> *bittorrent
[19:25:18] <StephenLynx> also I noticed the facebook account.
[19:25:29] <StephenLynx> didn't looked into it because I deleted mine.
[19:25:56] <GothAlice> Many video game remixes, much random other stuff. My schtick is hybrid electronica with a small handful of natural instruments.
[19:26:23] <wayne> you guys are creepy
[19:26:38] <GothAlice> wayne: I'm an unusual human being. If it exists, at some point I've been paid to do it. ;P
[19:26:51] <wayne> well i more or less meant the googling of other people
[19:26:56] <GothAlice> (At least, if it involves IT at all.)
[19:27:17] <wayne> too close to doxxing to be comfortable
[19:27:24] <StephenLynx> matchFWD? weird, you link illihicodes at github
[19:27:41] <GothAlice> StephenLynx: Illico Hodes is the company that was behind matchFWD.
[19:27:46] <StephenLynx> oh
[19:28:50] <windsurf_> which operator do I use to find all event objects whose categories array (array of _id) contains at least one matching _id in an array I have to compare it against?
[19:29:07] <GothAlice> wayne: I've been dox'd by a gaming group in the past. Always fun to have "that talk" with employers.
[19:29:12] <StephenLynx> I wouldn't expect you to use the same image you use on github on linkedin
[19:29:21] <windsurf_> use case: user passes array of category ids he wishes to search within, need to find all events that are tagged with any of those categories
[19:29:33] <wayne> windsurf_: $in
[19:29:40] <GothAlice> windsurf_: No need for an operator. db.event.find({categories: ObjectId(…)})
[19:29:47] <StephenLynx> I reallly can't see anything compromising here.
[19:29:54] <GothAlice> Oh, sorry, missed the extra array part.
[19:29:55] <GothAlice> $in
[19:30:03] <StephenLynx> that would cause a talk with anyone.
[19:30:34] <wayne> windsurf_: you could also do something like
[19:31:26] <wayne> db.events.find({tags: {$elemMatch: {$in: ['trees', 'cars', 'chairs']}}}) // maybe?
[19:31:44] <wayne> assuming tags is an array
[19:32:09] <windsurf_> thanks guys
[19:44:34] <windsurf_> wayne that worked, thanks
[19:45:58] <wayne> np
[19:48:55] <whaley> GothAlice: is it ok if I PM you, regarding your music (I don't want to be offtopic)
[19:49:33] <GothAlice> Feel free. :)
[19:51:34] <jclif> Is Automation enabled in MMS?
[19:54:52] <StephenLynx> I think I might start using gridsf for my chan.
[19:55:09] <StephenLynx> found out you can stream files in io using gridsf.
[19:55:24] <StephenLynx> if I pickup that project again, that is.
[19:55:45] <giowong> hey guys
[19:55:49] <giowong> im conflicred on what to do
[19:56:37] <giowong> so i have a array of objects, each object has a unique name, but have similar categories attribute
[19:57:33] <giowong> and each object also has a count
[19:57:48] <giowong> i need to sum up the total counts for each category
[19:58:07] <giowong> should i do it in the back or front after i pass the whole kson array
[20:15:02] <fewknow> giowong: not sure I follow..you have an array of sub-documents? each has a count and you want to sum them up ?
[20:17:48] <jake__> Hi. I'm trying to optimize a mapreduce job. Looking at db.currentOp() I see that "planSummary" : "COLLSCAN". Does this mean it isn't using an index for the query?
[20:24:21] <giowong> yea
[20:25:05] <giowong> each object has a defaultCategory, which cna be the same across sub-documents, and each one has a unique count
[20:25:22] <giowong> i want to sort by default category and sum the total for each category
[20:26:45] <MacWinner> mongo write locks are database level right? not collection level?
[20:26:47] <fewknow> you could use the aggregation frame work
[20:27:01] <fewknow> MacWinner: depends on the version on mongodb you are running
[20:27:03] <MacWinner> so if I have an activity table, I should probably place it in a separate db?
[20:27:06] <MacWinner> 2.6.7
[20:27:19] <fewknow> still db locking....but db locking isn't a bad thing
[20:27:30] <fewknow> and depending on your load it might not matter
[20:27:49] <fewknow> is it heavy read or write
[20:28:11] <MacWinner> fewknow, i have one database that will have a bunch of configuration information that will rarely change.. but another set of data that is activity data related to the configuration information
[20:28:32] <MacWinner> the activity and configuration info is heavy read... teh activity is also heavy write
[20:29:16] <fewknow> k....locking doesn't prevent readds....it just slows down writes
[20:29:28] <kali> locking locks reads too
[20:29:42] <fewknow> so you could test...but unless you need more than 2k writes a second you shouldn't need to worry
[20:29:43] <kali> it's a one writter or N readers locks
[20:29:46] <MacWinner> yeah..i thought i saw it locks reads as well
[20:29:48] <fewknow> locking doesn't prevent reads
[20:29:54] <kali> fewknow: it does.
[20:30:09] <fewknow> no it doesn't...locing a read does nor prevent the read
[20:30:17] <fewknow> bot*
[20:30:29] <kali> fewknow: a read does not lock another read, but a write will lock all reads
[20:30:30] <fewknow> the lock% you see when you look at mongostat is only the write lock
[20:30:38] <fewknow> its doesn't include the read lock
[20:32:16] <fewknow> kali: yes i agree with that..
[20:33:05] <fewknow> but a read won't fail on a lock....and unless and unless you are hitting 2500 writes a second the read will be sub second most of the time
[20:33:26] <fewknow> i guess it depends on your scaling needs
[20:33:45] <fewknow> but I never worry about reads due to a lock...espically since the working set of data is in memory
[20:33:51] <fewknow> and you can cache on top of that
[20:33:59] <fewknow> reads should not be a huge concern
[20:34:09] <fewknow> slow writes should be the concern
[20:53:16] <cook01> When you’re dealing with large collections (each document is fairly small) and trying to efficiently page over the entire collection, then is the recommended way to page with $sort and $limit? http://docs.mongodb.org/manual/reference/operator/aggregation/sort/
[20:54:13] <kali> cook01: well, even with an index for $sort, $limit will have a linear complexity
[20:54:27] <kali> linear on the number of documents you skip
[20:54:55] <kali> cook01: so it may be more efficient to use the sort key to pull the next page
[20:55:12] <kali> mmm... i meant $skip is linear
[20:55:27] <GothAlice> cook01: I'm not sure of the technical details in MongoDB, however as kali touches on, skipping will still require ordering the result set and massaging the records you're skipping. (Even the most efficient method of doing this, sorted on _id, would be a "log n" situation as it walks the index b-tree.)
[20:56:57] <kali> yeah, log(number of doc in the database) but at least it means you 1000e page will have the same performance as the first
[20:57:19] <kali> 1000th :) 1000e is french
[20:59:42] <cook01> Thank kali and GothAlice
[21:00:58] <cook01> It sounds like skipping on an indexed field is one of the better ways to page over a large collection
[21:01:55] <GothAlice> Yes. Better is to "continue" instead of "page", as briefly mentioned above.
[21:02:08] <fewknow> cook01: why do you need to page over the collection?
[21:02:13] <GothAlice> I.e. get the first page, remember the last record's sorted value, re-query asking for $gt that value.
[21:02:33] <fewknow> $gt with $limit
[21:02:39] <fewknow> and make sure the query is index ONLY
[21:02:50] <fewknow> if you are just paging
[21:05:08] <cook01> fewknow: just trying to calculate a few stats that start with the objects in this collection
[21:05:35] <fewknow> cook01: like?
[21:05:35] <cook01> Thanks for the suggestions!
[21:05:40] <fewknow> what stats?
[21:07:02] <cook01> think of business stats/kpis that are derrived by using the documents in that collection
[21:09:32] <fewknow> I would use elastic search or hadoop to do that
[21:09:45] <fewknow> there are connectors from mongo to elastic and hadoop
[21:09:55] <fewknow> it ill read from the oplog and keep the documents up to date
[21:10:01] <fewknow> you can run stats off of those
[21:10:07] <fewknow> i wouldn't page through mongo(slow)
[21:11:00] <cook01> fewknow: Yep, I completely agree. This is just a quick solution (v1) that’s subject to a lot of change down the road
[21:12:31] <gansbrest> hi. I need to move one database from one replica set (beta) to prod replica set ( new db needs to be created there )
[21:12:45] <gansbrest> what's the best to do this?
[21:12:51] <gansbrest> can I just copy files maybe?
[21:13:20] <fewknow> gansbrest: mongodump
[21:13:25] <fewknow> mongorestore
[21:13:48] <gansbrest> oh ) thanks )
[21:25:40] <gansbrest> fewknow: I should restore on master I assume, and slaves will fetch?
[21:32:22] <fewknow> gansbrest: yes
[21:32:29] <fewknow> PRIMARY
[21:32:31] <fewknow> not master
[22:11:44] <gansbrest> one other question, what's the best strategy to update beta replica set with production data? We have similar needs for solr and we ended up sending multiple writes to both though proxy library. Maybe there is something simpler for mongo?
[22:52:56] <dacuca> how can I read the oplog? I’d like to use it to index the data in a FTS engine
[23:34:28] <GothAlice> dacuca: db.oplog.rs.find() on the "local" database
[23:35:14] <GothAlice> https://github.com/cayasso/mongo-oplog is a handy tool to abstract it a bit
[23:36:12] <dacuca> ok thank you GothAlice
[23:37:51] <GothAlice> dacuca: As a side note, I hope you are aware of http://docs.mongodb.org/manual/reference/operator/meta/comment/ which allows your application to send extra information to the back-end you seem to be writing.