[00:20:27] <stefandxm> GothAlice: what happened to better driver documentation for c++?
[00:20:59] <GothAlice> Quite handily that isn't one of my responsibilities. (For two reasons: first, I don't use C++ except for kernel development, and secondly I don't work for MongoDB. ;)
[00:21:26] <GothAlice> Despite appearances to the contrary on that latter point. >:P
[00:22:26] <stefandxm> which yes. would prove you not use c++ driver
[00:22:37] <stefandxm> since its something mongodb doesnt wanna acknowledge they ever had lol
[00:24:32] <stefandxm> i love the "mongo calls me every 3rd day" to "never" since i requested c++ help :) (thats what i ever said)
[00:26:01] <GothAlice> Admittedly, except for certain highly specialized data processing tasks, it's almost universally easier (in terms of ease, development time/cost, produced SLoC, testing cycle times, and development turnaround time) to use a scripting language for interaction with document-based storage systems.
[00:29:49] <stefandxm> sounds like a proper background story for a bot :)
[00:30:18] <GothAlice> (Discovered some interesting things about the host platforms; the software floating point lib in ROM required two long jumps, the first into a redirection table. It was faster to embed the needed routines in the app for performance, and still managed to squeeze it into 256 bytes. ;)
[00:31:39] <stefandxm> "software floating point lib in ROM "
[00:31:48] <GothAlice> That one was for a demo competition on PalmOS.
[00:41:37] <stefandxm> i wish all bots where like you
[00:42:05] <stefandxm> you should join us at #c++ and spread your wisdom :)
[00:42:11] <GothAlice> It was a motor unit with a button on it. Disassemble, replace button with two diode-protected relays in opposite polarities, connect to the parport. Bam, really crappy plotter control. :)
[00:59:41] <stefandxm> "well.. was a good idea you know.. cats are interesting*
[00:59:53] <stefandxm> *an donkeys.. donkeys are the new stuff....
[00:59:56] <GothAlice> Actually, in the last two weeks there has been a substantial amount of driver discussion; mostly JS and some Python, though. A bit of Java.
[01:00:05] <stefandxm> can we make them mock @ transactions?
[01:00:12] <stefandxm> yeah .. maybe.. who cares? *"
[01:00:54] <stefandxm> i dont want transactions in mongodb :)
[01:01:57] <GothAlice> And I've learned from my kernel work, you can build almost any higher-level locking or transaction behaviour using update-if-not-modified (compare-and-swap).
[01:03:09] <stefandxm> if you come to sweden to or norway id be happy to show you our data system
[01:08:08] <GothAlice> stefandxm: I'll add that to the tasks to perform in that country. Next time vacation comes up, I'll see if Sweden or Norway ping. :)
[01:08:51] <stefandxm> we have monogdb world in oslo soon i think :)
[01:12:25] <stefandxm> i just.. i had my own imap server
[01:12:31] <GothAlice> … and now I have server racks in a spare bedroom and cooling problems. ;)
[01:12:39] <stefandxm> and removing emails seemd a bit.. yesterday?
[01:14:54] <GothAlice> Well, mine also combines with a legally provable/verifiable cryptographic audit trail to detect tampering in the data. ¬_¬ Same with photos from my DLSR, too, amusingly.
[01:15:48] <stefandxm> are you questioning my imap servers security?
[01:17:40] <GothAlice> 99% of mail services out there are weak in comparison to crypto trail systems, it's a safe bet I can send you e-mail as Bob Dole or Bill Gates. ;P
[01:18:43] <GothAlice> (I used to give presentations on information security… the first five minutes of the presentation was designed to scare the crap out of the audience. I'd demonstrate hijacking the audience's social network accounts if any were so bold as to check during the presentation. ;)
[01:19:10] <GothAlice> But usually the first step was getting a volunteer, and sending them mail from famous e-mail addresses.
[01:19:50] <stefandxm> i gave you my email in privchat
[01:22:45] <GothAlice> stefandxm: Did you get the (really badly formatted test mail) I sent from Bill Gates?
[01:23:14] <GothAlice> stefandxm: http://cl.ly/image/2J282l223C3i — Doing this should not be possible. (The message should be instantly rejected as being from an unauthorized EHLO host.)
[01:23:33] <kba> Nowadays, most providers (like Gmail) will tell you that the server address doesn't fit the email
[01:29:43] <GothAlice> stefandxm: No, because it accepts a) clearly invalid messages, and b) accepts messages from hosts (for e-mail addresses) it should not.
[01:29:44] <stefandxm> can i paste the entire mail?
[02:02:36] <stefandxm> a bit of a warning kba and GothAlice are quite good fellahs. dont trust anyone of them without a vs b etc. they are quite sneaky :) if you want any more info email me! stefan@skogome.net
[02:02:56] <kba> what on earth are you talking about?
[02:03:31] <kba> yeah, I just had a seemingly friendly chat with him about shellshock, until he became incredibly defensive
[02:03:43] <kba> suggested I wasn't very old and ended up saying he'd put me on ignore
[02:03:44] <GothAlice> kba: Apparently expiring screenshots that involve e-mail addresses that are public knowledge (whois record stuff) warrants that kind of response.
[02:03:55] <stefandxm> feel free to paste any chats
[02:04:08] <kba> GothAlice: from a guy like that, I understand why he'd want to hide as much of his identity as possible
[02:04:14] <stefandxm> but for good manners; cc them to me :)
[02:16:11] <GothAlice> cheeser: I was attempting to help diagnose his mail server setup. I was attempting to explain why the various deficiencies I found were important. Then, bam, irate gibberish.
[02:16:39] <GothAlice> cheeser: My contribution has been a total of one insanely badly formatted message from Bill Gates. ;)
[02:16:45] <stefandxm> by helping we got new domains to block for spam.. i guess thats a good one? :)
[02:16:59] <stefandxm> GothAlice: that is bullshit.
[02:17:26] <Boomtime> you guys know this is #mongodb right? are you in the right channel?
[02:54:07] <user123321> In replication, when the master DB goes down, and until a slave is promoted to a master, what happens to the DB client requests during this transition period?
[02:56:39] <cheeser> probably rejected unless you have slaveOK set
[02:58:03] <Boomtime> user123321: note slaveOk only permits reads/queries no writes can occur without a primary
[02:59:19] <user123321> I see. Is there a way to make sure that no data get lost?
[02:59:41] <user123321> Or at least, I don't mind if the writes get a little bit delayed like very few ms.
[02:59:42] <joannac> have write concern of "majority"?
[03:02:15] <GothAlice> Majority is the safest (esp. if you also ask for journal commit), but also slowest.
[03:03:00] <GothAlice> user123321: http://docs.mongodb.org/manual/core/write-concern/ — For more information.
[03:07:07] <GothAlice> user123321: http://www.nonfunctionalarchitect.com/2014/06/mongodb-write-concern-performance/ — has many good charts, and he makes certain to point out these are mostly relative comparisons, different hardware will act differently.
[03:07:28] <stefandxm> GothAlice: you owe me a beer.
[03:27:41] <user123321> if 1 goes down, at least the other has written data.
[03:29:02] <GothAlice> user123321: You can't exactly have two primaries, but you can have a write concern that requires the data to be written to the primary and at least one secondary to be considered "written".
[03:29:21] <GothAlice> user123321: For details: http://docs.mongodb.org/manual/core/replica-set-write-concern/ — you can specify how many you want explicitly, or simply say "majority" (useful in larger sets).
[03:30:05] <user123321> GothAlice, Does it mean I have to change the application code as well?
[03:30:35] <GothAlice> user123321: To specify the write concern, yes. You can do this at the connection level if you wish, though. (Depending on driver.)
[07:46:47] <joel_tux> hello, pulling my hairs here. I’m trying to fetch a deeply nested hash on a document in a collection with mongoose, and it just keeps abbreviating the nested parts to [Object], time to switch to a more basic driver like mongoskin or what?
[09:20:13] <Hypfer> I want to upsert params in some but not all subobjects
[09:20:23] <Hypfer> but since they are in an array it doesnt quite work
[09:20:31] <Hypfer> I cannot change the data scheme
[10:29:24] <ncls> hi all, I don't get it : I'm running mongoimports on a simple csv file with a line like "lastname;firstname;age" and two lines like "foo;bar;26" but each line is recorded as : "lastname;firstname;age" : "foo;bar;26"
[10:29:48] <ncls> here is my command : mongoimport --db mongoimports --collection test --type csv --headerline --file test.csv
[10:29:58] <ncls> I'm on Windows 7, and I'm running Mongodb 2.6.5
[10:32:46] <ncls> oh okay ... it really needs a coma "," and not a ";"
[10:33:05] <ncls> but it's weird because I think I had the same issue with a tsv file ...
[12:03:02] <drp> hi, does anyone know if there is sourcecode available for mongotop ?
[12:17:46] <fatgeekuk> Folks, I have a question. (sorry to just jump into channel and fire away) : the mongodb documentation uses an object to define indexes and states that the order of the keys is significant, however, js documentation states that objects do not maintain the order of their properties. wos goin on?
[12:58:36] <someotherdev> I have some specific questions about some MongoDb optimisations. Does anyone have a minute to spare?
[13:00:32] <joannac> you'd be better off just asking
[13:02:25] <someotherdev> So, I have a document collection blogs {id, author, section} which I need to query. After which, I need to populate the author and section. Which I do via sub-queries. I then need to populate the number of votes and comments (which I am storing in another collection). This is taking 700MS to query and I was looking at ways to reduce this. Any ideas?
[13:03:02] <someotherdev> Using mongoose populate to do the sub-queries
[13:03:37] <someotherdev> I page pulling in 20 records at all
[13:03:47] <someotherdev> the blog collection has a few thousand documents
[13:05:04] <joannac> stop distributing your data across multiple collections?
[13:07:47] <someotherdev> well, blogs is the wrong word. It's articles not sure why I used blogs. Anyway, given the traffic amount it's likely to rise by a thousand article per month
[13:07:59] <someotherdev> So keeping it in the same collection would be a bad idea.
[13:15:07] <joannac> what the user is editing is not going to be statistics
[13:16:05] <joannac> and if they are (i.e. resetting their read count), then their edit should win anyway, and it doesn't matter that intermediate edits are lost
[13:18:11] <someotherdev> You have a good point. Well, I inherited this data structure - which is already live. Not sure migrating everything would be productive
[13:18:32] <someotherdev> and things such as comments, would you also store them in the same document?
[13:24:28] <someotherdev> okay, that's what I am basically trying to do. It just so happens that everything is fragment e.g. author, section and records (by far the biggest data)
[13:26:48] <joannac> anything that needs to be seen together, should be in the same document if possible
[13:27:01] <joannac> are you going to hit the 16mb limit?
[13:27:08] <someotherdev> anything unique to the document, surely?
[13:27:14] <someotherdev> Let me check the current stats
[13:29:30] <someotherdev> okay, so highest viewed article in one month has 5mb of stats. We are expecting lots of growth in the coming months due to funding etc
[13:30:04] <someotherdev> to be fair, it's logging so much stuff it doesn't need. Can probably reduce that by more than half
[13:30:12] <cheeser> what we did at my last gig was to store things separately (what consitutued each bit is a different discussion) then on each request, we'd check couchbase for the cached version of the fina page. if it wasn't there, we'd hit mongodb and assemble all the parts.
[13:30:23] <cheeser> then we'd just serve out of couchbase until the page expired.
[13:30:45] <cheeser> if something changed on that page (new comment, e.g.) we'd invalidate that cache entry and rebuild it on the next request.
[13:32:28] <someotherdev> That's an awesome idea. I haven't included caching yet as it's my last resort. I am planning to do it however, I just want to optimise the queries first.
[13:33:27] <someotherdev> is there a particular feature of mongo that may assist this? e.g. map reduce
[13:35:26] <ncls> in an aggregation pipeline, in a $group operation, how can I perform the $concat operator for each field of my objects, without naming them one by one
[13:35:31] <someotherdev> Thanks, still new to Mongo so trying to figure out what's the best deal. Though I think for the stats I may serve this via the article document as suggested. Just need to be careful with the 16mb limit. However, 16MB should be enough
[13:40:30] <fatgeekuk> why not make your viewer stats someone elses problem? use custom dimensions in GA and us the GA api? unless of course you are displaying user read stats on the normal page views (not just reporting pages)
[13:40:56] <someotherdev> the users can see their stats
[13:41:04] <someotherdev> they have a page dedicated to the stats
[13:41:05] <fatgeekuk> in that case, ignore me. :-)
[13:41:37] <fatgeekuk> oh?! right, well if it is not part of the normal pages, but a stats/reporting page, why not use GA to make the stats information somebody elses problem?
[13:42:11] <someotherdev> well, it's in all pages to some degree e.g. number of votes and if you voted etc. But you can see a breakdown on the stats page
[13:44:12] <someotherdev> however, if we had 16mb of stats per story I would be over the moon. As a matter of fact, I am happy to vist the problem then if that's the case.
[14:41:06] <grkblood13> how do i query for a range include the boundaries. for example, i want to include value1 and value2 in the search results of field: { $gt: value1, $lt: value2 }
[15:44:11] <plamer> hello need some help, and can't find anything on the net whole day :/ Tried to install Genghis but when i select the server i get: Database command 'dbStats' failed: (errmsg: 'exception: expected to be write locked for monitoring'; code: '16105'; ok: '0.0').
[15:44:53] <plamer> the same "expected to be write locked for monitoring" shows in the logs
[15:45:05] <plamer> any idea what is this and how to fix it?
[15:45:36] <plamer> never worked with mongo before and now my boss wants me to help a guy with that :/
[15:46:49] <plamer> the whole thing from the logs: assertion 16105 expected to be write locked for monitoring ns:monitoring.notifications query:{ $query: { notification_id: "401", date_created: /2014-11-14/ }, $orderby: { date_created: 1 } }
[16:28:29] <s2013> can i just do a dump of a specific collection
[16:28:34] <Dewsworld> Could you comment on my question on stackoverflow http://stackoverflow.com/questions/26934073/mongodb-update-guarantee-using-w-0
[16:50:49] <s2013> anyone here has worked with elasticsearch? i followed all the instructions. (i had it working beforE) but it cant seem to import my mongodb collections into my Es instance
[20:20:17] <GothAlice> (Bundle the nick in the record to avoid an extra lookup.)
[20:20:38] <GothAlice> The timestamp of the message would be provided by the _id ObjectId's timestamp.
[20:21:18] <GothAlice> You could add a "read" boolean flag which can be atomically updated when the message gets seen, too.
[20:23:14] <GothAlice> Lonesoldier728: Then for the "push" aspect of messaging, these "messages" could be written to a collection, but also sent to a capped collection where other processes are waiting, listening.
[20:23:50] <GothAlice> Lonesoldier728: https://gist.github.com/amcgregor/4207375 are some presentation slides (with link to code and the rest of the presentation at the bottom) describing using MongoDB as a messaging system for RPC.
[20:26:00] <Lonesoldier728> i GothAlice http://pastebin.com/yBr8yLXk
[20:27:13] <GothAlice> But consider how you'll be using the data: most messaging apps, well, display the messages. To display a message in your structure you'll need to look up the user (to get their name, profile picture, etc., etc.)
[20:27:52] <GothAlice> If the display only ever shows a conversation between two users (no group chat), then this is OK. You'll do the lookup for the "other user"'s data once. But in a group chat, things get more complicated.
[20:27:57] <Lonesoldier728> I will have all the info already once the person clicks on a person then i will pull the messages and
[20:28:27] <Lonesoldier728> that match both users ids
[20:28:33] <Lonesoldier728> kk perfect yeah no group
[20:28:40] <Lonesoldier728> so no sharding necessary right?
[20:28:58] <GothAlice> Lonesoldier728: You may also want a full-text index on the message, if you want to be a able to search by keywords and stuff.
[20:29:20] <GothAlice> Sharding is about scaling your data… when you have lots and lots of data (more than fits into the RAM of one machine) then you'll want to add it.
[20:30:08] <GothAlice> (But this will depend on use case. If you have lots of historical data that is rarely accessed, the RAM thing can be ignored. You'll have to benchmark and see how much time you spend on disk I/O.)
[20:30:43] <GothAlice> But you'll *always* want your indexes to fit in RAM, otherwise madness will ensue.
[20:32:52] <GothAlice> (That's where, when you get to the point of needing to scale, having "archived" data separate from the main, "live data" collections can be useful.)
[20:34:49] <GothAlice> Finally, another approach (if you have "transactional" conversations, meaning one with a definitive start and end to a session) is to store both sides of the conversation in one record:
[20:41:25] <GothAlice> For your problem domain, though, a combination of *both* strategies could be quite useful. The first approach for the "push" messages (capped collection), the second approach for the "archived" messages (subsequent recall). Getting a whole conversation for one page is then just a fetchOne. :D
[20:49:22] <Lonesoldier728> yeah that actually sounds like a better idea no?
[20:49:39] <Lonesoldier728> an array of the texts for one convo
[20:54:45] <GothAlice> Lonesoldier728: It has implications, which is why I use that for archival.
[20:55:23] <GothAlice> Lonesoldier728: Notably, if the document has data added to it beyond a certain amount (default padding), it will have to be moved on-disk to perform the $push. This will add unexpected semi-random slowdown to those operations.
[20:55:56] <Lonesoldier728> http://pastebin.com/d7Uiwwvzso like that
[20:56:10] <GothAlice> That paste has been deleted.
[20:56:52] <Lonesoldier728> oh i added the so at the end lol
[20:57:45] <Lonesoldier728> so that kind of setup should be fine?
[20:58:13] <Lonesoldier728> and when storing does it automatically store in descending order (what is most recent to be pulled out first)
[20:58:35] <GothAlice> Hmm; splitting the creator/other at the top level means two indexes and doubling of the queries if you want to look up by either party to the conversation. (Do you need to track the initiator of the conversation?)
[20:58:48] <GothAlice> $push adds to the end, luckily you can $slice with a -1 to get the last from the list.
[20:59:07] <GothAlice> (In theory you could $push anywhere in the list… but append-only is a simple and safe default approach.)
[20:59:27] <Lonesoldier728> and what do you mean, if I have the from I know which person in the convo sent it
[20:59:31] <GothAlice> Using an aggregate query, yes, you could re-sort the nested data MongoDB-side.
[20:59:52] <GothAlice> "creator" and "other" implies a relationship; "creator" started the conversation with "other".
[20:59:57] <Lonesoldier728> or should I put the from as an int of 1 and 2, where the creator is 1 and the other is 2, so then I can match it based on that
[21:00:08] <Lonesoldier728> well it can just be user1 and user2
[21:01:07] <GothAlice> Note: "participants" — you can index on this (rapidly search for conversations involving one of the two parties), and, to save space in the message list, you could store the "from" in my example as an integer index into that participants list. (0 or 1 instead of 1 or 2, as you gave.)
[21:01:32] <GothAlice> You can also rapidly search for all conversations between two specific parties.
[21:01:37] <Lonesoldier728> it is better to make that an array or participants?
[21:01:37] <GothAlice> Same problem; you're splitting the data.
[21:02:02] <GothAlice> How do you search for all conversations involving "Bob"? You'd have to effectively ask twice, $or: {user1: "bob", user2: "bob"} and that's kinda yucky.
[21:02:27] <GothAlice> {participants: "Bob"} would get the answer in a much more elegant way.
[21:04:19] <GothAlice> The process to display a conversation would be: fetchOne on the conversation by ID (preferably), load up data about the participants (users = db.users.find({_id: {$in: conversation.participants}})), then loop over the messages emitting HTML. (As an example.)
[21:05:04] <GothAlice> Then, after rendering the initial conversation history, wait for push notifications over a MongoDB capped collection in order to live add new messages. :)
[21:05:17] <Lonesoldier728> well I was thinking of fetching the messages stored on the users collection from the user array
[21:05:35] <Lonesoldier728> and then check each message in the message collection returned for which ones have the second participants id
[21:06:01] <Lonesoldier728> the from I guess I might leave the objectId since it seems confusing to figure out if the user is 0 or 1
[21:06:22] <GothAlice> I do not understand that statement. "messages stored on the users collection from the user array" sounds dangerous—I wouldn't store pretty much anything to do with conversations or messages within the user record itself, except possibly a list of "active conversation" IDs.
[21:07:19] <Lonesoldier728> User Collection {_id: ObjectId, messages: [messageObjectIds], etc. etc.}
[21:08:22] <Lonesoldier728> to link the user to his messages no?
[21:08:25] <GothAlice> Nuuuuuuu… reverse that. Put the user's ObjectId on the message documents; looking up all messages for a user will be trivial then.
[21:08:58] <GothAlice> Were you planning on getting the user document, then db.messages.find({_id: {$in: user.messages}})? After a certain point, that'll just explode.
[21:09:00] <Lonesoldier728> oh so it is better not to have any of the messageIds on the users document
[21:09:37] <GothAlice> Conversation IDs (for the archived conversation data) is acceptable if severely limited to, for example, only the "current active conversations". (I.e. to track which conversation tabs people have open at any given point.)
[21:10:29] <GothAlice> {username: "GothAlice", active: [ObjectId('#mongodb'), ObjectId('##python-friendly'), …]} as an example from IRC, here.
[21:11:51] <GothAlice> One of the big reasons to not do what you were proposing is that the entire list of all message IDs will grow substantially over time. Documents are limited to 16 MB, but there are also limitations on queries. {$in: big_list_of_ids} will require first getting the big list (data transfer), then sending it *back* to MongoDB (more transfer).
[21:12:43] <GothAlice> date_created is already covered by _id.
[21:13:34] <Lonesoldier728> i figured it is better to query 1000 users (grab the 1 user and in his doc grab the 100 message id) and then query the 100 messages, as oppose to 100,000 message ids for the users id in the array of participants no?
[21:14:18] <GothAlice> The _id of the messages are indexed, and the creator ID of the messages will be indexed… this means you can answer the question "what are the IDs of every message posted by user X" with one query that need never touch the collection (only the indexes!) making it insanely fast.
[21:14:31] <GothAlice> This is not a question you should be asking, however, unless you have something really special in mind.
[21:15:46] <GothAlice> Hmm; that didn't sound right. The question "what are all the IDs" is, in general, the wrong question. Asking questions is good, asking the right questions is the hard part. ;)
[21:16:49] <GothAlice> So, my question: how exactly do you need to query your data? What are the use cases? Starting a conversation, sending a new message to a conversation, viewing an old conversation, and getting updates on current conversations?
[21:17:25] <GothAlice> Lonesoldier728: I avoid JS for server-side development like I avoid the plague. In the last two weeks I've seen more obtuse JS driver weirdness in #mongodb than I like.
[21:17:42] <GothAlice> (One ODM, for example, generates completely broken ObjectIds…)
[21:18:15] <GothAlice> (I use WebCore as a web framework and MongoEngine as my ODM of choice. Full disclosure: I'm the author of WebCore.)
[21:18:20] <GothAlice> (And contributor to MongoEngine.)
[21:19:19] <Lonesoldier728> ok so this is the way it should be queried --- Query to grab all of user's friends (which are stored in the User doc in the friends array as {fId, fpic, fname}) - on clicking on the friend takes the friendId and would query all the messages between them
[21:19:52] <Lonesoldier728> between the friendId and myId, if both appear in participants grab it
[21:20:18] <Lonesoldier728> then upon adding a new message, append to the array based on finding the doc with the same two participants
[21:20:26] <Lonesoldier728> un-friending a person removes the whole doc
[21:21:16] <GothAlice> Lonesoldier728: Some terminology tips: on chat systems the list of friends is called a "roster". Also, I'd break conversations apart; like storing the message IDs in the user record, storing every message ever sent between two users in one document will not work well. So if a participant "closes" the conversation, that'd finish that record and start a new one the next time messages between those two friends are exchanged.
[21:22:32] <Lonesoldier728> Storing message ids in user record what do you mean
[21:22:54] <GothAlice> Your prior idea to have {username: "GothAlice", messages=[ObjectId(…), …]}
[21:22:56] <Lonesoldier728> what I said originally of on the user document to have an array of the message ids?
[21:24:55] <Lonesoldier728> have conversation ids?
[21:25:00] <GothAlice> However, people close windows and "leave" conversations. Conversations don't last forever. Any time a user "closes" a conversation, that conversation should be marked as closed, and if a new message is sent between those two users, a new conversation is created.
[21:26:03] <GothAlice> In terms of showing "previous chat history", this becomes quite easy. Simply findOne the "most recent" conversation between the two, and grab ($slice) the last five messages out of it. :)
[21:27:17] <GothAlice> (The size of the slice there can be user configurable; some users don't like seeing old chat history, others only want the last two messages, some want 25 messages…)
[21:27:42] <Lonesoldier728> well I can let them click on a button see more or when they scroll up have it grab more
[21:28:41] <GothAlice> (And if you only allow one active conversation between any two users at a time, a typical approach, you don't need to worry when adding messages about the exact ID of the conversation; simply $push the message into whichever conversation between those users is "active".)
[21:29:07] <GothAlice> Lonesoldier728: Exactly; "show more history" is likewise a simple findOne and another $slice.
[21:29:14] <Lonesoldier728> How do I determine on the server side that the conversation is active/when to close one
[21:29:53] <Lonesoldier728> I mean if the user moves off the conversation screen they can still come back to it a min later and to close it out might be too much no
[21:30:18] <GothAlice> That can become fun; if this is web-based, people can "disappear" from a conversation at any time, and you can't really detect it. (You'd have to use activity timers.) In a more controlled environment, you'd simply have your app (client) ping the server when the user closes the app/tab/etc.
[21:31:34] <GothAlice> Well, then you know when someone closes a conversation. The client app would ping your service to say, "yup, just closed conversation X". You could even notify the other user if you wanted.
[21:31:46] <Lonesoldier728> but even if a user closes a tab and a user comes back 2 min later, wouldn't it be annoying than there is a new convo
[21:32:28] <GothAlice> They've closed the tab. They've ended the convo. In that situation 99.9% of chat systems will show you a dimmed "last few messages from last chat" history, but it's a new convo.
[21:33:11] <GothAlice> If you don't give a visual indication that the historical messages are historical (i.e. like Messages on Mac or Facebook Messages on their website) the user will be none the wiser.
[21:33:21] <GothAlice> (Yes, lying to users about the underlying architecture is A-OK!)
[21:34:07] <Lonesoldier728> yeah I dont care about lying to them, just trying to figure out if 30 docs will be created on a convo that has taken place in 30 min or something
[21:34:22] <Lonesoldier728> and if that makes sense
[21:34:59] <Lonesoldier728> is it wiser to do a daily convo?
[21:35:11] <Lonesoldier728> where if it is the next day it will be a new convo
[21:35:15] <GothAlice> Daily could work… but I can generate 16MB of activity in a day…
[21:35:22] <Lonesoldier728> or if the message count
[21:35:32] <Lonesoldier728> well to take into account also the amount of messages saved
[21:35:42] <Lonesoldier728> so adding a messageCount on the document
[21:36:08] <Lonesoldier728> checking if messageCount is less than 50 then append
[21:36:12] <GothAlice> You'd create a new document each time either side of a conversation is closed. Most people I know leave their chat windows open for a long time. ;) This provides a natural separation. (People will likely want to see the history from their *last* conversation, but far more rarely decide to scroll up further to get even older convos.)
[21:36:14] <Lonesoldier728> if not then start new convo
[21:36:36] <GothAlice> In general you want to reduce the difficulty of adding messages… having to check something first adds a complete round-trip to that.
[21:37:50] <Lonesoldier728> and if closed then active of that convo id goes to false
[21:37:50] <GothAlice> Actually, if you turn that into an "upsert", new conversation documents will even be created completely automatically for you, when needed.
[21:38:27] <GothAlice> Correct; if either side closes the convo, db.conversations.update({active: true, participants: [ObjectId(…), ObjectId(…)]}, {$set: {active: false}}) — also trivial.
[21:38:52] <GothAlice> (Basically every operation in something as light-weight as a messaging app should be a single, atomic operation.)
[21:39:22] <GothAlice> (I.e. that way you can't accidentally deliver a message to an inactive conversation.)
[21:39:24] <Lonesoldier728> so everytime a new convo is added then two queries, one to set the active false, and one to start a new one
[21:39:37] <GothAlice> Very different and much simpler than that.
[21:40:04] <GothAlice> Every time you get a message, upsert. This will either a) add the message to an existing active conversation between those users, or b) automatically create a new conversation between those users.
[21:40:29] <GothAlice> Every time a user closes a chat tab / conversation, update whatever conversation they were in to be inactive.
[21:40:31] <Lonesoldier728> and anytime close happens then just close
[21:46:30] <Lonesoldier728> I was thinking of doing something in the middle where after a month erasing the convos (so recent history is only a month old)
[21:47:18] <GothAlice> Auto-expunging of old data is good. Interesting fact: if you store the conversations server-side as described above, you can use a simple index on the conversations collection to have MongoDB automatically clean up old data. (TTL or "time to live" indexes.)
[21:48:46] <Lonesoldier728> i just ensure an index TTL can I do it on the ObjectID
[21:49:21] <GothAlice> I'd only really bother to store the last 15 or so messages from each conversation locally… that'd greatly speed up rendering of new conversations that include a small amount of history, while keeping the full conversation archive server-side.
[21:49:52] <GothAlice> Lonesoldier728: No. My recommended approach to TTL indexes is to have an explicit "expires" field in your documents that specifies the exact date/time (plus or minus a minute) the record should expire.
[21:50:12] <GothAlice> (Alice's Law #29: Explicit is better than implicit.)
[21:51:28] <GothAlice> This way, each time a message comes in for a conversation, you can $set the expires time to be "now + 30 days" (or whatever the rule is). This will also allow that rule to be customized per-user, rather than having all conversations cleaned up at the same rate.
[21:52:33] <GothAlice> First, that's only an hour. Second, if I'm happily chatting away I'd be unhappy to encounter an error after that hour when the active conversation gets cleaned up. (Active conversations shouldn't be cleaned up this way.)
[21:52:59] <Lonesoldier728> Well was going to be 30 days
[21:53:13] <GothAlice> I.e. basing it on creation time is far less advantageous than basing it on a modification time. Even better, be explicit, and set a field ("expires") dedicated to tracking when the record will be deleted.
[21:54:22] <GothAlice> Lonesoldier728: I keep my chat history forever. Some users don't want more than a week. (Some want no history!) Having a separate "expires" field will allow you to accommodate "how long do you want your history" being a user choice.
[21:54:59] <Lonesoldier728> well the whole point is to get rid of spammers and people that have convos untouched for months
[21:55:45] <GothAlice> Lonesoldier728: Yes. So "never expire old conversations" might not be something you want to offer, but tuning between none, 24h, 7d, or 30d is reasonable.
[21:56:30] <GothAlice> You'd effectively $set the "expires" date/time at the same time you mark it as active: false.
[21:56:53] <GothAlice> (Simply using the creation or modification time as the TTL index means "active" conversations can be deleted while still in use.)
[21:57:29] <GothAlice> Actually, you could use the presence (or lack of presence) of an expiry time *as* the "active" flag.
[21:57:49] <GothAlice> expires: null — same as active: true
[22:02:13] <GothAlice> (That's effectively what you'd run when someone closes a chat tab.)
[22:02:28] <GothAlice> (Instead of the previous $set: {active: false} example.)
[22:02:50] <GothAlice> datetime/timedelta being Python date handling classes.
[22:02:53] <Lonesoldier728> but how do I set TTL on expires?
[22:03:16] <GothAlice> I've already shown you. The time to live (after the date given by the value of "expires") is zero seconds—don't live after that date.
[22:03:48] <GothAlice> The date can then be different for absolutely every single record… and you don't need to write a cron script to do it. :)
[22:04:03] <Lonesoldier728> so setting it like this db.conversations.ensureIndex( { "expires": 1 }, { expireAfterSeconds: 0 } ) means that once expires is the same time as the current time
[22:04:08] <Lonesoldier728> then it will be effective
[22:04:16] <GothAlice> (Within a minute or so, yeah.)
[22:04:38] <GothAlice> MongoDB does "garbage collection" runs on TTL indexes once a minute.
[22:06:41] <Lonesoldier728> datetime.utcnow()+timedelta(days=7) this is the formula for mongodb
[22:06:53] <GothAlice> No, that's the magical incantation for Python.
[22:07:13] <GothAlice> You'd have to determine how to calculate the date/time using your own platform's tools.
[22:07:18] <Lonesoldier728> ah kk yeah because I never saw it
[22:08:02] <Lonesoldier728> the problem can be though on convos due to one person wanting 30 days vs another wanting 7 days
[22:08:15] <Lonesoldier728> so that is why letting them choose will be a bad idea no?
[22:08:32] <GothAlice> The safe (and secure! I always think about security on things like this) is to go with the minimum of the two participant's expiry rules.
[22:09:11] <GothAlice> So if one asks for 7 days, and another asks for 30, nuke it in 7. The one asking for 30 might be unhappy with that, but the other user asked explicitly for you to not keep data involving him longer than that.
[22:11:56] <GothAlice> One of the problems I see with chat systems is that there are already so goram many of them, and most of the chat apps are really terrible. If you're going to roll Yet Another Chat Application™, it'd be a marketing thing (let alone a technical quality and pride thing) to do a kick-ass job. Taking security rules into consideration is one of those things that goes the extra mile.
[22:12:04] <GothAlice> (And with MongoDB TTL, it's trivial to do!)
[22:12:30] <Lonesoldier728> yeah my focus is not on the chat aspect
[22:12:37] <Lonesoldier728> the chat aspect is a side social feature
[22:13:23] <Lonesoldier728> but yeah if it helps reduce future costs at the same time def worth it heh
[22:14:05] <GothAlice> (Most people who say "I need a chat system" I point squarely at XMPP. XMPP "solved" chat, and anything less is usually awful. Reference: Steam's chat is UDP messaging wrapped in TCP… reliable message delivery is an issue. At least Tabletop Simulator uses IRC for chat! ;)
[22:15:05] <Lonesoldier728> is there a way to detect a person sending millions of messages at once btw
[22:15:12] <GothAlice> That's called rate limiting.
[22:15:14] <Lonesoldier728> have you ever encountered something of that nature
[22:15:19] <GothAlice> And there's an entire sub-industry dedicated to it.
[22:15:46] <Lonesoldier728> is it something you think I need to worry about or not till the app takes off
[22:16:11] <GothAlice> You can rate limit at several points: you can program your app to not enable the "send" button more than once a second. You can program your front-end API load balancer to not accept certain messages more than X times in Y period, with a "pool size" of Z. (Nginx rate limiting.)
[22:16:33] <GothAlice> You could also do firewall rate limiting, and other things.
[22:17:14] <GothAlice> Initially I wouldn't worry. Simple firewall-level rate limits will make application requests fail in gruesome ways, but this may be acceptable, especially with a sufficiently large "pool size" in the rate limiter.
[22:17:24] <GothAlice> (It's also the easiest to set up.)
[22:18:50] <GothAlice> My SSH connections, for example, are rate limited as 2/5:5 — add two connections to the pool every five minutes, starting with and having a pool size of 5. This means I can try five connections, but then I'll have to wait five minutes… and will only get two more chances until five minutes after that, etc.
[22:20:11] <Lonesoldier728> Yeah is there any security problems I need to watch out for through message sending as well?
[22:20:26] <GothAlice> Injection attacks if you're using WebKit to render the chat interface…
[22:20:45] <Lonesoldier728> well it will just be mobile apps native (ios and android)
[22:20:50] <GothAlice> (The demo chat interface I showed you doesn't protect against anything at all… you saw how just sending an empty message screwed up the display. ;)
[22:21:01] <GothAlice> "Native" often also means HTML.
[22:25:21] <quuxman> they're both properties of records. A compound index won't solve that?
[22:25:34] <Lonesoldier728> so how would I go about that
[22:25:45] <GothAlice> quuxman: If that's the case, I'm not sure how you're formulating the query at all. Could you gist/pastebin your real query?
[22:26:07] <GothAlice> Lonesoldier728: Well, the link I gave you covers the Android side getting started with that.
[22:26:16] <Lonesoldier728> right but I mean security problems with it
[22:26:48] <GothAlice> Lonesoldier728: https://github.com/amcgregor/syntax-alpha is an example of my own (old) using WebKit to render syntax highlighting of source code.
[22:27:07] <quuxman> GothAlice: I'm using {"$where": "this.snapshot_time < this.updated"} along with a bunch of other conditions
[22:27:38] <GothAlice> Lonesoldier728: I'd Google around for "HTML injection attack", XSS, and other security related things. Escaping user input (not trusting anyone) is the starting point.
[22:27:52] <quuxman> but that's the only condition I could use to constrain the results
[22:27:55] <GothAlice> quuxman: Ah, $where is unoptimized and must be evaluated against each record in the intermediate result set. No indexes for you.
[22:28:11] <GothAlice> (It's also JavaScript, and requires spinning up V8 for the query.)
[22:28:14] <quuxman> That's what I figured, but can I reformulate this to not use $where and to use an index?
[22:28:38] <quuxman> I don't know how to compare one record property with another without using $where
[22:29:06] <GothAlice> Uum; your'e comparing one record with itself.
[22:29:30] <Lonesoldier728> thanks GothAlice for all the help
[22:29:40] <GothAlice> This means you have an opportunity, in this example, of any time you update snapshot_time or updated to set a value that caches the result of that boolean expression.
[22:29:55] <GothAlice> Lonesoldier728: No worries! I hope I was actually helpful and not just confusing. ^_^
[22:30:16] <Lonesoldier728> hehe yeah you opened up a lot of things I did not take into account
[22:30:17] <GothAlice> quuxman: You could then index that pre-calculated boolean field and everything will be blazing fast again.
[22:30:21] <quuxman> Good point, then I could simply index the boolean
[22:30:53] <GothAlice> Sometimes MongoDB requires solutions that are too obvious.
[22:31:59] <GothAlice> quuxman: Also remember that MongoDB can only make use of one index at a time: coming up with compound indexes in the right arrangement is critical to complex queries.
[22:32:26] <quuxman> simply adding this boolean will be fine. It will cut down the results to 10 or so
[22:32:29] <GothAlice> quuxman: I use Dex (https://github.com/mongolab/dex) to work out my indexes for me. :3
[22:45:13] <GothAlice> It never hurts to help. :) Enjoy the foods!
[23:23:39] <uptownhr> is it possible to create a document with duplicate field/key names?
[23:23:50] <uptownhr> i just ran into a scenario where this is happening
[23:24:15] <GothAlice> uptownhr: It's not possible to have the same key twice, but it is possible to have the value of the key be another compound type, like a list.
[23:29:24] <uptownhr> this should be stopped at the driver then
[23:34:02] <GothAlice> Python's dictionary keys are hashed, so yeah… not normally a problem in that language.
[23:36:08] <Boomtime> uptownhr: it is usually stopped at the driver
[23:36:52] <Boomtime> please note the Ruby driver does this too, only if the defences are switched off (like by an evil ORM) or the presence of a bug permits what you are seeing
[23:37:50] <GothAlice> uptownhr: If you were able to reproduce this using your ODM, I'd recommend submitting a bug with them. Such unexpected behaviour should not happen by chance.