pmxbot IRC Log Viewer

[00:20:27] <stefandxm> GothAlice: what happened to better driver documentation for c++?

[00:20:59] <GothAlice> Quite handily that isn't one of my responsibilities. (For two reasons: first, I don't use C++ except for kernel development, and secondly I don't work for MongoDB. ;)

[00:21:26] <GothAlice> Despite appearances to the contrary on that latter point. >:P

[00:22:05] <stefandxm> you are a bot from mongo

[00:22:26] <stefandxm> which yes. would prove you not use c++ driver

[00:22:37] <stefandxm> since its something mongodb doesnt wanna acknowledge they ever had lol

[00:24:32] <stefandxm> i love the "mongo calls me every 3rd day" to "never" since i requested c++ help :) (thats what i ever said)

[00:26:01] <GothAlice> Admittedly, except for certain highly specialized data processing tasks, it's almost universally easier (in terms of ease, development time/cost, produced SLoC, testing cycle times, and development turnaround time) to use a scripting language for interaction with document-based storage systems.

[00:26:22] <stefandxm> ehrm

[00:26:27] <stefandxm> i dont do that

[00:26:29] <GothAlice> Flexible type systems and descriptive deep object access are two things that make my C++ code look like spaghetti.

[00:26:43] <stefandxm> i guess your c++ sucks

[00:26:56] <stefandxm> but no worries, so does mongodb c++s code ;-)

[00:26:57] <GothAlice> (I've seen Java code pulling from MongoDB… the strict typing really uglifies the code.)

[00:27:37] <stefandxm> i just want updated c++ documentation

[00:27:41] <GothAlice> My C++ is a severely restricted subset, since I had to write libc, libc++, etc. myself from scratch. ;P

[00:27:51] <stefandxm> so have i

[00:27:55] <stefandxm> more than 20 times

[00:28:10] <GothAlice> Rewritten libc?

[00:28:21] <stefandxm> no

[00:28:23] <stefandxm> made my own stubs

[00:28:42] <stefandxm> i used to compete with small binaries shizzle

[00:29:15] <stefandxm> watcom 10 rings a bell? :)

[00:29:16] <GothAlice> ^_^ As did I. Mine usually turned from C++ to assembly, then hand-optimizing the assembly. Darn embedded systems.

[00:29:25] <stefandxm> yeap

[00:29:49] <stefandxm> sounds like a proper background story for a bot :)

[00:30:18] <GothAlice> (Discovered some interesting things about the host platforms; the software floating point lib in ROM required two long jumps, the first into a redirection table. It was faster to embed the needed routines in the app for performance, and still managed to squeeze it into 256 bytes. ;)

[00:31:39] <stefandxm> "software floating point lib in ROM "

[00:31:48] <GothAlice> That one was for a demo competition on PalmOS.

[00:32:02] <stefandxm> never touched palm

[00:32:03] <GothAlice> (320x320 animated mandelbrot was my submission. Got 3rd.)

[00:32:07] <stefandxm> my collegue did

[00:32:20] <stefandxm> i guess i should be glad? still sounds like a flackerfuckerup

[00:32:38] <GothAlice> The weird blending Palm did between ARM and m68k was mind-bending.

[00:32:52] <stefandxm> motorola i do am familair with

[00:33:01] <stefandxm> mostly because of amiga tho

[00:33:15] <stefandxm> but did some industry cpu crap tho

[00:33:25] <stefandxm> not really

[00:33:28] <stefandxm> its still a mindfuck

[00:33:33] <stefandxm> mostly because the memory speed

[00:33:45] <GothAlice> Well, yeah, and the severe penalty on long jumps (as mentioned ;)

[00:34:35] <stefandxm> i dont remember making long jumps :o

[00:34:47] <stefandxm> luckily my amigas are sold =)

[00:37:21] <stefandxm> hmm

[00:37:31] <stefandxm> last time i coded on a mac classic i used codewarrior

[00:37:43] <GothAlice> Yup! Same for lots of PalmOS development in the early days before GCC was an option.

[00:38:04] <GothAlice> ThinkPascal was my environment of choice on Classic, though.

[00:38:10] <stefandxm> i think last mac i had that ran "classic" was a quadra

[00:38:20] <GothAlice> q700 = best classic mac ever.

[00:38:27] <stefandxm> or well "me" ie a "company i fixed shit for"

[00:38:51] <stefandxm> too young to buy that hardware myself :o

[00:39:02] <GothAlice> Well, yeah, the classic macs were my father's.

[00:39:59] <stefandxm> never did any graphics programming on that mac tho

[00:40:09] <stefandxm> i did make a software to scarve stuff :o

[00:40:11] <stefandxm> with a plotter

[00:40:14] <stefandxm> (with knife)

[00:40:17] <GothAlice> Nice. :D

[00:40:24] <stefandxm> scsi shizzle

[00:40:26] <GothAlice> I built my first plotter out of lego…

[00:40:30] <GothAlice> ¬_¬

[00:40:41] <stefandxm> sounds like a uni project

[00:40:58] <GothAlice> I was 12? Parallel port controlled Technix pack. :)

[00:41:06] <stefandxm> ok

[00:41:10] <stefandxm> so no PID controllers then

[00:41:16] <GothAlice> No… no controllers at all, actually.

[00:41:21] <stefandxm> at least not hand rolled

[00:41:37] <stefandxm> i wish all bots where like you

[00:42:05] <stefandxm> you should join us at #c++ and spread your wisdom :)

[00:42:11] <GothAlice> It was a motor unit with a button on it. Disassemble, replace button with two diode-protected relays in opposite polarities, connect to the parport. Bam, really crappy plotter control. :)

[00:43:31] <stefandxm> no really

[00:43:39] <stefandxm> you would realize why mongodb c++ driver sucks

[00:43:42] <GothAlice> XD

[00:43:50] <stefandxm> in say 5 days

[00:43:54] <stefandxm> (thats a challenge)

[00:44:07] <stefandxm> (i will buy a beer at some mongodb world..)

[00:47:28] <stefandxm> of course i wont interfer(language on that one? +e?)

[00:47:37] <GothAlice> ?

[00:47:49] <stefandxm> meaning

[00:47:55] <stefandxm> i wont say anything :)

[00:48:00] <GothAlice> :P

[00:48:14] <stefandxm> you will (yourself) have to grasp if anything is not compatible with mongodb driver (c++)

[00:48:38] <GothAlice> I'll spend some time reading through the driver's code this week-end.

[00:48:39] <stefandxm> so

[00:48:51] <stefandxm> Nov 20

[00:48:53] <stefandxm> :)

[00:49:00] <stefandxm> we settle this

[00:49:17] <GothAlice> stefandxm: Link to your JIRA issue (if you have one) for your C++ issue?

[00:49:30] <stefandxm> and if you come to sweden and/or norway it would be alot easier than me going to new york again ;-)

[00:49:42] <stefandxm> lol

[00:49:46] <stefandxm> c++ + jira

[00:49:48] <GothAlice> XD I've got buddies I could visit in Finland on the same trip.

[00:49:52] <stefandxm> was that a joke? ;)

[00:50:05] <stefandxm> i just want better docs

[00:50:21] <GothAlice> :P

[00:50:28] <GothAlice> There's gotta be an existing ticket for that.

[00:50:34] <stefandxm> make one

[00:50:42] <stefandxm> i dont use JIRA even at my company

[00:50:46] <GothAlice> I'm retentive about not duplicating issues.

[00:50:53] <GothAlice> stefandxm: Remember, I don't work for MongoDB. ;P

[00:50:54] <stefandxm> i am sure as hell not making one for a third party library lol :)

[00:51:04] <stefandxm> i dont believe you (yet) ;)

[00:51:25] <stefandxm> if you come to oslo or sweden i could prolly verify it ;)

[00:52:26] <GothAlice> joannac: Could you confirm or deny if I'm a current employee of MongoDB?

[00:52:52] <GothAlice> joannac: ;^P

[00:53:09] <stefandxm> maybe i have joanna on ignore? iam not seeing anything ;-)

[00:53:13] <stefandxm> and why cares

[00:53:33] <GothAlice> stefandxm: I haven't quite narrowed down her hours yet. ;P

[00:53:36] <stefandxm> what do not pussle me is your 24h activity. meaning you are a bot ;-)

[00:53:41] <stefandxm> but then again

[00:53:44] <stefandxm> that does not bother me

[00:53:49] <GothAlice> I sleep on Sunday. The markets are closed.

[00:53:55] <stefandxm> since if anyone care to represent you.. i am sure it will be a nice one :)

[00:54:42] <joannac> GothAlice is not a current employee of MongoDB

[00:55:01] <stefandxm> gotta love that current -)

[00:55:07] <stefandxm> surely

[00:55:16] <GothAlice> Who knows where the future will take me, eh/

[00:55:17] <stefandxm> can joannac acknowledge that?

[00:55:28] <GothAlice> (I'll just state that I haven't ever been an employee, either.)

[00:57:54] <stefandxm> :D

[00:58:02] <stefandxm> this channel is dead tho

[00:58:23] <GothAlice> Fits and bursts of on-topic activity, but not dead by any measure.

[00:58:30] <stefandxm> it is

[00:58:37] <stefandxm> ive been idle here for months

[00:58:41] <stefandxm> this channel is dead :)

[00:59:11] <stefandxm> but then again

[00:59:24] <stefandxm> an irc channel where noone discusses *drivers*

[00:59:25] <stefandxm> hey

[00:59:41] <stefandxm> "well.. was a good idea you know.. cats are interesting*

[00:59:53] <stefandxm> *an donkeys.. donkeys are the new stuff....

[00:59:56] <GothAlice> Actually, in the last two weeks there has been a substantial amount of driver discussion; mostly JS and some Python, though. A bit of Java.

[01:00:05] <stefandxm> can we make them mock @ transactions?

[01:00:12] <stefandxm> yeah .. maybe.. who cares? *"

[01:00:54] <stefandxm> i dont want transactions in mongodb :)

[01:01:57] <GothAlice> And I've learned from my kernel work, you can build almost any higher-level locking or transaction behaviour using update-if-not-modified (compare-and-swap).

[01:02:23] <stefandxm> its a naive approach

[01:02:32] <stefandxm> global lockings are still global lockings :(

[01:02:34] <stefandxm> but yeah

[01:02:36] <GothAlice> But a pragmatic "good enough" in most scenarios.

[01:02:38] <GothAlice> E-gads, not global locking.

[01:02:43] <stefandxm> yeap

[01:03:09] <stefandxm> if you come to sweden to or norway id be happy to show you our data system

[01:08:08] <GothAlice> stefandxm: I'll add that to the tasks to perform in that country. Next time vacation comes up, I'll see if Sweden or Norway ping. :)

[01:08:51] <stefandxm> we have monogdb world in oslo soon i think :)

[01:08:55] <GothAlice> Ooh.

[01:09:05] <stefandxm> not 100% sure

[01:09:11] <stefandxm> i dont read spam that often ;-)

[01:10:17] <GothAlice> Usually things like spam with variable substitution turned off. Hello {NAME}! Your {PRODUCT} is ready for delivery!

[01:10:20] <GothAlice> XP

[01:11:11] <GothAlice> (I haven't deleted an e-mail since 2000.)

[01:11:39] <stefandxm> same

[01:11:45] <stefandxm> just.. some years back :)

[01:12:03] <GothAlice> 2000/2001 was when I got the idea to record literally everything I ever do on a computer.

[01:12:19] <stefandxm> i never had that idead

[01:12:25] <stefandxm> i just.. i had my own imap server

[01:12:31] <GothAlice> … and now I have server racks in a spare bedroom and cooling problems. ;)

[01:12:39] <stefandxm> and removing emails seemd a bit.. yesterday?

[01:14:54] <GothAlice> Well, mine also combines with a legally provable/verifiable cryptographic audit trail to detect tampering in the data. ¬_¬ Same with photos from my DLSR, too, amusingly.

[01:15:48] <stefandxm> are you questioning my imap servers security?

[01:15:51] <GothAlice> Yes.

[01:15:55] <GothAlice> :P

[01:15:56] <stefandxm> nice

[01:16:02] <stefandxm> what is it?

[01:16:31] <GothAlice> Dovecot + EXIM, with a metric ton of plugins.

[01:16:37] <stefandxm> ?

[01:16:43] <stefandxm> what is the host?

[01:16:47] <GothAlice> … me?

[01:16:53] <stefandxm> no you dont get it

[01:16:57] <stefandxm> what is my server.. hostname?

[01:17:04] <GothAlice> Pshaw, I don't care. ;P

[01:17:12] <stefandxm> i dont get it

[01:17:22] <stefandxm> afaik i never told you my email :o

[01:17:27] <stefandxm> yet you say its insecure

[01:17:33] <stefandxm> you must be insane hacker

[01:17:40] <GothAlice> 99% of mail services out there are weak in comparison to crypto trail systems, it's a safe bet I can send you e-mail as Bob Dole or Bill Gates. ;P

[01:18:39] <stefandxm> ok?

[01:18:43] <GothAlice> (I used to give presentations on information security… the first five minutes of the presentation was designed to scare the crap out of the audience. I'd demonstrate hijacking the audience's social network accounts if any were so bold as to check during the presentation. ;)

[01:18:59] <stefandxm> ok

[01:19:10] <GothAlice> But usually the first step was getting a volunteer, and sending them mail from famous e-mail addresses.

[01:19:50] <stefandxm> i gave you my email in privchat

[01:22:45] <GothAlice> stefandxm: Did you get the (really badly formatted test mail) I sent from Bill Gates?

[01:23:14] <GothAlice> stefandxm: http://cl.ly/image/2J282l223C3i — Doing this should not be possible. (The message should be instantly rejected as being from an unauthorized EHLO host.)

[01:23:33] <kba> Nowadays, most providers (like Gmail) will tell you that the server address doesn't fit the email

[01:23:57] <kba> so sending out fake emails is s

[01:24:07] <GothAlice> (microsoft.com has a SPF record that doesn't mention me: "v=spf1 include:_spf-a.microsoft.com include:_spf-b.microsoft.com include:_spf-c.microsoft.com include:_spf-ssg-a.microsoft.com include:spf-a.hotmail.com ip4:147.243.128.24 ip4:147.243.128.26 ip4:147.243.128.25 ip4:147.243.1.47 ip4:147.243.1.48 -all")

[01:24:22] <kba> is easily detected, in fact Gmail has a yellow bar saying something like "This might not come from who you think it comes from"

[01:24:32] <GothAlice> kba: Yeah, <3 Google for that.

[01:25:15] <GothAlice> (Note the -all at the end. If it were ~all, then other hosts would be acceptable with a spam score penalty.)

[01:25:19] <GothAlice> stefandxm: ^

[01:25:23] <stefandxm> GothAlice: yes?

[01:25:57] <stefandxm> with warning bad header

[01:25:58] <stefandxm> ?

[01:26:31] <GothAlice> stefandxm: Yeah, I left out Date:, Sender:, From: and other rather important headers.

[01:26:40] <GothAlice> stefandxm: T'was a quick test.

[01:26:47] <GothAlice> (Which failed.)

[01:26:48] <stefandxm> ?

[01:26:53] <stefandxm> from is a good give away

[01:28:16] <GothAlice> If I bothered to spin up my mail framework and send a properly formatted message, it'd work a bit better, but I was lazy.

[01:28:29] <stefandxm> mkai?

[01:28:50] <stefandxm> so you think my email box is "weak" because i "receive emails over smtp" ?

[01:28:54] <kba> stefandxm: what was wrong with the from?

[01:29:07] <GothAlice> kba: I omitted it from my message.

[01:29:20] <stefandxm> kba: from wrong country

[01:29:23] <kba> Not according to the picture

[01:29:43] <GothAlice> stefandxm: No, because it accepts a) clearly invalid messages, and b) accepts messages from hosts (for e-mail addresses) it should not.

[01:29:44] <stefandxm> can i paste the entire mail?

[01:30:01] <GothAlice> stefandxm: Gist.

[01:30:10] <GothAlice> gist.github.com

[01:30:17] <stefandxm> will do

[01:30:20] <stefandxm> just

[01:30:36] <stefandxm> iam not admin of this email server so *if* it is a securty breach i wont want it to go open

[01:30:48] <stefandxm> (i mean. we had 3 sever breaches already this year lol)

[01:30:51] <GothAlice> It's not technically a breach, it just opens you to phishing attacks.

[01:46:48] <unholycrab> > rs.conf()

[01:46:49] <unholycrab> null

[01:46:57] <unholycrab> :r

[01:47:02] <joannac> there's no replica set config

[01:47:12] <GothAlice> unholycrab: Yup; you're not running a replica set.

[01:47:17] <unholycrab> ah okay

[02:02:36] <stefandxm> a bit of a warning kba and GothAlice are quite good fellahs. dont trust anyone of them without a vs b etc. they are quite sneaky :) if you want any more info email me! stefan@skogome.net

[02:02:56] <kba> what on earth are you talking about?

[02:02:59] <GothAlice> kba: http://cl.ly/image/1I1A46072r2U

[02:03:31] <kba> yeah, I just had a seemingly friendly chat with him about shellshock, until he became incredibly defensive

[02:03:43] <kba> suggested I wasn't very old and ended up saying he'd put me on ignore

[02:03:44] <GothAlice> kba: Apparently expiring screenshots that involve e-mail addresses that are public knowledge (whois record stuff) warrants that kind of response.

[02:03:55] <stefandxm> feel free to paste any chats

[02:04:08] <kba> GothAlice: from a guy like that, I understand why he'd want to hide as much of his identity as possible

[02:04:14] <stefandxm> but for good manners; cc them to me :)

[02:04:15] <stefandxm> bye

[02:04:24] <kba> I've never experienced anybody act out like that on irc

[02:04:34] <GothAlice> kba: Yeah… first time in about three months for me.

[02:04:48] <kba> "02:56:09 stefandxm you must be very young?"

[02:04:51] <kba> "02:59:26 stefandxm ll put you on ignore. youre not even entertaining :)"

[02:04:53] <kba> etc.

[02:04:55] <kba> nice guy

[02:08:51] <kba> he's back, great

[02:09:50] <GothAlice> kba: Probably noticed my sudden profound lack of interest in C++.

[02:14:34] <cheeser> what the hell was all that about?

[02:14:40] <stefandxm> nice GothAlice

[02:14:45] <kba> no idea, cheeser

[02:14:59] <stefandxm> suddenly all spam pornhub suff :)

[02:15:24] <stefandxm> cheeser: basically GothAlice & kba spamming crap :)

[02:15:34] <kba> we are spamming crap?

[02:15:37] <cheeser> i ... find that hard to believe

[02:15:42] <stefandxm> riight

[02:16:11] <GothAlice> cheeser: I was attempting to help diagnose his mail server setup. I was attempting to explain why the various deficiencies I found were important. Then, bam, irate gibberish.

[02:16:39] <GothAlice> cheeser: My contribution has been a total of one insanely badly formatted message from Bill Gates. ;)

[02:16:45] <stefandxm> by helping we got new domains to block for spam.. i guess thats a good one? :)

[02:16:59] <stefandxm> GothAlice: that is bullshit.

[02:17:26] <Boomtime> you guys know this is #mongodb right? are you in the right channel?

[02:17:37] <stefandxm> Boomtime: i thought i were

[02:18:00] <stefandxm> Boomtime: and yes ive filed for official logs/complaints.

[02:18:20] <cheeser> great. that's all resolved. let's not discuss it here.

[02:18:41] <Boomtime> indeed, excellent

[02:18:45] <joannac> What cheeser said

[02:18:47] <stefandxm> +1

[02:54:07] <user123321> In replication, when the master DB goes down, and until a slave is promoted to a master, what happens to the DB client requests during this transition period?

[02:56:39] <cheeser> probably rejected unless you have slaveOK set

[02:58:03] <Boomtime> user123321: note slaveOk only permits reads/queries no writes can occur without a primary

[02:59:19] <user123321> I see. Is there a way to make sure that no data get lost?

[02:59:41] <user123321> Or at least, I don't mind if the writes get a little bit delayed like very few ms.

[02:59:42] <joannac> have write concern of "majority"?

[03:02:15] <GothAlice> Majority is the safest (esp. if you also ask for journal commit), but also slowest.

[03:03:00] <GothAlice> user123321: http://docs.mongodb.org/manual/core/write-concern/ — For more information.

[03:07:07] <GothAlice> user123321: http://www.nonfunctionalarchitect.com/2014/06/mongodb-write-concern-performance/ — has many good charts, and he makes certain to point out these are mostly relative comparisons, different hardware will act differently.

[03:07:28] <stefandxm> GothAlice: you owe me a beer.

[03:18:02] <user123321> Cool, thanks.

[03:27:18] <user123321> joannac, GothAlice can't I have more than 1 master so that bith will get written.

[03:27:24] <user123321> both*

[03:27:41] <user123321> if 1 goes down, at least the other has written data.

[03:29:02] <GothAlice> user123321: You can't exactly have two primaries, but you can have a write concern that requires the data to be written to the primary and at least one secondary to be considered "written".

[03:29:21] <GothAlice> user123321: For details: http://docs.mongodb.org/manual/core/replica-set-write-concern/ — you can specify how many you want explicitly, or simply say "majority" (useful in larger sets).

[03:29:36] <user123321> Cool

[03:30:05] <user123321> GothAlice, Does it mean I have to change the application code as well?

[03:30:35] <GothAlice> user123321: To specify the write concern, yes. You can do this at the connection level if you wish, though. (Depending on driver.)

[03:31:14] <user123321> Cool, thanks.

[03:31:42] <GothAlice> I have a default, then override on certain queries, i.e. to set logging inserts to "unacknowledged" for performance.

[03:32:25] <user123321> I see

[07:46:47] <joel_tux> hello, pulling my hairs here. I’m trying to fetch a deeply nested hash on a document in a collection with mongoose, and it just keeps abbreviating the nested parts to [Object], time to switch to a more basic driver like mongoskin or what?

[09:18:41] <Hypfer> hi

[09:18:46] <Hypfer> this is what I have:

[09:19:46] <Hypfer> { id: 1234 subobjects: [{id:567, param:something,...}, {id....}]

[09:20:13] <Hypfer> I want to upsert params in some but not all subobjects

[09:20:23] <Hypfer> but since they are in an array it doesnt quite work

[09:20:31] <Hypfer> I cannot change the data scheme

[10:29:24] <ncls> hi all, I don't get it : I'm running mongoimports on a simple csv file with a line like "lastname;firstname;age" and two lines like "foo;bar;26" but each line is recorded as : "lastname;firstname;age" : "foo;bar;26"

[10:29:48] <ncls> here is my command : mongoimport --db mongoimports --collection test --type csv --headerline --file test.csv

[10:29:58] <ncls> I'm on Windows 7, and I'm running Mongodb 2.6.5

[10:32:46] <ncls> oh okay ... it really needs a coma "," and not a ";"

[10:33:05] <ncls> but it's weird because I think I had the same issue with a tsv file ...

[12:03:02] <drp> hi, does anyone know if there is sourcecode available for mongotop ?

[12:17:46] <fatgeekuk> Folks, I have a question. (sorry to just jump into channel and fire away) : the mongodb documentation uses an object to define indexes and states that the order of the keys is significant, however, js documentation states that objects do not maintain the order of their properties. wos goin on?

[12:58:14] <someotherdev> Hi all,

[12:58:36] <someotherdev> I have some specific questions about some MongoDb optimisations. Does anyone have a minute to spare?

[13:00:32] <joannac> you'd be better off just asking

[13:02:25] <someotherdev> So, I have a document collection blogs {id, author, section} which I need to query. After which, I need to populate the author and section. Which I do via sub-queries. I then need to populate the number of votes and comments (which I am storing in another collection). This is taking 700MS to query and I was looking at ways to reduce this. Any ideas?

[13:03:02] <someotherdev> Using mongoose populate to do the sub-queries

[13:03:37] <someotherdev> I page pulling in 20 records at all

[13:03:47] <someotherdev> the blog collection has a few thousand documents

[13:05:04] <joannac> stop distributing your data across multiple collections?

[13:05:12] <joannac> create indexes?

[13:07:47] <someotherdev> well, blogs is the wrong word. It's articles not sure why I used blogs. Anyway, given the traffic amount it's likely to rise by a thousand article per month

[13:07:59] <someotherdev> So keeping it in the same collection would be a bad idea.

[13:08:05] <joannac> why?

[13:08:30] <someotherdev> All the stats within the articles document?

[13:09:07] <someotherdev> because the stats about each article are in the thousands

[13:09:22] <someotherdev> and that would mean writing to the article each time someone views the article

[13:11:42] <joannac> well, you're still writing every time you do a read, just in another collection

[13:11:58] <joannac> which you then need to read to populate your article

[13:12:00] <joannac> right?

[13:12:41] <someotherdev> correct

[13:13:02] <joannac> so, i don't see the difference

[13:13:25] <joannac> put all the stats in one document

[13:13:31] <joannac> one read -> one write to update stats

[13:13:33] <someotherdev> imagine that a user is editing a published article. I don't want to have that written to at the same time

[13:14:15] <joannac> your way: one read -> many other reads -> a write into another collection

[13:14:27] <joannac> your read is going to turn into a write in both cases

[13:14:37] <joannac> um, why?

[13:15:07] <joannac> what the user is editing is not going to be statistics

[13:16:05] <joannac> and if they are (i.e. resetting their read count), then their edit should win anyway, and it doesn't matter that intermediate edits are lost

[13:18:11] <someotherdev> You have a good point. Well, I inherited this data structure - which is already live. Not sure migrating everything would be productive

[13:18:32] <someotherdev> and things such as comments, would you also store them in the same document?

[13:18:40] <joannac> depends how large they are

[13:18:46] <joannac> and how you want to use them

[13:18:53] <joannac> remmenber the 16MB limit

[13:19:05] <someotherdev> 16MB is a lot

[13:19:51] <cheeser> comments should go in a separate collection

[13:19:53] <someotherdev> regardless, surely storing the records is going to increase the risk of the 16MB?

[13:20:04] <someotherdev> which are the most frequent of events

[13:20:12] <joannac> cheeser: depends! :p

[13:20:13] <someotherdev> which have a type, ip, user etc

[13:20:24] <cheeser> remember it's not just the 16M limit. as your document grows, it might have to be moved on disk and that takes time.

[13:20:36] <someotherdev> ahh

[13:20:39] <joannac> I would store the last X comments, and put the rest elsewhere

[13:20:41] <cheeser> joannac: having worked for a CMS company, i wouldn't do it any other way.

[13:20:57] <cheeser> it's a common problem

[13:21:02] <joannac> someotherdev: think about it this way: most of the time, what do my users care about?

[13:21:20] <someotherdev> consuming content asap

[13:21:31] <joannac> they are probably not going to care about comments more than 100 deep if there are 500 comments

[13:21:38] <joannac> what kind of content?

[13:21:49] <someotherdev> articles, user content generated

[13:22:02] <someotherdev> think of it as creative comments journalism

[13:22:03] <joannac> right

[13:22:18] <joannac> so they'll care about the article, the comments, who wrote the comments, and maybe votes

[13:22:34] <someotherdev> Votes, certainly, very important

[13:22:36] <someotherdev> because it

[13:22:44] <someotherdev> it trends everything to their needs

[13:22:56] <joannac> right

[13:23:15] <joannac> make your documents such that you're retrieving what the user wants, and that's all

[13:23:20] <joannac> not too little, not too much

[13:24:28] <someotherdev> okay, that's what I am basically trying to do. It just so happens that everything is fragment e.g. author, section and records (by far the biggest data)

[13:24:42] <joannac> defragment

[13:25:01] <joannac> you are not going to get good performance if you have to include many round trips to the server

[13:25:10] <someotherdev> Yeah, agreed

[13:25:25] <someotherdev> so you would store the stats/records within the story document?

[13:26:35] <joannac> yup

[13:26:42] <someotherdev> despite the 16mb limit?

[13:26:48] <joannac> anything that needs to be seen together, should be in the same document if possible

[13:27:01] <joannac> are you going to hit the 16mb limit?

[13:27:08] <someotherdev> anything unique to the document, surely?

[13:27:14] <someotherdev> Let me check the current stats

[13:29:30] <someotherdev> okay, so highest viewed article in one month has 5mb of stats. We are expecting lots of growth in the coming months due to funding etc

[13:30:04] <someotherdev> to be fair, it's logging so much stuff it doesn't need. Can probably reduce that by more than half

[13:30:12] <cheeser> what we did at my last gig was to store things separately (what consitutued each bit is a different discussion) then on each request, we'd check couchbase for the cached version of the fina page. if it wasn't there, we'd hit mongodb and assemble all the parts.

[13:30:23] <cheeser> then we'd just serve out of couchbase until the page expired.

[13:30:45] <cheeser> if something changed on that page (new comment, e.g.) we'd invalidate that cache entry and rebuild it on the next request.

[13:32:28] <someotherdev> That's an awesome idea. I haven't included caching yet as it's my last resort. I am planning to do it however, I just want to optimise the queries first.

[13:33:27] <someotherdev> is there a particular feature of mongo that may assist this? e.g. map reduce

[13:33:52] <cheeser> not really, no

[13:34:01] <someotherdev> damn :)

[13:34:05] <cheeser> query, assemble, cache, serve

[13:34:07] <cheeser> it's not hard

[13:35:26] <ncls> in an aggregation pipeline, in a $group operation, how can I perform the $concat operator for each field of my objects, without naming them one by one

[13:35:27] <ncls> ?

[13:35:31] <someotherdev> Thanks, still new to Mongo so trying to figure out what's the best deal. Though I think for the stats I may serve this via the article document as suggested. Just need to be careful with the 16mb limit. However, 16MB should be enough

[13:36:46] <cheeser> 16M is a lot of stats

[13:39:40] <someotherdev> yeah, it is

[13:40:19] <ncls> ok did it with mapReduce

[13:40:30] <fatgeekuk> why not make your viewer stats someone elses problem? use custom dimensions in GA and us the GA api? unless of course you are displaying user read stats on the normal page views (not just reporting pages)

[13:40:56] <someotherdev> the users can see their stats

[13:41:04] <someotherdev> they have a page dedicated to the stats

[13:41:05] <fatgeekuk> in that case, ignore me. :-)

[13:41:37] <fatgeekuk> oh?! right, well if it is not part of the normal pages, but a stats/reporting page, why not use GA to make the stats information somebody elses problem?

[13:42:11] <someotherdev> well, it's in all pages to some degree e.g. number of votes and if you voted etc. But you can see a breakdown on the stats page

[13:44:12] <someotherdev> however, if we had 16mb of stats per story I would be over the moon. As a matter of fact, I am happy to vist the problem then if that's the case.

[14:41:06] <grkblood13> how do i query for a range include the boundaries. for example, i want to include value1 and value2 in the search results of field: { $gt: value1, $lt: value2 }

[14:41:13] <grkblood13> actually, nevermind

[14:41:19] <grkblood13> i already knwo my answer

[14:41:22] <grkblood13> :)

[15:37:02] <Mmike> Hola, lads. What's the best way to test if mongodb command finished with success? Checking the returnset's 'ok' key ?

[15:39:32] <cheeser> Mmike: yes

[15:44:11] <plamer> hello need some help, and can't find anything on the net whole day :/ Tried to install Genghis but when i select the server i get: Database command 'dbStats' failed: (errmsg: 'exception: expected to be write locked for monitoring'; code: '16105'; ok: '0.0').

[15:44:53] <plamer> the same "expected to be write locked for monitoring" shows in the logs

[15:45:05] <plamer> any idea what is this and how to fix it?

[15:45:36] <plamer> never worked with mongo before and now my boss wants me to help a guy with that :/

[15:46:49] <plamer> the whole thing from the logs: assertion 16105 expected to be write locked for monitoring ns:monitoring.notifications query:{ $query: { notification_id: "401", date_created: /2014-11-14/ }, $orderby: { date_created: 1 } }

[15:51:32] <plamer> anyone?

[16:22:15] <plamer> c'moon, help a dude in need :P

[16:25:59] <Dewsworld> hi

[16:26:52] <s2013> if i have a mongodb uri can do i need to break it down to use mongodump?

[16:26:56] <s2013> its a remote db

[16:27:51] <cheeser> mongodump just takes host and port, iirc

[16:28:25] <s2013> and user/pass right

[16:28:29] <s2013> can i just do a dump of a specific collection

[16:28:34] <Dewsworld> Could you comment on my question on stackoverflow http://stackoverflow.com/questions/26934073/mongodb-update-guarantee-using-w-0

[16:28:35] <cheeser> mongodump --help

[16:29:09] <s2013> k thanks

[16:50:49] <s2013> anyone here has worked with elasticsearch? i followed all the instructions. (i had it working beforE) but it cant seem to import my mongodb collections into my Es instance

[17:31:37] <s2013> anyone?

[20:19:18] <Lonesoldier728> Hey anyone know what a messaging app schema design should look like

[20:20:11] <GothAlice> {sender: {id: ObjectId(…), nick: "…"}, recipient: ObjectId(…), message: "…"} — one approach

[20:20:17] <GothAlice> (Bundle the nick in the record to avoid an extra lookup.)

[20:20:38] <GothAlice> The timestamp of the message would be provided by the _id ObjectId's timestamp.

[20:21:18] <GothAlice> You could add a "read" boolean flag which can be atomically updated when the message gets seen, too.

[20:23:14] <GothAlice> Lonesoldier728: Then for the "push" aspect of messaging, these "messages" could be written to a collection, but also sent to a capped collection where other processes are waiting, listening.

[20:23:50] <GothAlice> Lonesoldier728: https://gist.github.com/amcgregor/4207375 are some presentation slides (with link to code and the rest of the presentation at the bottom) describing using MongoDB as a messaging system for RPC.

[20:26:00] <Lonesoldier728> i GothAlice http://pastebin.com/yBr8yLXk

[20:26:14] <Lonesoldier728> is that good

[20:26:24] <GothAlice> It's a good start, yes. Good index choice.

[20:26:40] <GothAlice> createdTime isn't needed, it's already in the _id.

[20:27:01] <Lonesoldier728> kk

[20:27:13] <GothAlice> But consider how you'll be using the data: most messaging apps, well, display the messages. To display a message in your structure you'll need to look up the user (to get their name, profile picture, etc., etc.)

[20:27:52] <GothAlice> If the display only ever shows a conversation between two users (no group chat), then this is OK. You'll do the lookup for the "other user"'s data once. But in a group chat, things get more complicated.

[20:27:57] <Lonesoldier728> I will have all the info already once the person clicks on a person then i will pull the messages and

[20:28:27] <Lonesoldier728> that match both users ids

[20:28:33] <Lonesoldier728> kk perfect yeah no group

[20:28:40] <Lonesoldier728> so no sharding necessary right?

[20:28:58] <GothAlice> Lonesoldier728: You may also want a full-text index on the message, if you want to be a able to search by keywords and stuff.

[20:29:12] <Lonesoldier728> nope no search needed

[20:29:20] <GothAlice> Sharding is about scaling your data… when you have lots and lots of data (more than fits into the RAM of one machine) then you'll want to add it.

[20:29:36] <Lonesoldier728> what is a lot

[20:29:38] <GothAlice> Choice of sharding key is critical: it'd be a good idea to keep messages within a conversation on the same shard.

[20:29:43] <GothAlice> "More than fits in RAM."

[20:30:02] <Lonesoldier728> kk

[20:30:08] <GothAlice> (But this will depend on use case. If you have lots of historical data that is rarely accessed, the RAM thing can be ignored. You'll have to benchmark and see how much time you spend on disk I/O.)

[20:30:43] <GothAlice> But you'll *always* want your indexes to fit in RAM, otherwise madness will ensue.

[20:32:52] <GothAlice> (That's where, when you get to the point of needing to scale, having "archived" data separate from the main, "live data" collections can be useful.)

[20:34:49] <GothAlice> Finally, another approach (if you have "transactional" conversations, meaning one with a definitive start and end to a session) is to store both sides of the conversation in one record:

[20:35:25] <GothAlice> {_id: ObjectId(…), participants=[ObjectId(…), ObjectId(…)], messages=[{when: Date(…), from: ObjectId(…), message: "…"}, …]}

[20:35:52] <GothAlice> New messages can be appended easily with $push.

[20:36:54] <Lonesoldier728> right

[20:37:11] <Lonesoldier728> yeah it is going to be more like whatsapp one to one convos only

[20:37:16] <Lonesoldier728> without the read stamps

[20:37:22] <GothAlice> (That latter approach is how I store my per-day-per-channel IRC logs.)

[20:39:58] <Lonesoldier728> lol

[20:41:25] <GothAlice> For your problem domain, though, a combination of *both* strategies could be quite useful. The first approach for the "push" messages (capped collection), the second approach for the "archived" messages (subsequent recall). Getting a whole conversation for one page is then just a fetchOne. :D

[20:49:22] <Lonesoldier728> yeah that actually sounds like a better idea no?

[20:49:39] <Lonesoldier728> an array of the texts for one convo

[20:54:45] <GothAlice> Lonesoldier728: It has implications, which is why I use that for archival.

[20:55:23] <GothAlice> Lonesoldier728: Notably, if the document has data added to it beyond a certain amount (default padding), it will have to be moved on-disk to perform the $push. This will add unexpected semi-random slowdown to those operations.

[20:55:56] <Lonesoldier728> http://pastebin.com/d7Uiwwvzso like that

[20:56:10] <GothAlice> That paste has been deleted.

[20:56:26] <Lonesoldier728> huh?

[20:56:33] <Lonesoldier728> this paste http://pastebin.com/d7Uiwwvz

[20:56:34] <GothAlice> Dead link, friend. :)

[20:56:39] <GothAlice> \o/

[20:56:41] <Lonesoldier728> I just added it

[20:56:52] <Lonesoldier728> oh i added the so at the end lol

[20:57:45] <Lonesoldier728> so that kind of setup should be fine?

[20:58:13] <Lonesoldier728> and when storing does it automatically store in descending order (what is most recent to be pulled out first)

[20:58:35] <GothAlice> Hmm; splitting the creator/other at the top level means two indexes and doubling of the queries if you want to look up by either party to the conversation. (Do you need to track the initiator of the conversation?)

[20:58:48] <GothAlice> $push adds to the end, luckily you can $slice with a -1 to get the last from the list.

[20:59:07] <GothAlice> (In theory you could $push anywhere in the list… but append-only is a simple and safe default approach.)

[20:59:08] <Lonesoldier728> or I can sort -1 ? no

[20:59:27] <Lonesoldier728> and what do you mean, if I have the from I know which person in the convo sent it

[20:59:31] <GothAlice> Using an aggregate query, yes, you could re-sort the nested data MongoDB-side.

[20:59:52] <GothAlice> "creator" and "other" implies a relationship; "creator" started the conversation with "other".

[20:59:57] <Lonesoldier728> or should I put the from as an int of 1 and 2, where the creator is 1 and the other is 2, so then I can match it based on that

[21:00:08] <Lonesoldier728> well it can just be user1 and user2

[21:00:19] <GothAlice> {_id: ObjectId(…), participants=[ObjectId(…), ObjectId(…)], messages=[{when: Date(…), from: ObjectId(…), message: "…"}, …]}

[21:01:07] <GothAlice> Note: "participants" — you can index on this (rapidly search for conversations involving one of the two parties), and, to save space in the message list, you could store the "from" in my example as an integer index into that participants list. (0 or 1 instead of 1 or 2, as you gave.)

[21:01:28] <Lonesoldier728> {_id: ObjectId, user1: ObjectId, user2: ObjectId

[21:01:32] <GothAlice> You can also rapidly search for all conversations between two specific parties.

[21:01:37] <Lonesoldier728> it is better to make that an array or participants?

[21:01:37] <GothAlice> Same problem; you're splitting the data.

[21:02:02] <GothAlice> How do you search for all conversations involving "Bob"? You'd have to effectively ask twice, $or: {user1: "bob", user2: "bob"} and that's kinda yucky.

[21:02:27] <GothAlice> {participants: "Bob"} would get the answer in a much more elegant way.

[21:02:45] <Lonesoldier728> oh right

[21:02:49] <Lonesoldier728> ok makes sense

[21:04:19] <GothAlice> The process to display a conversation would be: fetchOne on the conversation by ID (preferably), load up data about the participants (users = db.users.find({_id: {$in: conversation.participants}})), then loop over the messages emitting HTML. (As an example.)

[21:05:04] <GothAlice> Then, after rendering the initial conversation history, wait for push notifications over a MongoDB capped collection in order to live add new messages. :)

[21:05:17] <Lonesoldier728> well I was thinking of fetching the messages stored on the users collection from the user array

[21:05:35] <Lonesoldier728> and then check each message in the message collection returned for which ones have the second participants id

[21:06:01] <Lonesoldier728> the from I guess I might leave the objectId since it seems confusing to figure out if the user is 0 or 1

[21:06:22] <GothAlice> I do not understand that statement. "messages stored on the users collection from the user array" sounds dangerous—I wouldn't store pretty much anything to do with conversations or messages within the user record itself, except possibly a list of "active conversation" IDs.

[21:07:19] <Lonesoldier728> User Collection {_id: ObjectId, messages: [messageObjectIds], etc. etc.}

[21:07:28] <GothAlice> …

[21:07:32] <Lonesoldier728> not good?

[21:07:46] <GothAlice> Why are you storing messages inside user documents?

[21:07:47] <Lonesoldier728> so each user's doc will have an array of the messageIds

[21:07:51] <Lonesoldier728> just the ids

[21:07:53] <GothAlice> All of them?!

[21:08:22] <Lonesoldier728> to link the user to his messages no?

[21:08:25] <GothAlice> Nuuuuuuu… reverse that. Put the user's ObjectId on the message documents; looking up all messages for a user will be trivial then.

[21:08:58] <GothAlice> Were you planning on getting the user document, then db.messages.find({_id: {$in: user.messages}})? After a certain point, that'll just explode.

[21:09:00] <Lonesoldier728> oh so it is better not to have any of the messageIds on the users document

[21:09:07] <GothAlice> Lonesoldier728: Aye.

[21:09:37] <GothAlice> Conversation IDs (for the archived conversation data) is acceptable if severely limited to, for example, only the "current active conversations". (I.e. to track which conversation tabs people have open at any given point.)

[21:10:29] <GothAlice> {username: "GothAlice", active: [ObjectId('#mongodb'), ObjectId('##python-friendly'), …]} as an example from IRC, here.

[21:11:33] <Lonesoldier728> http://pastebin.com/ZhBeuUwJ

[21:11:36] <Lonesoldier728> this is my user

[21:11:39] <Lonesoldier728> collection

[21:11:51] <GothAlice> One of the big reasons to not do what you were proposing is that the entire list of all message IDs will grow substantially over time. Documents are limited to 16 MB, but there are also limitations on queries. {$in: big_list_of_ids} will require first getting the big list (data transfer), then sending it *back* to MongoDB (more transfer).

[21:12:43] <GothAlice> date_created is already covered by _id.

[21:13:11] <GothAlice> Otherwise looks reasonable.

[21:13:34] <Lonesoldier728> i figured it is better to query 1000 users (grab the 1 user and in his doc grab the 100 message id) and then query the 100 messages, as oppose to 100,000 message ids for the users id in the array of participants no?

[21:14:18] <GothAlice> The _id of the messages are indexed, and the creator ID of the messages will be indexed… this means you can answer the question "what are the IDs of every message posted by user X" with one query that need never touch the collection (only the indexes!) making it insanely fast.

[21:14:31] <GothAlice> This is not a question you should be asking, however, unless you have something really special in mind.

[21:15:15] <Lonesoldier728> hm kk

[21:15:46] <GothAlice> Hmm; that didn't sound right. The question "what are all the IDs" is, in general, the wrong question. Asking questions is good, asking the right questions is the hard part. ;)

[21:15:58] <Lonesoldier728> right haha

[21:16:20] <Lonesoldier728> ok thanks going to try this out, what do you work with btw stackwise (server-side)

[21:16:26] <Lonesoldier728> nodejs by chance?

[21:16:49] <GothAlice> So, my question: how exactly do you need to query your data? What are the use cases? Starting a conversation, sending a new message to a conversation, viewing an old conversation, and getting updates on current conversations?

[21:17:16] <Lonesoldier728> correct

[21:17:25] <GothAlice> Lonesoldier728: I avoid JS for server-side development like I avoid the plague. In the last two weeks I've seen more obtuse JS driver weirdness in #mongodb than I like.

[21:17:36] <Lonesoldier728> aha

[21:17:42] <GothAlice> (One ODM, for example, generates completely broken ObjectIds…)

[21:18:15] <GothAlice> (I use WebCore as a web framework and MongoEngine as my ODM of choice. Full disclosure: I'm the author of WebCore.)

[21:18:20] <GothAlice> (And contributor to MongoEngine.)

[21:19:19] <Lonesoldier728> ok so this is the way it should be queried --- Query to grab all of user's friends (which are stored in the User doc in the friends array as {fId, fpic, fname}) - on clicking on the friend takes the friendId and would query all the messages between them

[21:19:52] <Lonesoldier728> between the friendId and myId, if both appear in participants grab it

[21:20:18] <Lonesoldier728> then upon adding a new message, append to the array based on finding the doc with the same two participants

[21:20:26] <Lonesoldier728> un-friending a person removes the whole doc

[21:21:16] <GothAlice> Lonesoldier728: Some terminology tips: on chat systems the list of friends is called a "roster". Also, I'd break conversations apart; like storing the message IDs in the user record, storing every message ever sent between two users in one document will not work well. So if a participant "closes" the conversation, that'd finish that record and start a new one the next time messages between those two friends are exchanged.

[21:22:32] <Lonesoldier728> Storing message ids in user record what do you mean

[21:22:54] <GothAlice> Your prior idea to have {username: "GothAlice", messages=[ObjectId(…), …]}

[21:22:56] <Lonesoldier728> what I said originally of on the user document to have an array of the message ids?

[21:23:02] <GothAlice> Aye.

[21:23:12] <Lonesoldier728> so that is fine for my situation then?

[21:23:16] <GothAlice> No.

[21:23:32] <GothAlice> You have three things: users, conversations, and messages.

[21:23:34] <Lonesoldier728> ah

[21:23:37] <Lonesoldier728> not work well

[21:23:56] <GothAlice> Conversations link users (the "participants" in the conversation), and stores the messages sent during that conversation.

[21:24:48] <Lonesoldier728> where the messages?

[21:24:55] <Lonesoldier728> have conversation ids?

[21:25:00] <GothAlice> However, people close windows and "leave" conversations. Conversations don't last forever. Any time a user "closes" a conversation, that conversation should be marked as closed, and if a new message is sent between those two users, a new conversation is created.

[21:25:22] <GothAlice> conversation = {_id: ObjectId(…), participants: [ObjectId(…), ObjectId(…)], messages: […]}

[21:25:49] <Lonesoldier728> so there can be multiplie conversation docs with the same two users then

[21:25:54] <Lonesoldier728> multiple*

[21:26:03] <GothAlice> In terms of showing "previous chat history", this becomes quite easy. Simply findOne the "most recent" conversation between the two, and grab ($slice) the last five messages out of it. :)

[21:26:10] <GothAlice> Lonesoldier728: Yes.

[21:27:17] <GothAlice> (The size of the slice there can be user configurable; some users don't like seeing old chat history, others only want the last two messages, some want 25 messages…)

[21:27:42] <Lonesoldier728> well I can let them click on a button see more or when they scroll up have it grab more

[21:28:41] <GothAlice> (And if you only allow one active conversation between any two users at a time, a typical approach, you don't need to worry when adding messages about the exact ID of the conversation; simply $push the message into whichever conversation between those users is "active".)

[21:29:07] <GothAlice> Lonesoldier728: Exactly; "show more history" is likewise a simple findOne and another $slice.

[21:29:14] <Lonesoldier728> How do I determine on the server side that the conversation is active/when to close one

[21:29:53] <Lonesoldier728> I mean if the user moves off the conversation screen they can still come back to it a min later and to close it out might be too much no

[21:30:18] <GothAlice> That can become fun; if this is web-based, people can "disappear" from a conversation at any time, and you can't really detect it. (You'd have to use activity timers.) In a more controlled environment, you'd simply have your app (client) ping the server when the user closes the app/tab/etc.

[21:30:58] <Lonesoldier728> it is an application

[21:31:34] <GothAlice> Well, then you know when someone closes a conversation. The client app would ping your service to say, "yup, just closed conversation X". You could even notify the other user if you wanted.

[21:31:46] <Lonesoldier728> but even if a user closes a tab and a user comes back 2 min later, wouldn't it be annoying than there is a new convo

[21:32:28] <GothAlice> They've closed the tab. They've ended the convo. In that situation 99.9% of chat systems will show you a dimmed "last few messages from last chat" history, but it's a new convo.

[21:33:11] <GothAlice> If you don't give a visual indication that the historical messages are historical (i.e. like Messages on Mac or Facebook Messages on their website) the user will be none the wiser.

[21:33:12] <GothAlice> :)

[21:33:21] <GothAlice> (Yes, lying to users about the underlying architecture is A-OK!)

[21:34:07] <Lonesoldier728> yeah I dont care about lying to them, just trying to figure out if 30 docs will be created on a convo that has taken place in 30 min or something

[21:34:22] <Lonesoldier728> and if that makes sense

[21:34:59] <Lonesoldier728> is it wiser to do a daily convo?

[21:35:11] <Lonesoldier728> where if it is the next day it will be a new convo

[21:35:15] <GothAlice> Daily could work… but I can generate 16MB of activity in a day…

[21:35:22] <Lonesoldier728> or if the message count

[21:35:32] <Lonesoldier728> well to take into account also the amount of messages saved

[21:35:42] <Lonesoldier728> so adding a messageCount on the document

[21:36:08] <Lonesoldier728> checking if messageCount is less than 50 then append

[21:36:12] <GothAlice> You'd create a new document each time either side of a conversation is closed. Most people I know leave their chat windows open for a long time. ;) This provides a natural separation. (People will likely want to see the history from their *last* conversation, but far more rarely decide to scroll up further to get even older convos.)

[21:36:14] <Lonesoldier728> if not then start new convo

[21:36:36] <GothAlice> In general you want to reduce the difficulty of adding messages… having to check something first adds a complete round-trip to that.

[21:37:02] <Lonesoldier728> right

[21:37:22] <GothAlice> db.conversations.update({active: true, participants: [ObjectId(…), ObjectId(…)]}, {$push: {messages: {…}}) — highly efficient.

[21:37:50] <Lonesoldier728> and if closed then active of that convo id goes to false

[21:37:50] <GothAlice> Actually, if you turn that into an "upsert", new conversation documents will even be created completely automatically for you, when needed.

[21:38:27] <GothAlice> Correct; if either side closes the convo, db.conversations.update({active: true, participants: [ObjectId(…), ObjectId(…)]}, {$set: {active: false}}) — also trivial.

[21:38:52] <GothAlice> (Basically every operation in something as light-weight as a messaging app should be a single, atomic operation.)

[21:39:22] <GothAlice> (I.e. that way you can't accidentally deliver a message to an inactive conversation.)

[21:39:24] <Lonesoldier728> so everytime a new convo is added then two queries, one to set the active false, and one to start a new one

[21:39:31] <GothAlice> No.

[21:39:37] <GothAlice> Very different and much simpler than that.

[21:40:04] <GothAlice> Every time you get a message, upsert. This will either a) add the message to an existing active conversation between those users, or b) automatically create a new conversation between those users.

[21:40:29] <GothAlice> Every time a user closes a chat tab / conversation, update whatever conversation they were in to be inactive.

[21:40:31] <Lonesoldier728> and anytime close happens then just close

[21:40:34] <Lonesoldier728> kk

[21:40:46] <GothAlice> Step 2 primes step 1 to create a new document the next time a message is sent, entirely automatically.

[21:41:16] <GothAlice> (One query each!)

[21:41:47] <GothAlice> Lonesoldier728: You might like this, BTW: http://ecanus.craphound.ca/chat/

[21:42:11] <GothAlice> It's a multi-user chat prototype with no app.

[21:42:24] <Lonesoldier728> I had a friend

[21:42:33] <Lonesoldier728> haha i thought it was a demo

[21:42:38] <GothAlice> Oh, it's live.

[21:43:43] <Lonesoldier728> so as I said over there I had a friend who created a messaging app

[21:44:01] <Lonesoldier728> and he never stored it in a db just on the user's device's db like sqlite

[21:44:12] <Lonesoldier728> so if a user ever cleared their data

[21:44:19] <GothAlice> Yup, they'd lose their history.

[21:44:24] <Lonesoldier728> is that bad to do

[21:44:32] <GothAlice> Nope, it's actually quite elegant.

[21:44:46] <GothAlice> Most general chat apps use local history storage.

[21:44:51] <GothAlice> (Adium, Trillian, etc.)

[21:45:01] <GothAlice> Almost not chat services themselves provide history.

[21:45:04] <GothAlice> *no*

[21:45:20] <Lonesoldier728> Yeah I am trying to think if that route fits my situation

[21:45:34] <GothAlice> In a web-based app, localstorage (SQLite) is darn good for that.

[21:45:44] <Lonesoldier728> if stored does it quickly become expensive chat apps?

[21:46:04] <GothAlice> Well, since you're using a real app, most people wouldn't expect it to grow in size over time.

[21:46:12] <GothAlice> (Mobile?)

[21:46:30] <Lonesoldier728> I was thinking of doing something in the middle where after a month erasing the convos (so recent history is only a month old)

[21:46:32] <Lonesoldier728> yeah mobile

[21:47:18] <GothAlice> Auto-expunging of old data is good. Interesting fact: if you store the conversations server-side as described above, you can use a simple index on the conversations collection to have MongoDB automatically clean up old data. (TTL or "time to live" indexes.)

[21:48:46] <Lonesoldier728> i just ensure an index TTL can I do it on the ObjectID

[21:49:21] <GothAlice> I'd only really bother to store the last 15 or so messages from each conversation locally… that'd greatly speed up rendering of new conversations that include a small amount of history, while keeping the full conversation archive server-side.

[21:49:52] <GothAlice> Lonesoldier728: No. My recommended approach to TTL indexes is to have an explicit "expires" field in your documents that specifies the exact date/time (plus or minus a minute) the record should expire.

[21:50:12] <GothAlice> (Alice's Law #29: Explicit is better than implicit.)

[21:51:28] <GothAlice> This way, each time a message comes in for a conversation, you can $set the expires time to be "now + 30 days" (or whatever the rule is). This will also allow that rule to be customized per-user, rather than having all conversations cleaned up at the same rate.

[21:51:28] <GothAlice> :)

[21:51:35] <Lonesoldier728> db.log_events.ensureIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } )

[21:51:53] <GothAlice> That's a bad idea.

[21:52:00] <Lonesoldier728> so I can just change the 3600 to whatever 30 days is no?

[21:52:01] <Lonesoldier728> why?

[21:52:33] <GothAlice> First, that's only an hour. Second, if I'm happily chatting away I'd be unhappy to encounter an error after that hour when the active conversation gets cleaned up. (Active conversations shouldn't be cleaned up this way.)

[21:52:59] <Lonesoldier728> Well was going to be 30 days

[21:53:01] <Lonesoldier728> not an hour

[21:53:13] <GothAlice> I.e. basing it on creation time is far less advantageous than basing it on a modification time. Even better, be explicit, and set a field ("expires") dedicated to tracking when the record will be deleted.

[21:54:20] <Lonesoldier728> for example?

[21:54:22] <GothAlice> Lonesoldier728: I keep my chat history forever. Some users don't want more than a week. (Some want no history!) Having a separate "expires" field will allow you to accommodate "how long do you want your history" being a user choice.

[21:54:59] <Lonesoldier728> well the whole point is to get rid of spammers and people that have convos untouched for months

[21:55:45] <GothAlice> Lonesoldier728: Yes. So "never expire old conversations" might not be something you want to offer, but tuning between none, 24h, 7d, or 30d is reasonable.

[21:55:54] <GothAlice> (none being "no history")

[21:56:30] <GothAlice> You'd effectively $set the "expires" date/time at the same time you mark it as active: false.

[21:56:53] <GothAlice> (Simply using the creation or modification time as the TTL index means "active" conversations can be deleted while still in use.)

[21:57:29] <GothAlice> Actually, you could use the presence (or lack of presence) of an expiry time *as* the "active" flag.

[21:57:49] <GothAlice> expires: null — same as active: true

[22:00:29] <GothAlice> db.conversations.ensureIndex( { "expires": 1 }, { expireAfterSeconds: 0 } )

[22:00:40] <Lonesoldier728> expires being the field

[22:00:43] <GothAlice> Yes.

[22:00:46] <Lonesoldier728> and what is 0

[22:00:49] <Lonesoldier728> in that situation

[22:00:50] <GothAlice> Zero seconds.

[22:01:21] <GothAlice> "Expire as close to the date and time given by the expires field as possible." (May be that moment, or up to a minute later.)

[22:01:21] <Lonesoldier728> but how do I put for example messages after 7 days vs 30 days

[22:01:59] <Lonesoldier728> but how do I plug in the index I mean

[22:02:04] <GothAlice> db.conversations.update({expires: null, participants: [ObjectId(…), ObjectId(…)]}, {$set: {expires: datetime.utcnow()+timedelta(days=7)}})

[22:02:13] <GothAlice> (That's effectively what you'd run when someone closes a chat tab.)

[22:02:28] <GothAlice> (Instead of the previous $set: {active: false} example.)

[22:02:50] <GothAlice> datetime/timedelta being Python date handling classes.

[22:02:53] <Lonesoldier728> but how do I set TTL on expires?

[22:03:16] <GothAlice> I've already shown you. The time to live (after the date given by the value of "expires") is zero seconds—don't live after that date.

[22:03:30] <Lonesoldier728> ah ok

[22:03:48] <GothAlice> The date can then be different for absolutely every single record… and you don't need to write a cron script to do it. :)

[22:04:03] <Lonesoldier728> so setting it like this db.conversations.ensureIndex( { "expires": 1 }, { expireAfterSeconds: 0 } ) means that once expires is the same time as the current time

[22:04:08] <Lonesoldier728> then it will be effective

[22:04:16] <GothAlice> (Within a minute or so, yeah.)

[22:04:38] <GothAlice> MongoDB does "garbage collection" runs on TTL indexes once a minute.

[22:04:45] <Lonesoldier728> kk

[22:06:41] <Lonesoldier728> datetime.utcnow()+timedelta(days=7) this is the formula for mongodb

[22:06:53] <GothAlice> No, that's the magical incantation for Python.

[22:07:13] <GothAlice> You'd have to determine how to calculate the date/time using your own platform's tools.

[22:07:18] <Lonesoldier728> ah kk yeah because I never saw it

[22:08:02] <Lonesoldier728> the problem can be though on convos due to one person wanting 30 days vs another wanting 7 days

[22:08:15] <Lonesoldier728> so that is why letting them choose will be a bad idea no?

[22:08:32] <GothAlice> The safe (and secure! I always think about security on things like this) is to go with the minimum of the two participant's expiry rules.

[22:09:11] <GothAlice> So if one asks for 7 days, and another asks for 30, nuke it in 7. The one asking for 30 might be unhappy with that, but the other user asked explicitly for you to not keep data involving him longer than that.

[22:09:22] <Lonesoldier728> right

[22:09:28] <GothAlice> (Another reason why "no history" should be an option: I don't want my conversations stored forever on someone else's server!)

[22:09:36] <Lonesoldier728> right

[22:11:56] <GothAlice> One of the problems I see with chat systems is that there are already so goram many of them, and most of the chat apps are really terrible. If you're going to roll Yet Another Chat Application™, it'd be a marketing thing (let alone a technical quality and pride thing) to do a kick-ass job. Taking security rules into consideration is one of those things that goes the extra mile.

[22:12:04] <GothAlice> (And with MongoDB TTL, it's trivial to do!)

[22:12:30] <Lonesoldier728> yeah my focus is not on the chat aspect

[22:12:37] <Lonesoldier728> the chat aspect is a side social feature

[22:13:23] <Lonesoldier728> but yeah if it helps reduce future costs at the same time def worth it heh

[22:14:05] <GothAlice> (Most people who say "I need a chat system" I point squarely at XMPP. XMPP "solved" chat, and anything less is usually awful. Reference: Steam's chat is UDP messaging wrapped in TCP… reliable message delivery is an issue. At least Tabletop Simulator uses IRC for chat! ;)

[22:14:25] <Lonesoldier728> heh

[22:15:05] <Lonesoldier728> is there a way to detect a person sending millions of messages at once btw

[22:15:12] <GothAlice> That's called rate limiting.

[22:15:14] <Lonesoldier728> have you ever encountered something of that nature

[22:15:19] <GothAlice> And there's an entire sub-industry dedicated to it.

[22:15:46] <Lonesoldier728> is it something you think I need to worry about or not till the app takes off

[22:16:11] <GothAlice> You can rate limit at several points: you can program your app to not enable the "send" button more than once a second. You can program your front-end API load balancer to not accept certain messages more than X times in Y period, with a "pool size" of Z. (Nginx rate limiting.)

[22:16:33] <GothAlice> You could also do firewall rate limiting, and other things.

[22:17:08] <Lonesoldier728> hm kk

[22:17:14] <GothAlice> Initially I wouldn't worry. Simple firewall-level rate limits will make application requests fail in gruesome ways, but this may be acceptable, especially with a sufficiently large "pool size" in the rate limiter.

[22:17:24] <GothAlice> (It's also the easiest to set up.)

[22:18:50] <GothAlice> My SSH connections, for example, are rate limited as 2/5:5 — add two connections to the pool every five minutes, starting with and having a pool size of 5. This means I can try five connections, but then I'll have to wait five minutes… and will only get two more chances until five minutes after that, etc.

[22:20:11] <Lonesoldier728> Yeah is there any security problems I need to watch out for through message sending as well?

[22:20:26] <GothAlice> Injection attacks if you're using WebKit to render the chat interface…

[22:20:45] <Lonesoldier728> well it will just be mobile apps native (ios and android)

[22:20:50] <GothAlice> (The demo chat interface I showed you doesn't protect against anything at all… you saw how just sending an empty message screwed up the display. ;)

[22:21:01] <GothAlice> "Native" often also means HTML.

[22:21:16] <Lonesoldier728> so not good?

[22:21:19] <GothAlice> About 80% of iOS chat apps use WebKit and push HTML snippets to it to draw the chat bubbles. (The remainder use TextKit.)

[22:21:58] <GothAlice> (There's a minor few that use UITableViews… but I feel bad for those people.)

[22:22:10] <Lonesoldier728> what about android side

[22:22:28] <GothAlice> WebKit is also often used on Android. I'm less experienced on that side of mobile, though.

[22:22:42] <GothAlice> A la: http://code.tutsplus.com/tutorials/android-sdk-embed-a-webview-with-the-webkit-engine--mobile-1459

[22:22:45] <Lonesoldier728> I actually havent started my app on iOs yet tho

[22:23:17] <Lonesoldier728> Yeah I do not use a webkit

[22:23:30] <GothAlice> How're you rendering the chat text, then?

[22:23:33] <Lonesoldier728> I actually was going to design the caht screen myself too

[22:23:36] <Lonesoldier728> is that a bad idea

[22:23:47] <GothAlice> How many developers are working on it?

[22:23:48] <Lonesoldier728> consider each message a list item

[22:23:52] <Lonesoldier728> myself

[22:24:11] <GothAlice> Lonesoldier728: Yeah, in that case, WebKit will save you an insane amount of time themeing things.

[22:24:21] <GothAlice> (There's a reason it's so popular.)

[22:24:29] <quuxman> Can a query with a 'foo < bar' expr take advantage of an index?

[22:24:46] <GothAlice> quuxman: If foo is a field and bar is a constant value, then yes.

[22:24:59] <GothAlice> (Or vice-versa.)

[22:25:21] <quuxman> they're both properties of records. A compound index won't solve that?

[22:25:34] <Lonesoldier728> so how would I go about that

[22:25:45] <GothAlice> quuxman: If that's the case, I'm not sure how you're formulating the query at all. Could you gist/pastebin your real query?

[22:26:07] <GothAlice> Lonesoldier728: Well, the link I gave you covers the Android side getting started with that.

[22:26:16] <Lonesoldier728> right but I mean security problems with it

[22:26:48] <GothAlice> Lonesoldier728: https://github.com/amcgregor/syntax-alpha is an example of my own (old) using WebKit to render syntax highlighting of source code.

[22:27:07] <quuxman> GothAlice: I'm using {"$where": "this.snapshot_time < this.updated"} along with a bunch of other conditions

[22:27:38] <GothAlice> Lonesoldier728: I'd Google around for "HTML injection attack", XSS, and other security related things. Escaping user input (not trusting anyone) is the starting point.

[22:27:52] <quuxman> but that's the only condition I could use to constrain the results

[22:27:55] <GothAlice> quuxman: Ah, $where is unoptimized and must be evaluated against each record in the intermediate result set. No indexes for you.

[22:28:11] <GothAlice> (It's also JavaScript, and requires spinning up V8 for the query.)

[22:28:14] <quuxman> That's what I figured, but can I reformulate this to not use $where and to use an index?

[22:28:38] <quuxman> I don't know how to compare one record property with another without using $where

[22:29:06] <GothAlice> Uum; your'e comparing one record with itself.

[22:29:16] <quuxman> right

[22:29:30] <Lonesoldier728> thanks GothAlice for all the help

[22:29:40] <GothAlice> This means you have an opportunity, in this example, of any time you update snapshot_time or updated to set a value that caches the result of that boolean expression.

[22:29:55] <GothAlice> Lonesoldier728: No worries! I hope I was actually helpful and not just confusing. ^_^

[22:30:16] <Lonesoldier728> hehe yeah you opened up a lot of things I did not take into account

[22:30:17] <GothAlice> quuxman: You could then index that pre-calculated boolean field and everything will be blazing fast again.

[22:30:21] <quuxman> Good point, then I could simply index the boolean

[22:30:42] <quuxman> of course :-P

[22:30:46] <GothAlice> :)

[22:30:53] <GothAlice> Sometimes MongoDB requires solutions that are too obvious.

[22:31:59] <GothAlice> quuxman: Also remember that MongoDB can only make use of one index at a time: coming up with compound indexes in the right arrangement is critical to complex queries.

[22:32:26] <quuxman> simply adding this boolean will be fine. It will cut down the results to 10 or so

[22:32:29] <GothAlice> quuxman: I use Dex (https://github.com/mongolab/dex) to work out my indexes for me. :3

[22:32:33] <quuxman> instead of 10s of thousands

[22:35:02] <Lonesoldier728> what data type does my expiresAt have to be, a date right or in seconds?

[22:35:34] <GothAlice> Lonesoldier728: The value in the document should be a true native Date object.

[22:39:18] <Lonesoldier728> and if the field is set to null then nothing will happen to it right

[22:39:23] <GothAlice> Correct.

[22:39:43] <GothAlice> (This also means you can use the "expires" field *instead* of an "active" field, rather than in addition to it.)

[22:44:08] <Lonesoldier728> to check if it is null aha

[22:44:14] <GothAlice> :)

[22:44:58] <Lonesoldier728> well going to eat

[22:45:03] <Lonesoldier728> thanks a bunch again

[22:45:13] <GothAlice> It never hurts to help. :) Enjoy the foods!

[23:23:39] <uptownhr> is it possible to create a document with duplicate field/key names?

[23:23:50] <uptownhr> i just ran into a scenario where this is happening

[23:24:15] <GothAlice> uptownhr: It's not possible to have the same key twice, but it is possible to have the value of the key be another compound type, like a list.

[23:24:28] <uptownhr> take a look here

[23:24:28] <uptownhr> http://pastebin.com/PgL71qnp

[23:24:36] <uptownhr> this is how my document looks like

[23:24:45] <uptownhr> pulled directly from db

[23:24:52] <uptownhr> through the mongo console

[23:25:00] <uptownhr> has been boggling my mind today

[23:25:16] <Boomtime> yes, it is possible

[23:25:19] <joannac> uptownhr: driver?

[23:25:24] <GothAlice> Huh; yeah; BSON lets you do it.

[23:25:26] <GothAlice> That's nuts.

[23:25:32] <uptownhr> wow really?

[23:25:34] <GothAlice> Yeah.

[23:25:37] <uptownhr> haha

[23:25:42] <GothAlice> The method of storing dictionaries in BSON allows for that.

[23:25:44] <uptownhr> the crazy thing is, i don't even know how to do this

[23:25:57] <uptownhr> i tried to forcefully create a document like this and i coudln't

[23:26:26] <Boomtime> most drivers can do it on demand, but you (usually) have to explicitly tell it to

[23:26:28] <uptownhr> but i was able to reproduce through a ruby's mongoid orm

[23:26:37] <uptownhr> since ruby has ":symbols"

[23:26:47] <uptownhr> it'll create things separately

[23:27:05] <Boomtime> the orm is being evil most probably

[23:27:16] <uptownhr> lol yea

[23:27:20] <uptownhr> but bson shouldn't allow this either

[23:27:34] <Boomtime> how would it stop it?

[23:27:54] <uptownhr> i guess..

[23:28:09] <uptownhr> i guess it would cost too much to perform

[23:28:17] <Boomtime> you got it

[23:28:18] <uptownhr> yea, stupid ruby

[23:28:20] <uptownhr> and there orm

[23:29:24] <uptownhr> this should be stopped at the driver then

[23:34:02] <GothAlice> Python's dictionary keys are hashed, so yeah… not normally a problem in that language.

[23:36:08] <Boomtime> uptownhr: it is usually stopped at the driver

[23:36:52] <Boomtime> please note the Ruby driver does this too, only if the defences are switched off (like by an evil ORM) or the presence of a bug permits what you are seeing

[23:37:50] <GothAlice> uptownhr: If you were able to reproduce this using your ODM, I'd recommend submitting a bug with them. Such unexpected behaviour should not happen by chance.

[23:37:59] <GothAlice> (Or by default!)

[23:38:11] <uptownhr> yup, about to write the bug now

[23:38:15] <uptownhr> just searching first

Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 14th of November, 2014