pmxbot IRC Log Viewer

[00:00:49] <ra21vi> and will I be benefited with sharded mongo environment for geo queries, or will it just slow a little bit

[00:19:37] <morenoh149> ra21vi: sounds perfect for mongo

[00:19:59] <morenoh149> we have 2d-shpere indexes and mongo can be replicated which increases read speed

[00:20:16] <morenoh149> replication doesn't increase write speed

[00:20:55] <ra21vi> oh nice ... i thought it will slow down. I was choosing postgres gis, but really want to be stick with mongo since I do not need complex geospatial queries

[00:21:26] <ra21vi> morenoh149: what if I shard the data, will it still be increasing for read?

[00:22:46] <bros> geospatial?

[00:23:43] <morenoh149> read http://docs.mongodb.org/manual/core/sharding-introduction/

[00:24:11] <morenoh149> I'm not sure but sharding could increase read speed for certain operations. But it's heavily dependent on your read patterns

[00:24:25] <morenoh149> it could make certain reads slower due to the routing overhead

[00:25:03] <morenoh149> sharding is usually a neccesity not a performance consideration. You shard when you data is too large to fit in a single machine

[00:27:33] <ra21vi> ok.

[00:28:05] <morenoh149> I would rather do more replication in your case and forego the sharding if the data fits on a single machine

[00:28:58] <ra21vi> yes that seems possible.. my data wont be much.. right now in postgres its ~8GB.. in mongo, at most it will be around 12GB.. that I can have on single machine

[00:30:57] <bros> morenoh149, when do you usually shard?

[00:31:20] <morenoh149> bros: when the data is too big to fit in a machine

[00:31:35] <bros> morenoh149, storage wise? CPU wise? memory wise?

[00:31:48] <morenoh149> https://stackoverflow.com/questions/17810499/when-to-start-mongodb-sharding

[00:31:49] <cheeser> CPU doesn't really figure in

[00:31:51] <morenoh149> storage

[00:33:04] <joannac> um, what?

[00:33:15] <joannac> storage is not what I would pick

[00:34:22] <Boomtime> yeah, storage doesn't really matter except insofar as it affects working set

[00:34:25] <morenoh149> storage or ram rather

[00:34:37] <Boomtime> right, impact on ram usage

[00:36:04] <morenoh149> for reference https://mongolab.com/plans/pricing/

[00:38:01] <morenoh149> I have an 8gb box at home. I'd rather run that for some of these prices. But there's no redundancy, replication, sharding or failover

[00:41:51] <menn0> can anyone help me out with a write performance question?

[00:42:39] <Boomtime> morenoh149: you could do cheaper by hiring an instance on AWS (or wherever) and managing it yourself

[00:43:12] <Boomtime> menn0: ask your question, please use of gist/pastebin for examples

[00:43:37] <menn0> i'm adding a new aspect to our application that's going to be quite write heavy (logs)

[00:44:03] <menn0> i've done some initial benchmarking and there's a signifcant impact on the rest of the application (expected)

[00:44:16] <morenoh149> Boomtime: have you tried azure yet? I'm looking at the more and more recently

[00:44:17] <menn0> would moving the logs to separate db be useful?

[00:44:31] <menn0> or even moving them to separate mongod instance?

[00:45:01] <morenoh149> menn0: sure

[00:45:18] <menn0> both are feasible options within our architecture but i'm wondering how much that would help

[00:45:34] <menn0> morenoh149: which? separate db or separate mongod?

[00:46:04] <morenoh149> I'd say seperate box. I may be wrong.

[00:46:23] <morenoh149> the issue is the increased io operations on the current db

[00:46:45] <menn0> morenoh149: separate box is less feasible :) i'm just wondering about getting the most out of a single machine

[00:47:42] <menn0> morenoh149: sure. i'm wondering if there's a win to be had (in terms of locking etc) by using a separate db/mongod instance and if so, which option might be the best to persue.

[00:48:00] <morenoh149> ah that's tougher. What's the effect of running two mongo processes on a single machine? it's probably worse due to the overhead of having two processes instead of one.

[00:48:38] <menn0> morenoh149: i'm going to benchmark this anyway but am trying to save myself some time by asking here first

[00:49:42] <menn0> morenoh149: that makes some sense. i'll start with the separate db approach then and see what that buys me.

[00:49:57] <menn0> easier to set up that test too

[00:50:07] <menn0> morenoh149: thanks

[00:50:50] <morenoh149> I think it really depends on the io blocking. If the logging and app queries are constant fighting over the lock then two process on a single machine may help. Really will depend on what the OS does.

[00:51:32] <morenoh149> that being said it'd defintely help if you could just do the logging on a small hosted mongodb instance somewhere else

[00:54:31] <morenoh149> it's a really interesting question. I'd love for others to chime in. Or for you to share your results.

[00:55:26] <menn0> morenoh149: i can certainly share the high level results

[00:56:02] <menn0> morenoh149: logging to an external service isn't really an option for this product

[00:56:52] <morenoh149> menn0: the client really wants all the computation done on their hardware?

[00:57:43] <menn0> morenoh149: it's more about the content of the logs. they need to be kept private.

[00:59:01] <morenoh149> how private? the data in something like loggly or mongohq is yours. Unless your passwords are compromised

[01:00:44] <morenoh149> also what are you logging? It sounds like application level events. Because mongodb has facilites for managing it's own logs.

[01:01:26] <menn0> morenoh149: application level logs.

[01:02:02] <menn0> morenoh149: some clients just won't like the logs level their control

[01:06:54] <bros> Where would you store 10k+ documents of 20kb+ base64-encoded data? In files, then store the locations in Mongo?

[01:10:05] <Boomtime> bros: what is the upper limit in size of each document?

[01:10:14] <bros> <32kb

[01:10:20] <bros> base64 encoded shipping label PDFs

[01:11:25] <Boomtime> so just stick them in mongodb

[01:11:26] <bloudermilk> Hi, all. I'm currently architecting a solution for the somewhat complex problem of recursively archiving data in MongoDB using NodeJS. I'm soliciting feedback on my architecture if anyone is interested in giving me their two cents. Thanks :) http://stackoverflow.com/q/28709172/29297

[01:11:26] <Boomtime> you don't even need to bother with base64 encoding them, mongodb can store bindata directly

[01:12:08] <bros> cool, thanks.

[01:15:29] <Boomtime> bloudermilk: your question is too broad, you don't provide any metrics for how many books/chapters/pages you are expecting, you need to set some expectations on data load before you can architect anything

[01:15:29] <Boomtime> "100GB of data" is really not sufficient btw

[01:15:29] <Boomtime> also, this statement: "The data model was designed in a way very similar to what you would expect to see in a relational database" <- if you approach mongodb with a relational mindset you are going to fail hard

[01:15:29] <bloudermilk> @Boomtime thanks for the feedback. Most collections contain many millions of records. The reason I didn't specify is that the solution must be generic enough to work for several places in our model graph, which is ~20 models and would have detracted from the question. Well aware of the fact that the way the data is modeled is not suited to MongoDB–before my time, sadly

[01:19:31] <Boomtime> well, generic questions elicit generic answers - a relational model in a document store will be painful forever, especially given that you have a natural embedded document structure unnecessarily spread over multiple collections

[01:19:32] <bloudermilk> Boomtime: generic answers is all I'm looking for, but changing the data model at this point is out of scope. If you have any feedback on the problem at hand I'm all ears

[01:31:41] <Boomtime> bloudermilk: you should look into the aggregation framework, it can probably short-circuit some of your work

[03:33:50] <menn0> morenoh149: some initial findings. logging to a separate DB (not separate mongod instance) makes a massive improvement for our app.

[03:34:35] <cheeser> probably because of the write locks. 3.0 and WiredTiger should fix that.

[03:44:58] <menn0> cheeser: using a separate DB for logging is no problem for this application so this is already a big win for us.

[08:09:36] <rawl79> hello

[08:35:47] <rawl79> hey guys

[08:35:51] <rawl79> got a question here.

[08:36:23] <rawl79> The repair (repairDatabase) command, can you specific a different path for storing temporary data?

[09:53:05] <Cygn> Hey everyone, i am new to mongo and need a hint in the right direction. I have a table "dataChunk" which contains data like the following: http://pastie.org/9980768, i need to get the "original_value" of the last entry (unit_identifier=10) when items with the same dataId and different unit_identifiers have certain values (f.e. with the unit_identifier 7 has a certain date). How would you solve this in general?

[10:06:04] <jaitaiwan> Cygn, not entirely clear what you're trying to achieve.

[10:07:07] <jaitaiwan> If I understand it correctly, you just want to sort by descending on the unit_identifier field?

[10:07:54] <bishwa> hi guys

[10:09:08] <Cygn> jaitaiwan: there are 10 entities in the example, all connected via the attribute dataId, i need to fetch the value of the original_value attribute, where the unit_identifier attribute is 10, but only when the (via the dataId) connected entity with the unit_identifier 6 has a certain value (f.e. a date older than today).

[10:10:35] <jaitaiwan> Cygn: that's a VERY relational request... not something mongodb would be well suited to without a large performance hit on possibly the application side.

[10:10:42] <jaitaiwan> hi bishwa

[10:11:16] <bishwa> what is the difference between running:

[10:11:32] <bishwa> mongo --eval "some command"

[10:11:37] <bishwa> and db.eval()

[10:11:48] <bishwa> do both of these take the write lock?

[10:14:05] <jaitaiwan> bishwa: I would think that they would both write lock and infact probably do the same thing. One is the command line equivalent of the other.

[10:14:49] <Cygn> jaitaiwan: i could also implement the data in another way, since i am fetching it from another (relational) database. Goal was to get more speed by setting up the data to mongodb. How would you structure the data to get the same information using a mongo request… having f.e. the data i have there right now in one row? (unit_identifier would be the column, original_value, the value)

[10:16:35] <jaitaiwan> Cygn: I'd probably combine the data into the one document if that makes sense

[10:20:33] <jaitaiwan> Cygn: if I had a more functional understanding of the data, I could better inform its structure

[10:22:42] <Cygn> jaitaiwan: this is a sale of a product. Contains the market (first entry), the product number, (second entry), the name (third entry) etc. - you would now maybe like to ask why the hell i did it that way in the first place - there will be more data, thousands of entries with different attributes. For example some will have a price, others not (like this one) and i wanted to have a way to handle any amount of additional attributes.

[10:23:47] <Cygn> jaitaiwan: unit_identifier, identifies the type of attribute (1 for market, 2 for product number and so on), original value holds the value and dataId connects the sales attributes to each other… at least that was the plan ;)

[10:23:55] <jaitaiwan> Cygn: in mongodb, the structure isn't set, one document can have a field where another doesn't. You should have a products collection, with one document per product.

[10:24:23] <jaitaiwan> As far as the attribute identifier you could simply use the number as the index

[10:27:16] <Cygn> jaitaiwan: oh my god - this is so sexy i wanna have kids with it. Thank you i will try it out NOW.

[10:28:15] <sabrehagen> hypothetical: i have an object o with three keys a, b, and c that i want to store in mongo. what happens if somebody decides to post data to my api route that saves the object to mongodb with a huge object full of many other keys? surely it gets saved. the only way i can see of stopping this is imposing a schema on the data, which is opposed to the schemaless paradigm

[10:30:06] <Derick> you should be validating input anyway?

[10:30:25] <sabrehagen> but that validation implies a schema

[10:30:38] <jaitaiwan> sabrehagen: Depends on the language your using. Mongodb uses binary objects so depending on the language unless your specifically turning into an object.

[10:41:44] <Cygn> jaitaiwan: MongoDB just increased our execution Time from about 6 Minutes to 20 Seconds. - Thank you very much and keep on the good work !

[10:42:21] <Derick> Cygn: that's "decreased" :-)

[10:42:40] <Cygn> Derick: Right… that is the adrenaline :D

[10:46:28] <jaitaiwan> Cygn: curious to see what your structure is now :)

[10:50:37] <Cygn> jaitaiwan: i am just preparing a giant insert statement, i will give you a look afterwardes

[10:51:53] <jaitaiwan> Awesome :)

[11:52:59] <Cygn> jaitaiwan: This is an example of the Result by now, in the future there will be datasets with more or less attributes :) http://pastie.org/9981002

[11:54:05] <jaitaiwan> Awesome. Another suggestion. Use ISODate's to store dates in the db :)

[11:57:51] <Cygn> jaitaiwan: Would be the next step, right now i am thrilled to see my first full request working, i am just adjusting the backend. Fine-Tuning Afterwards.

[11:58:16] <jaitaiwan> Awesome :D What's your backend built on Cygn

[11:58:17] <jaitaiwan> ?

[11:59:37] <Cygn> jaitaiwan: Laravel, you know it?

[11:59:43] <Cygn> A smooth php framework

[12:10:03] <Cygn> jaitaiwan: actually, there is a nice out of the box package for mongodb and laravel

[12:10:32] <jaitaiwan> Cygn: cool cool :)

[12:10:46] <jaitaiwan> I use #phalconphp haha

[12:20:00] <jaitaiwan> Cygn: I don't know how much you value speed but I know that using phalcon has allowed me to scale at a much lower cost

[12:49:49] <StephenLynx> what is phalcon?

[13:06:20] <iksik> hm, how can iterate on document object field? Object.keys doesn't seems to work

[13:12:54] <StephenLynx> afaik, you have to do it on your application code with the object you get from the query.

[13:16:23] <iksik> StephenLynx: no way to hack it somehow? i'm just playing around with data inside of robomongo

[13:16:55] <StephenLynx> that IS the hack.

[13:16:56] <StephenLynx> :v

[13:17:09] <iksik> uhm

[13:18:11] <jaitaiwan> StephenLynx: was it you and GothAlice we were discussing "best languages" about? Phalcon is a php "framework" which is a php c extension. Means it helps reduce a lot of the speed issues that most php frameworks face.

[13:18:27] <StephenLynx> ugh

[13:18:54] <StephenLynx> we were discussing, but I was bashing python and she was defending it.

[13:20:48] <jaitaiwan> StephenLynx: haha yeup.

[13:21:00] <iksik> ok, it seems, for in - 'works'

[13:41:56] <Cygn> jaitaiwan: thx for the falcon hint !

[13:42:13] <jaitaiwan> no worries Cygn :D

[13:49:55] <StephenLynx> Cygn any particular reason you chose PHP?

[14:52:54] <jaitaiwan> Oh dear StephenLynx: converting the masses? haha.

[14:53:12] <StephenLynx> trying my best :v

[14:54:53] <cheeser> Someday we'll eradicate php from the world...

[14:56:06] <StephenLynx> If we couldn't eradicate the plague, I don't think we will be able to do it with PHP.

[14:56:21] <GothAlice> (Writing my C10K HTTP/1.1 server, I tried borrowing a C extension from a Ruby HTTP server for header parsing. Turns out the prototype 12 lines of pure Python I wrote were about 60x more performant than the C extension I tried to borrow. And all it was doing was upper-casing the keys, replacing hyphens with underscores, and prefixing the keys with "HTTP_". ¬_¬)

[14:56:31] <cheeser> Some battles are worth fighting regardless of the odds.

[14:57:11] <GothAlice> (And unwrapping the headers, if wrapped. Yay line length limits in protocols!)

[14:57:52] <StephenLynx> GothAlice it could be for a number of reasons. I heard that V8 has an overhead if you swap from JIT and native code.

[14:58:24] <GothAlice> A bit of a non-issue in CPython and CRuby.

[14:58:39] <GothAlice> (Both lacking JIT.)

[14:59:01] <StephenLynx> I don't know why you had this experience in particular, but I know that C is not slower in general than these two languages.

[15:00:22] <GothAlice> https://github.com/marrow/server.http/blob/develop/marrow/server/http/protocol.py#L139-L151 vs. https://github.com/marrow/server.http/blob/e6468ecfc01808ce66f90411dae836b14f6f773c/marrow/server/http/parser/http11_parser.c

[15:04:16] <StephenLynx> your point being?

[15:04:44] <GothAlice> The earlier comment about the PHP framework having a C extension. C extension just means part of it is written in C, not that that inherently makes it better or faster. ;)

[15:05:30] <StephenLynx> well, it is still running PHP, you can't polish turd anyways.

[15:06:11] <GothAlice> HippyVM (alternate PHP implementation written in RPython) does a pretty good job on the polishing.

[15:13:53] <GothAlice> Instead of eradicating bad languages from the world, I want a common runtime for them all. Then we can have parts written in whatever language people are comfortable with and everyone, regardless of language, could benefit from the good bits. Thus my being such a fan of the RPython interpreter projects. There are already projects implementing Ruby, PHP, JavaScript, Lisp, Smalltalk, Scheme, and IO, with runtime back-ends for C, .NET, JVM,

[15:13:53] <GothAlice> and JS, plus recent transactional memory support.

[15:14:34] <cheeser> the JVM and the CLR are shaping up to be just that. and I'm all for it.

[15:14:55] <cheeser> (the JVM being much more prevalent for obvious reasons)

[15:15:09] <GothAlice> Indeed. I dislike Java the language, but the JVM itself is a work of art.

[15:15:50] <cheeser> i love java. but i'm about to port a project to kotlin because "Why Not?(tm)"

[15:16:21] <GothAlice> ^_^ I've written commercial code in Brainf*ck for the same reason.

[15:17:34] <cheeser> ha!

[15:18:07] <StephenLynx> I like java, but I can't see a world without native code.

[15:18:22] <GothAlice> (Compiled to ELF using a compiler written in BF, of course. For binary datastream processing, it does a pretty reasonable job!)

[15:19:09] <StephenLynx> and a single runtime will be nothing but a huge unmaintaineable clusterfuck

[15:20:22] <StephenLynx> not to mention that productivity of people working on projects that use multiple languages would decline because of the context switching.

[15:21:27] <GothAlice> I don't really see a reason why that must be the case. Compilers and runtimes are different things, and as listed above, I've already got compilers for a variety of languages that can emit bytecode (or native code) for multiple extant runtimes. When you're consuming an API typically you have objects, methods, and bare functions. These are pretty common among languages, so as long as your documentation is reasonable, there'll be no real

[15:21:27] <GothAlice> need to dive into code you aren't familiar with.

[15:24:05] <GothAlice> (At work we use Java NLP libraries from our Python code running in the JVM. It's pretty neat to be able to do this.)

[15:26:10] <StephenLynx> sounds awful.

[15:26:21] <StephenLynx> a real hack job.

[15:26:34] <cheeser> yeah. writing production code in python? come on!

[15:27:43] <StephenLynx> lol I have the impression cheese is subtly making fun of me :V

[15:29:11] <cheeser> not a bit, actually.

[15:29:27] <cheeser> i like python, mostly. though i honestly despise PHP.

[15:30:49] <GothAlice> weka = jpype.JPackage('weka'); self.dataset = weka.core.Instances(java.io.FileReader(datasetFilename)) # Using Java from Python is actually this easy. ;)

[15:34:19] <StephenLynx> still a hack.

[15:34:35] <cheeser> a "hack" is just a solution i didn't write. ;)

[15:37:18] <ianp> same here. No wonder all the code you write is a huge hack, cheeser!

[15:37:49] <ianp> ;)

[15:38:00] <cheeser> in this case, it's because i outsource my work to my 8 year old...

[15:45:23] <NoOutlet> Where is this logging available?

[15:46:02] <NoOutlet> Sorry. I'm referring to the logged conversations which -pmxbot- alerts us to upon entering this room.

[15:46:13] <cheeser> check /topic

[15:46:24] <NoOutlet> Cool, thanks!

[16:26:29] <bros> I have a collections called orders. orders has a subdocument which is an array called log. I push timestamps into log. How can I select all orders within a certain time range?

[16:27:15] <StephenLynx> you can use dot notation on the query block for the array field.

[16:27:22] <StephenLynx> like log.timestamp

[16:49:02] <valeri_ufo> hi guys! is anyone running mongodb configs with the puppetlabs provider? is anyone aware if the provider can handle auth enabled mongo?

[16:50:58] <elm-> hi, a quick question were I could only 2 year old info online: Is there a way to speed up draining of a shard? I'm waiting already 4 hours for a shard of 20GB to drain, it does so a rate of maybe 2-3 Mbit/s which seems way to low for me

[16:56:52] <bros> StephenLynx, cheers. Quick question. In MongoDB, the fields are dates. Can I simply pass another date object in log.time $gte start_date?

[16:57:21] <StephenLynx> yeah, afaik mongo can perform checks on date objects.

[16:57:39] <StephenLynx> never did it though, but I know it is possible.

[17:37:29] <freeone3000> I have a SECONDARY whose oplog is not catching up to PRIMARY. What could be causing this?

[17:54:58] <freeone3000> oplog isn't even changing.

[18:00:18] <GothAlice> Welp, I know the next marrow project for me to NIH. A MongoDB migration framework based on my five years of migration experience. What's everyone's experience with both early rapid development cycle changes and longer-running datasets?

[18:00:38] <GothAlice> Any existing tools that are good?

[18:02:11] <StephenLynx> I see marrow alot in your stuff, what is it?

[18:02:39] <GothAlice> I've been rolling the occasional one-liner like User.update({'oid': {'$exists': True}}, {'$unset': {'oid': True}}, upsert=True, multi=True) or Source.objects({'se': {'$type': 3}}).update(set__settings=[]) when needed. :/

[18:02:51] <GothAlice> StephenLynx: https://github.com/marrow/ < my "open source collective" namespace

[18:04:36] <GothAlice> WebCore is special, it's namespace is just "web". Yes, I'm namespace-greedy. ;)

[18:14:38] <freeone3000> How can I force mongo to ignore intervening oplog entries and simply get the current state of the database to primary, and update itself accordingly? I've tried getting the data from a snapshot, but taking the snapshot took longer than my oplog delay, so pre-seeding doesn't seem to have worked? Is my only option to delete the data and try again?

[18:16:28] <GothAlice> You could increase the size of your oplog and try pre-seeding again. You may even be able to bootstrap (and only have to reconfigure this way) any other existing and up-to-date secondary, if I recall correctly.

[18:18:50] <freeone3000> How can I convert from oplog hours to oplog time? Current state is 10GB == 5 hours, so for 24 hours I'd need 50GB?

[18:19:40] <StephenLynx> GothAlice so its just a group of collaborative projects?

[18:20:11] <GothAlice> StephenLynx: Aye. They're "backed" by two companies in Canada here; much of the recent code is exports from commercial work.

[18:20:39] <GothAlice> Effectively getting paid to do FOSS is awesome.

[18:20:43] <StephenLynx> hm

[18:21:11] <GothAlice> (Illico Hodes here in Montréal, and Top Floor Computer Systems Ltd. over in BC.)

[18:24:43] <Nepoxx> Montreal has a pretty nice CS market

[18:25:13] <GothAlice> Indeed; it's also pretty much the Python capitol of Canada. So many conferences hosted here…

[18:25:32] <Nepoxx> Yeah, but I can'T see that lasting :P

[18:26:51] <cheeser> went to confoo last year there.

[18:27:08] <cheeser> Feb's a terrible time to visit montreal ;)

[18:27:13] <GothAlice> It really is. XD

[18:27:22] <GothAlice> Snowing at the moment…

[18:28:27] <Nepoxx> Haha, Montreal thinking they have real winters is funny :P

[18:28:38] <GothAlice> I'm living in Montréal because of PyCon Atlanta, and the CCP after-party. ¬_¬ Presenting at ConFoo for the two or three years I did it was easier once I lived here. XP

[18:37:34] <Nepoxx> GothAlice, are you Canadian? If not, how hard was it to immigrate?

[18:37:54] <StephenLynx> I got a friend who moved to there a couple of years ago.

[18:38:06] <GothAlice> I'm Canadian-Australian, so for me, no issues.

[18:38:07] <StephenLynx> it used to be easier, but its more strict now, from what he said.

[18:38:41] <GothAlice> Indeed; in the last 10 years they changed some of the rules. My parents needed new papers.

[18:38:55] <StephenLynx> you will need a job offer there.

[18:39:14] <StephenLynx> there used to be something called workstudy that gave you a working visa, but it was finished.

[18:39:43] <StephenLynx> but once you get an offer, it is pretty fast, AFAIK. 5 years for full citizenship more or less.

[18:42:16] <StephenLynx> I was planning to move there.

[18:42:21] <StephenLynx> but now I want to move to new zealand

[18:42:35] <Nepoxx> I'm actually from Montreal, so it's not an issue for me. I was wondering if it was much easier to get in, than for me tog et in the US

[18:42:51] <Nepoxx> I had a job offer in Seattle, I could have got a TN easily, but not so much for my spouse

[18:42:57] <StephenLynx> migrating to the US is a kick in the balls

[18:43:05] <Nepoxx> (we're not married, and common-law is not recognized in the US)

[18:43:07] <StephenLynx> and US is pretty crappy

[18:43:15] <Nepoxx> Seattle's amazing though

[18:43:29] <StephenLynx> you are still under all federal US bullshit.

[18:43:37] <StephenLynx> I'd rather stay in brazil than move to US.

[18:44:26] <Nepoxx> Well moving in for you guys is even harder, a TN is easy, H1-b not so much

[18:45:23] <StephenLynx> no idea what these mean :v

[18:45:30] <Nepoxx> Work visas

[18:45:43] <StephenLynx> and no interest in moving to US. doesn't matter which part.

[18:46:00] <StephenLynx> and I know it is pretty hard for anyone that isn't a north american.

[18:46:13] <jimmie_fulton> We have a Brazilian in our office here in San Francisco. He plans on staying...

[18:46:50] <StephenLynx> I can understand the appeal of making loads of money, though.

[18:47:12] <Nepoxx> I'd love to work in Cali, jsut for the tech community

[18:47:30] <StephenLynx> what would be an average yearly salary for IT in the US?

[18:47:31] <Nepoxx> I'm in Quebec City right now, I absolutely love it, but there's virtually no tech community whatsoever :(

[18:48:13] <Nepoxx> StephenLynx, it really depends where and what you do. I was a Software Engineer for a "big company" in Seattle and I got 125k fresh out of college

[18:48:37] <StephenLynx> ok, 125k per year. here in brazil it would be 350k reais.

[18:48:51] <StephenLynx> let me see how long I could live with 350k

[18:48:53] <Nepoxx> I'm making less than half that now, thanks to Canadian (and Quebec) economics :P

[18:49:35] <StephenLynx> 29

[18:49:36] <StephenLynx> YEARS

[18:50:02] <StephenLynx> i could literally live three decades with what you earned in a single year.

[18:50:04] <Nepoxx> You can't really compare it that way

[18:50:09] <StephenLynx> my point is

[18:50:25] <StephenLynx> how high are the number one see on his paycheck.

[18:50:38] <StephenLynx> and how that motivates people to move to US.

[18:52:40] <StephenLynx> did you know USA is pratically the only country to not have a law giving mothers a license for maternity?

[18:52:55] <StephenLynx> since we are talking about jobs and USA

[18:53:04] <ralphholzmann> my database (with journaling enabled) did not shut down cleanly, now the logs say it can't open the database file

[18:53:17] <ralphholzmann> version 2.2.4

[18:53:33] <ralphholzmann> warning database /var/lib/mongodb htt could not be opened

[18:53:34] <jimmie_fulton> Yep. 125+. There are so many startups and software companies here, I can't even count: Just off the top of my head... Google, Twitter, Splunk, Zynga, Saleforce (on every single block), GitHub (around the corner from my house), Yelp Square, IGN, Zendesk, Yammer, DropBox, Riverbed, FitBit, Ripple Labs, MuleSoft, Clustrix...

[18:54:00] <ralphholzmann> any ideas on where to start?

[18:54:13] <ralphholzmann> http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/

[18:54:22] <ralphholzmann> ^^ seems to apply only to non-journaled databases

[18:55:33] <Nepoxx> StephenLynx, I know that. I'm not going to comment, we most likely shouldn't discuss politics in the mongodb channel :)

[18:57:10] <ralphholzmann> can I just delete the "journal" folder in the data directory

[18:57:16] <ralphholzmann> and then run with the repair option?

[19:09:55] <doxavore> have a replica set with 1 node is doing a full resync. that crashes during index build (seems it runs out of physical memory without touching swap). short of taking my cluster down to get a snapshot, is there anything else we can do?

[19:10:56] <doxavore> (i've only had bad times trying to get mongo to apply rs config changes with only 2/3 servers up)

[19:11:42] <ralphholzmann> seems like rm -rf-ing the journal folder, then doing a repair, then starting up worked

[19:25:19] <freeone3000> How can I convert oplog to a non-capped collection?

[19:26:19] <doxavore> freeone3000: i'm not sure, but maybe follow the "resize oplog" instructions, but when creating it, don't provide the capped info, just create the collection?

[19:26:54] <doxavore> afaik you can only set the capped size on collection creation, so if it exists before then...

[19:27:18] <freeone3000> Alright, thanks. And that should allow infinite-length data recovery?

[19:28:43] <doxavore> in theory, yes. but when a new node comes online missing all of its data, it will start a sync from an existing node, and only needs the oplog to not run out of space between the time it STARTS recovery and the time it finishes and gets caught up.

[19:29:09] <doxavore> any oplog that existed before the new node starts its recovery doesn't matter

[19:32:52] <freeone3000> Okay. That's not happening to my nodes - they start recovery, get into STARTUP2, and sit there for days. If I pre-seed with a backup, they get to a point five hours after they started RECOVERY, then their oplogTime doesn't increase.

[19:34:17] <freeone3000> It's at the point where my next step is to launch a new, identical cluster to get the thing into the same state.

[19:34:31] <freeone3000> *a sane state. And then restore using mongodump / mongorestore.

[19:35:11] <doxavore> freeone3000: do you see anything in the logs? something about oplog being too old, etc? (i just ran into this the other day...sigh)

[19:35:24] <freeone3000> doxavore: No, no message like that in the logs, but it just... stops.

[19:35:33] <freeone3000> doxavore: It's more than 6 hours behind and I only have a 10GB oplog, though.

[19:35:52] <freeone3000> doxavore: But that doesn't explain why a force resync (which discards current db state and tries to start fresh) doesn't go past STARTUP2, either.

[19:36:01] <doxavore> does mongod itself stop?

[19:36:23] <freeone3000> Only on one server, and I think something else there is odd.

[19:37:31] <doxavore> i have a node now that can't resync because mongod doesn't handle an index that doesn't fit fully into physical (eg not swap) memory

[19:37:42] <doxavore> nothing in the logs, mongod just dies

[19:38:23] <freeone3000> These are r3.xlarges, so I can fit a full third of my dataset into memory.

[19:39:55] <doxavore> so the server you're restoring to syncs up to the point of where it started the sync, and then doesn't do anything but sit in STARTUP2? but mongod process keeps running on that server?

[19:44:05] <freeone3000> doxavore: Yes.

[19:45:01] <doxavore> freeone3000: sorry, perhaps someone else has an idea, but the only time i've seen that it is spitting out messages in the log that it's too old to catch up :-/

[19:46:44] <fastfinger> Hi. I’ve been trying to use $addToSet in a findAndModify query using the node.js api. But I’m receiving an error that the prefixed field $addToSet is not valid for storage. The query works using the mongo shell client, but just not with the api.

[19:48:25] <fastfinger> Basically: db.collection(‘users’).findAndModify({‘username’: ‘foo’}, [], {$addToSet: {roles: “somerole”}}, {upsert: true, new: true})

[19:54:00] <edrocks> does anyone know why my db isnt showing up when I use "show dbs"? I'm using a replica set and I have 2 dbs both with data but one doesnt show up

[19:54:48] <edrocks> o nevermind it got dumped in test somehow

[20:14:08] <MLM> In my model, would it be better to only have a `dateCompleted` Date field or also add a `isCompleted` Boolean? I feel like the `dateCompleted` Date field by itself is better so I don't have any maintenance complexity

[20:20:37] <cheeser> i'd use the date

[20:20:46] <cheeser> more information

[20:21:55] <MLM> I would be using only Date or the combo of Date and Boolean.

[20:22:07] <MLM> Can I just set the Date to `null`?

[20:22:16] <MLM> meaning "non completed"

[20:24:44] <cheeser> find the completed items: {$exist : { dateCompleted : true } }

[20:25:07] <cheeser> find the order of completions: find().sort( {dateCompleted : 1 })

[20:26:39] <cheeser> https://twitter.com/mipsytipsy/status/570674246072057857

[20:28:47] <MLM> cheeser: Oh, awesome! Haven't used `$exists` yet. http://docs.mongodb.org/manual/faq/developers/#faq-developers-query-for-nulls

[20:30:54] <MLM> I assume you can also use it the way you showed but I can't find any example in the documentation. The use-case I can find is:

[20:30:59] <MLM> db.inventory.find( { qty: { $exists: true } } )

[20:32:36] <cheeser> i might've gotten it swtiched around. i always have to look it up. i don't often work in the shell. ;)

[20:32:59] <MLM> Same syntax with Mongoose though (from what I have used)

[20:33:12] <MLM> Items.find({ ... })

[20:43:35] <MLM> http://mongoosejs.com/docs/api.html#query_Query-exists

[20:47:11] <MacWinner> if I have 3 documents which has a "tags" attribute which is just an array of tag words, and I want to find the documents with the most number of tags matching to a search, is there a simple way?

[20:47:57] <MacWinner> and if there is a matching number of tags, return all the docs.. not sure if I explained that clearly

[20:48:18] <MacWinner> basically trying to select the best ad out of database given a set of tags..

[20:49:38] <MacWinner> i currently have a combinatorial type of set search happening

[20:57:19] <freeone3000> I'm getting "dirname: missing operand" when starting mongod, and starting mongo fails, with no logs. The IGNORECASE fix in AWK is already applied, as well as the --check command to daemon. How should I try to fix this next?

[21:03:57] <morenoh149> MacWinner: you can match by an exact set of tags easily http://docs.mongodb.org/manual/reference/operator/query/#array

[21:04:20] <morenoh149> the part where you don't have a match but want the closest match will be considerably more difficult

[21:24:38] <freeone3000> Okay, turned out it was a bad configuration parameter, reported poorly. Startup script should probably not redirect all output to dev null...

[22:05:25] <wavded> if a property is sometimes an array and sometimes a string, is there a way to update only the non-array properties?

[22:05:30] <wavded> values

[22:06:00] <GothAlice> wavded: Yes, though this type of mixed-type shared-key situation is strongly discouraged within a collection.

[22:06:38] <GothAlice> Use $type in your query to select for the documents whose given named field is of a certain type. http://docs.mongodb.org/manual/reference/operator/query/type/

[22:06:52] <wavded> GothAlice: i agree, they should be all arrays, I think something happened in the ODM layer

[22:07:09] <GothAlice> I use this during migrations, if I'm trying to find records to update and move over. :)

[22:08:21] <wavded> GothAlice: thanks! I guess I've never done this either but how to you query a field for a value and a type? is there like a $value operator?

[22:08:43] <GothAlice> http://docs.mongodb.org/manual/reference/operator/query/eq/

[22:08:44] <GothAlice> :)

[22:08:54] <wavded> thank you!!

[22:09:04] <GothAlice> Thus: { <field>: { $eq: <value>, $type: N } }

[22:09:29] <GothAlice> However, no array will match a string, AFIK.

[22:09:41] <GothAlice> So generally you won't need to query both. Comparing value should do.

[22:12:59] <wavded> looks like string also matches array

[22:13:03] <GothAlice> :/

[22:13:04] <wavded> this seems to work though

[22:13:07] <wavded> db.sales.find({ 'acquisTM.value': { $eq: 'No Consessions' }, $where: "Array.isArray(this.acquisTM.value)"}, { 'acquisTM.value': 1 })

[22:13:08] <GothAlice> Source.update({'se': {'$type': 3}}, {'$set': {'settings': [...]}}) # Common migration pattern.

[22:13:21] <GothAlice> Eeeeew, $where.

[22:13:41] <wavded> http://docs.mongodb.org/manual/reference/operator/query/type/#querying-by-data-type

[22:13:50] <wavded> donno how else to get just arrays or non-arrays

[22:14:45] <GothAlice> Will you need to issue this query often?

[22:14:56] <wavded> no, i'm just cleaning up some data

[22:15:08] <GothAlice> Okay, yeah, $where is fine for maintenance.

[22:15:18] <wavded> it is slower for sure but is what i want

[22:15:24] <GothAlice> I've never in 8 years needed it, but if it works, great! :D

[22:16:35] <codetroth> I am having trouble figuring out how to store a subdocument. Basically I want to have a top level document which has the id of a conversation, and then the messages saved in sub documents under that

[22:17:10] <fewknow> codetroth: use bucketing to avoid that

[22:17:15] <fewknow> lol

[22:18:21] <GothAlice> fewknow: A 16 MiB BSON document can store between 13 and 15 million words. If a single "conversation" reaches that limit, there are other issues. ;)

[22:19:02] <GothAlice> codetroth: This is a pretty common pattern. My own forums store threads this way. Any particular issue or question?

[22:19:30] <wavded> GothAlice: btw, thanks for the help, looks like we are all good again :)

[22:19:37] <fewknow> GothAlice: yeah i agree, but if you want to run stats or anything on the messages in the conversations it is easier with bucketing

[22:19:48] <GothAlice> wavded: Phew! Mixed types are deadly.

[22:20:10] <fewknow> possibly

[22:20:17] <GothAlice> fewknow: That's why the gods invented pre-aggregation. ;)

[22:20:35] <fewknow> agree again...as that is what I do

[22:20:39] <fewknow> but lots of people don't

[22:21:02] <codetroth> yea I think I will go back to MySQL because none of that made any sense to me...

[22:21:33] <fewknow> codetroth: nah..wait

[22:21:42] <fewknow> how are you writing to mongo

[22:22:33] <codetroth> I am working with Node.js and using Mongoose.

[22:22:39] <GothAlice> MONGOOOSSE

[22:22:44] <fewknow> will then i would give up

[22:22:48] <fewknow> jk

[22:23:31] <fewknow> codetroth: so what is the issues? is your object not serializing?

[22:23:36] <codetroth> Basically I can do the initial write of a document and the first sub document.

[22:23:43] <codetroth> But I am not sure how I would add another sub document

[22:23:47] <GothAlice> $push

[22:23:51] <codetroth> Going through the Mongoose documents they give examples

[22:23:51] <fewknow> yep

[22:24:00] <codetroth> but not how to specify what parent document the child document will attach too

[22:24:21] <fewknow> its all the same document

[22:24:24] <fewknow> no parent and child

[22:24:34] <fewknow> just need the _id and the key for the sub document

[22:25:00] <fewknow> what is the json structure in mongo that you are getting after the first write?

[22:25:27] <GothAlice> So, in plain mongo shell syntax, you'd do this: db.conversations.update({_id: ObjectId(id of conversation)}, {messages: {$push: {… message sub-document …}}})

[22:26:21] <GothAlice> Er, flip that to "{$push: messages: …"

[22:26:27] <codetroth> http://pastebin.com/vjKWk1e7

[22:26:29] <GothAlice> (I'm too used to my ODM abstraction… ¬_¬)

[22:26:33] <codetroth> That is what it looks like after the first write

[22:26:54] <fewknow> GothAlice: that is really the best way to learn mongo

[22:26:57] <GothAlice> Uhm, chuldren isn't an array there…

[22:27:01] <fewknow> then you can apply to all damn ORM's

[22:27:19] <GothAlice> fewknow: Aye, that's why I generally use mongo shell examples on IRC. :)

[22:27:36] <fewknow> codetroth: do you want any array of the messages?

[22:27:43] <fewknow> or each message is a sub document?

[22:27:59] <codetroth> Which do you think would work best?

[22:28:01] <fewknow> one is a $push the other is a $set

[22:28:08] <codetroth> This is my first foray into this type of databsae

[22:28:13] <GothAlice> … you'll have multiple messages in a conversation, right?

[22:28:18] <codetroth> Yes

[22:28:27] <GothAlice> Then you definitely want an array of sub-documents.

[22:28:28] <fewknow> do you need to access individual messaegs ever?

[22:28:46] <fewknow> or just all messages for convo?

[22:28:47] <GothAlice> fewknow: In an array, there's $elemMatch, as well as the various $ operator tricks.

[22:28:52] <GothAlice> Plus slicing.

[22:28:52] <codetroth> No when I access them it will be as a group

[22:29:02] <fewknow> array is best then

[22:29:21] <fewknow> GothAlice: yeah...but you can avoid if they are all sub docs by just referencing the "key", right?

[22:29:45] <fewknow> for messages it doesn't really matter...i think in terms of stats

[22:29:55] <GothAlice> fewknow: Aye, but then you lose any possibility of indexing the data. Having sub-documents with their own ObjectIds is a good approach, and lets you individually update them, etc., etc.

[22:29:57] <fewknow> like metrics during the day down to the minute

[22:31:32] <fewknow> GothAlice: how do you losing indexing? {_id : id, messages : { message_id : {}, message_id:{} }}

[22:31:35] <codetroth> Part of my confusion has been this http://mongoosejs.com/docs/subdocs.html

[22:31:39] <GothAlice> {settings: [{key: "google-analytics", value: "GA-373640-2"}, …]} — You can index settings.key and rapidly get documents which have a Google Analytics tracking key set on them. $exists, not so much.

[22:31:39] <fewknow> can't you index messages.message_id?

[22:31:44] <codetroth> When I look at adding a sub document

[22:32:06] <GothAlice> fewknow: No, those are technically field names.

[22:32:23] <GothAlice> Requiring you to use $exists to find the document which contains a message of the given ID. Not indexable.

[22:33:02] <fewknow> GothAlice: right...they are....good call...can't index a key

[22:33:23] <fewknow> i use that structure for stats...but the only query is for the entire document....then the application does the heavy lifting

[22:33:40] <GothAlice> This distinction has bitten us at work. One of our migrations is to convert a {objectid: {…settings…}} mapping to a [{target: objectid, … settings …}, …] array.

[22:34:14] <GothAlice> Luckily only 118,000 or so records to update on that one. ^_^

[22:35:33] <GothAlice> codetroth: Oh! Say you want to know the average number of back-and-forth messages in a conversation. Pre-aggregation would help this by having each top-level conversation store a value that is the number of messages. (No need to re-count each convo each time you get the average.) When you $push (append) a new message, $inc (increment) the count by one <- baby's first pre-aggregation.

[22:36:18] <fewknow> codetroth: what are you confused about....there is a .push method off the parent.childern

[22:36:19] <codetroth> Under 100, this is going to be used for simple customer service support via chat and sms

[22:36:34] <codetroth> Maybe I am thinking about this wrong

[22:36:42] <GothAlice> db.conversations.update({_id: ObjectId(id of conversation)}, {$inc: {"stats.messages": 1}, $push: {messages: {… message sub-document …}}})

[22:36:44] <codetroth> but if I have converstaion A happening and Conversation B happening

[22:37:01] <codetroth> how do I attach the message to the correct conversation

[22:37:10] <codetroth> Because there will be up to 25 conversations happening at once

[22:37:45] <GothAlice> You would need a collection of "sessions" pairing some form of identity between supportee and supporter. I.e. SMS number to agent ID.

[22:37:57] <GothAlice> That session would be tied to a conversation.

[22:38:27] <fewknow> codetroth: this is the issue with using Mongoose...it tracks the object for you

[22:38:28] <codetroth> I am tying each session to the inbound number as that is what will be unique to each conversation but the same for the length of one conversation

[22:38:53] <fewknow> Model.update(query, { $set: { name: 'jason borne' }}, options, callback)

[22:38:59] <fewknow> so

[22:39:02] <GothAlice> Never a support call from an individual more than once?

[22:39:21] <fewknow> Message.update(query, {$push : { message: 'adlfkjdsfa'}})

[22:39:24] <codetroth> At the end the agent will close that conversation which will change the status to false

[22:39:25] <fewknow> where query is

[22:39:32] <codetroth> any false status will be removed

[22:39:38] <fewknow> var query = { _id: id };

[22:39:54] <fewknow> codetroth: check this out...http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html

[22:39:55] <codetroth> I think I get it now

[22:39:56] <GothAlice> codetroth: Uhm, there may be legal requirements as to auditing, so "removal" may be a too-simple approach.

[22:40:06] <GothAlice> Depends on your industry and clientele.

[22:40:30] <codetroth> By removed I just mean not included

[22:40:37] <codetroth> my query is phone, status

[22:40:46] <GothAlice> ^_^ Remember to make a compound index of that.

[22:40:49] <codetroth> so I am looking for a document where the phone number matches and the status = true

[22:41:02] <fewknow> codetroth: make that the _id

[22:41:07] <fewknow> the phone number

[22:41:19] <fewknow> it is unique

[22:41:41] <fewknow> but yes compound index {_id:1, status:1}

[22:42:04] <GothAlice> It's not, though. _id must be unique, which means you can't have a number with more than one record at any point in time.

[22:42:05] <codetroth> The entire reason I am using mongo is too document it lol. Because I could just accept the incoming connection and pass to the agent via socket.io and save nothing

[22:42:29] <fewknow> k

[22:42:38] <fewknow> GothAlice: yeah you would need timestamp too , my bad

[22:42:44] <codetroth> I appreciate the help all

[22:42:47] <fewknow> phonenumber timestamp

[22:43:25] <GothAlice> I'd keep _id as an ObjectId… seriously. Then no need for timestamps, you have a unique way to reference every record, and you can have multiple expired histories without issue.

[22:43:42] <GothAlice> Also, conversation creation time is built-in to the _id to boot.

[22:44:00] <GothAlice> (Letting you do date range queries, i.e. everything older than 90 days, on _id alone.)

[22:44:39] <GothAlice> (And sort by time… on _id.)

[22:46:48] <codetroth> I think part of my problem is using mongoose. I may drop it and use mongo directly

[22:48:34] <GothAlice> Mongo's syntax is expressive and powerful, something which I found surprising when first converting over from SQL-land.

[22:49:42] <codetroth> I had found just about the same answer I got from you guys when using mongo directly

[22:49:51] <codetroth> the problem is Mongoose doesn't seem to have the best documentation

[22:50:15] <GothAlice> Mongoose has a number of deficiencies. ;)

[22:50:17] <codetroth> var Parent = mongoose.model('Parent');

[22:50:17] <codetroth> var parent = new Parent;

[22:50:17] <codetroth> // create a comment

[22:50:17] <codetroth> parent.children.push({ name: 'Liesl' });

[22:50:17] <codetroth> var subdoc = parent.children[0];

[22:50:20] <GothAlice> Ack.

[22:50:23] <GothAlice> Don't do that.

[22:50:23] <codetroth> That is how you have to do a push

[22:50:57] <codetroth> lol sorry

[22:51:33] <codetroth> I had the same issue with the twilio library for node.js. It was easier just to queue the rest API directly than try to use helper libraries

[22:51:54] <fewknow> GothAlice: you can do a raw mongo command with mongoose

[22:52:27] <fewknow> http://mongoosejs.com/docs/2.7.x/docs/updating-documents.html

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 25th of February, 2015