[04:09:03] <kngslyr> joannac, thank you. Does MongoDB support this? I could not find any reference in the mongodb documentation. Is it safe to assume that all the config parameters listed on wiredTiger website are supported by MongoDB?
[04:19:59] <joannac> kngslyr: unsure. SERVER-16609 looks relevant. otherwise open a server/docs ticket?
[04:59:45] <Freman> soooo I have a bunch of data stored as a bunch of json files on disk (52977) and it seems somewhat ideal to store this in mongo... I was originally planning to index it with sphinx but... I wonder if mongos text searching is good enough...
[08:07:59] <Kota> i'm having an issue with sort performance that I can't quite figure out. say I sort descending on a field (which is indexed ascending) the query takes .9 seconds. Now if i do an ascending sort it takes upwards of 4 seconds. what on earth is going on there
[08:08:30] <Kota> i've tried all manner of indexing desc and asc, and even both at the same time, but still the same result
[08:09:18] <Kota> which is quite frustrating because i need the data ascending, and it blocks everything up for that 4 seconds
[08:12:53] <ZorgHCS> Kota: Have you tried using explain() to make sure the index is being used both times?
[08:16:30] <Kota> the ascending query scans almost 4 times as many objects as the descending query
[08:18:06] <Kota> ZorgHCS: it appears to use the index both times, however the ascending sort scans 1.9 million objects, whereas descending scans 500 thousand
[08:18:49] <Kota> otherwise i am unsure of what i'm looking at in explain
[08:19:21] <ZorgHCS> Are you using a limit()? It sounds like 1 is selecting the first 1000 it finds, the other is having to order the entire collection to get the last 1000 it finds.
[08:19:46] <ZorgHCS> I'm still new to MongoDB myself so I'm not entirely sure why it would do that :/
[08:22:45] <ZorgHCS> yeah I love MongoDB now I've started using it, it's quite a transition though when you're used to MySQL :D
[08:23:48] <ZorgHCS> My system is still in production, I'm not looking forward to the day I have a million Documents to deal with lol
[08:24:23] <Kota> db.chat_pms.find({ $or: [ { $and: [ { from: '535082e36324a96b1f9bab74' }, { to: '535082db6324a96b1f9ba0d8' } ] }, { $and: [ { from: '535082db6324a96b1f9ba0d8' }, { to: '535082e36324a96b1f9bab74' } ] } ] }).sort({ _id: 1 }).skip(64 - 30).limit(30) is where im at currently
[08:24:52] <Kota> obviously in the app we are running a count query beforehand but for the purposes of testing i insert the value manually
[08:25:20] <Kota> *basically* this does what i need lol
[08:25:37] <Kota> i'm unsure of how to refactor that, because if i limit the opposite way i get the wrong end of the collection
[08:27:00] <ZorgHCS> can you not sort: -1 then remove the skip entirely and limit 30... that would get you the same 30 rows in opposite order without having to skip over the whole collection?
[08:27:21] <ZorgHCS> you could always reverse them in the code after mongodb has sent them
[08:35:27] <ZorgHCS> yeah I have no idea about that one :/
[08:36:57] <Kota> joannac: how do i remove a duplicate index? http://puu.sh/gM2Ze/dab42029e0.png ive got two _id_ indexes somehow and they both sort _id ascending. dropping either does nothing
[08:39:03] <Kota> ZorgHCS: importing from mysql was a dream though... 50k users and converting all of the relational type data into mongo took around 11 seconds
[08:39:22] <Kota> the previous import we did from mysql to mysql took around 20 minutes
[08:40:06] <ZorgHCS> I started a new project in MongoDB my old project still uses MySQL. I haven't given any thought to converting it yet
[08:40:28] <Kota> this application didn't exist in the previous database, so luckily we didn't have to go through the pain of importing 5 million rows around
[10:07:26] <alexi5> i am thinking of devloping a central authentication database for my web applications and thinking of using mongodb. Is it worth doing this with mongodb or would it be better to stay with a directory server (LDAP) ?
[10:18:07] <d0x> Is it possible to replace a whole collection? I have an read only Collection that gets constructed by an Hadoop Job. Everytime the Hadoop Job finishes (what is around twice a day) the whole collection should be replaced (without an downtime during shipping the data). Is that possible?
[10:19:10] <d0x> I like to implement the lambda architecutre from the Big Data book
[11:29:38] <Freman> well, I'll be damned, I got data into mongo
[13:02:07] <jigax> hi anyone around that could help mw with a simple aggregate query
[13:31:29] <d0x> Is it possible to replace a whole collection? I have an read only Collection that gets constructed by an Hadoop Job. Everytime the Hadoop Job finishes (what is around twice a day) the whole collection should be replaced (without an downtime during shipping the data). Is that possible?
[13:31:31] <d0x> I like to implement the lambda architecutre from the Big Data book
[13:41:33] <cheeser> your job would write its data to a new collection A'. once that's done, a follow up job would rename A to A'', rename A' to A, delete A''
[14:07:42] <jigax> Hi, anyone around to kindly help me with this question.
[14:48:42] <jigax> the only small fix to your second group was adding "_id" to the label to show as "$_id.label"
[15:37:39] <android6011> using Object.bsonsize(db.stuff.findOne()) for me returns 534. what is the max that should return to still be considered safe for performance?
[15:38:06] <android6011> (i know that only takes the size of 1 doc into account)
[15:44:22] <GothAlice> android6011: Considering projection is a thing (allowing you to select a subset of the origin document) your question doesn't apply. There's a upper bound limit of 16MB, but other than that, not much else to consider in terms of data size. As usual, it all comes down to how you query your data.
[15:45:30] <android6011> GothAlice: sorry maybe I worded that wrong, I mainly meant as far as a limit. so is it impossible to go above 16MB?
[16:24:29] <StephenLynx> so, yesterday I solved an issue I had with my pinned forum threads.
[16:24:55] <StephenLynx> the issue was sorting the threads by using something that was not a value, but a condition, having a certain string on a sub-array
[16:25:31] <StephenLynx> I solved it by pre-aggregating the condition as a number in another field, 0 or 1. So now I can sort the threads by this field after I sort them by other value.
[16:26:07] <StephenLynx> my question is: there could be a way to do this without neither pre-aggregating it nor using application code?
[16:26:31] <StephenLynx> I tried to project the condition, but I couldn't get to evaluate it on a $cond block.
[16:41:17] <shlant> morning all! is there a way to just install mongodump/restore on debian? a mongo client package or something>?
[16:49:45] <StephenLynx> shlant without installing the db itself?
[16:49:53] <StephenLynx> yes, you can install just mongo tools.
[16:51:09] <bdiu> in an aggregate, how can I get a sum total of multiple fields something like.... {$group:{field1:{$sum:"$field1},field2:{$sum:"$field2"}, field3:{$sum:{$add:["$field1","$field2"}}
[16:51:30] <shlant> StephenLynx: what is the package called?
[16:51:59] <StephenLynx> from http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian/ mongodb-org-tools
[16:52:36] <StephenLynx> bdiu you just do that, I guess.
[16:59:42] <Derick> mdavid613: best to contact sales :)
[16:59:58] <Derick> (or post to stackoverflow, or our google group)
[17:00:18] <mdavid613> I posted on the google group just now….I posted to MongoDB JIRA last Monday
[17:01:16] <mdavid613> the strange part is the mongodb logfile doesn't show an authenticate command when I run mongorestore, but the cluster members show authenticate messages when they heartbeat each other
[17:06:26] <pamp> Hi in mongodb 3.0 with WT snappy compression, the issue of memory usage is equal?
[17:07:23] <pamp> the OS scale use of memory until it is at the limit?
[17:44:06] <d0x> Any Idea why I get the following Error when I try reading a BSON and storing it with PIG and the MongoDB Hadoop Connector on EMR? Error ERROR 6000: <line 8, column 0> Output Location Validation Failed for: 's3://...bson More info to follow: Output directory not set.
[17:44:40] <d0x> The pig script loads and stores the data. No manipulation at all. And loading works great
[17:44:51] <GothAlice> pamp: There are many open tickets regarding WiredTiger and memory usage. https://jira.mongodb.org/browse/SERVER-17421 https://jira.mongodb.org/browse/SERVER-17424 https://jira.mongodb.org/browse/SERVER-17386 https://jira.mongodb.org/browse/SERVER-17456 https://jira.mongodb.org/browse/SERVER-17542 ancillary: https://jira.mongodb.org/browse/SERVER-16311 https://jira.mongodb.org/browse/DOCS-4844
[17:45:24] <StephenLynx> GothAlice between these memory issues and count being slow and some other stuff, would you say 3.0 is not production ready yet?
[17:45:28] <d0x> StephenLynx: Then it gets time: https://pig.apache.org/ :)
[17:45:48] <GothAlice> StephenLynx: Count being slow? Mine aren't. Do you mean .skip()/offset? (Which certainly did degrade vs. mmapv1.)
[17:46:07] <cheeser> 3.0 is production ready. but there's a reason WT isn't the default engine yet.
[17:46:08] <StephenLynx> I don't know, some dude ran count on something and had very slow performance.
[17:46:19] <StephenLynx> and he ran from terminal.
[17:46:32] <GothAlice> StephenLynx: And no, I can't recommend production use when my own production data causes the cluster to rotate segmentation faults. (Nothing stays primary for very long in testing…)
[17:46:35] <cheeser> "some dude" isn't a great reference ;)
[17:48:14] <GothAlice> In my own forums "pinned" is a flag, and you can index/sort on booleans, so it's a non-issue. :/
[17:48:19] <epx998> How would a new secondary be detected by a primary in a different datacenter?
[17:48:35] <cheeser> epx998: you'd add it to the replica set
[17:49:09] <GothAlice> epx998: Nodes in a cluster communicate topography changes to each-other. Adding it to the replica set on the master = all the secondaries rapidly find out about it. Then after that you can use any server to find out which nodes are available.
[17:49:59] <StephenLynx> Wich is working fine, I don't get to make any additional queries.
[17:50:26] <StephenLynx> but if I could shave that field from my model without any costs, I would like that better.
[17:51:24] <GothAlice> StephenLynx: My "extra data" properties typically are simple mappings, i.e. {topic: "Hello", flags: {sticky: True, …}} — vs. {topic: "Hello", flags: [{name: "sticky", value: True}, …]} — I technically do both, but depending on needs (i.e. querying on it easily), I'll pick the former vs. the latter.
[17:51:53] <epx998> can a replica set option be added to a standalone via the shell, without stopping/restarting mongod?
[17:52:15] <GothAlice> epx998: AFIK, no. Promotion to membership in a set requires a restart due to a required configuration/command-line change.
[17:52:47] <StephenLynx> I just use plain strings instead of objects, so I can have any number of boolean settings without any additional fields.
[17:56:24] <StephenLynx> and I couldn't figure how to use an operator to project based on the condition of the document having the string in the array of settings.
[17:57:14] <GothAlice> You might be able to do it with http://docs.mongodb.org/manual/meta/aggregation-quick-reference/#set-expressions — normal array manipulation in an aggregate, however, requires unrolling.
[17:58:03] <GothAlice> $setIntersection between the document value and ['pinned'] would give you a boolean-ish answer. $cond or $setEquals to turn it into a straight boolean, then you can sort on it.
[17:58:28] <StephenLynx> $unwind? then it would have a severe impact on performance, wouldn't it?
[17:58:36] <GothAlice> That's why I'm not suggesting doing that. ;)
[17:58:51] <d0x> I posted the MongoDB-Hadoop Connector Question to Stackoverflow: http://stackoverflow.com/questions/29217152/error-6000-output-location-validation-failed-using-pig-mongodb-hadoop-connect
[17:58:52] <StephenLynx> besides, it would shave off any threads without any settings.
[18:00:56] <GothAlice> So, the above aggregate pipeline operators should do you. Or, you could pivot the data structure a tiny bit (a la the {flag: Bool} storage method) and not have to muck with any of this, allowing you to sort using standard find queries instead of aggregates. ;)
[18:01:29] <GothAlice> (Initially it might only make sense to store this one flag in the alternate way, due to querying requirements.)
[18:02:30] <GothAlice> While I appreciate the purity of a set-of-strings approach, practicality beats purity. (Alice's law #36.)
[18:02:47] <StephenLynx> thats why I pre-aggregate it now.
[18:03:50] <GothAlice> Laws 30-31: Simple is better than complex, complex is better than complicated. For a single flag, an aggregate with $cond/set operations is "complex", pre-aggregating is "complicated". Storing in a queryable way to begin with is "simple".
[18:04:48] <StephenLynx> I am thinking about removing stuff I have to pre-aggregate from the settings array.
[18:05:03] <StephenLynx> I used the array exactly so I would have less fields, but now I have the field anyway.
[18:06:16] <GothAlice> Indeed. An array of flags is an elegant and minimal way to store it, very pure, but doesn't align with how you need to query it, unfortunately. (Sorting is the kicker.)
[18:07:22] <StephenLynx> not only sorting, but for anonymous threads too, since I have to project based on the condition of the array having the value.
[18:07:31] <StephenLynx> so now I also pre-aggregate that too.
[18:09:14] <StephenLynx> or this weekend, since I just downloaded cities: skylines :v
[18:09:19] <GothAlice> https://github.com/bravecollective/forums/blob/develop/brave/forums/component/thread/model.py is my forums' thread model, with https://github.com/bravecollective/forums/blob/develop/brave/forums/model.py#L8-L14 being a "forum", "thread" common set of pre-aggregated statistics. Line 44 of the first file is the start of the flags.
[18:09:41] <StephenLynx> yeah, some stuff you can't help but to pre-aggregate.
[18:11:59] <StephenLynx> you don't have plain text documentation for model?
[18:12:24] <GothAlice> If you're curious how any of that code is operating, it's using the MongoEngine ODM: http://docs.mongoengine.org/
[18:13:16] <GothAlice> (Python also has first-class "doc strings", so any time you see a function/method/class with a string as the first line of code after the definition, that's documentation that can be inspected during runtime, too.)
[18:13:26] <StephenLynx> like this https://gitlab.com/mrseth/bck_lynxhub/blob/master/doc/model.txt
[18:13:42] <GothAlice> Looks like a rather large duplication of effort, to me.
[18:14:17] <StephenLynx> it make unecessary for the person to know the language or to run your code.
[18:14:52] <GothAlice> If I wanted to, I could add "doc strings" to the individual fields. If they're non-obvious enough to need it, I've failed, though. Like having an "authors.txt" file in a project, such things are silly and can be pulled form the code itself. Also, I'm not really associated with that particular project any more (year or so), so don't blame me for its current state. ;)
[18:15:37] <GothAlice> StephenLynx: Also considering I re-name all of the fields, such a "field name to description" mapping file would be pretty much useless anyway.
[18:16:01] <StephenLynx> so you update the documentation.
[18:16:18] <GothAlice> And now you have additional developer overhead, when the entire file could be auto-generated on demand.
[18:16:42] <StephenLynx> then you have to maintain more code.
[18:16:47] <StephenLynx> which is an even bigger overhead.
[18:17:04] <GothAlice> StephenLynx: Really, like, auto-generated documentation FTW. To hell with manual labour. (Alice's law #57: if it's worth doing twice, it's worth building a tool to do it.)
[18:32:34] <d0x> Is it worth writing my Mongo-Hadoop Connection question to the monog-users mailing list?
[19:22:27] <GothAlice> I typically $project down to the bare minimum fields needed for a query as quickly as possible. Sometimes this takes several passes to clean up temporary vars.
[19:22:52] <StephenLynx> what about match and project?
[19:22:58] <StephenLynx> I match before I project.
[19:23:07] <StephenLynx> usually is match, project, anything else.
[19:23:48] <GothAlice> There are optimizations for having a $match/$project/$sort/$limit at the "top" of the aggregate pipeline. For example, $sort/$limit together will have a sort engine that only tracks the $limit number of values.
[19:32:24] <StephenLynx> yeah, I know there is already a sort>limit optimization.
[19:52:41] <GothAlice> StephenLynx: Well, it's important for $match, esp., as only the initial $match will utilize indexes.
[20:06:59] <mdavid613> hey anyone…just a general consensus…how stable is 3.0 or 3.0.1?
[20:07:08] <mdavid613> is anyone running it in production yet?
[20:07:37] <GothAlice> mdavid613: Pretty darn good. Yeah, I've upgraded, but I kept the cluster on mmapv1 as a storage back-end. From what I can tell, there are some kinks to iron out in WiredTiger.
[20:08:03] <bbclover> hey guys, is mongodb free if I were to make a small web application?
[20:08:21] <GothAlice> bbclover: Indeed it is. http://mongodb.org for the open-source side of it, http://mongodb.com for the commercial side.
[20:08:22] <mdavid613> I'm just wondering if 3.0.1 will help me out with the x509 restore stuff that I'm having issues with…since mongorestore doesn't seem to correctly support x509 auth in 2.6.8
[20:08:36] <GothAlice> bbclover: While the software is free, you'll still need somewhere to run it.
[20:10:24] <GothAlice> mdavid613: There are many notes to read if upgrading is to be a thing: http://docs.mongodb.org/manual/release-notes/3.0-compatibility/
[20:24:00] <GothAlice> (And on several occasions, reading the DB source helped answer questions.) For the most part, yes.
[20:24:15] <StephenLynx> yeah, it shouldn't be too hard to understand the code.
[20:24:16] <GothAlice> Well structured, too. (At least consistent, if it disagrees with a personal style. ;)
[20:26:35] <epx998> someone was telling me, i can tar up a database in the /data dir and scp it to a different mongo server, restart and it'll see it. that true?
[22:42:04] <sellout> Does the Java driver 3.0 provide some way to do eval()? I can’t seem to find it, except on the old DB class, and I don’t see how to get a DB from a MongoDatabase, so … I’m a bit lost.
[22:45:53] <joannac> sellout: eval is deprecated in 3.0
[22:46:12] <joannac> i would expect the java driver to no longer expose it
[22:46:54] <sellout> Ah, ok. I don’t think we actually use it anyway. Happy to toss that :D