pmxbot IRC Log Viewer

[00:13:02] <Freman> now to futz with it

[00:18:00] <Freman> simple 1:1 proxy, parsing the queries, now to add index checks

[01:38:00] <Freman> error: { "$err" : "No index was found for your query" }

[01:38:03] <Freman> hazar! it works!

[01:39:04] <Freman> is most simplistic... checks just that the first field in the query is the first field in any index

[02:50:33] <Waheedi> I've noticed something while using 2.4 server and client trying to read from nearest node using read preference

[02:50:50] <Waheedi> It seems its only sending reads to secondaries

[02:51:00] <Waheedi> nothing to primary?

[02:51:06] <Waheedi> while its the nearest

[02:51:10] <Waheedi> and a primary?

[03:21:22] <Boomtime> @Waheedi: when you use 'nearest' it is the driver that decides where to send queries - what driver (and version) are you using?

[03:21:43] <Waheedi> 2.4.1 mongo-cxx

[03:21:49] <Waheedi> where to send

[03:21:54] <Waheedi> server is 2.4.1

[03:22:01] <Waheedi> server is 2.4.14

[03:22:33] <Waheedi> thanks Boomtime

[03:22:53] <Waheedi> do you think the behavior should be different?

[03:23:49] <Boomtime> not necessarily - how many queries have you tested?

[03:24:45] <Boomtime> note you are using a very old version, so i can't be certain that behavior was intended - it isn't today

[03:45:34] <Freman> program complete, enter when ready

[03:45:53] <cheeser> how about a nice game of chess?

[03:45:55] <Freman> my proxy performs an explain on a given query and if no indexes are used it rejeects

[03:46:00] <Freman> error: { "$err" : "No index was found for your query" }

[03:46:27] <Freman> which is a shame, cos I couldn't find({happiness: {$in ["bewbies"]})

[03:46:50] <Freman> find({ids: {$in["bewbies"]}) works, cos there's an index in ids

[03:47:17] <cheeser> this probably isn't the best forum for "bewbie" jokes

[05:01:38] <Waheedi> Boomtime: I've tested 5000 queries a second!

[05:02:04] <Waheedi> I know its old but I don't want to upgrade the code base at the moment

[05:02:12] <Waheedi> that would not hurt

[05:02:56] <Waheedi> nothing goes to primary maybe 6 or 7 queries :)

[05:03:43] <Waheedi> If thats because its an old version then I can fix it, but if thats the default behavior I think something should be considered

[05:16:46] <Boomtime> @Waheedi: can you test with the latest version? the specification suggests that where the ping times are close (within 15ms by default) 'nearest' would round-robin but that only applies to later versions

[05:18:05] <Waheedi> 15 ms to thats correct

[05:18:36] <Waheedi> primary is 1 ms ping

[05:19:02] <Waheedi> secondaries are either 35ms

[05:19:09] <Waheedi> or 2ms

[05:19:40] <Waheedi> from client

[05:20:32] <Waheedi> sometimes i see words and characters coming after my sentences but they are not intended, this one is clear though

[05:20:57] <Waheedi> i do this in real life

[05:21:02] <Waheedi> anyway

[05:21:23] <Waheedi> Boomtime:

[05:21:26] <Waheedi> :)

[13:15:28] <knuto> I'd like to add AUTH capabilites to an already running mongo database.. is this possible? (I'm getting "Unable to connect" etc.) is it possible in theory?

[13:57:37] <Lope> Apparently this is an unsupported projection? db.foo.find({_id:{$gt:ObjectId('56b0b138c16e3195036bb7f5')}});

[13:58:03] <Lope> I used to do this to find object Id's created after a certain date?

[13:59:21] <StephenLynx> that is no a projection

[13:59:54] <StephenLynx> and I could use that kind of query.

[13:59:57] <StephenLynx> exact same thing.

[14:00:00] <Lope> I don't know that a projection is

[14:00:18] <cheeser> a projection limits the fields returned

[14:00:26] <StephenLynx> or modify them

[14:00:30] <Lope> oh okay

[14:00:32] <cheeser> right

[14:00:54] <Lope> oop oops. by bad. okay. it;'s been a while since I've mongo'd\

[14:01:01] <StephenLynx> what error are you getting with that?

[14:01:40] <Lope> What I wrote above was not my actual query. Sorry. My bad again. Maybe I should go to bed :)

[14:01:46] <StephenLynx> kek

[14:01:54] <Lope> it was more like db.foo.find({},{_id....

[14:02:07] <StephenLynx> yeah, that has a project.

[14:02:11] <StephenLynx> projection*

[14:11:26] <Lope> how can I convert a ObjectId to a timestamp?

[14:11:47] <nixstt> objectid->getTimestamp

[14:12:11] <nixstt> Lope: https://docs.mongodb.org/v3.0/reference/method/ObjectId.getTimestamp/

[14:12:18] <StephenLynx> that depends on the driver.

[14:12:48] <Lope> ObjectId().getTimestamp() seems to do it

[14:12:55] <Lope> Thanks guys

[14:15:01] <Lope> is there a way I can mutate results in the mongoDB shell, so that my ObjectIds are returned as timestamps?

[14:17:35] <asuraphel> anyone using mongo with Jaspersoft?

[14:17:48] <asuraphel> I'm trying to create a report that involves two collections on mongodb 2.6

[14:17:59] <asuraphel> how did you do it?

[14:19:04] <yossarianuk> hi - looking to setup a mongodb replica server- can this be done only using 2 servers or do I need 3 ?

[14:19:59] <StephenLynx> asuraphel, you have to query the collections separately

[14:20:38] <StephenLynx> yossarianuk, I am not 100% sure, but I think the 3rd server is the arbiter that gets to pick one of the 2 other servers to step in when one fails

[14:21:10] <StephenLynx> so my GUESS is that while you don't need 3 servers, having 2 will only duplicate your data without adding resilience to your setup.

[14:21:29] <StephenLynx> but I am not experienced with replica sets so you shouldn't take just my word.

[14:21:51] <yossarianuk> StephenLynx: thank you so much for your input

[14:22:05] <yossarianuk> StephenLynx: what you are saying matches this guide/artice -> StephenLynx

[14:22:12] <yossarianuk> whoops -> https://devops.profitbricks.com/tutorials/configure-mongodb-replica-set/

[14:24:06] <yossarianuk> however slightly confused now.... With Mysql for example we have MASTER -> MASTER replication bewteen 2 data centres, if a DC ever went offline we would still have the data, if we need one primary server set to replicate to2 secondary servers we won't have a failover for the primary?

[14:25:11] <StephenLynx> with replica sets a different servers steps in as primary.

[14:25:26] <StephenLynx> this is why it isn't master/slaves anymore

[14:25:33] <StephenLynx> because any member can become the primary.

[14:29:42] <StephenLynx> your concern is that in that case the data will stop being duplicated?

[14:32:32] <yossarianuk> StephenLynx: yes, that is the concern

[14:33:07] <StephenLynx> then you

[14:33:11] <StephenLynx> A: have more servers

[14:33:21] <StephenLynx> B: fix the one that went offline, it will sync when it gets back

[14:43:28] <pchoo> Hello all, we've got two stacks in two different DCs, looking to get a replica set, consisting of one on either side, and then a third party. We're looking at using an amazon Micro Instance for the arbiter, would that suffice, or would you recommend something with a higher spec?

[14:47:41] <asuraphel> StephenLynx, what patterns are usually used to solve this?

[14:47:47] <asuraphel> I mean map-reduce etc

[14:47:55] <StephenLynx> none.

[14:47:58] <asuraphel> aha

[14:47:59] <asuraphel> ok

[14:48:09] <StephenLynx> you can't perform a single action on multiple collections.

[14:48:28] <asuraphel> my PM is convinced 100%

[14:48:43] <asuraphel> have you by any chance used Jasper

[14:48:45] <asuraphel> ?

[14:48:51] <StephenLynx> what is PM?

[14:48:53] <StephenLynx> what is jasper?

[14:48:57] <asuraphel> Project Manager

[14:49:03] <asuraphel> Jasper Reporting Engine

[14:49:04] <asuraphel> haha

[14:49:07] <StephenLynx> tell him I said he is a retard.

[14:49:18] <asuraphel> lol

[14:49:21] <StephenLynx> and jaspter != mongo

[14:49:26] <StephenLynx> mongo can't do that.

[14:49:42] <StephenLynx> even if you slap some abstraction layer on top of mongo that does that behind the scenes

[14:50:05] <StephenLynx> it is just fetching data from two separate collections on two separate queries and doing some application-side operation to output a result

[14:50:54] <asuraphel> StephenLynx, cool

[14:51:06] <asuraphel> good to know I'm not missing something basic

[14:51:09] <asuraphel> tnx

[15:32:48] <the_drow> Is there a way to install only mongo tools on OSX?

[15:33:14] <the_drow> I mean without compiling it

[15:40:14] <GothAlice> the_drow: You can "cask pour" using homebrew to install a binary version instead of compiling from source.

[15:40:51] <GothAlice> Additionally, the downloads on the official site are an (effectively, since it's technically not possible on Mac) statically compiled version of the command-line tools you can just untar somewhere and run in-place.

[15:41:23] <GothAlice> https://www.mongodb.org/downloads#production

[15:43:24] <GothAlice> the_drow: brew search mongodb would indicate that the "cask to tap" (I love homebrew terminology!) Caskroom/cask/mongodb

[15:43:39] <GothAlice> *is

[15:50:29] <GothAlice> Uh, wow. https://docs.mongodb.org/manual/core/index-unique/#unique-index-and-missing-field and https://docs.mongodb.org/manual/core/index-sparse/#sparse-and-unique-properties would seem, on the surface, to contradict each other directly.

[15:50:57] <GothAlice> "An index that is both sparse and unique prevents collection from having documents with duplicate values for a field but allows multiple documents that omit the key." vs. "Because of the unique constraint, MongoDB will only permit one document that lacks the indexed field. If there is more than one document without a value for the indexed field or is missing the indexed field, the index build will fail with a duplicate key error."

[15:51:55] <GothAlice> Aaaah, no, the sparse flag just wasn't actually getting set on index creation from the ODM layer. >____<

[15:59:43] <the_drow> GothAlice: it's kind of a bummer to install mongodb itself if you only need the tools

[16:00:08] <GothAlice> the_drow: How so?

[16:00:14] <cheeser> is it? it's just one command to install either way.

[16:00:19] <GothAlice> I mean, unless you launchctl load it, it's hardly consuming system resources.

[16:00:28] <the_drow> Because it takes longer and bloats your computer with stuff you don't need

[16:00:36] <the_drow> I use docker to run mongo

[16:01:00] <GothAlice> the_drow: Well, that certainly bloats more than just un-tarring the tarball somewhere and running the resulting files.

[16:01:03] <GothAlice> But that's your choice.

[16:01:16] <cheeser> takes longer? i don't think so

[16:01:51] <StephenLynx> docker is ass

[16:02:20] <GothAlice> the_drow: By running on Docker, you're hardly actually running MongoDB on Mac. You're running it on Linux in a VM. Any overhead is… self-inflicted.

[16:54:03] <csd_> How do I search for all collections that have a subtree that matches a certain pattern? (e.g. i dont know what the outer key will be)

[16:54:34] <csd_> so {keyvaries: {_id: number}}

[16:59:37] <StephenLynx> you can't query more than one collection at a time

[16:59:54] <csd_> sorry i meant all documents

[17:00:02] <Tomasso> why could that be that when I calculate this average the result is always 0 ?

[17:00:04] <Tomasso> db.getCollection('Accuchecks').aggregate({$group: {_id: "$algorithm",accuracy: { $avg: "$difference.close"}}})

[17:01:46] <StephenLynx> i dont think thats possible

[17:01:49] <StephenLynx> csd_,

[17:03:03] <csd_> i cant use elemMatch to do it somehow?

[17:03:15] <Tomasso> haha me neither.. but the values from the field difference.close almost never are 0

[18:04:52] <uuanton> hi anyone can help find the documentation on how to rollback replica set to backup of amazon snapshot ?

[18:05:38] <uuanton> restore replica set using swapping elastic block storage drives ?

[19:11:03] <renlo> how do you install / (specifically update) the mongo shell without installing mongod ?

[19:13:04] <StephenLynx> there are separate packages

[19:13:08] <renlo> mongodb-client?

[19:13:14] <StephenLynx> i think s

[19:25:12] <idd2d> I've got some questions about modeling some data relationships in a new app with mongo, but it's likely kind of involved to give enough context to explain the relationships. What's the protocol for this?

[19:25:37] <idd2d> I guess I could just throw it out there, explanation and all, and hope someone bites? Just hate to blow the time if nobody is feeling up to helping out

[19:34:53] <GothAlice> Took a year, but https://jira.mongodb.org/browse/SERVER-15815 was fixed by https://jira.mongodb.org/browse/SERVER-15176

[19:40:41] <rkgarcia> :O

[19:44:13] <StephenLynx> so now you can get tailable cursors that never cose?

[19:44:16] <StephenLynx> close*

[19:44:33] <GothAlice> No, now I have fine-grained control over getMore timeouts across tailing cursors.

[19:45:19] <GothAlice> I.e. myAwesomeTask.result(timeout=2) — actually waits two seconds now instead of waiting for the cursor to die of old age and/or try, sleep a second, and repeat polling.

[19:45:53] <GothAlice> It's not that they don't close, it's that they close when I tell them to.

[19:47:07] <GothAlice> idd2d: And yeah, the general process is to throw ideas out to get feedback; "ask, don't ask to ask" being the general idea. IRC is asynchronous, and nobody can

[19:47:13] <GothAlice> answer a question not posed.

[19:47:43] <idd2d> thanks

[19:48:12] <GothAlice> "Modelling relationships" is sticky business when not using a relational database, though. I often link http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html as a brief primer to some of the concerns.

[19:48:20] <StephenLynx> welp. i guess im stuck to unix sockets.

[19:48:33] <GothAlice> StephenLynx: Memory mapped, efficient, secure. What's not to love about UNIX domain sockets?

[19:48:42] <StephenLynx> permissions.

[19:48:48] <GothAlice> That's the secure bit. ;P

[19:49:30] <StephenLynx> yeah, but I rather have it secured by the authentication method I am already using over mongo.

[19:49:41] <StephenLynx> instead of having to handle a second security method.

[19:49:57] <GothAlice> Well, you could just 666 the on-disk socket and rely on the upstream auth.

[19:50:03] <GothAlice> (666 - mode of the beast)

[19:50:28] <StephenLynx> ok, but when? when I start listening for the socket it won't automatically create the file

[19:50:44] <GothAlice> But the memory mapped thing is great, and the elimination of TCP overhead for local connections is awesome.

[19:50:47] <StephenLynx> no matter how I look at it, I will have to add more code to handle that.

[19:51:08] <StephenLynx> while if I had a tailable cursor that never closes, there would be no code overhead.

[19:51:24] <StephenLynx> and since this feature doesn't need to perform fast, the performance is not an issue

[19:51:44] <StephenLynx> so the benefits of the unix socket are moot for me.

[19:52:55] <GothAlice> Uhh, binding to a domain socket creates it if possible. It's then up to the application to chown/chmod it as needed.

[19:53:09] <GothAlice> http://beej.us/guide/bgipc/output/html/multipage/unixsock.html

[19:53:23] <StephenLynx> yeah, I will read more on it eventually.

[19:53:55] <StephenLynx> and my software can't change its permissions sanely, because I don't mandate permissions or ownerships.

[19:54:08] <StephenLynx> that is up to the sysadmin

[19:55:49] <GothAlice> Most daemons like MongoDB generally accept a "user/group to become" option and re-use that when constructing their on-disk sockets. Some, like the FastCGI on-disk socket adapter I use in Python, also lets you set a "umask" for the sockets; this sets default permissions configurably.

[19:56:55] <StephenLynx> if I do that, I increase complexity overhead.

[19:57:19] <StephenLynx> more features, more bugs, more problems.

[19:57:36] <StephenLynx> I just assume the person is using the intended system user to run the application.

[19:57:50] <uuanton> Hey Alice, by any chance do you know strategy how to restore replica set from aws snapshot ?

[19:57:50] <StephenLynx> and this user has the permissions required to operate.

[19:59:14] <GothAlice> uuanton: Alas, I switched away from AWS before we really started pickup up MongoDB at work. :(

[20:00:07] <GothAlice> uuanton: A first set of steps might be to clone a new EBS volume from the snapshot, delete the journal files, then spin up a standalone mongod across it to see if you can get your data readable; it may require a --repair depending on your snapshot process.

[20:00:42] <GothAlice> You could then follow the standalone-to-replaca-set tutorial to bring things back to production-worthy: https://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/

[20:05:54] <uuanton> i've done above steps, after i switch volume on all 3 replica set members and reconnect secondaries they resync from scratch

[20:07:54] <GothAlice> Uhm.

[20:08:17] <GothAlice> So you're swapping out the disk storage for three configured nodes then spinning them back up?

[20:09:13] <GothAlice> Re-syncs happen when a secondary knows it's a member of a set and knows that its data is older than the current primary by longer than the replication oplog time. (It does this because it knows there is literally no way for it to actually "catch up" and still see everything that changed.)

[20:11:45] <uuanton> thanks a lot

[20:17:30] <GothAlice> uuanton: The "bring it back up as a new primary, then promote back into a replica set" approach will sync the new replicas, unless you take a new snapshot and clone it in under the oplog time, but will be potentially more reliable/controllable a process than swapping out the underlying storage of existing nodes and just restarting them.

[20:17:50] <dylush> hey anyone here use mongoose?

[20:18:09] <GothAlice> dylush: I've only looked at it long enough to Choose Something Else™. ¬_¬

[20:18:28] <GothAlice> Something about it storing _id ObjectIds as actual hex-encoded strings.

[20:18:37] <dylush> really? is there better out there?

[20:18:43] <GothAlice> The bare driver?

[20:19:01] <dylush> just the node package

[20:19:12] <GothAlice> MongoDB itself now has schema validation, so the need for an ODM layer to do that is becoming excessive.

[20:19:43] <dylush> oh

[20:20:02] <GothAlice> Ref: https://www.mongodb.com/blog/post/document-validation-part-1-adding-just-the-right-amount-of-control-over-your-documents which introduces the idea, and the documentation: https://docs.mongodb.org/manual/core/document-validation/

[20:20:43] <GothAlice> :)

[20:21:02] <GothAlice> I'm a Python developer, and even I'm seriously considering switching back to the plain MongoDB-provided pymongo driver.

[20:21:22] <dylush> oh ok

[20:21:46] <dylush> yeah its funny you mentions its _id storing

[20:22:20] <dylush> I have an issue where mongoose is throwing a cast error saying I have an undefined _id even though the update works

[20:26:15] <Spynxic> I'm using pymongo and I noticed that when I insert a document into a collection like so, db.collectionname.insert({ someid:{ someotherid:somevalue }, .. }) the someid document gets an _id but someotherid document doesn't -- does that mean it isn't a document?

[20:26:17] <GothAlice> dylush: Have you run into accidentally constructing a collection literally named "[object Object]" yet? How to fix that once it happens makes for a fun MongoDB challenge question.

[20:26:37] <GothAlice> Spynxic: Wait; what's "someid"?

[20:27:24] <Spynxic> It's a key value pair that should have a document as its value

[20:28:11] <Spynxic> accessible as db.collectionname.someid.someotherid

[20:28:12] <GothAlice> Spynxic: To answer the question, though, only the topmost document has an _id field added to it (by default with a newly generated ObjectId generated on your application-side prior to pymongo sending the request off to the server). Embedded documents are stored as-is, and ObjectId generation is skipped if you provide your own _id at the top level.

[20:28:27] <GothAlice> MongoDB is not intended to be used the way you are seeming to.

[20:29:29] <GothAlice> http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html

[20:30:33] <Spynxic> Thanks, I'll give it a read

[20:30:44] <GothAlice> Adding your own _id to the embedded documents is OK, but that field won't be automatically indexed. (So remember to create that index, if possible.) from bson import ObjectId – simply creating a new instance of ObjectId() will generate one according to the standard.

[20:31:18] <GothAlice> (My forums do exactly this to add an _id to each reply, embedded in the thread.)

[20:33:16] <dylush> @GothAlice Not that I am aware :S it's strange the update works perfectly but I get the recurive cast error in my console

[20:40:42] <Spynxic> One more question, unrelated to the last. Is there a method to iterate the fields of a document similar to iterating through a dictionary (in Python) like so, for key,value in dict.iteritems()

[20:42:39] <GothAlice> for key in somedict: …

[20:42:57] <GothAlice> (Documents just being dicts in Python.)

[20:45:05] <GothAlice> (_Actually_ iterating dictionaries, and not the result of calling one of their iteration methods, iterates keys.)

[20:45:25] <GothAlice> Makes even having .iterkeys() somewhat silly. ;P

[20:50:02] <Spynxic> If I'm iterating {"a":1, "b":2, "c":3} then I can get 1 2 and 3 from your example but I can't seem to find a way to get the a b or c from the value

[20:52:46] <Spynxic> One solution would be to put the key in the value like so, {'a': {'key':'a', 'value':1}, 'b':{... but that's my last resort

[20:53:07] <GothAlice> Sorry, I'm not following.

[20:53:54] <GothAlice> http://s.webcore.io/0a3l0K300S38 < works for me

[20:54:18] <GothAlice> Order isn't preserved, but that's a Python quirk of how dictionaries work (by hash, not value).

[20:57:02] <Spynxic> Oh thank you, list(dict) is exactly what I'm looking for. I'm using the key to access a value in a different document with the same key

[20:59:47] <GothAlice> Spynxic: If iteration is your goal, getting the list of the dict is generally not what you want. It needlessly constructs a list prior to iteration if used like for field in list(somedict).

[21:00:08] <GothAlice> Spynxic: Similarly, if you're checking for the presence of a field, this'll do: "somefield" in somedict

[21:02:37] <GothAlice> (One almost never needs to turn the dictionary into something else just to get the keys, or just to get the values, or the combination of the two as items.)

[21:28:49] <Freman> todays #1 fun job

[21:29:12] <Freman> figure out permissions so I can add a new role with the existing permissions and depreciate the existing role so it can only query not insert/delete

[21:46:26] <Freman> don't suppose mongo can list who's logged in

[21:49:42] <GothAlice> Hmm, Freman, db.serverStatus().connections only lists counts, not actual connections. But listing the current connections, if possible, should also list which identity they're authenticated as.

[21:52:19] <Freman> thanks GothAlice, it's not so important, the two services I know that should be writing have been moved to writer roles, just was curious to see if anything else was writing and should be updated before I dropped the writing role from the old user

[21:52:30] <Freman> someone will tell me if something's broken :D

[21:54:18] <Bajix> $group can only ever put a document into a single group?

[22:11:49] <shian48263> test

[22:24:02] <dddh> shian48263: time to sleep

[22:25:37] <Freman> hmm sleep

[22:29:39] <decl> hi. quick question and i know this isn't the right channel to ask... but do you happen to know if there's any kind of database that'd perform good on dropbox?

[22:29:43] <decl> like each record in a file?

[22:29:55] <decl> in a different file*

[22:30:57] <GothAlice> decl: None. The per-file latency is too high, and the gods know what happens if two hosts try to modify the same file. No atomic locking, …

[22:31:09] <decl> hmm

[22:31:26] <decl> i don't need a fast database

[22:31:42] <decl> there's the possibility of adding incremental diffs

[22:31:45] <cheeser> then couch. *badump*

[22:31:52] <cheeser> :D

[22:32:00] <decl> does couch use different files for each record?

[22:32:01] <GothAlice> It took Apple more than six years before they worked out most of the bugs of trying to use SQLite files over synced iCloud storage (formerly Mobile Me, formerly .Mac).

[22:32:18] <decl> wew 6 years?...

[22:32:49] <GothAlice> Yeah. It was so bad after the iCloud name change that most app developers, when the APIs were fresh and newly public, refused to use it. Thus my task manager has Things Cloud instead of iCloud sync.

[22:33:57] <decl> never heard of things cloud

[22:34:06] <GothAlice> Private synchronization service for the task manager, Things.

[22:34:17] <decl> oh, i see

[22:34:54] <GothAlice> If something as "simple" as a to-do list can't get distributed file-based data storage to work…

[22:35:16] <decl> ynab does it

[22:35:35] <decl> conflicts need to be handled manually

[22:35:37] <decl> though

[22:35:53] <decl> the system seems to be fairly simple

[22:36:38] <decl> i could write my own but was wondering if there was anything pre-made

[22:36:50] <uuanton> hey Alice I followed your advice to drop replica set settings and reconnect replica set. Secondaries start resyncing from scratch

[22:37:00] <GothAlice> uuanton: As I mentioned they would.

[22:37:39] <uuanton> i know that oplog collection is the key in that relationship

[22:38:09] <uuanton> but i need to remove rs.conf() settings as they from production

[22:38:15] <GothAlice> https://docs.cloud.mongodb.com/tutorial/use-restore-to-seed-secondary/ < you'd need to follow something like this to make a new snapshot of the new primary to transfer to the new secondaries, as the snapshot the primary was constructed from is "too old" to meet the oplog requirement for the secondaries.

[22:39:05] <GothAlice> Er, sorry, that's MMS/Cloud docs.

[22:39:05] <uuanton> im so interesting into seedSecondary.sh but anyone has that file ?

[22:41:34] <Freman> "$err" : "No index was found that could be used for your query try db.log_settings.getIndexes()"

[22:41:36] <uuanton> problem is that snapshot is from production with different replset settings. But restore is for staging and replicaset settings need to be modified

[22:41:40] <Freman> now I just need an error code to put with it

[22:42:12] <GothAlice> https://docs.mongodb.org/manual/tutorial/reconfigure-replica-set-with-unavailable-members/ should help with that last one, uuanton.

[22:42:40] <Freman> maybe I can borrow 17334 or 17357

[22:44:35] <uuanton> @GothAlice thanks

[22:44:53] <GothAlice> uuanton: Basically, restore a single member, convince it it's not a replica set, re set up the replica set, optionally with a fresh snapshot/backup to restore to the new secondaries. That's the process, and the last link there outlines how to force it if it won't cooperate. ;)

[22:48:57] <uuanton> database pretty big snapshotting primary again would take hours

[22:49:41] <GothAlice> uuanton: This is why on particularly large datasets I don't use filesystem snapshots or mongodump as a primary backup mechanism. I use _another_ secondary. ;)

[22:50:32] <Bajix> How is mapReduce query supposed to work?

[22:50:53] <GothAlice> Bajix: mapReduce in MongoDB is accomplished by sending JavaScript code to the server for execution.

[22:51:16] <uuanton> by another secodary what do you mean )

[22:51:55] <GothAlice> uuanton: Say I have three replicas in the datacenter where my app lives. I also have a pair of secondaries (hidden ones) in the office. And one at home.

[22:52:05] <Bajix> I get that part - what's confusing me here is that my map/reduce functions work, but if I add query, then I end up with output that hasn't been reduced

[22:52:13] <GothAlice> Absolute worst-case, I can serve our app from anywhere, with data coming from our office. ;)

[22:53:03] <GothAlice> https://docs.mongodb.org/manual/core/map-reduce/ < ensure you are specifying the query correctly; mapReduce is not like find.

[22:53:11] <GothAlice> Bajix: ^

[22:53:34] <Bajix> ```{ createdAt:

[22:53:34] <Bajix> { '$gte': Tue Jan 26 2016 13:29:54 GMT-0800 (PST),

[22:53:34] <Bajix> '$lte': Tue Feb 02 2016 13:29:54 GMT-0800 (PST) },

[22:53:34] <Bajix> channel: 569fa0175ea610136aac28cb }```

[22:53:47] <Bajix> That's my serialized query

[22:54:12] <Bajix> I'm just using Mongoose query casting - which has always worked for me when using aggregation

[22:54:18] <GothAlice> Bajix: Please avoid pasting in-channel; use a paste service like Gist. Also, that's the query part… but what's really important here is how you're putting together the actual mapReduce call.

[22:54:20] <GothAlice> Oh. Mongoose.

[22:54:26] <GothAlice> ¬_¬

[22:55:05] <Bajix> Yea...

[22:55:07] <GothAlice> Can you try getting your map/reduce and query working in a plain Mongo shell?

[22:55:17] <Bajix> I'll give it a shot

[23:10:47] <idd2d> I'm trying to implement tags in a new app. I've got a collection of people, and each person can have multiple tags. I need to be able to search by tags. The situation is similar to this example: http://programmers.stackexchange.com/questions/190802/how-to-model-hashtags-with-nodejs-and-mongodb

[23:11:33] <idd2d> However, there's one critical difference: I can't embed the tags on the person document. This is for scoping concerns. Basically, every user of the app will be drawing from the same large pool of People, but they should not allowed to make changes to these documents.

[23:12:07] <idd2d> So what's the best approach? Is it to create another tags collection, and associate each tag with a person via personID?

[23:16:27] <Bajix> GothAlice, same issue occurs in Mongo Shell

[23:17:22] <Bajix> http://pastebin.com/rpF56BK9

[23:18:51] <Bajix> I end up with documents that are unreduced if I include the query

[23:19:57] <GothAlice> Well, an initial problem which may be unrelated that I notice is that your reduce function can't eat its own output.

[23:20:34] <GothAlice> I.e. it emits a reduced group document in a different format than the records it's grouping. This can be problematic.

[23:23:02] <GothAlice> Actually, not eating its own output may, in fact, be the issue. Without a query there's no reason to issue multiple reduces, you just stream the values as they're mapped. With a query, though, it might sparsely run the index issuing reduces in batches.

[23:23:23] <GothAlice> If that happens, a final reduce will be run with the output of the prior mini-reduces, leading to failure.

[23:24:02] <Bajix> I see

[23:25:12] <Bajix> I'm a little confused then, wouldn't you need a memo to do that kind of reduce?

[23:25:39] <GothAlice> https://gist.github.com/amcgregor/1623352 is the example I typically use.

[23:25:45] <GothAlice> Generally experienced MongoDB users, including myself, will recommend using the Aggregate framework instead of map/reduce. Is there any particular reason holding you to this less efficient data processing model?

[23:26:29] <GothAlice> (Less efficient, and far more quirky.)

[23:27:29] <Bajix> This is my first time using MapReduce actually - I've only ever used aggregate, which I'm quite familiar with

[23:29:06] <Bajix> I'd like to use aggregate if possible, but for that to be viable, I would need to be able to count tokens grouped into all hour ranges between createdAt & lastActive

[23:29:21] <GothAlice> You can certainly do that.

[23:29:38] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 < pardon the field name shortening

[23:30:10] <GothAlice> This example groups by day of the week, but you can see some of the operator possibilities. You can extract date/time segments easily, $match query, etc.

[23:31:14] <GothAlice> Ref: https://docs.mongodb.org/manual/meta/aggregation-quick-reference/#date-expressions

[23:32:11] <GothAlice> See also https://docs.mongodb.org/manual/reference/operator/aggregation/cond/#exp._S_cond to allow you to $sum conditionally. :)

[23:32:35] <Bajix> Here let me show you my original aggregation

[23:33:29] <Bajix> http://pastebin.com/pgCCjXVK

[23:34:10] <GothAlice> Just so I have a head for what's going on here, what is the value of query after line 10?

[23:34:39] <Bajix> Just a date range & a filter against an ID

[23:34:58] <GothAlice> (Ignoring the exact date values being range-queried for; those obv. depend on the request time.) Ok, so nothing crazy coming from req.query. ;)

[23:35:01] <Freman> queryguard is live on our shards... I probably should have added statistics so I can see how many queries it rejects

[23:35:43] <GothAlice> Freman: Heh, without instrumentation how can you tell that it's not rejecting them all? ;)

[23:35:57] <Freman> easy, no-one's come in here and cried yet

[23:36:16] <Bajix> So, let's say for example the difference between createdAt & lastActive is 5 hours - that token would need to be counted in all 5 hours, which is where the date grouping falls flat

[23:36:25] <Freman> oh and lots of testing, 97% test suite coverage

[23:36:47] <GothAlice> Nice.

[23:37:00] <Bajix> i thought about using an earlier group to push each hour as a new token, and then to unwind it, but that seemed terrifying

[23:37:36] <Freman> tho I might add source ip to the request log cos I has no idea who's executing what

[23:37:40] <GothAlice> Though that's technically what you're trying to do, Bajix.

[23:38:03] <GothAlice> You have singular timed-duration events which you're trying to pivot to repeated per-hour events covering the durations.

[23:38:41] <Bajix> Exactly, but I'm not sure how to accomplish pivoting

[23:40:59] <GothAlice> Hmm; this is an interesting problem. I'm going to have to ponder.

[23:41:03] <Bajix> Is there any way to use $push + $let + $cond

[23:41:14] <GothAlice> $push only works within a $group operation.

[23:41:31] <Bajix> Sure, but if I did $group with _id: null

[23:41:38] <Bajix> and followed up with an $unwind

[23:42:02] <GothAlice> Well, no, that's the thing. With a $group on null there's only one record per group thus only one $push issued per record.

[23:42:16] <GothAlice> Er, other way around. All the values would be pushed together.

[23:42:23] <GothAlice> (Still only contributing one per document, though.)

[23:42:45] <Bajix> Yea, but that's the only step that could do $push

[23:43:12] <Bajix> Making one giant array didn't seem sane, which is why I went over to use mapReduce

[23:44:23] <Bajix> Maybe there's a way to use $project + $unwind to accomplish pivoting here, but I'm not seeing it

[23:46:13] <GothAlice> Hmm. Within the standard flow we can easily calculate the initial "reporting period" and number of periods covered.

[23:46:41] <GothAlice> (That's a little bit of $mod action for the former, division of that result for the latter.)

[23:49:46] <Bajix> What, using $mod to round hours?

[23:51:18] <GothAlice> Aye; transforming the rich date object into a UNIX timestamp integer, then subtract the value modulo the period size in seconds to snap it to the period size. Then, to determine how many periods are covered for that entry, subtract that snapped period from the end date, divide by the period size and ceil round.

[23:52:13] <GothAlice> Rather, subtract the original start date/time from the end. ¬_¬ Clearly sleep depravation isn't working for me, here. ;)

[23:52:20] <Bajix> Ok, but say we do calculate the number of periods, what does that do for us?

[23:52:33] <GothAlice> The size of the array to build. :)

[23:53:15] <Bajix> Didn't know that was a thing

[23:53:17] <GothAlice> Actually, yeah, two passes is how I see this working.

[23:53:52] <GothAlice> :/

[23:54:15] <Bajix> Ok I'm missing something here... I don't see how size does anything

[23:54:31] <Bajix> Let alone how it would be useful in building an array

[23:54:43] <Freman> now to get permission to open source queryguard :D

[23:54:55] <GothAlice> It gives a value we can use to easily identify the exact time periods of each affected slice, for each record.

[23:55:38] <GothAlice> (The exact snapped time period for each affected slice.)

[23:57:20] <Bajix> But we couldn't iterate over it?

[23:57:33] <GothAlice> It's not an iterable thing.

[23:58:19] <Bajix> If it's not iterable, then how can it be used to identify time periods

[23:58:37] <uuanton> Alice, mebbe you know how to drop replset settings without dropping oplog inside local database

[23:58:53] <GothAlice> uuanton: You want a new oplog if you're reconfiguring the set. :/

[23:59:21] <uuanton> with new oplog secodaries start from scratch

[23:59:34] <uuanton> I kind of hate the day when I installed mongodb

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 2nd of February, 2016