[03:56:32] <giowong> anyone with mongoose experience?
[04:04:06] <Nepoxx> Anyone might have a clue why I'm getting a "Can't canonicalize query: BadValue unknown top level operator: $set" with Mongoose when trying to do an update?
[04:05:35] <cheeser> not a mongoose kinda guy but if you pastebin your code maybe I can spot something
[04:10:00] <Nepoxx> http://pastebin.com/gV63yCAd I just figured out that the 3 parameter function works fine
[04:13:15] <Nepoxx> cheeser, good luck if you'Re not used to Mongoose :P (but thanks!). I think this is a bug though, I'll probably file an issue on it
[04:25:29] <Nepoxx> The other thing is that "myemotions.this.timesUsed" doesn't really make sense, myemotions is not going to be defined in your update function (or whatever you name it)
[04:25:51] <Nepoxx> it should be more like `this.myemotions.timesUSed += 1` instead
[04:26:57] <Nepoxx> Unfortunately it's very late and I'm tired, good luck with your issue giowong__
[04:41:25] <MacWinner> anyone try tokumx here? any comment on the results?
[07:47:20] <iksik> hm, is it possible to sort whole collection by nested field?
[08:36:36] <coudenysj> Hi all, anyone that could help me solve a query problem?
[08:37:21] <kali> coudenysj: just state your question
[08:38:06] <coudenysj> I actually created a stack overflow question for it (http://stackoverflow.com/questions/28610445/how-do-i-get-a-list-of-mongodb-documents-that-are-referenced-inside-another-coll) so I can add data examples to it
[08:39:46] <kali> well, you're basically asking for JOIN
[08:46:39] <coudenysj> the data i'm storing in the business documents would be problematic in relational dbs
[08:48:43] <jaitaiwan> @coudenysj: from the youtube video re where the business vs users relation is stored: "why not have both?"
[08:50:57] <coudenysj> like I said, I think this will get me into trouble in the future when writes fail etc.., but I will start to use it, I think :)
[08:53:33] <jaitaiwan> Probably safer to store the relation separately if you're concerned about writes
[08:53:46] <jaitaiwan> and you can force confirmation too
[09:01:03] <coudenysj> I'll have a look at the "double relation" solution, thanks @kali & @jaitaiwan
[10:21:19] <cers> is there a way to control the format of Date objects in mongoexport? I need it to be a unix timestamp instead of something like 2014-12-04T13:23:55.000Z
[11:18:03] <ra21vi> how can I search all document containing exact array items. For ex searching for A, B would result this doc { participant: [A, B] }, since it exactly have both item in field
[11:35:40] <arvydas_> hello, if i drop a collection, why does sharding is not removed?
[12:10:14] <arvydas_> how to drop sharded collection together with sharding? if i drop a collection and add collection with same name - sharding reappears
[12:43:06] <arvydas_> after dropping collection, db.collection.stats() returns: "sharded" : true....
[13:28:24] <no-thing_> how to drop a collection with its config information? (drop totally, so no information about sharding remains) ?
[13:29:02] <flok420> i have this data: http://pastebin.com/XnNaQ75W structure. how do I retrieve the ts/value pairs from all documents with "data_source" == "meminfo" and ts > 123 ?
[13:31:12] <coudenysj> flok420: use the aggregation framework
[13:34:26] <flok420> coudenysj: I tried that but that give an error I don't understand: http://pastebin.com/8iUR57AX
[13:35:20] <coudenysj> flok420: try to $unwind the param_data first
[13:38:11] <flok420> I added { $unwind : "$param_data" }, but it gives the same error (gt requiring 2 parameters)
[14:18:48] <StephenLynx> just put _id:0 on the projection block.
[14:21:06] <no-thing_> thanks joannac mentioning config db, just cleaned from there and now everything works fine
[14:28:47] <flok420> ok I've added { $project : { '_id' : 0, param_data : 1 } } and that gives indeed the output wanted, e.g only ts/value. unfortunately this doesn't solve the "spec must be an instance of dict" of python I get
[14:38:06] <hashpuppy> can you have a replica set where one node is 64-bit and the other 32-bit?
[14:40:27] <mattyw> hey folks, quick question. If I set an Index with DropDups: true - I can't see anything that says when the duplicate is dropped, it looks like it will happen "at some point" so it's possible for duplicates to exist in the collection for a small amount of time?
[14:41:04] <cheeser> they'll be gone when the index is done building.
[14:43:23] <mattyw> cheeser, isn't the index only built once - so I can write the index to dropdups and then insert duplicates?
[14:43:44] <mattyw> cheeser, I'm just trying to work out the exact differences between dropdups and unique
[14:44:08] <mattyw> cheeser, I did read the docs - but it didn't seem to spell it out as clearly as I'd hoped
[14:47:00] <cheeser> insertions will fail with a constraint violation
[14:47:13] <cheeser> dropdups only makes sense at creation time.
[14:47:33] <cheeser> if you have existing data and now you want to make a field unique, what do you do to duplcates?
[14:48:01] <cheeser> one option, the default, is to fail the index build. the other option is to delete the dupes when you're building the index.
[15:05:21] <flok420> I converted this http://pastebin.com/scD481Xs into this http://pastebin.com/jmCQXxFx The original works fine in the mongo commandline client, but the latter returns zero records. What could be the cause?
[15:27:33] <GothAlice> So, marrow.task is back on the table. SERVER-15815 be damned. The wait timeouts just snap to the next largest multiple of ~2.5 seconds instead of being precise. Ah well. ^_^
[15:29:52] <freeone3000> I'm having a problem bringing secondaries in a repl set back up after a changeover. My error is "[repl prefetch worker] uh oh: 90". I can't seem to find documentation on this. Any suggestions?
[15:39:18] <saml> so sometimes magic is not 123987
[15:40:08] <GothAlice> Luckily, with a replica set, fixing corrupted on-disk structures is as simple as nuking the dead node and recreating it.
[15:40:15] <GothAlice> Let replication sort 'em ou.
[15:41:28] <saml> freeone3000, what's changeover that you did?
[15:43:28] <freeone3000> saml: We had a server fail due to lack of disk, which since we deploy all the servers the same, caused it to replicate to every other server, filling up their disks, and causing them to fail due to lack of disk.
[15:43:58] <freeone3000> saml: So we've been migrating to much larger storage drive. (20TB instead of 1TB)
[15:44:28] <GothAlice> Uhm. That's bad news bears. Running out of disk == corrupt data, often.
[15:44:59] <GothAlice> Hopefully an index is the only casualty, but you'll need to run a --repair across it for sure.
[15:45:00] <nobody18188181> in mongodb 2.6, how do I get the lock percentage?
[15:45:43] <freeone3000> GothAlice: Yeah, did that. In the middle of that, I'm getting the "uh oh 90".
[15:48:45] <StephenLynx> lol backblaze does not support linux GothAlice
[15:48:51] <StephenLynx> how do you get around that?
[15:48:53] <GothAlice> There's usually no benefit to bashing against a failing replica secondary, just nuke it and have it re-replicate. (Certain datasets this is ill-advised for, i.e. ones with massive write loads, the node may never catch up, or where you have extremely large amounts of data.
[15:49:05] <GothAlice> StephenLynx: Mount my zfs volume snapshot on my Mac. ;)
[15:49:12] <freeone3000> See above, 1TB data space exceeded.
[15:54:19] <GothAlice> Well, no, mostly it's due to a rental contract that was badly written on the part of the rental agency. ;)
[15:54:46] <GothAlice> Electricity included, but not added to cost of rent = I can exploit that. >:3
[15:55:11] <StephenLynx> I used to value data. then after getting pissed off about having to care about it I just said "fuck it" and I just backup my browser bookmarks and RSA key.
[15:55:12] <saml> probably you have one of the biggest mongodb installation in the world
[15:55:33] <saml> i thought my 95GB was big and hard to manage
[15:55:40] <StephenLynx> 26tb is a lot for a person, but is kind of small when you get to industrial levels.
[15:56:18] <GothAlice> (And remember, with xz/lzma compression, text like HTML pages compress to < 10% their original size… uncompressed my dataset is a *lot* larger!
[15:57:11] <saml> so you have no index? just key-val where value is compressed html files?
[15:57:25] <saml> or did you publish some interesting reports, aggregation, analysis?
[15:59:10] <GothAlice> saml: Nah. There's the GridFS blob storage, compressed. There's the de-duplicated full text index, i.e. storing unique combinations of root words. There's the double and triple word association pairs. The tag neural network for predictive association and the tag synonym lists. NLP for tag prediction. And a metadata collection to store extracted data like EXIF, ID3, etc. It's non-heirarchical, so most searching is done by tag.
[15:59:55] <GothAlice> Plus processing pipelines to do content extraction from different media types, perform computer vision for facial recognition in photos/videos, etc., etc., etc.
[16:02:44] <saml> what's your mongo configuration like?
[16:02:45] <GothAlice> No, without realizing it several of the behaviours I was implementing (NLP for tag prediction being one) were already encumbered.
[16:02:52] <saml> just a replicaset? no sharding? mongos?
[16:03:24] <GothAlice> Two different process pools, one for blob storage, the other for indexed metadata querying. (Needed to isolate memory growth of the blob storage processes.)
[16:03:55] <StephenLynx> wait, so people patented algorithms and you can't implement the concept itself?
[16:04:09] <GothAlice> You can implement to your hearts content. You just can't share without paying up.
[16:04:20] <StephenLynx> that is straight up bullshit.
[16:05:55] <GothAlice> https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/search/model.py#L37-L112 is my original search ranking algorithm, BTW. (Okapi BM-25, the same that Yahoo! and Lucene use.) https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/index.py is the index storage model.
[16:06:24] <GothAlice> https://en.wikipedia.org/wiki/Okapi_BM25 < describes the algorithm
[16:07:51] <GothAlice> Technically covered by https://encrypted.google.com/patents/US7925644
[16:08:41] <GothAlice> Owner name: MICROSOFT CORPORATION, WASHINGTON — damn you, Microsoft!
[16:09:04] <StephenLynx> I might try and release implementations out of spite
[16:15:07] <mrmccrac> i know mongo is supposed to use all the possible ram it kind on the server, but has anyone seen it invoke oom-killer after using up all possible memory?
[16:15:28] <mrmccrac> or rather oom-killer invoked killing it
[16:15:34] <GothAlice> mrmccrac: Generally, no. MongoDB uses memory mapped files, so the kernel is given free leeway to load and unload chunks as it wishes.
[16:15:46] <mrmccrac> GothAlice: this is with 3.0rc8 wiredtiger
[16:16:05] <GothAlice> However, the oom-killer under situations where _other_ processes are using too much RAM, might prioritize killing the process with the largest allocated pool.
[16:16:46] <GothAlice> Well, I can't really speak to wiredtiger behaviour. :/
[16:17:05] <mrmccrac> let me see if it has its own set of memory options..
[16:17:15] <mrmccrac> i was doing a very high amount of queries/writes
[16:17:46] <GothAlice> Chunks of memory should still only be ephemerally locked from swapping out, so I still can't really believe MongoDB itself would be triggering oom-killer.
[16:19:17] <mrmccrac> its like its keeping every doc i query about/write in memory and never releasing them
[16:19:41] <mrmccrac> ya its definitely the shard processes eating gigs and gigs of mem
[16:20:09] <GothAlice> mrmccrac: Well, the best I can recommend is to grab a spindump/trace while it's churning away and filing that as a private ticket (along with a description of the problem) on JIRA. When in doubt: ask the developers. :)
[16:37:00] <mrmccrac> although by default it should only use half of physical ram
[16:48:03] <jrbaldwin> can anyone help with this error i'm having with mongo/mongoose http://stackoverflow.com/questions/28621931/mongo-traverse-error-when-updating-object-inside-nested-array
[16:48:19] <jrbaldwin> everyone says the query is right but i'm still having that traversal error
[16:57:46] <mattd_> hey guys, quick ques if anyone has a sec: im just looking to maintain a relationship between 2 documents, so i have a ref object id in each, pointing to the other. there's no transaction support in mongo, so what would be the best way to ensure i can update each document with the reference to the other?
[16:59:17] <mattd_> following the suggestions here: http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ my best bet?
[17:04:56] <mattd_> GothAlice: im trying to link 2 root documents to eachother, nesting them within eachother isnt an option,.. looking to do a many to many
[17:05:40] <GothAlice> mattd_: Looks like two-phase is your solution, then.
[17:06:07] <GothAlice> mattd_: As a note, if you're storing graphs, it's generally better to use a real graph database. (Right tool for the right job.)
[17:06:20] <mattd_> GothAlice: yea I hear you, might be headed that way
[19:20:47] <GothAlice> MacWinner: I'm working on a cMS at work. It's Python, though.
[19:21:18] <GothAlice> cMS having a lower c intentionally: it's a component management system. Acting like a CMS (content) is just one of the things it does.
[19:55:00] <kirby> so I'm trying to access serve data from a javascript program - connected to the server fine - but none of the .js commands mongo has in their documentation are recognized as being methods in the db class
[20:51:18] <MacWinner> i see some solutions on used capped collections and tailable cursors for pubsub type of functionality. Is there way to do pubsub where only one subscriber gets a message at a time? If i have 4 servers which have subscribers listening, I only want 1 of them to process at a time.. am I going to need the subscriber to somehow lock the document?
[20:51:28] <MacWinner> any pointers to example implementation of this?
[20:52:27] <cheeser> use findAndModify to grab a document out of the collection and mark it as "in process" or something.
[20:52:39] <cheeser> each accessing thread would a different document then
[20:57:35] <GothAlice> MacWinner: You could add a random integer field to each message in the capped collection. Each subscriber can then use a query on the tailing cursor which selects for that field modulo the number of works == that worker's ID.
[20:58:48] <GothAlice> MacWinner: It actually sounds like you're working on something very similar to what I'm doing right now: https://gist.github.com/amcgregor/4207375
[20:59:40] <GothAlice> cheeser: findAndModify is dubious. The docs state you get the record prior to any modifications you define, and my driver warns me that it's deprecated. :/
[21:00:09] <GothAlice> MacWinner: I also use time-to-live indexes to have MongoDB automatically prune old data.
[21:00:28] <GothAlice> (So no need for manual clean-up, nor extra threads, etc.)
[21:00:41] <MacWinner> GothAlice, i see.. but if one of the workers is dead or busy processing, then the next thing won't get processed?
[21:00:43] <cheeser> GothAlice: it works. we use it here and i've used it past gigs.
[21:01:45] <GothAlice> MacWinner: My use case (and example link above) has each subscriber with a thread pool which queues up jobs to work on. Also, if you don't have process monitoring that can restart dead workers, you have more problems than having jobs skipped.
[21:02:08] <fewknow> MacWinner: why can't you use an actual Message service rather than hacking Mongo?
[21:02:57] <GothAlice> fewknow: "hacking Mongo" — the code I linked can process 1.9 million distributed RPC calls per second (benchmarked with two producers, five consumers). MongoDB uses capped collections internally to handle replication. So yeah. Less of a hack.
[21:03:09] <MacWinner> fewknow, i was trying to standardize on mongo as much as possible.. i have redis setup which has some nice pubsub stuff as well
[21:04:19] <GothAlice> MacWinner: If you include worker start/stop messages on that message bus, then they can even automatically reconfigure themselves during runtime.
[21:04:27] <fewknow> GothAlice: it is a hack...you are turning a database into a messaging service. Why reinvent the wheel? Could you not use a queue/message service to get the same results?
[21:04:30] <Chipper351> I am looking for someone I can talk with that has experience with setting up MongoDB over Multi-Sites and Multiple Locations (All Active). I am having trouble putting together a picture of how this would work and would really appreicate if someone would able to talk for a few minutes.
[21:04:40] <GothAlice> fewknow: It's already a messaging service.
[21:05:29] <GothAlice> fewknow: http://docs.mongodb.org/manual/tutorial/create-tailable-cursor/ for the API, http://docs.mongodb.org/manual/core/replica-set-oplog/ for MongoDB's own use of it.
[21:05:46] <medmr> thats a little like saying android is a web browser... i mean tools can be overkill/too generic for a job at hand
[21:06:03] <fewknow> a tailable cursor doesn't make a messaging service
[21:06:53] <GothAlice> That's pretty much the only capability you need to implement one, so I don't grok the distinction you are trying to make fewknow.
[21:07:50] <GothAlice> The fact that tailable cursors are standard find queries makes it even better.
[21:08:17] <fewknow> I would argue a full messaging service has management of failure, time delay messages, orhter functionality with a complete service. Using a database with a tailable cursor is not a serice.
[21:09:36] <fewknow> it all you need is to tail a file then you could us tail -f on a text file and just write to it
[21:10:13] <MacWinner> if all you need is tail -f, then just write to mongo and use tailable cursor :)
[21:10:21] <fewknow> I wasn't saying you can't use mongo for a PUB/SUB...i just think there are better built tools to use already
[21:10:26] <MacWinner> then you can tail -f across servers!
[21:10:39] <GothAlice> Management of failure (i.e. watchdog keeping workers alive) is outside the scope of the message bus itself. Time delay is pretty easily implementable. (The RPC system I linked allows scheduling tasks at arbitrary times.)
[21:11:32] <MacWinner> GothAlice, what was the issue with findAndModify you mentioned?
[21:11:38] <GothAlice> "Better built" is highly subjective. And just because something exists doesn't mean it's automatically a better solution. For light-weight messaging, one doesn't need third-party tools at all.
[21:11:53] <GothAlice> MacWinner: The docs themselves state that you get the original, unmodified document back. That limits the usefulness.
[21:13:03] <bybb> I have weird results with this query db.orders.find({}, {state: {$slice: -1 }})
[21:13:17] <GothAlice> https://github.com/marrow/task/blob/develop/marrow/task/message.py is an example of a message bus model, with message schemas that make sense for RPC, as an example.
[21:13:46] <bybb> Could you tell me what it is suppose to do? Because ti doesn't do what I thought.
[21:19:56] <bybb> Shouldn't I get just the last array's item and nothing else?
[21:20:02] <MacWinner> cool.. thanks for all the tips! very appreciated
[21:21:31] <MacWinner> fewknow, stumbled on this article: https://blog.serverdensity.com/replacing-rabbitmq-with-mongodb/
[21:22:15] <bros> An item has one or many barcodes. How can I model this with Mongo? I have barcode_id and item_id.
[21:22:23] <bybb> Well apparently it's the normal behavior https://www.safaribooksonline.com/library/view/mongodb-the-definitive/9781449344795/ch04.html
[21:23:08] <GothAlice> MacWinner, fewknow: The presentation I gave on using MongoDB as dRPC, that code replaced RabbitMQ and ZeroMQ. Additionally, on that project, MongoDB replaced Memcache/Membase (using TTL indexes) and Postgres.
[21:23:23] <GothAlice> When I joined the project, yes, they really were using all of that—and more.
[21:23:48] <MacWinner> yeah.. i just need to simplify my life.. we have a small team and don't want to manage too many components
[21:24:28] <MacWinner> or load is not high and grows at a controlled pace
[21:24:31] <GothAlice> MacWinner: Just remember to give yourself room to grow (and have monitoring!) on the size of the capped collection.
[21:24:55] <GothAlice> For bachelor we pre-allocated an 8GiB capped collection to handle a theoretical one million simultaneous users playing the game.
[21:25:27] <GothAlice> (With a requisite one week history being preserved.)
[21:30:29] <GothAlice> MacWinner: The SD blog post's final point about scaling is avoided in marrow.task because I store the (locking) task data in a normal collection already, and use the capped collection purely for messaging.
[21:31:00] <MacWinner> GothAlice, i see.. cool.. thanks!
[21:54:44] <blizzow1> Anyone here have experience integrating solr/elasticsearch/sphinx with mongo? Everything I see seems to mention mongo-connector and all the articles about that seem to say it's replicating information. I don't really want to duplicate 1.3TB of mongoDB data just to search quickly. Am I misintrepreting the scope of mongo-connector's replication?
[22:53:33] <jrbaldwin> anyone know where to start looking to solve this issue? it feels like a write collision but i'm not sure: