[00:01:53] <meisth0th> i am using mariadb commonly, i want to give mongodb a try. i am planning to make http requests to some sites and storing results in a database, is mongodb right for this?
[01:53:39] <mocx> hi, i rename a collection using the following command, db.mycollection.renameCollection({renameCollection: 'mycollection', to: 'items'}), now i have a collection called [object Object]
[01:54:03] <mocx> i'm completely dumbfounded on what to do lol
[01:54:13] <GothAlice> mocx: That's because the argument to db.<collection>.renameCollection is the name of the collection, and you passed in an object. ;)
[03:06:24] <GothAlice> There are free usage tiers for several of the components, from provisioning (fewer than 8 servers) to backup (less than $5/mo is free?)
[03:07:38] <GothAlice> FiftyFivePlus: As a note, since MongoDB uses MongoDB internally for configuration, monitoring, and auditing, we rolled our own tools pretty easily. For various definitions of "pretty easily". ;)
[03:09:14] <FiftyFivePlus> sounds great for development and someone without hope of making it
[03:12:04] <GothAlice> FiftyFivePlus: Here's one tidbit we use in development, to help rapidly test different sharding strategies: https://gist.github.com/amcgregor/c33da0d76350f7018875 < spawns a 2x3 sharded replica set with authentication enabled on a single host and manages start/stop/rebuild of same. (Just an example. Automation doesn't have to be pretty. ;)
[03:12:30] <GothAlice> (And I'm a huge fan of light-weight command-line utilities instead of large webapps.)
[03:12:36] <FiftyFivePlus> to be clear, i also love the fact that there is a commerical fee-for-service arm to mongodb because many customers would not use it without such services being there ... however, it's a minefield trying to weave through all this stuff and to try to come up with purely free open sources tools that one could weave into their knitting
[03:12:58] <GothAlice> Yeah. Part of the reason we rolled our own. :/
[03:13:50] <GothAlice> (Also, if you want to learn how a database works, there's nothing quite like diving in to automate it. I'm a "learn by tearing things apart" kinda gal.)
[03:15:02] <FiftyFivePlus> omg, you're a lady ... i'm sorry for sounding disrespectful ... thanks
[03:16:49] <GothAlice> And I happen to agree with you. There should probably be a better organized "marketplace" of tools, with a clear separation or way of filtering to free options, with reviews.
[03:16:55] <GothAlice> 'Cause, that'd be awesome and help a lot.
[03:18:53] <FiftyFivePlus> especially for us 55+ who started out on bubble cards ;)
[03:18:59] <GothAlice> cheeser: Please, just do a better job than 99% of paywalls (where right-clicking, inspecting, and deleting the popover DIV is sufficient to give you access… like most news sites…)
[03:19:19] <GothAlice> FiftyFivePlus: I had a TI calculator that took punch cards…
[03:20:12] <FiftyFivePlus> our stacks of bubble cards got collected once a week and taken for processing ;)
[03:20:29] <GothAlice> cheeser: Or Quora. ("You need to register to continue reading answers." No I don't, you silly website. ;)
[03:23:16] <FiftyFivePlus> where do i find some performance tuning info, just to get an idea of what the sweet spots might be in terms of memory configruations, etc?
[03:24:19] <GothAlice> FiftyFivePlus: There are various posts with specific use cases and different scenarios, such as http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework (which covers different ways to store data, including pre-aggregation, and the effect it has performance-wise and in terms of storage)
[03:25:06] <GothAlice> FiftyFivePlus: Because performance is so closely tied to how you use a system, and any optimization without measurement is by definition premature, generally it's advised to do your modelling in a way which is "most natural" up-front (most obvious, easiest to grok, etc.), benchmark, then optimize the pain points.
[03:29:33] <GothAlice> FiftyFivePlus: As a more concrete example, even using MongoDB as if it were a relational database (by faking the joins across collections by making extra queries) is a starting point. It's relatively easy to adapt from there to arrays of embedded documents for the cases that would most benefit from it.
[03:30:08] <GothAlice> (For example, replies to threads in a forum can naturally be expressed both ways. Embedding is more optimum for the majority of queries, but does have the edge case you must handle of a thread that "fills up" and hits the 16MB document size limit, so it's a trade-off in terms of complexity.)
[03:33:57] <FiftyFivePlus> an aside ... one of the things that i need to figure out how to do (being new to web dev, just call me Mr. WEB 0.2 :) I need to figure out how to search documents with a regular search and a more constrained one and then identify the key/values/attributes that caused a given document to be excluded by the more constrained search
[03:34:38] <GothAlice> FiftyFivePlus: That sounds like a rather hairy aggregate query to me.
[03:35:12] <GothAlice> (Since you effectively need to include excluded results in the final result set, there will be a fair number of $if/$cond expressions. :/ )
[03:37:26] <FiftyFivePlus> i was looking at elasticsearch for a while and it seemed that i could construct complex searches for the more constrained one (their searches are json docs) the trick is then to evaluate which of the "pieces" of the search json caused exclusions
[03:39:25] <GothAlice> IDF is the typical "relevance" ranking, but you need EAV-based search. Hmm.
[03:40:42] <FiftyFivePlus> not sure, but it's not traditional ranking where docs are ranked against each other
[03:41:21] <FiftyFivePlus> IDF, EAV, yeah that's right ;)
[03:43:05] <dreese> That’s the answer I was looking for!
[03:43:08] <FiftyFivePlus> no alice, hershey is looking for 14 year olds ;)
[03:43:52] <FiftyFivePlus> what are IDF and EAV ;)
[03:44:22] <GothAlice> IDF - Inverse Document Frequency, relating to the statistical occurrence of terms in documents. Okapi BM-25 is one algorithm for this, used by Lucene, Sphinx, and Yahoo.
[03:45:11] <GothAlice> EAV - Entity Attribute Value, where you have discrete "types" of properties (attributes) and, in terms of search, the queries tend to be more boolean, but target attributes rather than the document's full-text in general.
[03:45:23] <GothAlice> EAV is a pretty typical model from relational SQL databases to pretend to be MongoDB. ;)
[03:46:32] <GothAlice> (*Very* frequently used in the medical market.)
[03:46:39] <FiftyFivePlus> this is more like searching for music videos and getting a big list and then doing a more constrained search that shows under which constraints some did not make it ... as in "not interested in punk rock or those recorded before whatever or whatever"
[03:48:15] <GothAlice> Hmm, the way we did that type of search (with explanation) in our app at work was to pre-calculate the answers to the queries. (I.e. we pre-search every possible search and store the results, with explanations. It sounds terrible, but there's only a hundred or so possibilities per user.)
[03:48:24] <FiftyFivePlus> actually the user would get the short list and the list of exclusions and then they might look at specific ones to see why they got excluded
[03:48:40] <GothAlice> It did involve an unfortunate amount of slurping data into our Python app for further processing.
[03:48:44] <GothAlice> That makes it a bit easier.
[03:49:22] <GothAlice> In the initial pass, you don't need to worry about why. When the user asks why, you can then load the record, repeat the "search" within your application (rather at the database level) and record the rationale for delivery back to the user.
[03:49:23] <FiftyFivePlus> so the most efficient way is to run the two searches and then to be able to on demand explain any one in the delta
[03:51:38] <GothAlice> s/rather at/rather than at/
[03:53:25] <FiftyFivePlus> exactly ... still there are two issues ... first, it would be nice if one could feed a document to a search to see if it the given document would get returned by the search, without going to the database with it ... the trick is then in the second point, to be able to write javascript that would precisely identify why the db search excluded a document - this
[03:53:25] <FiftyFivePlus> is where i was hoping for something being there in a built-in db search "explain"
[03:53:38] <GothAlice> (So, basically, run the initial query returning all results prior to exclusion, get the results, then perform the second-level exclusions within your app, that way you only need to write the "explanation" code once, when displaying the results just throwing the rationale away.)
[03:54:58] <GothAlice> You need to load the records (at least, part of the records) anyway to display the exclusions anyway, so bulk processing within your app for those exclusions should be nearly indistinguishable, performance-wise.
[03:56:18] <FiftyFivePlus> if you wrote the app it might even be faster, but me, with a cane in hand ;)
[03:57:09] <GothAlice> For example, here is a parallel Okapi BM-25 (IDF, so not useful for you alas) search ranking algorithm in Python, loading data from MongoDB. (It's fast enough to rank ~2000 multi-megabyte documents in a about a second.)
[03:57:46] <FiftyFivePlus> so, finally, i gather that there is nothing built-in that does just that, that explains which key/values played a role in a [de]selection
[03:57:48] <GothAlice> Luckily I had to write this in the days before MongoDB included full-text search as a feature, so now I can do this in-db. ;)
[03:58:02] <GothAlice> Alas, no. You could certainly build that up within an aggregate query, though.
[03:59:15] <FiftyFivePlus> how does an aggregate query help me in identifying that year of publication was cause of an exclusion
[04:00:08] <GothAlice> Because you can write your individual criteria as individual $cond expressions, building up a "rationale" list with tags (numbers, strings, whatever) explaining if the document matched or didn't. :)
[04:28:56] <srruby> Say a user has a manager. The user records hold lots of different info. If I include manager_id in the user document, then I don't get the manager's name without an additional query. Would it make sense to include manager_name as well as manager_id ?
[04:29:34] <GothAlice> srruby: It can. Such arrangements are pretty common. (I.e. store {manager: {ref: ObjectId(…), name: "Bob Dole"}})
[04:30:05] <GothAlice> You just need to remember to .update() those values when the referenced record is updated. (Certain ODMs, such as MongoEngine, can automate this task. CachingReferenceField FTW. ;)
[04:31:05] <srruby> GothAlice: Awesome. That is what I thought. Is this called denormalization in the mongodb world?
[04:33:34] <GothAlice> Object Document Mapper, the MongoDB equivalent of an ORM (Object Relational Mapper).
[04:34:09] <GothAlice> Something that can enforce schemas and generally offers abstractions and client-side implementations of some common, but missing features.
[04:43:15] <srruby> GothAlice: I realize that the denormalization requires an update. I'm looking at a Clojure implementation, which doesn't seem to have as full featured a ODM. Does FTW mean you recommend using a ODM or not ?
[04:44:03] <GothAlice> Generally yes, though an async version of the standard JS driver seems to trump most JS ODMs.
[04:46:27] <GothAlice> (It was more that the existence of a "reference field" that does the caching and can automatically register signal handlers to do the updates for you is pretty awesome.)
[04:47:04] <GothAlice> (Considering MongoDB doesn't have signal handlers / triggers.)
[05:26:22] <cheeser> it's why i write java code. :)
[05:26:49] <GothAlice> I'm abusing Python's "operator" module, so I can make some pretty simple mappings like: {"$or": {None: operator.__or__, list: any}, …}
[05:27:23] <GothAlice> {"$not": {None: operator.__not__}, …} Etc.
[05:27:46] <FiftyFivePlus> but it needs to go a step further - to zero in onto matching sub condition(s) within a complex condition ... ie - matched on this $and as well as this $or , etc
[05:28:28] <GothAlice> (A & B) | C, A & (B | C), etc.
[05:30:23] <FiftyFivePlus> have you seen any usable user interface that would enable end users to construct their own complex .find
[05:31:03] <GothAlice> FiftyFivePlus: Alas, I have. I say alas because we wrote a complete in-DB reporting engine for MongoDB. (Technically it's a GUI and method of storing aggregate queries.)
[05:31:10] <FiftyFivePlus> maybe a better question is whether there are any end user json doc editors
[05:31:32] <GothAlice> I didn't find many that stood out as any more useful than a programmers' editor.
[05:32:03] <GothAlice> A few tree-based things, but the GUIs weren't keyboard-friendly, and I loathe touching the mouse. (So inefficient. ;)
[10:46:00] <iksik> hi guys... i have two servers: server A (inside of private network where data is updated/inserted, with access to server B) and server B (in public network without access to server A - which serves data to local apps) - what would be the best way of syncing data between servers?
[15:15:49] <Quuik> Hello, can anybody tell me what the mongodb query format is called? I would like to be able to match javascript objects the same way
[16:01:26] <arunpyasi> hi guys, can I know how we can install rockmongo ?
[16:01:35] <arunpyasi> I got that the site is down ..
[16:01:44] <arunpyasi> will it be up again in future ?
[18:12:38] <roadrunneratwast> using mongoose, is there a way to ensure that the entire document is not a duplicate? it's okay if any of the fields are non-unique. but i want to safeguard against populating the db twice, etc
[18:14:24] <GothAlice> roadrunneratwast: The "_id" field is always unique.
[18:14:40] <GothAlice> Typically it is populated (by default) with generated ObjectId instances.
[18:15:33] <GothAlice> However, you can save whatever data you want there. A typical pattern is to store objects with multiple keys. I.e. {_id: {username: 'GothAlice', token: '…'}, …}
[18:16:13] <GothAlice> (But remember, when querying an object like that, order matters. {token: …, username: …} wont' match the reverse, even if the values are the same.)
[18:16:35] <roadrunneratwast> but if you are inserting {'foo": "bar"} into your db, it doesn't have an id yet
[18:23:27] <omnidan> already asked in #doctrine but there don't seem to be many active people there: I'm using doctrine mongodb and after altering dbrefs in the db (migration) it doesn't work anymore and fails with a compile error, I tried clearing the cache and regenerating proxies, didn't work. When I clear the whole database and create new objects it works fine, but when I migrate the old it doesn't, although the only change I made in the old ones was move some
[18:23:27] <omnidan> db references, could this confuse doctrine in a way and what can I do to debug and fix this?
[18:23:42] <roadrunneratwast> there must be a way to create a unique id based on the input data. you could successivley hash the fields. but this probably not an optimal solution.
[18:23:46] <roadrunneratwast> thanks for the tip, though
[18:23:57] <GothAlice> (And again, you can specify whatever "_id" you want, and it will be enforced as unique. This means you can hash however you wish. I typically store the binary SHA256 of the data encoded in BSON.)
[18:33:17] <omnidan> GothAlice: so does it look fine to you?
[18:33:45] <GothAlice> sunoano: Not sure if you're actually here, but you may wish to set your IRC client to wait for nickserv identification before joining channels. (You have a mask, but you join before it is applied.)
[18:35:32] <GothAlice> omnidan: I'm playing with it in my own mongo shell. ^_^
[18:36:09] <omnidan> basically what I'm doing is moving the dbref from authorized_ids -> assigned_ids to assigned_ids -> authorized_ids
[18:36:15] <GothAlice> omnidan: Is authorized_ids self-referential?
[18:39:41] <omnidan> and { "_id" : ObjectId("5429428da32844b864b2d720")}
[18:39:44] <GothAlice> Are you accessing this data via a schema or model in your application layer? If you are, it needs to represent the "after" state, here.
[19:21:30] <hrrld> I'm on v2.6, and I'm using the mongodb node driver. I'm doing: db.collection("foo").find({bar:{$exists: false}}).limit(100).toArray(...) and the array I'm getting back has ~286 elements in it. Is this weird, or am I reasoning about `.limit()' wrong (I expect to get an array with 100 elements)?
[19:54:08] <dreese> hrrld: also, try limiting to a smaller number to see what you get out of toArray, like 2
[19:56:35] <hrrld> dreese: ...limit(100).count() gave the size of the entire query (>100,000)
[19:57:37] <hrrld> dreese: I just noticed that maybe .limit(...) takes a callback... But it also seems to return a cursor... Going to experiment with the callback,
[19:59:16] <dreese> hrrld: i’m not familiar with the node driver, just going by what i know from the mongo shell and a couple other drivers. limit() taking anything but a number is something i have not seen.
[19:59:56] <hrrld> dreese: understood. I think the node driver is sorta designed to resemble the shell, but there are many 'interesting' differences. ;)
[20:40:27] <hrrld> dreese and kexmex: thanks for your help, I think it has something to do with batch-sizes, but I'm getting good throughput right now, so I'm just going to let it run it's course. Take care.
[20:53:36] <tpayne> What's up guys. I'm having trouble grouping something (I don't even know if it's possible but)
[20:53:57] <tpayne> I wanna know how to write a mongo query that returns them grouped up, based on whether they're adjacent or not. The results would be an array of 5 arrays.
[20:54:19] <tpayne> I pasted an example here: http://pastebin.com/jZaAu8Yr
[21:06:34] <iksik> hi again... i have two servers: server A (inside of private network where data is updated/inserted, with access to server B) and server B (in public network without access to server A - which serves data to local apps) - what would be the best way of syncing data between servers?
[21:10:31] <iksik> currently i'm using mongodump/restore but it's just replacing instead of syncing collections - as far as i can see
[22:28:52] <ParisHolley> when using wiredtiger and compress, do I need to optimize the working set size in memory based on compressed or uncompressed size?