[01:29:46] <Freman> I've convinced the dev that caused this mornings issues to stop quering for unindexed fields on 217609296400 byte collections that don't even exist
[02:16:34] <dman777> HarryKuntz: I don't really see a advantage w/ mongoose over vanilla mongo.
[02:17:06] <dman777> HarryKuntz: I've spent some time with mongoose and I think I prefer vanilla mongo. but I haven't spend enought time yet with vanilla mongo to say for sure.
[03:26:26] <SnarfSnarf> Hey all! I'm trying to build a room rental app as a class project and am having trouble thinking through the schema. I have a room schema that contains the location, an id and the name of the room. I;m having trouble thinking about how to integrate the schedule part. Any suggestions?
[05:43:59] <dman777> SnarfSnarfSnarf: if it's in the same context of the room, keep it as a single embeded document
[05:44:36] <SnarfSnarfSnarf> dman777: Wouldn't it quickly get out of hand to have it include everytime the room is reserved? Especially considering reservations up to a week in advance
[05:45:20] <dman777> SnarfSnarfSnarf: not if every room has it's own schedule
[05:45:55] <SnarfSnarfSnarf> So I'd have name, location, (all the dates where its taken?)
[05:46:02] <SnarfSnarfSnarf> and then append to that?
[05:46:04] <dman777> SnarfSnarfSnarf: but really, it sounds more like a relational database might be more appropriate in my opionion.
[05:46:43] <dman777> with small bits of fragmented data out of context.... rooms and schedules
[05:47:10] <dman777> but that's just in my view....others might have a better contrasting opinion
[05:47:29] <dman777> the whole point of mongo is to have less joins as possible, so you want data in the same context in single documents
[05:51:38] <SnarfSnarfSnarf> dman777: The idea behind using Mongo was in the future be able to change it to have general locations around say a city or state, with geo data, I figured Mongo's find would be really useful in that regard for say searching for rooms 8 km away that are open, I'm just really stumped on the open part
[05:52:48] <dman777> it sounds like a lot of relational data to me
[08:25:49] <alameda_> Hi, I'm having trouble with sorting. I have documents that looks like this: https://gist.github.com/daniel/ea24a0256e126f5b1def. But when I run .sort({order:1}) it sorts using the order field in the items array.
[08:52:21] <m3t4lukas> alameda_: you sure, it's somehow not possible :/
[08:53:07] <alameda_> you're right, I just discovered it doesn't sort on the order field in the items array
[14:00:21] <litecandle> Is there a general rule of thumb when it comes to sorting an array of objects? e.g. is it better to do it server side instead of using aggregate to unwind and sort?
[14:01:30] <StephenLynx> that depends on your priorities.
[14:02:18] <StephenLynx> and how you intend to use the data you are reading
[14:02:46] <StephenLynx> if you want to run a db operation AFTER the sorting, doing that on application code won't cut.
[14:03:56] <litecandle> Hmm - well I have an array of user events that I'd like to sort by date. Thing is the user gets to add the events manually so it's unordered.
[14:04:29] <StephenLynx> you need to run db operations after the aggregation?
[14:04:41] <litecandle> Nope this is just for a view
[14:04:50] <litecandle> Well, an index page for the resource.
[14:05:18] <StephenLynx> IMO, sorting on application code is better.
[14:05:51] <litecandle> That's what I was leaning towards.
[14:07:54] <StephenLynx> it will be much faster than an unwind, sorting and grouping.
[16:34:40] <echelon> i think the use of ISODate() is what's causing the exception.. BSON representation of supplied JSON array is too large: code FailedToParse: FailedToParse: Bad characters in value
[16:35:09] <echelon> how do i make a field using ISODate json-friendly?
[16:40:57] <echelon> ok, i found this.. http://grokbase.com/t/gg/mongodb-user/1247c1x2q6/isodate-exception-shell-vs-mongoimport
[16:47:26] <echelon> GothAlice: thanks for your help yesterday, i think you unknowingly alluded to the solution yesterday
[17:00:36] <GothAlice> echelon: Well, it also didn't look like a JSON array: there was no preceding [ and trailing ] markers to indicate such. There was just objects ({}) that were comma separated.
[17:00:43] <GothAlice> I.e. the whole thing didn't look like JSON to me.
[20:34:40] <cheeser> GothAlice: oof. that must've been fun to write and debug
[20:34:53] <GothAlice> It explodes pseudo-randomly, complaining that the i.$ projection is mis-matched.
[20:35:27] <GothAlice> I suspect the company evaluation (also an array) is somehow sometimes being evaluated first and reserving $ projection.
[20:36:03] <GothAlice> cheeser: This is the result of a higher-level abstraction, so things like those $ors are generated. ;)
[20:36:24] <GothAlice> The $and block is just standardized publication/retraction date range filtering.
[20:36:25] <cheeser> i was wondering. there are some (micro)optimizations to be made that suggest generation
[20:36:56] <GothAlice> I'm willing to take suggestions for optimization, though. I'm having to unroll this out of the abstraction due to a regression or three.
[20:37:05] <alexi5> i decide to start doing schema for my application. my previous relational schema that i design for the prototype had 20 tables which reduced to 5 collections when doing a composite document schema
[20:37:36] <cheeser> GothAlice: mostly just combining terms to simplify the query. optimization might be the wrong word.
[20:38:15] <cheeser> time to pick up the kid at the bus stop, though.
[20:38:29] <GothAlice> The 'k': 'use' and 'day' items, despite comparing the same value in this example, rarely actually are. Most are things like 7 days, 1 use.
[20:38:56] <GothAlice> (This is the query to identify the package to use with sufficient remaining applicable balance.)
[20:39:49] <GothAlice> Ah, but I see why my current failure is a failure. 'company' is actually just 'c'.
[20:40:26] <GothAlice> And boom: "OperationFailure: database error: Executor error: BadValue positional operator element mismatch" again.
[20:45:59] <GothAlice> And fixed. Doesn't use $ projection any more, it now explicitly requests the (badly structured) specific abstract field.
[20:59:18] <GothAlice> cheeser: I was surprised you didn't comment on my use of an ObjectId as a field name. ;P
[21:00:37] <GothAlice> shaboobal: MongoDB has no concept of a join, so relational models are entirely application-orchestrated, thus not particularly efficient. Additionally, multi-record, multi-table transactional updates aren't really a thing in MongoDB without a lot of extra work (simulating two-phase commits).
[21:01:32] <GothAlice> shaboobal: Lastly, if you have a graph model, i.e. you are storing social connections, for the love of all that is holy put that in a real graph database, don't simulate a graph on either MongoDB or a relational database.
[21:02:05] <shaboobal> GothAlice: so perhaps it's not suitable for storing the core of my data model? i have users, users have sites, sites have pages, etc.
[21:02:31] <GothAlice> I happen to use MongoDB for nearly everything, except the graph thing for which I use Neo4j.
[21:03:12] <GothAlice> A certain level of relational-ness is acceptable in MongoDB, but you need to be careful to model your data on how it's accessed, queried, and used, not in some "preferred perfect structure".
[21:11:59] <shaboobal> GothAlice: got it. do you prefer duplicated embedding or flattening out into collections?
[21:12:34] <GothAlice> The example I typically use are forums; replies to a thread are embedded in the thread, but the thread references the forum it's under.
[21:13:18] <GothAlice> (This makes displaying a thread two queries: get the forum details, then get the thread contents.) Arrays can be sliced during projection letting even the embedded replies be paginated.
[21:14:56] <GothAlice> http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html is a brief article I often link about this. :)
[21:17:22] <GothAlice> Embedding replies within the thread also saves on a number of other queries, though. No clean-up query to delete "related" data when deleting a thread, for example, as the replies are deleted with the thread they're contained in.
[21:28:16] <GothAlice> shaboobal: Ah, an additional rationale for embedding: it effectively simulates a join. My forums have "permalinks" (technically an ObjectId) on every reply, but when jumping to show a specific reply (like the Twitter tweet card view of a single tweet) I naturally want to get the details of the overall thread at the same time, things like title, permissions, etc.
[21:29:19] <GothAlice> db.thread.find({"reply.id": ObjectId(…)}, {"title": 1, "reply.$": 1}) — load out a specific reply, regardless of thread or forum, getting the thread title and just the single reply out of the thread.
[21:48:50] <sterns> in node, if I do myCollection.remove(query, function(err, result) {...} result.n = 1, from what I can tell 'n' is `documents scanned`. Can I determine how many records were actually removed?
[22:16:44] <GothAlice> Yup. Most particularly vocal denouncements don't RTFM, or ignore what they read. :/ On the "it's slow" front, I can show benchmarks showing ~4 million record operations per second (technically 1.9 million distributed RPC calls, but there's a round-trip in there to save the return value) on a single host five years ago. ;)
[22:17:18] <GothAlice> Most of the comments on that ycombinator thread are non-technical rants. :(
[22:17:18] <xissburg> the biggest problem I am reading about is inconsistency, and actual failures, not performance
[22:17:58] <GothAlice> During our evaluation of it at work we spun up 1000 nodes in a complex replica set + sharding setup, loaded the DB until we hit IO limits (random operations), then started kill -9'ing random whole VMs.
[22:18:29] <GothAlice> It took Igor (yeah, that's really that engineer's name) killing ~65% of the hosts randomly before errors started showing up on the loading scripts.
[22:18:37] <GothAlice> That satisfied our demands for reliability. ;)
[22:20:37] <GothAlice> With more than 30 terabytes of data in MongoDB, in use for the last five or six years, I have not encountered a single actual failure, unrecoverable state, or inconsistent operation. So… the majority of these articles tend to show me unrealistic failure scenarios ("well, if you toggle the router just *so* you can get some weird behaviour") or a lack of reading the manual.
[22:23:43] <GothAlice> Even more heh, that uninterrupted operation is in the face of some recently hilarious Rackspace maintenance windows. Until recently, my nodes had a greater than three year average uptime, but even when nodes were being cycled, the overall cluster barely noticed.
[23:00:51] <shaboobal> thanks for all the help GothAlice
[23:01:21] <shaboobal> one thing that's kind of a bummer is looking at managed services for postgres vs mongo - about 10 times more storage in postgres for the same price
[23:04:34] <GothAlice> Well, "cloud hosting" is typically a tax on those unwilling or unable to manage the services themselves.
[23:05:37] <GothAlice> Case in point: buying 24 HDDs and three 8-bay iSCSI RAID arrays pays for itself in three months vs. nearly any bulk file storage service. MongoHQ, for my dataset, would cost me half a million USD per month.
[23:05:43] <StephenLynx> managed services are for suckers
[23:07:37] <GothAlice> Yes and no. The current cloud deployment infrastructure I'm using is actually cheaper than if I continued using my own.
[23:10:27] <GothAlice> There are occasionally services out there who actually pass along economy of scale instead of simply trying to rip you off. ;P