[00:00:06] <callen> okay so if I cut through what you just said
[00:00:15] <dstorrs> and the reason I'm having a problem is because the data sampe that you posted above is not valid JSON
[00:00:28] <callen> suspend your pedantry/disbelief and work with me
[00:00:31] <callen> what you just told me was, "use an objectid with a unique index"
[00:00:37] <callen> how do I use a unique index against an embedded list?
[00:01:29] <dstorrs> At this point, I'm done trying to help. RTFM. http://www.mongodb.org/display/DOCS/Indexes#Indexes-IndexingonEmbeddedFields%28%22DotNotation%22%29
[00:02:22] <callen> this is why database people aren't invited to parties :(
[03:11:55] <dstorrs> I've got a collection that was created incorrectly -- the _id column should have been a string (the username), and instead it's an ObjectId. Is there a straightforward way to fix this aside from rebuilding it from scratch?
[03:12:36] <dstorrs> I played around with 'update' but it's not working as expected. Not sure if I'm doing it wrong or if it Doesn't Work Like That.
[03:32:35] <qawsedrf> i installed mongodb, /data/db is 3.1GB big ? wut?
[04:41:24] <dstorrs> qawsedrf1: for me, default collections are either empty or, if they have anything in them (indices, etc), they are 0.2...GB
[04:41:50] <dstorrs> so, somehow, you must have a lot of collections, or you populated data into something, etc.
[04:50:28] <dstorrs> I just did this: db.copyDatabase('foo', 'bar'). It claims it succeeded, but after it finishes 'show dbs' says that foo is ~6G and bar is ~4G. what gives?
[05:06:32] <dstorrs> qawsedrf1: ah. here you go: http://www.mongodb.org/display/DOCS/Journaling+Administration+Notes#JournalingAdministrationNotes-PreallocFiles%28e.g.journal%2Fprealloc.0%29
[05:14:21] <zhodge> anyone want to advise if using mongo is appropriate for my scenario?
[05:17:54] <dstorrs> zhodge: I'll give it my best shot.
[05:18:15] <zhodge> as a newcomer to mongo and nosql in general, it'll be general
[05:18:31] <zhodge> I was wondering if you see any merit to working in mongo for an e-commerce app
[05:18:47] <zhodge> I know that traditionally relational databases are used for this, but I'm wondering why
[05:19:15] <dstorrs> that's a little too vague for me to be really detailed, but here's what I can say.
[05:19:45] <dstorrs> (note that I'm fairly new to this myself. I have extensive book knowledge, but little practical experience. apply salt freely)
[05:20:13] <dstorrs> relational DBs are good for transactional behavior
[05:20:34] <dstorrs> things where you want to be 100% sure that a request either completely failed or completely succeeded.
[05:21:11] <dstorrs> they are good for situations where you have lots of disparate entities and you're going to be joining them together in lots of odd ways that may be determined at runtime.
[05:21:31] <dstorrs> relational DBs require that you know the structure of your data in advance, and that it doesn't change often.
[05:22:05] <dstorrs> NonRel DBs (e.g. Mongo) allow you to add fields at any time. they do not support joins, so you need to do that in app code.
[05:22:31] <waterymarmot> zhodge: You might find this relevant. Stripe is a startup/new payments processor. Veyr high need for reliability, and this article talks about their experiences running on Mongo - http://blog.mongodb.org/post/22280693621/mongodb-powering-the-magic-and-the-monsters-at-stripe
[05:22:37] <dstorrs> they tend to be faster, but the durability can be lower unless you are careful, because data doesn't actually hit the disk immediately.
[05:24:58] <dstorrs> zhodge: long story short, you can almost certainly do any sort of app in either Rel or Non-Rel DBs. It really depends more on other factors.
[05:25:09] <dstorrs> waterymarmot: cool article, thanks for the link
[05:25:59] <zhodge> dstorrs: thanks for the detailed overview :)
[05:26:12] <zhodge> waterymarmot: and great link too
[05:26:19] <waterymarmot> Also, everything dstorrs said, plus RDBMS have been around for a very long time. There is industry confidence, non-techies understand these are the accepted platforms, etc. What we are seeing right now is uptake by the 'early adopters' with nosql. And a lot of that is going to be startups who aren't overly worried about "what's been done before/ what we've always done"
[05:27:04] <dstorrs> the implication of that can also be "it's easier to find knowledgable people for RDBMS solutions"
[05:27:27] <waterymarmot> np, 10gen has a lot of good videos from their conferences up on the site. There is a link to one by Stripe at the top of that article, and you can find the rest from there. Craigslist has done one for the past few years about their experiences migrating portions of their architecture off of mysql
[05:27:28] <dstorrs> or, at least, that's my experience. YMMV
[05:27:58] <zhodge> I've watched a general mongo overview on 10gen before, it was quite helpful
[05:28:14] <zhodge> the thing is that when it comes right down to it, I'm really implementing more of a website than even an app
[05:28:22] <waterymarmot> dstorrs, yes that too. not a lot of it pros out there with experience with the nosql world. Its changing, but most of the ones who have experience, I suspect are using it somewhere that they won't want to leave. (i.e. cool startups)
[05:28:39] <zhodge> just a small store with a few products; not sure what the best stack would be
[05:29:26] <waterymarmot> general advice - go with what you will be productive using. if you know language X really really well and like coding in it, use that
[05:29:27] <dstorrs> well, if you do decide to go relational, use Postgres, not MySQL
[05:30:41] <dstorrs> waterymarmot: you want to take this one?
[05:30:54] <waterymarmot> mysql vs postgresql is one of those really long-running rivalries in the IT world ... like perl vs python, guinness vs lighter beers, etc
[05:31:22] <waterymarmot> mysql got a lot of market share early, and is fairly widely adopted
[05:32:08] <waterymarmot> but has long lacked many features considered important for reliability, transactions, etc ... that's changed incrementally with each release, but they are working up to a place that many consider postgres to have been for a long time.
[05:32:37] <waterymarmot> i'm a bit out of date to go into detail on specific features, ACID compliance, etc ... dstorss, if you want to take over :)
[05:33:35] <waterymarmot> (sorry about length of my posts, i've been mostly lurking, but when i delurk, its a deluge) :)
[05:34:47] <zhodge> waterymarmot: not a problem, this discussion has been nothing but helpful
[05:34:53] <dstorrs> MySQL is full of landmines where things don't behave the way you (well, I) expect...
[05:35:31] <dstorrs> for example "ok, I changed the param that controls how big a single data update can be...oh wait, there's ANOTHER param which overrides that"
[05:37:14] <dstorrs> then there's weird little issues like: "show full processlist\G" (which is documented to list all the queries running, without truncating the query) does not work, but "SHOW FULL PROCESSLIST\G" does.
[05:38:18] <dstorrs> strings are not, by default, case sensitive, so searching for "JoeBloggs" will get you the record that has "joebloggs"
[05:38:39] <dstorrs> but you can have both capitalizations in your table
[05:38:49] <dstorrs> even though the column is unique.
[05:41:38] <dstorrs> if you are going to have any sort of significant dataset, relational DBs are problematic
[05:41:55] <waterymarmot> zhodge: i'm just beginning to experiment with mongo, but it looks like *handled properly* it can be a very good database to work with. you have to be aware of factors that fall on you as developer though
[05:42:11] <dstorrs> because RDBMSes were not originally designed for partitioning. Mongo was.
[05:42:42] <zhodge> dstorrs: significant meaning what exactly? are you referring to size mostly? or number of nodes
[05:42:47] <dstorrs> Both MySQL and Pg have solutions (clustering / partitioning), but they require some expertise. Mongo have sharding more or less for free
[05:43:23] <waterymarmot> i.e. zhodge: things like turning on safe mode
[05:43:26] <dstorrs> it's hard to scale an RDBMS *out*. You need to scale it up.
[05:44:12] <dstorrs> Oh, thing about MySQL -- an UPDATE command does not update the data in place. It marks it deleted and writes a new record, so your table size always grows.
[05:45:00] <dstorrs> There is no way to get rid of this dead dead except to do an ALTER TABLE which will copy the live data in the table, then rename it back to the original.
[05:45:06] <waterymarmot> dstorrs: the craigslist people said they initially had some serious bottlenecks importing data into their shards because the shard set kept rebalancing as the import entered the first server. so they managed it in data, broke up the data and loaded each into a separate shard ...
[05:45:39] <dstorrs> waterymarmot: their shard key must have been set up so that new data always hit the rightmost shard.
[05:48:07] <waterymarmot> not a bug, per se, with sharding, just ... recognizing how it working right is going to be highly inconvenient for what you are trying to do sometimes. lol
[05:48:22] <dstorrs> ...ok, I'm enjoying this conversation but it's almost 11pm for me, I'm still at the office, and I'm bushed. I need to head out.
[09:33:09] <jenner> guys is `mongorestore --dbpath /some/path --directoryperdb` slower then `mongorestore --directoryperdb`? i.e. if the db is restored directly on disk rather then via mongod connection
[09:33:32] <jenner> I have a 47GB dump that's being restored for ~20 hours already
[11:27:18] <omid8bimo> i have two questions, where can i see the oplog file? (if there is a physical file somewhere) and how can i see how much of my oplog file is currently used during offtime of my slave server?
[13:25:36] <qawsedrf> if anyone code's node.js+mongodb, i havea question -> node-mongodb-native or mongoose?
[13:40:08] <deoxxa> qawsedrf: mongoose uses mongodb-native. personally, i use mongoskin that provides a much lighter wrapper around mongodb-native, as i don't need all the functionality provided by mongoose (validation, schemas, etc).
[13:47:34] <qawsedrf> deoxxa: ok, is this the one - https://github.com/kissjs/node-mongoskin ?
[14:07:58] <dkode> I have some confusion around $push. I want to $push a document to an array, only if that document isn't already in the array. If my update query does not find the document in the array it doesnt push it all. If the document is already in the array it appends a new document
[14:39:34] <abrkn> having a collection with items like: { user: 'bob', car: 'bmw', vote: 200 }, { user: 'bob', car: 'bmw', vote: 20 }, { user: 'alice', car: 'bmw', vote: 100 }, { user: 'frank', car: 'audi', vote: 10 } how can i get the average vote for bmw taking only each user's highest vote? (in the example, bob's vote of 200 would count)
[15:04:38] <ro_st> so i have a collection, enterprises, and within it, an array of user ObjectID refs (the master list) and an array of groups, which themselves are lists of user ObjectIDs and some metadata.
[15:05:20] <ro_st> i want to assign these nested groups ObjectIDs of their own so that i can refer to them from elsewhere in the enterprise document; so, groups are embedded documents. can i do this? i'm using Monger, the clojure library
[15:05:38] <ro_st> my question: is this a) possible? b) idiomatic mongo?
[15:06:11] <ro_st> not seeing anything about it on the Schema Design page on the website
[15:06:31] <dstorrs> ro_st: yes, you can do do it, but it may end up not being the best design for your program.
[15:06:59] <dstorrs> Mind answering a few questions?
[15:12:14] <dstorrs> but, more importantly, embedded docs are simply harder to work with. often, it's better to put them into a separate collection and do the join in app code.
[15:12:27] <ro_st> well, the groups are merely subsets of the enterprises' user list
[15:12:46] <ro_st> so E has users a b c d e. groups: (a, b) (c, d, e)
[15:12:51] <dstorrs> embedded docs are good for definitely limited things like "a collection of invoices, each with an embedded list of line items"
[15:13:07] <ro_st> ok. i think it makes sense to split em out
[15:13:36] <dstorrs> particularly if there is a temporal component ("all of 2011's inductees"), embedded data can be troublesome
[15:13:42] <roe> I am starting to design a datastore with highly linked data, before I jump into the project I wanted to take a moment to consider a 'nosql' solution. Here is my first draft of the data model: http://pastebin.com/BWbmEeC6
[15:13:59] <ro_st> enterprise will contain an ObjId list of all the users that can be in the referred-to groups, and each of those groups themselves have ObjId lists to their own users
[15:14:06] <roe> Do you think mongodb would be useful or should I stick with a SQL variant
[15:15:14] <dstorrs> roe: assuming I'm understanding your model, I'd say it's begging to be NoSQL.
[15:17:58] <dstorrs> ro_st: also, one other thing about your IDs -- ObjectIDs are bright and shiny and have some nice advantages (e.g., they natively encode the creation time), but you probably want to see if there is a more natural ID key available, such as company name
[15:18:08] <dstorrs> it makes doing updates easier
[15:20:04] <ro_st> i've decided to make groups their own collection
[15:20:18] <ro_st> i had it nested before because a group would never belong to multiple enterprises
[15:20:44] <ro_st> but given the querying behaviours i know will occur, it makes sense to scan groups directly (eg fetch me all the groups i belong to)
[15:20:53] <ro_st> without having to trawl through the enterprise
[15:29:55] <dstorrs> I tend to agree, but it's hard to say without knowing the goals. Overall, you're writing a SQL schema in a NoSQL db. that's going to be frustrating, because you're working against the system.
[15:30:04] <roe> I am trying to build an object-based wiki graphically represented through the linking of objects
[15:30:34] <dstorrs> ro_st: is this an inventory tracking system for computers? an ecommerce site? a network topology manager? ...?
[15:33:11] <ro_st> roe: you said something, but i didn't learn anything new from it
[15:33:50] <roe> so for my particular application, servers, switches, routers, firewalls, printers, are objects but so are services like SSH, SMTP, OSPF, NAT, DNS, etc... and I guess the data overlap between the different objects represent the 'links'
[15:51:36] <SkramX> Q: i have a map reduce that goes through all objects and extracts the event's location and counts all of them and puts the _id and count (as value) into a new collection. Would I use a finalizer to drop all records where value is less than 5?
[16:16:45] <dstorrs> SkramX: better would be to not emit them in the first place
[16:17:55] <SkramX> I mean.. I need some sort of logic to see how many times it was emitted in the last map/reduce/the current collection its about to replace
[16:18:49] <dstorrs> what are you trying to achieve?
[16:19:05] <SkramX> so i have a collection of tweets. 16 million of them
[16:19:15] <SkramX> i want to do a count of the most popular locations.
[16:19:54] <SkramX> like 90% of them are bogus like "** my house **" and stupid crap like that
[16:20:24] <SkramX> i want to only keep the locations in the mapreduce collection with 100 or more tweets with that locatio
[16:20:45] <dstorrs> so, maybe something like this:
[16:34:01] <Quantumplation> if I have a collection A, and the general layout of the documents is like {_id:ObjectId(), array: [ { _id:ObjectId(), otherfields:1 } ] }, I know how to select all the documents which have an element in array that matches some criteria, but I don't know hot to JUST return that element.
[16:34:26] <Quantumplation> ie, if array has 100 documents, it's wasteful to do the $elemMatch query, because it ends up returning the whole array
[16:34:46] <Quantumplation> I could do it with skip/limit, but i don't know the index of it in the array either
[20:32:28] <Vooloo> does mongodb require optimization like mysql when you start to get higher load? adjusting settings and stuff depending on datasize etc.
[20:36:11] <Derick> Vooloo: no, just make sure you have your indexes in order
[21:02:10] <thewanderer1> hi. why doesn't a $all:[] match ['cat', 'dog']?
[21:02:48] <thewanderer1> logic tells me that all of []'s elements are present in ['cat', 'dog'], therefore it should match
[21:13:12] <thewanderer1> now, no matter what's in animals collection, not a single object will be returned
[21:16:14] <thewanderer1> (okay, maybe it's not a good test case because $all's semantics is largely undocumented)
[21:16:47] <thewanderer1> however, the first issue still persists
[21:17:41] <dstorrs> thewanderer1: what are you trying to achieve?
[21:18:35] <thewanderer1> dstorrs: I have some entries which are tagged using keywords. I want to select all entries that contain all keywords specified in the search.
[21:18:57] <thewanderer1> but, if the search specifies no keywords, the only logical thing is to return all entries, not none!
[21:19:46] <jY> all probably doesn't cause it's empty
[21:19:54] <jY> and all has to match everything in []
[21:26:19] <dstorrs> thewanderer1: from the docs "An array can have more elements than those specified by the $all criteria. $all specifies a minimum set of elements that must be matched."
[21:26:24] <thewanderer1> kp55: it's a home deployment, so I can't say much
[21:51:11] <skot> thewanderer1: $all:[] will never return anything since it requires the things in the array to match, and nothing can not provide a solid match. If you daon
[21:51:31] <skot> s/daon/don't want to search for something leave the criteria out
[21:51:54] <thewanderer1> skot: $all is a binary predicate, how can it not return anything? O.o
[21:52:32] <thewanderer1> that is what I'm doing in the code (dynamic criteria), but it defies common sense I think
[21:52:37] <skot> and if the array is empty, it is always false
[21:53:04] <skot> You can ask for an change but that is the current behavior
[21:53:22] <thewanderer1> yeah well... I interpret it mathematically, which is: http://latex.codecogs.com/gif.latex?(\forall%20x%20\in%20A%20=%3E%20x%20\in%20B)
[21:53:49] <skot> fair enough, but that is now how it works, just like $not