PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 2nd of June, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:06] <callen> okay so if I cut through what you just said
[00:00:15] <dstorrs> and the reason I'm having a problem is because the data sampe that you posted above is not valid JSON
[00:00:28] <callen> suspend your pedantry/disbelief and work with me
[00:00:31] <callen> what you just told me was, "use an objectid with a unique index"
[00:00:37] <callen> how do I use a unique index against an embedded list?
[00:01:29] <dstorrs> At this point, I'm done trying to help. RTFM. http://www.mongodb.org/display/DOCS/Indexes#Indexes-IndexingonEmbeddedFields%28%22DotNotation%22%29
[00:01:53] <callen> wheeeeeeee
[00:02:22] <callen> this is why database people aren't invited to parties :(
[03:11:55] <dstorrs> I've got a collection that was created incorrectly -- the _id column should have been a string (the username), and instead it's an ObjectId. Is there a straightforward way to fix this aside from rebuilding it from scratch?
[03:12:36] <dstorrs> I played around with 'update' but it's not working as expected. Not sure if I'm doing it wrong or if it Doesn't Work Like That.
[03:32:35] <qawsedrf> i installed mongodb, /data/db is 3.1GB big ? wut?
[04:02:30] <qawsedrf> so..?
[04:41:24] <dstorrs> qawsedrf1: for me, default collections are either empty or, if they have anything in them (indices, etc), they are 0.2...GB
[04:41:50] <dstorrs> so, somehow, you must have a lot of collections, or you populated data into something, etc.
[04:50:28] <dstorrs> I just did this: db.copyDatabase('foo', 'bar'). It claims it succeeded, but after it finishes 'show dbs' says that foo is ~6G and bar is ~4G. what gives?
[05:04:31] <qawsedrf1> dstorrs: http://pastie.org/private/mtjwbg38tsuyttatsvkkw
[05:06:32] <dstorrs> qawsedrf1: ah. here you go: http://www.mongodb.org/display/DOCS/Journaling+Administration+Notes#JournalingAdministrationNotes-PreallocFiles%28e.g.journal%2Fprealloc.0%29
[05:14:21] <zhodge> anyone want to advise if using mongo is appropriate for my scenario?
[05:17:54] <dstorrs> zhodge: I'll give it my best shot.
[05:18:15] <zhodge> as a newcomer to mongo and nosql in general, it'll be general
[05:18:31] <zhodge> I was wondering if you see any merit to working in mongo for an e-commerce app
[05:18:47] <zhodge> I know that traditionally relational databases are used for this, but I'm wondering why
[05:19:15] <dstorrs> that's a little too vague for me to be really detailed, but here's what I can say.
[05:19:45] <dstorrs> (note that I'm fairly new to this myself. I have extensive book knowledge, but little practical experience. apply salt freely)
[05:20:13] <dstorrs> relational DBs are good for transactional behavior
[05:20:34] <dstorrs> things where you want to be 100% sure that a request either completely failed or completely succeeded.
[05:21:11] <dstorrs> they are good for situations where you have lots of disparate entities and you're going to be joining them together in lots of odd ways that may be determined at runtime.
[05:21:31] <dstorrs> relational DBs require that you know the structure of your data in advance, and that it doesn't change often.
[05:22:05] <dstorrs> NonRel DBs (e.g. Mongo) allow you to add fields at any time. they do not support joins, so you need to do that in app code.
[05:22:31] <waterymarmot> zhodge: You might find this relevant. Stripe is a startup/new payments processor. Veyr high need for reliability, and this article talks about their experiences running on Mongo - http://blog.mongodb.org/post/22280693621/mongodb-powering-the-magic-and-the-monsters-at-stripe
[05:22:37] <dstorrs> they tend to be faster, but the durability can be lower unless you are careful, because data doesn't actually hit the disk immediately.
[05:24:58] <dstorrs> zhodge: long story short, you can almost certainly do any sort of app in either Rel or Non-Rel DBs. It really depends more on other factors.
[05:25:09] <dstorrs> waterymarmot: cool article, thanks for the link
[05:25:59] <zhodge> dstorrs: thanks for the detailed overview :)
[05:26:12] <zhodge> waterymarmot: and great link too
[05:26:19] <waterymarmot> Also, everything dstorrs said, plus RDBMS have been around for a very long time. There is industry confidence, non-techies understand these are the accepted platforms, etc. What we are seeing right now is uptake by the 'early adopters' with nosql. And a lot of that is going to be startups who aren't overly worried about "what's been done before/ what we've always done"
[05:27:04] <dstorrs> the implication of that can also be "it's easier to find knowledgable people for RDBMS solutions"
[05:27:27] <waterymarmot> np, 10gen has a lot of good videos from their conferences up on the site. There is a link to one by Stripe at the top of that article, and you can find the rest from there. Craigslist has done one for the past few years about their experiences migrating portions of their architecture off of mysql
[05:27:28] <dstorrs> or, at least, that's my experience. YMMV
[05:27:58] <zhodge> I've watched a general mongo overview on 10gen before, it was quite helpful
[05:28:14] <zhodge> the thing is that when it comes right down to it, I'm really implementing more of a website than even an app
[05:28:22] <waterymarmot> dstorrs, yes that too. not a lot of it pros out there with experience with the nosql world. Its changing, but most of the ones who have experience, I suspect are using it somewhere that they won't want to leave. (i.e. cool startups)
[05:28:39] <zhodge> just a small store with a few products; not sure what the best stack would be
[05:29:26] <waterymarmot> general advice - go with what you will be productive using. if you know language X really really well and like coding in it, use that
[05:29:27] <dstorrs> well, if you do decide to go relational, use Postgres, not MySQL
[05:29:38] <waterymarmot> optimize later
[05:29:39] <dstorrs> worst decision I made with our stack was starting us out on MySQL
[05:29:45] <dstorrs> friggin' nightmare
[05:29:51] <zhodge> dstorrs: why is that? (just curious)
[05:29:55] <waterymarmot> ^^^^^ what dstorrs said
[05:30:41] <dstorrs> waterymarmot: you want to take this one?
[05:30:54] <waterymarmot> mysql vs postgresql is one of those really long-running rivalries in the IT world ... like perl vs python, guinness vs lighter beers, etc
[05:31:04] <waterymarmot> ok ...
[05:31:22] <waterymarmot> mysql got a lot of market share early, and is fairly widely adopted
[05:32:08] <waterymarmot> but has long lacked many features considered important for reliability, transactions, etc ... that's changed incrementally with each release, but they are working up to a place that many consider postgres to have been for a long time.
[05:32:37] <waterymarmot> i'm a bit out of date to go into detail on specific features, ACID compliance, etc ... dstorss, if you want to take over :)
[05:33:35] <waterymarmot> (sorry about length of my posts, i've been mostly lurking, but when i delurk, its a deluge) :)
[05:34:47] <zhodge> waterymarmot: not a problem, this discussion has been nothing but helpful
[05:34:53] <dstorrs> MySQL is full of landmines where things don't behave the way you (well, I) expect...
[05:35:31] <dstorrs> for example "ok, I changed the param that controls how big a single data update can be...oh wait, there's ANOTHER param which overrides that"
[05:37:14] <dstorrs> then there's weird little issues like: "show full processlist\G" (which is documented to list all the queries running, without truncating the query) does not work, but "SHOW FULL PROCESSLIST\G" does.
[05:38:18] <dstorrs> strings are not, by default, case sensitive, so searching for "JoeBloggs" will get you the record that has "joebloggs"
[05:38:39] <dstorrs> but you can have both capitalizations in your table
[05:38:49] <dstorrs> even though the column is unique.
[05:38:50] <dstorrs> and on and on.
[05:39:07] <zhodge> sounds like a good time haha
[05:39:38] <dstorrs> there is a large bald spot in the center of my head labeled "MySQL"
[05:40:16] <zhodge> I may just be getting caught up in the hipsterness, but it seems as though mongo can be quite clean
[05:40:26] <zhodge> relatively painless, though I wouldn't know from experience
[05:40:46] <waterymarmot> dstorss: lol
[05:40:56] <dstorrs> I've been pleasantly surprised by Mongo so far.
[05:41:12] <dstorrs> The one thing I'm nervous about is how it's going to perform once I turn on sharding.
[05:41:18] <dstorrs> oh, that's another thing --
[05:41:38] <dstorrs> if you are going to have any sort of significant dataset, relational DBs are problematic
[05:41:55] <waterymarmot> zhodge: i'm just beginning to experiment with mongo, but it looks like *handled properly* it can be a very good database to work with. you have to be aware of factors that fall on you as developer though
[05:42:11] <dstorrs> because RDBMSes were not originally designed for partitioning. Mongo was.
[05:42:42] <zhodge> dstorrs: significant meaning what exactly? are you referring to size mostly? or number of nodes
[05:42:47] <dstorrs> Both MySQL and Pg have solutions (clustering / partitioning), but they require some expertise. Mongo have sharding more or less for free
[05:43:10] <dstorrs> data size.
[05:43:23] <waterymarmot> i.e. zhodge: things like turning on safe mode
[05:43:26] <dstorrs> it's hard to scale an RDBMS *out*. You need to scale it up.
[05:44:12] <dstorrs> Oh, thing about MySQL -- an UPDATE command does not update the data in place. It marks it deleted and writes a new record, so your table size always grows.
[05:45:00] <dstorrs> There is no way to get rid of this dead dead except to do an ALTER TABLE which will copy the live data in the table, then rename it back to the original.
[05:45:06] <waterymarmot> dstorrs: the craigslist people said they initially had some serious bottlenecks importing data into their shards because the shard set kept rebalancing as the import entered the first server. so they managed it in data, broke up the data and loaded each into a separate shard ...
[05:45:39] <dstorrs> waterymarmot: their shard key must have been set up so that new data always hit the rightmost shard.
[05:45:49] <dstorrs> (e.g. a time-based key)
[05:46:00] <waterymarmot> yep, i think that was how they handled it
[05:46:12] <dstorrs> had they used (e.g.) an md5 key, it would have randomized across the cluster and they wouldn't have needed to do that.
[05:46:13] <waterymarmot> or it might have been by city, etc
[05:46:47] <waterymarmot> well the initial load was going into one server, then that server was sending data out to the shards
[05:47:09] <waterymarmot> so they just cut the middleman out, and sent the segmented data out to shards directly
[05:47:10] <waterymarmot> .
[05:47:26] <dstorrs> Oh, I see. So, they avoided the mongos instance.
[05:47:30] <dstorrs> huh.
[05:47:43] <dstorrs> interesting.
[05:48:07] <waterymarmot> not a bug, per se, with sharding, just ... recognizing how it working right is going to be highly inconvenient for what you are trying to do sometimes. lol
[05:48:22] <dstorrs> ...ok, I'm enjoying this conversation but it's almost 11pm for me, I'm still at the office, and I'm bushed. I need to head out.
[05:48:28] <dstorrs> great talking to you guys.
[05:48:30] <waterymarmot> its in one of the videos up on 10gen that they have from the conventions
[05:48:43] <waterymarmot> zoiks
[05:48:52] <zhodge> dstorrs: yeah same here, been a pleasure
[05:48:55] <zhodge> thanks for all the help!
[05:48:56] <waterymarmot> on a friday? i salute your dedication
[05:48:56] <dstorrs> oh, one other thing -- if you have money, talk to the guys at http://mongohq.com
[05:49:22] <dstorrs> They make a living managing your Mongo servers, optmiizing your queries, doing your backups, etc.
[05:49:35] <dstorrs> I'm not affiliated with them, although I did talk with them.
[05:49:49] <dstorrs> anyway, g'night.
[05:49:56] <waterymarmot> 'nite
[06:43:09] <zhodge> waterymarmot: you still there?
[06:51:58] <waterymarmot> barely .. thinking about logging off soon.
[06:52:18] <waterymarmot> sleepy. got a question?
[08:22:29] <ro_st> is there a mode for mongodb shell where everything is colorized?
[08:22:49] <ro_st> i'm pretty sure everyone knows that syntax highlighting is far easier on the eyes than plain white on black is
[08:23:22] <ro_st> or should i cave in and install one of the mongo admin webapps
[08:24:20] <NodeX> no and up to you
[08:25:05] <ro_st> what do you use to query mongo, NodeX?
[08:34:04] <NodeX> the shell mosly when debugging
[08:34:10] <NodeX> mostly *
[09:33:09] <jenner> guys is `mongorestore --dbpath /some/path --directoryperdb` slower then `mongorestore --directoryperdb`? i.e. if the db is restored directly on disk rather then via mongod connection
[09:33:32] <jenner> I have a 47GB dump that's being restored for ~20 hours already
[11:27:18] <omid8bimo> i have two questions, where can i see the oplog file? (if there is a physical file somewhere) and how can i see how much of my oplog file is currently used during offtime of my slave server?
[11:41:26] <omid8bimo> anyone?
[11:42:10] <Derick> it's a database, like any other
[11:42:12] <Derick> "use oplog"
[11:43:10] <Derick> and for monitoring, read the "Replication Lag" part of http://docs.mongodb.org/manual/administration/replica-sets/
[12:10:29] <omid8bimo> how can i bind mongo on 2 ip addresses? i have 3 ip on my server, 1 valid and 2 invalid. can i just use bind_ip ?
[12:13:12] <omid8bimo> Derick: thanks for the info, but i dont see any oplog database in my primary server nor secondary
[12:25:27] <omid8bimo> guys, is it possible? i cant find any info regarding it on documents
[12:38:12] <omid8bimo> how can i bind mongo on 2 ip addresses? i have 3 ip on my server, 1 valid and 2 invalid. can i just use bind_ip ?
[12:59:31] <NodeX> can't you just use your firewall?
[13:00:28] <NodeX> it seems it's all or nothing
[13:25:36] <qawsedrf> if anyone code's node.js+mongodb, i havea question -> node-mongodb-native or mongoose?
[13:40:08] <deoxxa> qawsedrf: mongoose uses mongodb-native. personally, i use mongoskin that provides a much lighter wrapper around mongodb-native, as i don't need all the functionality provided by mongoose (validation, schemas, etc).
[13:47:34] <qawsedrf> deoxxa: ok, is this the one - https://github.com/kissjs/node-mongoskin ?
[13:48:14] <deoxxa> yep
[14:07:58] <dkode> I have some confusion around $push. I want to $push a document to an array, only if that document isn't already in the array. If my update query does not find the document in the array it doesnt push it all. If the document is already in the array it appends a new document
[14:17:14] <skylamer`> upsert may be
[14:24:16] <dkode> actually i think what i need is $addToSet
[14:24:24] <dkode> but addtoset looks like it matches the whole document
[14:24:37] <dkode> and there will be a count property on my document that wont be in the addToSet syntax
[14:24:39] <dkode> blah
[14:39:34] <abrkn> having a collection with items like: { user: 'bob', car: 'bmw', vote: 200 }, { user: 'bob', car: 'bmw', vote: 20 }, { user: 'alice', car: 'bmw', vote: 100 }, { user: 'frank', car: 'audi', vote: 10 } how can i get the average vote for bmw taking only each user's highest vote? (in the example, bob's vote of 200 would count)
[15:04:38] <ro_st> so i have a collection, enterprises, and within it, an array of user ObjectID refs (the master list) and an array of groups, which themselves are lists of user ObjectIDs and some metadata.
[15:05:20] <ro_st> i want to assign these nested groups ObjectIDs of their own so that i can refer to them from elsewhere in the enterprise document; so, groups are embedded documents. can i do this? i'm using Monger, the clojure library
[15:05:38] <ro_st> my question: is this a) possible? b) idiomatic mongo?
[15:06:11] <ro_st> not seeing anything about it on the Schema Design page on the website
[15:06:31] <dstorrs> ro_st: yes, you can do do it, but it may end up not being the best design for your program.
[15:06:59] <dstorrs> Mind answering a few questions?
[15:07:04] <ro_st> not at all
[15:07:24] <dstorrs> ok, so 'enterprises' refers to...what? companies?
[15:07:30] <ro_st> yes
[15:07:53] <dstorrs> and groups are what, exactly?
[15:08:03] <dstorrs> groups of enterprises?
[15:08:16] <dstorrs> per-user groups?
[15:08:17] <ro_st> groups of people within those companies. for example, there might be 2011's group of inductees, and 2012's
[15:08:31] <dstorrs> ok.
[15:08:45] <ro_st> all inductees are in the ent, but they only care about their own induction group's data
[15:09:34] <dstorrs> the problem with embedded documents is that often the numbers can grow without bound.
[15:10:20] <dstorrs> So, for example, if you have an enterprise who hires about 1000 people a year, what happens in 5-10 years?
[15:10:43] <ro_st> right, there's a 4mb limit per document, right?
[15:10:46] <dstorrs> what happens when you have 100,000 enterprise users, each hiring huge numbers of groups?
[15:10:51] <dstorrs> 16M now, but yes
[15:10:59] <dstorrs> (it used to be 4M)
[15:12:14] <dstorrs> but, more importantly, embedded docs are simply harder to work with. often, it's better to put them into a separate collection and do the join in app code.
[15:12:27] <ro_st> well, the groups are merely subsets of the enterprises' user list
[15:12:46] <ro_st> so E has users a b c d e. groups: (a, b) (c, d, e)
[15:12:51] <dstorrs> embedded docs are good for definitely limited things like "a collection of invoices, each with an embedded list of line items"
[15:12:58] <ro_st> gotcha
[15:13:07] <ro_st> ok. i think it makes sense to split em out
[15:13:36] <dstorrs> particularly if there is a temporal component ("all of 2011's inductees"), embedded data can be troublesome
[15:13:42] <roe> I am starting to design a datastore with highly linked data, before I jump into the project I wanted to take a moment to consider a 'nosql' solution. Here is my first draft of the data model: http://pastebin.com/BWbmEeC6
[15:13:59] <ro_st> enterprise will contain an ObjId list of all the users that can be in the referred-to groups, and each of those groups themselves have ObjId lists to their own users
[15:14:06] <roe> Do you think mongodb would be useful or should I stick with a SQL variant
[15:15:14] <dstorrs> roe: assuming I'm understanding your model, I'd say it's begging to be NoSQL.
[15:15:25] <ro_st> agreed
[15:15:33] <dstorrs> it will be trivial to add new types, attribs, etc
[15:15:34] <ro_st> looks like 3 collections
[15:15:47] <ro_st> links are collections on the object documents
[15:16:08] <ro_st> thanks dstorrs
[15:16:22] <dstorrs> roe: that said, it's a little hard to be sure, because you haven't given us real names
[15:16:31] <roe> I'm filling in an example
[15:17:58] <dstorrs> ro_st: also, one other thing about your IDs -- ObjectIDs are bright and shiny and have some nice advantages (e.g., they natively encode the creation time), but you probably want to see if there is a more natural ID key available, such as company name
[15:18:08] <dstorrs> it makes doing updates easier
[15:20:04] <ro_st> i've decided to make groups their own collection
[15:20:18] <ro_st> i had it nested before because a group would never belong to multiple enterprises
[15:20:44] <ro_st> but given the querying behaviours i know will occur, it makes sense to scan groups directly (eg fetch me all the groups i belong to)
[15:20:53] <ro_st> without having to trawl through the enterprise
[15:21:02] <dstorrs> yep.
[15:21:19] <ro_st> the enterprise document is there more for managing groups, rather than being a data model for the end-user to interact with
[15:21:29] <ro_st> it's a group 'nursery'
[15:22:02] <ro_st> this is a refactoring from a sql database, where the enterprise data and the group data are mashed together into a single table
[15:22:25] <ro_st> been using mongo for about a week. really enjoying it
[15:23:21] <ro_st> just wish the console wasn't so basic
[15:23:55] <ro_st> coming from a clojure repl, it's a bit like consoling with my eyes closed and only my nose to type with
[15:24:07] <dstorrs> heh
[15:25:56] <roe> well I think I need to work on my data design a bit, but here is the idea: http://pastebin.com/BWbmEeC6
[15:26:08] <roe> the link table doesn't make much sense yet
[15:26:30] <roe> I am going to need to define a attribute to link through, somehow (in this case it would be attribute 8)
[15:26:51] <dstorrs> roe: this is good detail, but instead of this, tell me what you're trying to achieve.
[15:26:57] <dstorrs> What's the goal of the project?
[15:27:05] <ro_st> roe, i think attributes are an unnecessary normalisation
[15:28:01] <ro_st> start with a more flatter document design and pull out commonalities if they need to be shared across documents/collections
[15:28:18] <ro_st> my 2c
[15:29:55] <dstorrs> I tend to agree, but it's hard to say without knowing the goals. Overall, you're writing a SQL schema in a NoSQL db. that's going to be frustrating, because you're working against the system.
[15:30:04] <roe> I am trying to build an object-based wiki graphically represented through the linking of objects
[15:30:34] <dstorrs> ro_st: is this an inventory tracking system for computers? an ecommerce site? a network topology manager? ...?
[15:32:47] <ro_st> great question
[15:33:11] <ro_st> roe: you said something, but i didn't learn anything new from it
[15:33:50] <roe> so for my particular application, servers, switches, routers, firewalls, printers, are objects but so are services like SSH, SMTP, OSPF, NAT, DNS, etc... and I guess the data overlap between the different objects represent the 'links'
[15:34:26] <dstorrs> hm.
[15:34:33] <ro_st> will each of these things have a page?
[15:34:44] <ro_st> and they link to more of these things?
[15:34:53] <ro_st> (assuming pages because of 'wiki')
[15:38:15] <ro_st> times up!
[15:40:05] <roe> nofxx, I guess I shouldn't have used the term 'wiki'
[15:40:29] <roe> it is more of an information repository, where each object will be an... object on a single page
[15:42:19] <dstorrs> do service link ("SSH"...) actually contain information, or just the name of the service?
[15:43:36] <dstorrs> oh, wait, there it is.
[15:51:36] <SkramX> Q: i have a map reduce that goes through all objects and extracts the event's location and counts all of them and puts the _id and count (as value) into a new collection. Would I use a finalizer to drop all records where value is less than 5?
[16:16:45] <dstorrs> SkramX: better would be to not emit them in the first place
[16:17:00] <SkramX> yeah
[16:17:14] <dstorrs> map can emit 0 or more times per document
[16:17:20] <SkramX> how could I go about that?
[16:17:55] <SkramX> I mean.. I need some sort of logic to see how many times it was emitted in the last map/reduce/the current collection its about to replace
[16:18:49] <dstorrs> what are you trying to achieve?
[16:19:05] <SkramX> so i have a collection of tweets. 16 million of them
[16:19:15] <SkramX> i want to do a count of the most popular locations.
[16:19:54] <SkramX> like 90% of them are bogus like "** my house **" and stupid crap like that
[16:20:24] <SkramX> i want to only keep the locations in the mapreduce collection with 100 or more tweets with that locatio
[16:20:45] <dstorrs> so, maybe something like this:
[16:21:22] <dstorrs> mapf = function() { emit( { loc : this.loc, count : 1 } ) }
[16:23:16] <SkramX> yeap my map is very much like that
[16:23:45] <dstorrs> hm. yeah, I see the trouble you're bumping into.
[16:24:37] <dstorrs> finalizer sounds like a reasonable option
[16:24:47] <SkramX> right
[16:24:50] <SkramX> the emit time isnt whats killing me
[16:24:58] <SkramX> its traversing the HUGE collection with a bunch of value of 1's
[16:25:03] <dstorrs> it doesn't feel quite right, but I can't think of a better option offhand
[16:25:10] <SkramX> could you help me with that finalizer function?
[16:25:27] <SkramX> I might try upgrading to v2.2 and use the group aggregate function...
[16:26:48] <dstorrs> group aggregation won't work on sharded collections
[16:26:56] <dstorrs> and you're likely to want to shard this eventually
[16:26:59] <SkramX> this isnt actually sharded (yet)
[16:27:03] <SkramX> but okay
[16:27:06] <SkramX> true true true
[16:27:19] <dstorrs> finalize would be something like this (approximately):
[16:28:27] <dstorrs> function finalize(k, v) { if (v.count > 100) { return v } }
[16:28:32] <dstorrs> I think that will work.
[16:28:52] <SkramX> ok
[16:28:54] <dstorrs> I may be wrong, but I don't think there is an obligation to return anything from the finalize
[16:28:55] <SkramX> ill try that out
[16:28:58] <SkramX> i THOUGHT it was that easy
[16:29:05] <SkramX> thanks man!
[16:29:10] <dstorrs> no problem
[16:29:16] <dstorrs> keep in mind, I may be wrong
[16:29:18] <SkramX> gotta go grocery shipping
[16:29:22] <SkramX> yeah - gotta trst it this PM
[16:29:23] <dstorrs> let me know if it works
[16:29:24] <SkramX> peace
[16:29:28] <SkramX> def
[16:34:01] <Quantumplation> if I have a collection A, and the general layout of the documents is like {_id:ObjectId(), array: [ { _id:ObjectId(), otherfields:1 } ] }, I know how to select all the documents which have an element in array that matches some criteria, but I don't know hot to JUST return that element.
[16:34:26] <Quantumplation> ie, if array has 100 documents, it's wasteful to do the $elemMatch query, because it ends up returning the whole array
[16:34:46] <Quantumplation> I could do it with skip/limit, but i don't know the index of it in the array either
[20:32:28] <Vooloo> does mongodb require optimization like mysql when you start to get higher load? adjusting settings and stuff depending on datasize etc.
[20:36:11] <Derick> Vooloo: no, just make sure you have your indexes in order
[21:02:10] <thewanderer1> hi. why doesn't a $all:[] match ['cat', 'dog']?
[21:02:48] <thewanderer1> logic tells me that all of []'s elements are present in ['cat', 'dog'], therefore it should match
[21:05:04] <thewanderer1> what am I doing wrong?
[21:12:42] <thewanderer1> I've come up with a simpler test case, too:
[21:12:58] <thewanderer1> db.animals.find({$all:[]});
[21:13:12] <thewanderer1> now, no matter what's in animals collection, not a single object will be returned
[21:16:14] <thewanderer1> (okay, maybe it's not a good test case because $all's semantics is largely undocumented)
[21:16:47] <thewanderer1> however, the first issue still persists
[21:17:41] <dstorrs> thewanderer1: what are you trying to achieve?
[21:18:35] <thewanderer1> dstorrs: I have some entries which are tagged using keywords. I want to select all entries that contain all keywords specified in the search.
[21:18:57] <thewanderer1> but, if the search specifies no keywords, the only logical thing is to return all entries, not none!
[21:19:46] <jY> all probably doesn't cause it's empty
[21:19:54] <jY> and all has to match everything in []
[21:19:59] <thewanderer1> yes...
[21:20:01] <jY> so it makes sense how it is
[21:20:06] <thewanderer1> doesn't it match everything?
[21:20:09] <jY> no
[21:20:20] <thewanderer1> what doesn't it match?
[21:20:56] <jY> it might only match an empty array
[21:21:55] <dstorrs> db.animals.find({$all:[]}); isn't how 'all' works. you need to specify the field name
[21:22:03] <dstorrs> see here: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all
[21:22:53] <dstorrs> so: db.animals.find( { keyword : { $all : [ 'foo', 'bar', 'baz' ] } })
[21:22:55] <thewanderer1> okay, I'll paste my test case here.
[21:22:56] <kp55> anyone run mongodb on vps'? (plan to deploy on a xen)
[21:23:51] <thewanderer1> dstorrs, jY: assume that Flow is a kind of animal which also has tags...
[21:24:06] <thewanderer1> db.Flow.find({tags:{$all:[]}}); finds nothing
[21:24:13] <thewanderer1> db.Flow.find({tags:{$all:['kot']}}); finds two
[21:24:24] <kp55> heard people have had problems on ovz containers, but zen/kvm anyone?
[21:24:52] <thewanderer1> kp55: I run Mongo on KVM, 256MB, it's ok - not recommended with grsecurity, though. messes up with sysfs access.
[21:25:23] <kp55> hm
[21:25:28] <dstorrs> thewanderer1: that doesn't sound surprising to me.
[21:25:33] <thewanderer1> dstorrs: why?
[21:25:56] <kp55> thanks, got a 2gb xen
[21:26:13] <kp55> any other recommendations?
[21:26:19] <dstorrs> thewanderer1: from the docs "An array can have more elements than those specified by the $all criteria. $all specifies a minimum set of elements that must be matched."
[21:26:24] <thewanderer1> kp55: it's a home deployment, so I can't say much
[21:26:30] <thewanderer1> dstorrs: yes.
[21:26:39] <dstorrs> it's ambiguous what happens if if no tags are spec'd
[21:26:43] <thewanderer1> dstorrs: in my mind, this translates to: (\forall x \in A => x \in B)
[21:26:52] <kp55> indices not big, just and id and a mysql varchar(255)
[21:26:57] <kp55> an*
[21:29:10] <dstorrs> thewanderer1: not really sure what to tell you, sorry
[21:29:41] <thewanderer1> well, it's either me or Mongo which fails at understanding 1st order logic
[21:30:06] <thewanderer1> I'll probably file a bug...
[21:48:41] <Derick> meghan: have a safe trip back!
[21:49:46] <meghan> thanks Derick!
[21:51:11] <skot> thewanderer1: $all:[] will never return anything since it requires the things in the array to match, and nothing can not provide a solid match. If you daon
[21:51:31] <skot> s/daon/don't want to search for something leave the criteria out
[21:51:54] <thewanderer1> skot: $all is a binary predicate, how can it not return anything? O.o
[21:52:07] <thewanderer1> it's true or false
[21:52:32] <thewanderer1> that is what I'm doing in the code (dynamic criteria), but it defies common sense I think
[21:52:37] <skot> and if the array is empty, it is always false
[21:53:04] <skot> You can ask for an change but that is the current behavior
[21:53:22] <thewanderer1> yeah well... I interpret it mathematically, which is: http://latex.codecogs.com/gif.latex?(\forall%20x%20\in%20A%20=%3E%20x%20\in%20B)
[21:53:49] <skot> fair enough, but that is now how it works, just like $not
[21:54:10] <skot> s/now/not
[21:54:44] <thewanderer1> I bet there's an explicit if() in the code which checks for empties, for sake of "optimization", heh :P
[21:55:08] <skot> You can take a look, but I think not.