[00:39:31] <vagueBrother> hey yall, still trying to get an answer to my question about storing sub documents inside documents in a collection. are there any best practices regarding how to do this? it seems simple enough but i’m having trouble figuring out how to query an episode.id in this example http://plnkr.co/edit/6E7vOFTfcvPYWC73h77g
[00:46:43] <vagueBrother> can i query just on the id of the episode object without knowing its parent documents?
[00:51:39] <mordof> vagueBrother: wouldn't you have to do a query for the show, then query the seasons, then episodes? i'm fairly new to mongo
[00:51:54] <mordof> so i don't know if there's another/better way to do it
[00:51:57] <vagueBrother> that’s what i was hoping against
[00:52:20] <vagueBrother> that sort of structure makes more sense to me conceptually, rather than having an episodes collection and a shows collection
[01:02:51] <vagueBrother> Boomtime: is there a best practice for this sort of thing? would it be better or faster somehow if i had a discrete episodes collection and then a separate shows collection?
[01:03:07] <vagueBrother> i’m sure the answer is usually “it depends”
[01:03:28] <vagueBrother> but i’m trying to get a feel for how best to deal with this sort of data from people who are more experienced with mongo
[01:05:03] <Boomtime> vagueBrother: firstly, it is not generally recommended against using multiply nested arrays where you want to query on the items inside the deepest level
[01:05:35] <Boomtime> * we recommend against multiply nested arrays where you want to query on the items inside the deepest level
[01:06:23] <Boomtime> the source of this recommendation is a limitation in the array positional operator ($) which can only reference the first level array position in a succesful match
[01:06:25] <vagueBrother> is that for performance reasons?
[01:06:35] <vagueBrother> or just because it’s a pain in the ass?
[01:06:37] <Boomtime> so not for performance reasons
[01:06:53] <Boomtime> it can become quite difficult to work with such documents
[01:08:25] <Boomtime> you can contruct a season still if you need to, but you're more likely interested in the individual episode or the show generally anyway
[01:08:28] <vagueBrother> so i’d just have an array of episodes from ALL seasons
[01:19:32] <mordof> I've got a 'legacy' 2d index on location: [x, y] fields in my collection. i tried doing a location greater than [3, 3] but it only matches the x portion for the greater than. any way i can get both to match?
[01:20:09] <mordof> or would i need to separate the location into two individual keys and index each individually?
[01:20:10] <Boomtime> what is the query you tried?
[01:20:39] <Boomtime> also, can you provide the index definition? (use db.COLLECTION.getIndexes()
[01:20:43] <mordof> hmm.. i'm using mongoid, so i'm not sure of the exact equivalent :/
[01:21:31] <Boomtime> always try to know the equivalent in shell, it is invaluable dignostic aid
[01:21:56] <mordof> mhmm - i'm learning that gradually. i started out using mongoid so that knowledge is a bit further ahead
[01:22:28] <mordof> http://pastie.org/9815362 my indexes
[01:22:55] <mordof> i'll try to find the equiv query
[01:23:43] <Boomtime> documents have a "location" field specified as [longitude,latitude] ? (or [x,y] if you just want dimensionless euclidean distance)
[01:23:57] <mordof> if it's 2dsphere index it's assumed to be long,lat
[01:24:15] <Boomtime> the index you have is 2d, not 2dsphere
[01:25:47] <Boomtime> just because it's easy to source
[01:27:10] <Boomtime> if you want to cheat to get the query that mongoid is running you can increase logging on the server to print it out to the logs, which will emit in shell format
[01:27:34] <Boomtime> but that might be more trouble than just learning to convert whatever mongoid does
[01:29:22] <charform> is there anyway to apply an index to a subset of documents based on the value of a field rather than the existence of a field? sparse index on documents where field == x?
[01:30:25] <charform> hmm how might I achieve the following then..
[01:31:10] <charform> i have a users collection that contains users that signed up with their email and users that login via social media... I want a unique index on type: 'localUser', email: 1 but for that to not apply to social users..
[01:31:51] <charform> social users may have an email field that is the same as other social users ie if they login with google then facebook where the accounts are botha attached to the same email
[01:32:44] <charform> I guess I could create a separate collection for localUsers and social login users but other than this difference they have mostly the same fields and it would mean checking for user types before quering for users client side etc
[01:36:12] <Boomtime> mordof: you are querying the field as though it were a regular array, you should look up the spatial query operators
[01:37:17] <Boomtime> charform: if you create a unique sprase index where one of the fields is only present when the other two fields are combinatorially unique, that should work for your case
[01:39:01] <charform> does mongo short circuit on such indexes? ie if after checking userType and email, if they are unique will it immidiately go ahead with the write operation?
[01:39:07] <mordof> Boomtime: so this: db.stars.find({ location: { $geoWithin: { $box: [ [ 3, 3 ], [ 7, 8 ] ] } } }) is for suer using the index?
[01:40:51] <mordof> indexBounds, then a whole bunch of numbers in arrays, lol
[01:41:01] <Boomtime> charform: the index will short-circuit as fast as it can, it doesn't need to check any specifics - the index is effectively a binary search path
[01:41:20] <charform> or what about concatenating email + socialId and using that as _id? for local users it would jsut be their email social users would break ties with their socialId and index on a single field
[01:42:29] <charform> although emails may be quite long I suppose and I would have to store the email again in a separate field
[01:43:13] <Boomtime> charform: that's an interesting idea, it certainly sounds plausible, you have options to test by the sounds of it
[01:43:42] <Boomtime> considering the data replication, consider that an index effectively replicates the data from the document
[01:44:05] <Boomtime> you may find that storage wise, some options are equivalent
[01:44:59] <newmongouser> Hi, does anyone know where ListField is defined. I'm getting error : "NameError: name 'ListField' is not defined
[01:46:47] <Boomtime> you will need to provide more information, where are you seeing this error? what are you trying to do when it happens? what libraries/environment/versions etc
[01:49:21] <newmongouser> It's python 2.7.6, I'm actually trying to work with Flask, and I'm defining my database schema. I have class Org(db.EmbeddedDocument) and class Category(db.Document), in class Category() I've defined orgs=ListField(db.EmbeddedDocumentField('Charity')) and this is where I get the error
[01:50:24] <Boomtime> you are expecting ListField to be defined somewhere then?
[01:50:34] <Boomtime> is this some sample code you got from somewhere?
[01:53:12] <charform> Boomtime: would the standard ObjectId be more efficient for subsequent lookups though (as opposed to the concatenated email + socialId string)? I also have many documents in different collections that reference these user _id fields
[01:54:27] <Boomtime> objectid is assured unique inside of 12 bytes - that's pretty fast binary divergence compared to virtually any data you're likely to have on hand
[01:54:52] <Boomtime> but if that field is not being used consistently by you then why have it at all?
[01:55:09] <Boomtime> 'fast' only matters if you use it
[01:55:18] <charform> they are appearing to be mostly a wash when running them now, mongoId slight faster but I have to create the object each time client side from the string my mongoclient returns to the app
[01:56:21] <charform> which field isnt being used consistently? _id?
[01:56:58] <charform> after insert, that is the field I use always to retrieve a user
[01:57:07] <Boomtime> ok, then you use it consistently
[01:57:12] <charform> I just need this index initiall for user creation
[01:57:32] <Boomtime> i still doubt the small difference in speed will make any tangible difference in reality
[01:58:28] <charform> yah I suspect I will just go wit the sparse 3 field index unless there is a solution that is an order of mag better the concat seems mostly identical if not slightly worse
[02:00:07] <charform> a separate collection would mean two smaller indexes I guess, smaller b trees to traverse but I doubt that would be of any consequence unless the worlds population was using this service
[02:00:34] <Boomtime> sparese index is small, it contains only those documents which have the fields
[02:01:54] <Boomtime> you may consider that your sparse index only protects against the localuser duplicates, it does not prevent duplicates of combinations that should logically be unique (email:facebook for example)
[02:06:46] <charform> true, I was just thinking of the entire index, ie, every user is guaranteed to have at least one of the three fields
[02:07:12] <charform> therefore the unique check runs on the entire collection rather than the one it needs to only
[02:07:34] <charform> although if I put the socialId field first in the index it would short circuit liek you say
[02:07:44] <charform> therefore effectively being as efficient as two collections
[02:14:24] <newmongouser> I'm using the Tumblelog tutorial on Mongo's website, and I've verified everything up to the point asking for a ListField, but I'm still getting a name error. Is there a known reason for this or should I just reinstall all my extensions?
[03:29:01] <chetandhembre> can any one help me with this ?
[03:29:52] <newmongouser> on the mongo site comparing sql to mongo, it says these statements are equivalent: "SELECT user_id, status FROM users" is equal to "db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )", what's the purpose of the empty document in the db.users.find(), it's in some of their queries and not in others
[03:30:28] <cheeser> newmongouser: that's the empty where clause
[03:32:00] <newmongouser> Oh OK, is it necessary? If I if I ran the query without it would it result the same?
[03:32:49] <chetandhembre> My query look like this http://pastebin.com/UiMBDWRb
[03:35:38] <newmongouser> Is the empty where clause necessary when i only want to return objects with those fields present?
[03:51:40] <cheeser> newmongouser: you have to specify a query document even if it's empty.
[03:56:56] <newmongouser> On mongos docs it read: "Not specifying a query document to the find() is equivalent to specifying an empty query document." Do you think including it is just being explicit. Even from the page (http://docs.mongodb.org/manual/tutorial/query-documents/) it's not entirely clear
[03:57:35] <newmongouser> But in the snippet above - "db.users.find( { }, { user_id: 1, status: 1, _id: 0 } )", isn't the second document the query document?
[04:19:17] <rh1n0> Having trouble. I have a user that can connect to the mongodb but i keep getting an authorization error on a specific query - can you grant privileges to queries?!?
[04:19:39] <rh1n0> oh i bet its a stats privilege - nevermind ;)
[04:34:23] <newmongouser> Do you need an empty where clause on a query or does mongo supply it itself? in the docs it says find() will supply it if not given, but in all of their examples it is present
[04:37:11] <Boomtime> newmongouser: the docs use the shell find() method as the example, which can be called with no parameters and it will assume an empty document
[04:38:00] <Boomtime> not that if you need to pass a value for one of the other parameters to that method (filter or options) then you will have to supply an empty document in the first parameter because, well, javascript
[04:38:46] <Boomtime> in short, it depends on the language and driver you are using - what language and driver are you using?
[04:42:17] <Boomtime> newmongouser: did you see my reply?
[04:42:39] <newmongouser> Boomtime: is it a good habit to use the first blank document in case you invoke other methods later down the road (other than the find() method?
[04:42:46] <newmongouser> yes, sorry i was disconnected for a minute
[04:44:43] <Boomtime> newmongouser: your question also comes with the assumption that you'll be doing a lot of "find all" queries, which is kind of defeating half the reason of having a database - why would you always do this?
[04:46:56] <newmongouser> I won't. I was just going through Mongo's site in order to get more comfortable with best practices.
[04:47:35] <Boomtime> in this regard, i would say do whatever you are comfortable with
[04:52:19] <newmongouser> For my particular use I have roughly 40 'categories' with organizations listed in their respective category, so I'm running something similar to find( {category: 1}) or find({category: 3}) (category indexed)... There won't ever be more than 1500 organizations, so is the find syntax something I shouldn't bother getting more familiar with?
[04:53:03] <cheeser> you're going to need to know the "native" query language regardless of what driver your app uses.
[04:53:10] <cheeser> how else are you ever going to debug?
[04:53:45] <newmongouser> So it does become relevant later?
[07:06:06] <jimbeam> Is it ok to declare two different schema in two files for the same object? I only want to access a subset of the properties in each file
[07:08:30] <Boomtime> jimbeam: it sounds like your question is regarding a library you are using, mongodb is schemaless - what library are you using?
[08:00:46] <charform> if a field is part of a composite index does that it is also indexed individually or should i create a separate index on it if I plan to query based on this field regularly?
[10:33:57] <Abluvo> hi, I am trying to install mongodb with a YAML config file which includes the storage.nsSize property but I keep getting parsing errors. Is there anything wrong with this YAML file http://pastebin.com/LAa5uapg?
[10:34:26] <Abluvo> The error I keep getting is: Error parsing INI config file: unrecognized line in 'storage:'
[10:35:22] <arussel> I have foo: [[XXX]], how do I match document for which XXX === "a" ?
[12:44:20] <theRoUS> Derick: i'm mining a ticketing system, which means daily fetches of about 100K documents of at least 2K each from the ticketing system, and putting them into a mongodb.
[12:45:11] <theRoUS> i only want to do insert-new and update-if; is there a recommended best practice for checking to see if a document in mongo is different from one in memory?
[12:45:58] <Derick> I don't know... I guess you could calculate a clever hash - that you also store in the documents in the collection? I think that would be fastest
[12:46:08] <theRoUS> atm, i'm working along the lines of including a shasum field in the db documents, and just doing a find() of the key field and the shasum
[13:35:40] <Naeblis> Hi. How can I count the *total* of all nested arrays for my collection items which match a certain query? Eg: { $match: { foo: bar }} and get a total count of all the result.myArray items
[13:42:48] <leotr|2> hi! I have a problem. Disk space is out
[13:43:30] <leotr> i tried to create nfs share and move some of files to that share and then created symlinks
[13:43:58] <leotr> but it didn't help, mongodb shows me error messages because it can't lock file
[20:29:46] <mng7> why doesn't that work to set processed to false on a single document where processed doesn't exist?
[20:30:00] <mng7> (i know i should specify multi to make it work for all of them, but let's stick to this for now..)
[20:44:02] <hrrld> Hello, I'm using the node.js driver and I'm calling .each() on a cursor for the first time. How do I signal that I've accumulated as many documents as I want, and that I don't want any more calls to the callback I passed to .each() ?
[20:53:11] <hrrld> Hm. There may not be a way: https://github.com/mongodb/node-mongodb-native/blob/2.0/lib/cursor.js#L420
[22:25:59] <harttho> Hey, I've got a db/collections on shard 1 (and not on shard 2)
[22:26:19] <harttho> one mongos connection allows me to use the database and show collections
[22:26:30] <harttho> and another mongos connection doesn't show the collections
[22:27:39] <harttho> sh.status() shows the primary for the DB is on shard 2, where the data is actually on shard 1
[22:31:36] <harttho> (for both mongos connections on the sh.status()))
[22:35:02] <harttho> Closest issue I could find is https://jira.mongodb.org/browse/SERVER-12268
[22:37:18] <Boomtime> harttho: did you issue a movePrimary or have you dropped the database and re-created it any point in the past?
[22:47:39] <harttho> running mongostat on 1 mongos shows reads/writes/etc
[22:47:52] <harttho> running mongostat on the other mongos (with missing collections) shows no reads/writes/etc
[22:48:02] <harttho> (Note, the 2nd shard isn't being uses yet with proper sharding)
[22:58:07] <harttho> scratch those last 3 statements (irrelavent)
[23:17:02] <harttho> Boomtime: any thoughts? It appears mongos2 is routing to the incorrect shard, and mongos1 is reading from the correct, but empty, shard
[23:17:16] <harttho> sh.status has same result on both
[23:17:57] <Boomtime> "harttho: have dropped and recreated"
[23:19:01] <Boomtime> it's subtle and that ticket will look baffling at first
[23:19:17] <Boomtime> may be easier to read this one first: https://jira.mongodb.org/browse/SERVER-15213
[23:20:04] <Boomtime> the good news is that you have detected it and can fix your situation very easily now; restart the mongos
[23:20:45] <harttho> Similar to the conclusion we came up with
[23:21:05] <Boomtime> you can avoid a restart of the mongos by issuing a flushRouterConfig if you prefer
[23:21:08] <harttho> Question is, when restarting the mongos, will the data from shard 1 be moved? or will it just start to write on the correct shard
[23:21:20] <Boomtime> it will start to write to the correct shard
[23:21:40] <Boomtime> if you have data now residing on shard 1 then you have a problem
[23:21:44] <harttho> We were going to move data from shard 1 to 2 (new primary), flushRouterConfig, enjoy writing data to existing collections on correct shard