PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 12th of February, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:39:47] <jayjo> Why is it that my mongo daemon is now starting on port 43980, when I've ran service mongod stop and restarted
[00:39:55] <jayjo> I am using ubuntu14.04 in this instance
[00:48:58] <cheeser> check your conf file
[02:38:36] <hackel> Why does the positional $ operator only return one result? How can I filter the contents of an array and include *all* the matching elements?
[02:40:38] <joannac> hackel: that's the way it's implemented
[02:41:03] <joannac> if you need multiple results from a single array, your array elements are better suited to be top level documents
[02:41:17] <hackel> Yes, I get that, but why? What is the use case? It just seems so arbitrary.
[02:42:40] <hackel> joannac: I am attempting to filter out subdocuments in an array when deleted_at is not null. I do want them to remain subdocuments, though.
[02:44:21] <joannac> hackel: why do they need to be subdocs?
[02:45:08] <hackel> joannac: Because I'm using Mongo, that's the whole point.
[02:45:52] <hackel> Yes, I could make everything a separate collection, but in that case I might as well use SQL and not have to deal with all this headache to accomplish simple tasks.
[02:55:48] <haole> where do I get a list of the support events in MongoDB's Node.js driver? like 'fullsetup' and 'close'
[03:20:55] <morenoh149> haole: 1.4 docs http://mongodb.github.io/node-mongodb-native/1.4/
[03:21:31] <morenoh149> make sure you use the right docs. I got really pissed off when I started out doing mongod from the node driver. I was looking at the 2.0 docs the whole time -.-
[04:40:03] <benson> is it possible to install just mongodump? I want to periodically backup a remote database but dont want to install mongodb on the server doing the backing up
[04:43:33] <Boomtime> you could just copy the binary from your existing install..
[04:44:17] <Boomtime> but i don't think there is a way to install a single particular tool from the set
[04:44:37] <benson> theres no issue with paths/dependencies doing that?
[04:45:41] <Boomtime> there might be, that would depend on the status of your destination machine - if you want to be sure you can do the do the fake install trick and determine if it would pull in other packages
[04:47:35] <benson> you mean the simulate apt-get install?
[04:48:00] <Boomtime> yeah, i can't remember the option, but you know the one
[08:01:14] <queretaro> Hi, does anyone of you use MongoDB as a backend for Adobe AEM?
[10:29:51] <brokenpipe> Hello from Beijing
[10:30:57] <brokenpipe> I just installed mongoDb and configured /etc/mongo.config file with correct permissions and when I start app does not read this file
[10:32:59] <brokenpipe> error everytime is dbpath=/srv does not work
[10:33:25] <brokenpipe> I created with permissions and changed to another and is same error
[10:50:17] <Audac3> hey everyone
[10:50:28] <Audac3> how are things going?
[11:14:04] <guest9999> hello. just wondering. why dont mongodb have different yum repositories for previous releases? e.g. 2.4. so when i create and bootstrap new server it doesnt automatically got to 2.6?
[11:14:18] <guest9999> or different package names
[11:14:57] <guest9999> e.g. i can install different php versions like: `yum install -y php53u` with php, php53u, php54, php55u, php56u
[12:29:39] <hemangpatel> Hi
[12:37:12] <elshaa> hi
[12:37:38] <elshaa> Is there a variable I can check from mongo client to know about the logging and verbosity options ?
[12:45:23] <drager> Hello, how much does mongodb need to start?
[12:45:25] <drager> (disk)
[12:46:54] <haole> morenoh149: couldn't find the events - only "events" in the sense of people getting together to talk about mongodb lol
[12:46:56] <haole> where do I get a list of the support events in MongoDB's Node.js driver? like 'fullsetup' and 'close'
[12:57:09] <flyingkiwi> drager, the first start need about 3089872 bytes
[12:57:23] <flyingkiwi> +kilo
[12:57:34] <elshaa> ok, I got the "getParameter" thing
[13:01:17] <drager> flyingkiwi: Alright, I cant start it…
[13:01:18] <drager> /dev/vda1 20G 18G 915M 96% /
[13:02:55] <flyingkiwi> mongod is preallocating disk space (the said 3gb) - so 915mb free disk space is insuffient
[13:36:21] <drager> Alright, I removed my documents in the db
[13:36:25] <drager> and now it started fine
[13:36:32] <drager> when I have some more space; /dev/vda1 20G 7.8G 11G 42% /
[13:38:35] <kali> you can use the smallfiles option to reduce the preallocation quantum
[13:38:47] <kali> it's not necessarily a good idea for production
[13:53:49] <Sticky> re mongo preallocation, can mongo give any indication of how full its preallocated files are and when it is likely/close to growing?
[13:55:39] <cheeser> maybe with db.stats()
[13:56:00] <kali> mmmm yeah... maybe when storageSize grows near fileSize ?
[13:56:09] <cheeser> that'd be my guess
[13:57:13] <Sticky> ok, thanks
[13:57:39] <Sticky> mongos ib
[13:57:41] <Sticky> oops
[13:58:39] <Sticky> mongos inability to defrag when you remove records without doing a full resync is quite annoying. A few times when we have run out of disk space on servers then cleaned the db it was an issue
[14:53:46] <valera> hello, how much overhead initial sync has comparing to pre-seeding copying ?
[15:24:06] <jiffe> http://www.securityweek.com/thousands-mongodb-databases-found-exposed-internet
[15:24:35] <jiffe> this is why authentication needs to be turned on by default
[15:26:16] <cheeser> no, this is why people need to think when putting software in to production.
[15:27:39] <Zelest> putting? i write my software in production :D
[15:27:51] <jiffe> cheeser: that is never going to happen and the software ultimately gets blamed for it
[15:28:20] <cheeser> consider it digital darwinsim.
[15:29:57] <jiffe> cheeser: I do, but not in the same context you are I'm guessing
[15:31:35] <StephenLynx> the default configurations make is secure so only local connections are accepted.
[15:31:50] <StephenLynx> and they didn't even tested for all those databases to see if they could actually access them.
[15:32:09] <StephenLynx> they were just "oh, it is open, lets assume it is open to anyone to access it"
[15:33:50] <jiffe> the only barrier to stop that access is auth and I'm willing to bet most of them don't run auth
[15:34:38] <StephenLynx> that and not taking external connections. which is turned on by default on install.
[15:35:28] <jiffe> but all those 40000 were accepting those connections, I'm sure they tried to access the db from another machine and when they couldn't they listened on all interfaces and then everything was working so no need to go further
[15:35:58] <StephenLynx> I dont know, Im yet to run the test they ran on a fresh install of mongo.
[15:36:19] <StephenLynx> did their test just checked for an open port or it actually tried to connect?
[15:36:24] <StephenLynx> I lost the link to the pdf.
[15:39:21] <Derick> I do think we should only bind to 127.0.0.1 by default though
[15:42:18] <Tausen> Hey! I'm a bit puzzled with some timing and hope someone can help me shed some light. I'm using aggregate through pymongo and filtering with $match to find documents where a field is in a list of strings. If I for example have 3 entries in the list and a lot of data in the collection, but no data matching those 3 entries, it is *much* faster to do three separate requests with only one element in the list than doing one request with all three in the list. Can
[15:42:18] <Tausen> mongodb not use the index I have on the field as efficiently in this case or something?
[15:43:23] <Sticky> having a sane default that non-localhost connections require auth, would probably have prevented a huge number of those misconfigurations. And should not be terribly difficult to implement
[15:45:18] <valera> what triggers switch of replica from stale to fatal state ?
[15:49:25] <jiffe> Sticky: agreed
[15:54:13] <Sticky> the argument that it is safe since it does not bind the public port is a bit off, people are used to existing db's that by default are secured (atleast the dbs I have used). Similarly web servers when asked to bind a public port will not expose their admin consoles unauthed to the world
[15:55:46] <Sticky> there is an expectation of safe by default by more than just binding the right ip, breaking that expectation is exposing your users to risk
[16:01:54] <StephenLynx> that is, indeed, a valid argument.
[16:02:26] <StephenLynx> doesn't change the fact the user has to screw up to make its db not secured, though.
[16:05:22] <Sticky> yeah, but for very little extra effort and inconvinience you could protect the majority of users. If you want to add an enableunauthedPublicAccess config param fine, then the few people who do want it have to explicitly request it for little effort
[16:07:40] <StephenLynx> again, a good argument and I'm sure they had a reason to not do that. You could try and open a ticket on jigra about it.
[16:09:33] <Sticky> tbh the fact that mongo does not make it easy to obtain an ssl'ed mongo is almost as bad as this issue as well
[16:09:49] <cheeser> that's been fixed in the latest releases, iirc
[16:09:57] <valera> what would be the correct way to re-sync replica in FATAL state ?
[16:10:12] <Sticky> cheeser: are they shipping an ssl'ed mongo now?
[16:10:25] <cheeser> i believe so. at least on the nonwindows builds.
[16:10:28] <StephenLynx> yeah, it was an issue with cross compiling or somethin.
[16:10:30] <Sticky> hmm
[16:10:32] <StephenLynx> not intentional.
[16:10:44] <cheeser> yeah. a linkage issue, iirc
[16:11:19] <Sticky> oh, they are accidentally shipping one?
[16:11:29] <StephenLynx> accidentally?
[16:12:16] <Sticky> I thought your "not intentional." ment they are not intentionally shipping ssl'ed mongo
[16:12:27] <StephenLynx> oh, not
[16:12:31] <StephenLynx> making it difficult
[16:12:35] <StephenLynx> that was not intentional.
[16:13:05] <Sticky> I thought it was a support contract upsell
[16:13:19] <cheeser> kind of.
[16:13:22] <cheeser> but not really
[16:13:38] <Sticky> where are the ssl binarys available from?
[16:13:40] <cheeser> the extra work to deliver an ssl build was worth the extra money
[16:13:51] <cheeser> but i believe that glitch has been fixded.
[16:15:25] <queretaro> can I install the MMS automation agent on SuSE Lunux?
[16:24:39] <AnnaGrey> Im trying out mongodb for the first time and im wondering if im in the right spot with this schema http://pastie.org/9942231
[16:30:53] <StephenLynx> you could start learning without using stuff like mongoose.
[16:31:04] <StephenLynx> and only start using if you really need.
[16:31:23] <StephenLynx> about the model
[16:31:25] <AnnaGrey> StephenLynx: well i need it
[16:31:28] <StephenLynx> I would make user more flat.
[16:31:32] <StephenLynx> for what?
[16:31:37] <AnnaGrey> for node js
[16:31:40] <StephenLynx> that profile object does not need to exist
[16:31:42] <StephenLynx> no you dont
[16:31:50] <StephenLynx> I use node and now io and never used mongo
[16:31:55] <StephenLynx> just the official node driver.
[16:32:30] <StephenLynx> aside from that, seems fine.
[16:33:16] <AnnaGrey> All right thanks StephenLynx
[16:33:30] <StephenLynx> never used mongoose*
[16:33:59] <AnnaGrey> StephenLynx: thought from what i read its a good orm
[16:34:13] <StephenLynx> personally I find it useless and bloated.
[16:35:17] <AnnaGrey> Will try everything out without mongoose
[16:36:56] <StephenLynx> do that, it is best to learn with the very minimum necessary.
[16:55:14] <hmsimha> I've been finding the documentation on TTL indexes a bit incomplete. If a TTL is set, say, for 3600 seconds (1 hour) on a 'lastUpdated' field that may get updated with some frequency, does that TTL countdown reset every time the field is updated?
[16:58:24] <cheeser> there isn't a countdown.
[16:58:51] <cheeser> there's a thread that runs that deletes docs where lastUpdated < $cutoff
[17:01:00] <hmsimha> ok thanks cheeser
[17:34:08] <hmsimha> If I'm getting documents from a collection in chunks of 1000 (`.limit(1000)`) and in between performing `collection.find().limit(1000)` and `collection.find().limit(1000).skip(1000)` one of the documents returned in the first query is deleted from the db, how can I prevent skipping over a document?
[17:35:33] <jiffe> Sticky: did you ticket default auth?
[17:48:12] <StephenLynx> deleted? how come?
[17:51:52] <jiffe> hmsimha: instead of skipping you could sort by an ascending field and add a $gt filter
[17:54:02] <appledash> Hello... Is there any way to have a "Standalone" MongoDB server? Perhaps I am using the wrong term, but what I want to do is have an application that uses MongoDB as a database, but it spins up its own internal copy of MongoDB to use just for itself, with the data being storeed in a subfolder of the dir the application is in. Is this possible?
[17:55:29] <hmsimha> jiffe: thanks, but lets say I have 10000 documents numbered 1-10000 and I request `collection.find({someField: {$gte: 0}}).limit(1000)` to start off with, I guess on the server I need to store the last value of someField somewhere?
[17:55:31] <jiffe> the term you're looking for is embedded
[17:55:36] <appledash> If it helps, my application is Python
[17:55:46] <appledash> And yes, yes it is. Thanks
[17:56:11] <cheeser> mostly, yes. but it wouldn't be embedded in the sense of running in process with your app
[17:57:19] <jiffe> hmsimha: yes each lookup will have to know the last value, similar to if you were to use skip you would have to skip(1000 * n)
[17:57:21] <appledash> That doesn't matter too much
[17:57:38] <hmsimha> great, thanks Jiffe!
[18:15:54] <hmsimha> the docs list an example for index creation of a compound index: `db.products.ensureIndex( { item: 1, quantity: -1 } )`. How does index creation preserve the order of the fields if they're passed as object keys?
[18:19:35] <StephenLynx> I supose it just uses the sequence of the keys.
[18:19:42] <StephenLynx> objects still have an order for their keys
[18:20:47] <StephenLynx> if you get to print an index name, you will see it is something like field_1_anotherfield_1 or something. I don't remember too well. I know It would be in the error.err or something when you try to insert something that disrespects the compound index.
[18:23:50] <hmsimha> StephenLynx: http://stackoverflow.com/questions/5525795/does-javascript-guarantee-object-property-order
[18:26:05] <hmsimha> ah, found the answer: http://stackoverflow.com/questions/18514188/how-can-you-specify-the-order-of-properties-in-a-javascript-object-for-a-mongodb
[18:48:16] <ezakimak> I just did db.coll1.copyTo(db.coll2) and now when i show collections there's an entry "[object Object]", and the coll2 is still empty
[18:48:49] <ezakimak> how can i fix this?
[18:49:38] <cheeser> use this instead: http://docs.mongodb.org/manual/reference/method/db.cloneCollection/#db.cloneCollection
[18:50:28] <ezakimak> how do i get rid of the funky collection entry?
[18:50:39] <fewknow> db.collection.drop()
[18:50:51] <ezakimak> but how do i name it?
[18:50:54] <fewknow> ezakimak: are you trying to clone the collection of just rename it
[18:51:07] <fewknow> show collections
[18:51:12] <fewknow> show give you the name
[18:51:14] <cheeser> you can use db.getCollectionNames() to get the name of the errant collection
[18:51:42] <ezakimak> ah, got it. db['[object Object]'].drop()
[18:51:47] <cheeser> it's probably, literally, "[object Object]"
[18:51:49] <ezakimak> it got the name from toString
[18:51:50] <cheeser> yeah
[18:52:16] <ezakimak> so what's the diff between db.collection.copyTo() and db.cloneCollection() ?
[18:52:36] <fewknow> probably nothing
[18:52:40] <fewknow> just a wrapper
[18:52:55] <fewknow> ezakimak: are you just trying to rename the collection or actually copy it?
[18:53:06] <ezakimak> move the data
[18:53:16] <ezakimak> it was in the wrong place
[18:53:23] <fewknow> that is just renaming the collection
[18:53:27] <fewknow> you dont' have to move anything
[18:53:30] <ezakimak> no, i need both
[18:53:35] <fewknow> so copy
[18:53:39] <fewknow> or clone
[18:53:45] <ezakimak> i guess i could rename then recreate the one
[18:53:50] <fewknow> yep
[18:57:03] <ezakimak> ok, so schema question, i have on Person collection, and each person can have 0 or more roles. I implement the roles as subdocuments. would it possibly be better to split the roles out into their own collection? (classic collection vs subdocument question)
[18:57:09] <ezakimak> *have one
[18:57:36] <StephenLynx> can you put it on text and show it?
[18:57:38] <ezakimak> most will only have one role, many zero, a few multiple
[18:58:06] <StephenLynx> I have something similar in two projects of mine, I use a subarray with strings for the roles.
[18:59:07] <ezakimak> in mine the roles are specific, eg Person: { _id: 4, name: "joe", admin: { admin role stuff } }
[18:59:36] <StephenLynx> what exactly is admin role stuff?
[18:59:38] <ezakimak> some of the roles are very large
[18:59:55] <ezakimak> well admin is small, it might just have perms for admin stuff
[19:00:08] <ezakimak> employee role might have a manager_id or something
[19:00:31] <ezakimak> user role has authentication tokens and access permissions
[19:01:11] <StephenLynx> I still have no idea how they are structured.
[19:01:24] <ezakimak> each role in mine has it's own structure, role-specific
[19:01:42] <ezakimak> they are just optional subdocuments on in my Person collection
[19:01:59] <StephenLynx> and there isn't any rules for how they are structured?
[19:02:15] <ezakimak> sure there are, all admin roles are the same
[19:02:49] <ezakimak> but, i have one role, student, which is very large, and the central focus of the application, that subdocument is complex
[19:03:08] <StephenLynx> that is bad design, IMO.
[19:03:24] <StephenLynx> you have this field that may have one structure or another
[19:03:29] <ezakimak> i didn't want two mechanism for representing roles, but didn't want to make a new collection for each role (because eventually i want to allow custom roles)
[19:03:34] <ezakimak> no i don't
[19:03:44] <ezakimak> each role is it's own field.
[19:04:08] <StephenLynx> so there is a field called admin
[19:04:11] <ezakimak> yes
[19:04:13] <StephenLynx> and another field called student?
[19:04:16] <ezakimak> if the person has the role
[19:04:18] <ezakimak> yes.
[19:04:20] <ezakimak> if he's a student.
[19:05:22] <ezakimak> well, i think i just answer my own question. I can't do searches easily if they are split into different collections
[19:06:03] <StephenLynx> fields, you mean?
[19:06:06] <ezakimak> i do not want to be doing joins
[19:06:32] <StephenLynx> you could have a single field called role that would have a string
[19:06:39] <StephenLynx> and then another field to hold another data.
[19:06:48] <StephenLynx> that could be or not empty.
[19:08:00] <ezakimak> why would i do that?
[19:15:12] <fewknow> ezakimak: use an array or sub docuemnts for each role ?
[19:15:28] <ezakimak> i'm using subdocuments now
[19:15:45] <ezakimak> just thinking through if it would be better to split them out, decided against it
[19:16:03] <fewknow> you don't keep the meta data of the role in the subdocument
[19:16:08] <fewknow> just what role for the user
[19:16:15] <fewknow> the meta data can be in a different collection
[19:16:36] <ezakimak> meta data?
[19:16:48] <fewknow> what is role?
[19:16:54] <fewknow> just the word "admin"
[19:16:54] <ezakimak> actual data
[19:16:54] <fewknow> ?
[19:17:05] <fewknow> like?
[19:17:29] <ezakimak> a teacher will have data like state id, qualifications, etc. a student role will have all the student record data
[19:17:48] <fewknow> that isn't a role
[19:17:50] <fewknow> that is a profile
[19:18:12] <ezakimak> whatever. i call it role
[19:18:24] <ezakimak> doesn't change what it *is*
[19:18:30] <fewknow> actually it does
[19:18:34] <fewknow> profile is what you want
[19:18:54] <fewknow> a profile doesn't control access
[19:18:56] <fewknow> a role does
[19:19:09] <fewknow> profiles can have roles
[19:19:19] <ezakimak> a *person* can have roles
[19:19:24] <fewknow> yes
[19:19:30] <fewknow> and a person can have a profile
[19:19:30] <ezakimak> one role is "user", which controls access
[19:19:50] <fewknow> sure...but user doesn't have state id, or qualifications
[19:19:54] <fewknow> as a role
[19:20:20] <ezakimak> no, they don't
[19:20:48] <fewknow> okay you said a role = https://dash.metamarkets.com/magnetic_audience/explore_audience_searches#e=2015-01-19&p=custom&s=2015-01-12&zz=3
[19:20:50] <fewknow> sorry
[19:21:00] <fewknow> role = a teacher will have data like state id, qualifications, etc. a student role will have all the student record data
[19:21:09] <StephenLynx> you could show us the schema
[19:21:18] <ezakimak> i did
[19:21:18] <fewknow> yeah that would work
[19:21:31] <StephenLynx> where?
[19:21:35] <fewknow> gist?
[19:21:46] <ezakimak> Person: { _id: 2, name: "John", teacher: { .... }, user: { ... } }
[19:21:59] <ezakimak> if they have the role, it has the subdocument, otherwise it doesn't even have the key
[19:22:09] <fewknow> those aren't roles
[19:22:16] <StephenLynx> you are still now showing the whole schema
[19:22:18] <fewknow> those are profile properties of a person
[19:22:18] <StephenLynx> not*
[19:22:35] <StephenLynx> what fields does teacher and user contains?
[19:22:41] <StephenLynx> what are the other collections?
[19:22:42] <ezakimak> stuff specific to those roles
[19:22:50] <StephenLynx> well, stuff is important.
[19:22:51] <fewknow> that would be meta data
[19:22:56] <fewknow> for each role
[19:22:57] <ezakimak> no, it's *data*
[19:23:10] <fewknow> meta data is data
[19:23:10] <ezakimak> metadata is data about data, this is actual data
[19:23:20] <fewknow> this is data about the role
[19:23:26] <ezakimak> metadata is stuff like last_access_time, modified_by, etc.
[19:23:27] <fewknow> teacher meta data
[19:23:35] <ezakimak> no, it *is* the teacher record
[19:23:55] <StephenLynx> again, we can't really help you on schema design if you don't show us the schema.
[19:24:02] <ezakimak> if there was a collection named Teacher, that's what would be in it.
[19:24:05] <StephenLynx> as he said, metadata is just data.
[19:24:07] <ezakimak> I just *did*
[19:24:22] <fewknow> this is not a schema
[19:24:22] <fewknow> Person: { _id: 2, name: "John", teacher: { .... }, user: { ... } }
[19:24:23] <StephenLynx> you showed us half of what a collection looks like.
[19:24:29] <fewknow> what is in teacher?
[19:24:31] <fewknow> or user?
[19:24:37] <StephenLynx> and what are the other collections?
[19:25:04] <ezakimak> the other collections are irrelevant
[19:25:19] <fewknow> schema should probably be similar to this
[19:25:35] <StephenLynx> :^)
[19:25:36] <fewknow> Person: { _id: 2, name: "John", role : [teacher,user] }
[19:25:51] <fewknow> Person: { _id: 2, name: "John", role : ['teacher','user'] }
[19:25:56] <StephenLynx> no data is irrelevant.
[19:26:44] <ezakimak> ok, then how would you search for a teacher named "john" ?
[19:27:12] <ezakimak> you'd use '$in' { 'role': 'teacher' } right?
[19:27:14] <fewknow> db.collection.find("name" : "John", role : "teacher")
[19:27:20] <fewknow> you don't need in
[19:27:28] <StephenLynx> role:{$in:["teacher"]}
[19:27:28] <ezakimak> but then you'd have to do a second query to get the rest of the teacher data, right?
[19:27:35] <StephenLynx> oh, nvm
[19:27:37] <StephenLynx> that is for projection
[19:27:39] <ezakimak> and if you want multiple results, now you have to join
[19:27:41] <StephenLynx> I always confuse myself with it
[19:27:44] <fewknow> why do you need teach data?
[19:27:52] <fewknow> multiple results?
[19:28:04] <fewknow> depends on the use case
[19:28:08] <ezakimak> if i want a list of teachers that teach 7th grade english
[19:28:18] <fewknow> if you need the meta data about the role then yes...you need 2 queries
[19:28:25] <ezakimak> not w/my schema i don't
[19:28:26] <fewknow> then you put that on the profile
[19:28:36] <ezakimak> which i call role.
[19:28:44] <fewknow> Person: { _id: 2, name: "John", role : ['teacher','user'] , grades: [7,10]}
[19:28:49] <fewknow> that isn't a role
[19:28:49] <ezakimak> no.
[19:28:53] <ezakimak> because not all people have grades.
[19:28:56] <StephenLynx> we have no idea what your schema is.
[19:29:00] <fewknow> that is MONGO
[19:29:01] <ezakimak> i showed you.
[19:29:05] <fewknow> they don't all need the grades
[19:29:14] <StephenLynx> again, you showed us half of one collection
[19:29:17] <fewknow> this isn't a relational database
[19:29:20] <StephenLynx> not the whole schema
[19:29:20] <ezakimak> exactly.
[19:29:38] <fewknow> right so a PROFILE...can have one to none of the properties
[19:29:43] <fewknow> the docuemnts don't ever have to match
[19:29:46] <fewknow> that is the point
[19:29:51] <fewknow> you want a profile of a PERSON
[19:29:59] <fewknow> every property they could have
[19:30:04] <fewknow> then make it searchable
[19:30:06] <ezakimak> Person: { _id: 2, name: "John", teacher: { grades: [7,10], classes: [4,22,97], ... }, user: { passphrase_sha1: 'xyz', ... } }
[19:30:25] <ezakimak> the whole thing is a profile
[19:30:43] <ezakimak> the *roles* contain profile data specific to that role within the profile
[19:30:43] <fewknow> why do you need teacher?
[19:30:52] <fewknow> if they have grades you can assume they ARE a teacher
[19:31:05] <fewknow> that isn't waht a role is
[19:31:08] <ezakimak> no, because there could be other roles that also have grades and for different reasons
[19:31:23] <fewknow> okay
[19:31:27] <fewknow> then just nest all of it
[19:31:32] <fewknow> will get bad query performance
[19:31:36] <ezakimak> i am using the terms from my ubiquitous language from the business model in my design
[19:31:56] <fewknow> then make a profile with properties
[19:32:18] <ezakimak> that is my original question, how much worse or better is it to leave these as subdocuments vs splitting them out into their own collection at the expense of now having to do joins and complicating searches
[19:32:22] <fewknow> Person: { _id: 2, name: "John", role : ['teacher','user'] , teacher_grades: [7,10]}
[19:32:33] <ezakimak> no no no. i will not flatten my names
[19:32:54] <fewknow> then make a roles collection
[19:32:59] <fewknow> and bucket them by _id
[19:33:22] <ezakimak> isn't that what I already did?
[19:33:29] <fewknow> every role in the collection, no matter what role with have the corresponding _id to the PERSON document
[19:33:37] <fewknow> no, you nested everything
[19:33:48] <fewknow> the other way is a seperate docuemnt for every role
[19:34:01] <ezakimak> exactly, and that's what i am considering the pros and cons of
[19:34:07] <fewknow> k..
[19:34:20] <fewknow> the only way to know ... is to know the access patterns you need
[19:34:33] <fewknow> if you have to query based on the roles then having them nested will be slow
[19:34:37] <fewknow> depending on the query
[19:34:45] <ezakimak> my "role" is a subset of the person's profile, with data specific to that role, if they have it, it's *still* part of the profile
[19:34:53] <fewknow> sure
[19:34:56] <fewknow> so 2 collections
[19:34:59] <fewknow> person and role
[19:35:07] <ezakimak> and what goes in your suggested role collection?
[19:35:17] <fewknow> the teach sub document
[19:35:21] <fewknow> user suddocument
[19:35:25] <fewknow> with a key back to person
[19:35:29] <ezakimak> both in the same "role" collection?
[19:35:33] <fewknow> yes
[19:35:37] <ezakimak> how does that help???
[19:35:38] <fewknow> they are all roles
[19:35:45] <fewknow> it buckets all of them
[19:35:53] <fewknow> so if you need to access all roles for one person
[19:35:54] <ezakimak> now i still have to look in two places
[19:35:56] <fewknow> it is one query
[19:36:03] <fewknow> or if you need to find one person with all those roles
[19:36:08] <fewknow> one query
[19:36:14] <ezakimak> hrm.
[19:36:25] <fewknow> if you need the person info then yes 2 queries
[19:36:30] <ezakimak> now i have three things to consider... :)
[19:36:37] <fewknow> but the first query for the role will be MUCH faster than the nested docuemtn
[19:36:48] <fewknow> and you can index it better
[19:37:11] <ezakimak> a) keep it as one collection with subdocuments, b) split out each role into it's own collection, c) your idea of lumping all roles into a 2nd "roles" collection
[19:37:34] <ezakimak> i think (c) just adds complication w/o making anything better than (a)
[19:37:35] <fewknow> I have built something very similiar at scale
[19:37:40] <fewknow> will over a billion profiles
[19:37:50] <MacWinner> i'm planning on moving away from using gluster for scalable storage to GridFS
[19:37:50] <fewknow> c is faster
[19:38:11] <ezakimak> last big project I was on they abandoned gluster and went to moosefs
[19:38:14] <MacWinner> i was wondering, how do you handle multiple filenames for the same blob of data with the same md5 hash?
[19:38:24] <ezakimak> they couldn't get deterministic behavior from gluster in failure scenarious
[19:38:27] <ezakimak> *scenarios
[19:38:56] <MacWinner> in gluster I'm appending the filename to the blob of data, and doing md5 hash on that..
[19:39:00] <ezakimak> MacWinner, maybe get some ideas from git?
[19:39:13] <ezakimak> ch 9 i think is the internals chapter
[19:39:36] <MacWinner> k.. i'll there, thanks
[19:40:20] <ezakimak> they use sha1, but md5 ought to work similar
[19:41:13] <ezakimak> fewknow, so in your idea (c), you'd have to encode the role type into each document
[19:41:59] <ezakimak> use the same _ids from the Person collection, then also index the type field also
[19:42:21] <ezakimak> oh, wait, you'd need a different _id for it to keep them unique
[19:42:28] <ezakimak> bleh.
[19:44:02] <ezakimak> I think i'll leave things as they are for now, maybe later i'll revisit this if i move to an ES achitecture
[19:45:04] <ezakimak> thanks for the discussion though, it's helped
[20:02:29] <fewknow> ezakimak: you don't need a different _id...you bucket on them
[20:02:38] <fewknow> I will be back in a bit if you want to talk about it
[20:02:45] <ezakimak> doesn't _id have to be unique in a collection?
[20:06:01] <cheeser> yes, it does
[20:08:51] <ezakimak> so "bucketing" on them simply means lumping them all together in the same document
[20:25:11] <mrmccrac> anyone know if mongos is multithreaded?
[20:56:15] <cheeser> mrmccrac: hrm?
[20:56:46] <mrmccrac> can a mongos instance use multiple cores
[20:58:45] <cheeser> oh, i would expect so...
[21:08:06] <ezakimak> mine is only using one process, no threads
[21:09:47] <ezakimak> maybe replica sets use threads for different modes
[22:07:46] <timmytool> Hi Everyone.
[22:07:58] <timmytool> Can someone here help me with a performance question about mongo?
[22:12:02] <timmytool> We recently upgraded our mongo servers from 2.4.10 to 2.6.7. We have been running fine for almost a year on the older version. After the upgrade we are having some serious performance problems. Based on what I can tell by looking at the currentOp statistcs, we have some queries that are taking a long time to run. The queries are fully indexed, I have verified this by doing an explain. Looking at the ops they have up to an hour with a
[22:12:03] <timmytool> timelockedMicro. The num yields is also very high for these queries. Is there any way I can get more information about what is going on? Thanks
[22:13:49] <fewknow> timmytool: do you use MMS? There are a lot of ways to see what is going on.
[22:13:58] <timmytool> Yes. I have looked at it.
[22:14:00] <fewknow> How do you know your queries are fully indexed? did you check the logs
[22:14:13] <fewknow> unless you are hinting all queries you can't be certain
[22:14:48] <fewknow> you can use explain to see if the index is being used correctly
[22:14:52] <timmytool> fewknow: thanks. I’ve done an explain on the query and it is using an index. It is the same query that is taking a long time. There are many instances of it running with.
[22:15:12] <timmytool> with different paramters^^
[22:15:15] <fewknow> k
[22:15:42] <timmytool> Is there some way for me to tell why mongo is yielding the locks on these queries. I’m assuming that that is the problem.
[22:15:59] <fewknow> they only yeild to writes
[22:16:08] <fewknow> you can run mongostat --discover
[22:16:14] <fewknow> and see how the replica set is performing
[22:17:27] <fewknow> are you sure it wasn't some code that was released that is inserting something that is causing the lock?
[22:17:39] <timmytool> fewknow: Thanks again. How do I determine replicat set performance from the output of mongo stat.
[22:17:53] <timmytool> Yes we did not release any code 1 week before and after the release
[22:18:03] <timmytool> 1 week before and after the upgrade ^^
[22:18:44] <timmytool> We do have a high number of inserts in our database, but we always have.
[22:19:15] <timmytool> I’m thinking that something in the newer version of mongo is yielding those locks more easily than the old versions.
[22:52:09] <AnnaGrey> Hey guys do you find this schema correct? http://pastie.org/9943076
[23:05:39] <jiffe> if you run a replicated or shareded setup on separate machines then you need to configure mongod to bind on an interface other than localhost right?