[00:06:39] <buzzalderaan> this may be an easy question, but i haven't found a clear answer online. i have a document that has a list of _id of references to other documents and now i need to be able to filter that list base on some supplied criteria. i was hoping to take advantage of the mongoose populate function and return only the _id field in the results and i could do any necessary post processing after, but i can't seem to find an easy way to return only the _id
[00:07:40] <GothAlice> Normally this would be the second argument to .find(). A la (in the shell) db.example.find({}, {others: 1})
[00:10:24] <buzzalderaan> would you still have to disable all the other fields in order to return just the _id field though?
[00:11:06] <GothAlice> Nope, if you specify the second argument it specifies explicitly which fields to return, with _id selected by default. (add "_id: 0" to disable that if you wish)
[00:12:02] <buzzalderaan> hm.. okay, the documentation was a bit confusing about that
[00:12:20] <buzzalderaan> and the documentation for mongoose certainly isn't any clearer
[00:12:35] <Dinq> good evening: quick question from new mongodb user - should I be worried if my Master and my Replica have different numbers of files in the /var/lib/mongodb directory? If so, what can I do to fix it, or what might have caused it?
[00:12:50] <GothAlice> buzzalderaan: Whenever I'm confused about exact behaviour, I drop into the MongoDB shell and play. :)
[00:13:32] <GothAlice> Dinq: Do you have --smallFiles enabled on one but not the other? (Or otherwise have different stripe sizes?)
[00:13:40] <buzzalderaan> not a slam against mongo, but i feel i've been spoiled using the MSSQL tools for the past 3 years, so to me working with the mongo shell isn't the easiest
[00:14:19] <Dinq> GothAlice: thanks for the input. I suspect both servers are identical in those settings, but one moment...
[00:15:14] <buzzalderaan> i'm also not sure how i'd model this query in mongoshell
[00:15:19] <GothAlice> Dinq: Also, likely the secondary doesn't need as large an oplog as the primary.
[00:15:45] <Dinq> GothAlice: I don't see --smallfiles enabled, no.
[00:16:23] <Dinq> printReplicationSetInfo() shows replication is occurring with only about 30s delay, file timestamps and sizes look identical(close enough on timestamp)
[00:16:26] <GothAlice> Dinq: What's the actual difference between the folders?
[00:16:49] <Dinq> but Master (used to be slave) has about 4000 more files (62000 vs 58000)
[00:17:06] <Dinq> file size is very close but not bit-for-bit (1.4?TB vs 1.5?TB)
[00:17:27] <Dinq> with that many files, i haven't quite gotten to the point to know exactly which files are different.
[00:17:38] <Dinq> also, they do seem to be "rotating"? at the 2GB per file limit
[00:17:51] <GothAlice> (ls -b | sort > f1; repeat that on the other with "f2" instead; diff f1 f2)
[00:18:26] <Dinq> reason I'm asking: we're about to fail over to do compact and repair and we don't want to fail over to a server that has missing data :/
[00:19:01] <Dinq> also: i inherited this server/task at my new job today ;)
[00:19:10] <Dinq> hence my n00b questions and lack of 100% information
[00:22:19] <Boomtime> let me clarify: what is it you think repair does that you need?
[00:23:46] <Boomtime> "repair" is what you run if you suspect you corruption from a faulty disk, power-loss without a journal, etc.. AND you aren't running a replica-set
[00:38:52] <Dinq> that's what we were headed toward today before we found that the Replica has 4000 fewer files than Master, yet claims to be only 30 seconds behind on delay
[00:38:59] <GothAlice> Just make sure you have an arbiter to ensure the secondary will actually take over duties.
[00:39:03] <Dinq> so we stopped to do a lil more reasearch :)
[00:39:11] <Dinq> Goth: there is indeed an arbiter in this setup, yes, thank you.
[00:42:25] <cheeser> some use that as a hedge against an errant write: if caught inside that window you can restore the old document fromt he secondary to the primary
[00:42:35] <Dinq> what i can say is that I see files on both are beign updated and written to at the same time(ish)
[00:42:38] <cheeser> but that's ... a creative way to manage disasters.
[00:43:11] <GothAlice> My own replica set at work doesn't exceed 25ms behind. (Currently 0ms behind.) :/
[00:43:22] <Dinq> so i def need to diff the file directories to find which are actuall different, confirm(fix) the time clocks, and then go from there maybe
[00:44:14] <Dinq> Goth:Cheeser: in that case I also need to confirm I did not misread "30ms" or "3.0ms" or "3ms" as 30s
[00:48:17] <GothAlice> I'm just digging into my production env at work: the primary has fewer journal files, and two fewer stripe files for one of the databases.
[00:49:17] <GothAlice> Or rather, each host has completely divergent journal files.
[00:49:32] <Dinq> If I may---what quantity of files are you talking about, generally? In the 10's of 1000's like mine? Uptime on the servers are 192 and 208 days.
[00:49:55] <Dinq> ok. that gives me another thing to research to be double sure. I need to make sure I'm only talking about the stripe files differing in quantity.
[00:50:00] <GothAlice> The project at work currently has only a small amount of data. There were ~30 files in /var/lib/mongodb.
[00:50:02] <Dinq> again, step 1 I need to diff f1 f2
[00:50:16] <GothAlice> (And ignore differences in the journal file names.)
[00:50:41] <bazineta> Dinq If you have a general feeling of unease about the setup, and you have the opportunity, I'd add a new replica, wait for the sync, and nuke the old one. Were it me, I'd go that route; at least I'd be certain of my starting point then.
[00:51:09] <GothAlice> Yeah, that'd be even safer. :D
[00:52:22] <Dinq> I would totally do that, and we might - my unease with that though is that the current replica is the one with more files.
[00:52:36] <Dinq> (again, forgive my mongo n00bness - it might "just work" that way on purpose) :)
[00:53:47] <bazineta> Dinq with unknown provenance, it could be that the node with more files was originally a single-node db, prior to a replica set being created, or something of that nature, I suppose.
[00:54:04] <bazineta> Dinq however, I am to Alice as the ant is to the soaring eagle, so listen to her ;)
[00:54:28] <GothAlice> Bah; I've been wrong about things before. >:P
[00:55:21] <Dinq> unknown provenance...wow...yes that certainly applies here :)
[00:55:37] <Boomtime> more files means next to nothing, it might just be more fragmentation on one host than another, this can happen
[00:56:03] <Boomtime> suggestion to sync a new replica member is a good one, it will produce a naturally compact copy
[00:56:35] <Boomtime> it will happen to also perform the equivalent of a validate, so you get everytrhing you've dreamed of
[00:59:13] <Dinq> tomorrow morning will be interesting. our mongo data itself comes from some other job within our web app that only runs large parts on a schedule - tomorrow's first test will either show up (YAY!) or it won't (BOO, which is what got us in this today in the first place).
[02:45:48] <sparc> Boomtime: If it doesn't do either, do you think it will still connect and work, to the single host?
[02:47:30] <bazineta> GothAlice one does not work with IBM...one survives IBM...
[02:48:15] <sparc> Boomtime: it was in the doc you linked, n/m :) thanks again
[02:48:33] <sparc> (and yes it will create a standalone connection)
[02:55:32] <sparc> Is it nessesary to specify the replset name, as a commandline argument to mongod?
[02:55:40] <sparc> It seems like you can place it in the config file also
[02:56:50] <Boomtime> sparc: a single connection is not good for you, if the host you connect to becomes secondary at any point you will no longer be able to write anything
[02:57:13] <blagblag> Hey everyone, i'm new to mongo and hoping to get some help on why a "find" is returning no results. Here is a pastie with what my query looks like and a little bit about the collection: http://pastie.org/9726666
[02:57:31] <Boomtime> if you want a general purpose connection you need to specify your intention to use a replica-set connection, the driver will then find the primary for you
[02:57:53] <sparc> Boomtime: I agree, yeah. It just lets me create the set, and then send in a pull-request to my group to update the connection string, a minute later.
[02:57:59] <sparc> instead of having to synchronize the change
[02:58:32] <blagblag> i'm using robomongo to test out these queries if that matters at all.
[03:00:26] <Boomtime> sparc: yep, that's a good use case
[03:01:11] <Boomtime> blagblag: are there any documents which match your query?
[03:01:51] <blagblag> Boomtime: Yes I believe so that particular query I pulled straight from the first record shown by robomongo
[03:02:24] <Boomtime> you do that query in the shell?
[03:02:33] <blagblag> Boomtime: Could it be that mongo does not handle special characters like '!'
[03:02:46] <Boomtime> what database do you have selected in the shell?
[03:03:03] <Boomtime> no, it's a string match, mongodb will handle unicode
[03:03:22] <blagblag> Boomtime: I am running the query in robomongo. I can test it out in the mongo shell if you think it might help
[03:03:53] <Boomtime> robomongo is a 3rd-party app, if you can reproduce the issue in the shell then your issue is real
[03:04:23] <blagblag> Boomtime: ok back in a sec once I try it
[03:04:26] <Boomtime> also, can you provide the document you think is a match
[03:10:53] <blagblag> Boomtime: Looks like each string value I imported into a collection has a '\n' at the very end of the string. It doesn't show up in the robomongo view. It may be failing to find it because I wasn't using a newline operator
[03:11:04] <blagblag> let me test that out before wasting anymore of your time
[03:13:03] <blagblag> Yup comes right up....dang this is problematic. Anyone ever deal with newline operators in a string field? I wonder if there is a way to get the find to ignore newlines
[03:18:36] <Boomtime> blagblag: regex, though ensure you use left-anchored or it will be much less efficient
[03:22:46] <blagblag> perfect yup thanks for the help
[03:28:39] <quuxman> In my app, every request has a session identifier, that is used to fetch a session record, which has a user id, which is then fetched immediately afterwards. Is there a reasonable way to grab both the session and referenced user record at the same time to reduce latency?
[03:29:59] <Boomtime> quuxman: yes, you put them in the same document
[03:30:23] <quuxman> What if I want separate collections, because some sessions don't have users?
[03:31:03] <Boomtime> then you need to develop a cohesive schema that achieves what you want
[03:32:08] <quuxman> I suppose I could query both the user collection and session collection at the same time, where anonymous sessions are stored in the session collection, and authenticated ones in the user collection
[03:34:50] <quuxman> or I could create an anonymous user type. Neither option seem verey nice
[03:40:25] <quuxman> in the docs, multikey indexes only mention arrays. Can they also index the keys of a dict / subobject?
[03:45:56] <quuxman> Would I want a user record like { _id: 'u1', session: { 's1': { created: ... }, 's2': { created: ... } } }, or like { _id: 'u1', sessions: [ { _id: 's1', created: ... }, { _id: 's2', created: ... } ] } ?
[03:47:42] <quuxman> the docs clearly say you can index the latter case as 'session._id', but are not so clear with the former case
[04:14:52] <bazineta> quuxman In general, a user can have multiple sessions, and you want sessions to auto-expire if they're not touched in some time, so a separate TTL collection is often a good choice.
[04:16:12] <bazineta> quuxman that way you can have mongo deal with scorching expired sessions automagically for you via just not updating the date TTL field
[08:31:51] <almx> a small question about mongodb write concurrency: is possible for different parts of my application to do concurrent writes to different collections in the same database
[13:03:04] <spgle> hello, I have a question about w:majority on 2.6 mongodb cluster made of replica sets. If you have 4 replicas and 1 arbiter, what's the majority (3 ?) ? What will happen to write requests if one replica goes down and cluster turns to have 3 hosts and one arbiter, does 'majority' still consider 3 as required value or is computed again with the new topology of the cluster ?
[13:08:30] <skot> It is based on the configured members, not the currently avail ones as that would lead to bad things.
[13:08:56] <skot> so a 5 member replica set has a majority of 3
[13:09:04] <skot> The same is true for a 4 member set.
[13:14:42] <talbott> "errmsg" : "found language override field in document with non-string type",
[13:15:00] <talbott> whether i do, or do not, specify an language_override
[13:15:01] <skot> cek: but I don't think you would hit the bug I'm thinking about if you are using "use database" auth stuff: https://jira.mongodb.org/browse/SERVER-15441
[13:15:14] <skot> cek: can you post your full session to gist or something?
[13:15:41] <cek> i can't , i deleted those already
[13:15:46] <cek> I justr can't understand the concept.
[13:16:08] <cek> in mysql, there's global users which you can allocate db.table privs
[13:22:52] <cek> "Create the user in the database to which the user will belong. " from tutorial. Can't I create a GLOBAL user that will have access to several databases?
[13:41:39] <talbott> one doc example from the collection
[13:41:48] <cheeser> yes. "language" is not a string.
[13:42:15] <talbott> i can override it though..right?
[13:42:21] <talbott> to use interaction.language.tag instead?
[13:42:27] <cheeser> change your index definition: http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/#use-any-field-to-specify-the-language-for-a-document
[14:11:38] <bazineta> according to the docs, fg produces a more compact index at the cost of DOSing yourself. Always seemed like a very por trade to me...
[14:13:43] <cheeser> that's right. i remember reading that a while back.
[14:14:52] <kali> +1, bad choice of default value on that one, but it's sooooooo old :)
[14:16:35] <bazineta> Ran that by mistake on a collection of 150 million documents once. Once.
[14:18:53] <bazineta> Yep. Still boggles my mind though that at some point, there must have been a meeting, and in that meeting there was a discussion, and at the end of it, everyone said, yeah, that's a great idea, do it that way.
[14:19:16] <kali> bazineta: well, it's not that simple. in the beginning, there was no background option
[14:20:00] <kali> bazineta: so when it was added, the choice was between changing the api semantics and picking a better default
[14:20:50] <bazineta> kali Sure, and for a time in our history we banged the rocks together to make sparks, but it's weird you'd leave this landmine hanging around for so long at this point.
[14:21:10] <winem_> sorry guys, just following your conversation and I'm wondering what you mean by bg. :)
[14:23:17] <bazineta> The first paragraph of that document is my /facepalm.
[14:23:22] <winem_> yes, seems to be really interesting. thanks. only read two third of the book yet
[14:24:39] <winem_> thanks for the input. will follow your informations from the bg ;)
[14:25:17] <kali> bazineta: I agree, it would have been nice to leave this behind when the default for write safety was switched.
[14:26:05] <cheeser> to leave which bit behind? the override field name?
[14:26:47] <bazineta> cheeser the bit where ensureIndex() by default exclusive locks the database unless you explicitly tell it not to.
[14:27:24] <kali> cheeser: to make bg=true the default
[14:27:34] <cheeser> i actually like it this way, tbh
[14:28:05] <cheeser> long lived database operations should be explicity in their nature
[14:28:20] <winem_> ok, one more question regarding the bg. is this just a question of performance or does it also lock the collection like relational databases do?
[14:29:22] <bazineta> Ok, so when's the last time you didn't use the bg flag?
[14:29:26] <kali> winem_: without it, the dabase is locked.
[14:29:45] <winem_> ok. now I understand the real impact
[14:30:17] <kali> winem_: if you forget it on a real production system, you go down.
[14:30:39] <winem_> working with mongodb for about two months now and I'm really impressed but only read two third of the book and just have some issues to release the relational dbms thinking.. :)
[14:32:01] <kali> Derick: cheeser: well "what you don't know should not harm (too much)"... I fear bg=false by default fails this one
[14:32:30] <bazineta> winem_ The beginning of wisdom is to repeat to yourself 'it'll only ever use ONE index' until it sticks.
[14:33:26] <winem_> yes, that's why they implemented compound indexes ;)
[14:35:44] <winem_> we'll define the json format until today EOB. could someone review our prefered shard key tomorrow? I think it's almost perfect - but I just read the official docs and books.. so I have no experience with mongodb outside our dev environment
[14:50:21] <Qualcuno> good morning. quick question: what is the recommended design to store in MongoDB a list of custom (user-defined) fields together with their label?
[14:55:21] <Qualcuno> Thanks Derick. Another question: I need all value’s to have a full-text index on a “normalized” value (all lowercase, diacritics removed, etc). I was thinking of creating a new item in the collection named “searchable” and putting all the normalized values inside that, and then creating a full-text index on that?
[14:55:37] <Qualcuno> “searchable” would be another object, with all k and v
[15:07:13] <Qualcuno> Derick: ok, thanks! one last question. The docs say that MongoDB can have just 1 full-text index per collection, and that’s fine. to put in the index multiple field you can do something like http://docs.mongodb.org/manual/tutorial/create-text-index-on-multiple-fields/ . How can I tell MongoDB to select all the v’s inside “searchable”?
[16:03:30] <omid8bimo> hey guys, i need help. im adding a new memeber to my replicaSet, after staying in startup2 state for 24 hours (huge data to replicate); the mongo goes to recovering state which then i see error "replSet not trying to sync from kookoja-db1:27017, it is vetoed for 381 more seconds" and replication stops
[16:15:34] <bazineta> that error means that the secondary wasn't able to reach the primary, so it's going to try again later...look for some mismatch in the configuration that might be confusing it if there's nothing else in the logs
[16:16:20] <bazineta> typically though the logs on one or both servers will have more detail
[16:17:55] <bazineta> for example, 'too stale to catch up' would be something not unexpected if the data is ginormous
[16:18:38] <jar3k> How remove duplciated records (with the same columns except only 1 column) in collection?
[16:21:33] <bazineta> omid8bimo grep fror stale in your mongo log...if you see that, then read http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/
[17:48:26] <ajn> i'm attempting to deploy a replica set on compute engine and keep get an error stating the mongodbserver does not have 2 members, any ideas?
[20:05:24] <shoerain> suggest any web admin panels for mongodb? there are quite a few listed here: http://docs.mongodb.org/ecosystem/tools/administration-interfaces/. I use robomongo on my computer, so something akin to that would be sweet -- this is mainly for folks who want to check up on stats on certain collections
[20:28:23] <naquad> i've got a super n00b question: how do i limit amount of updated items? i need something like update({query}, {update}, limit=100)
[20:29:35] <hahuang61> it's not possible to have for example a replicated db across 2 data centers geographically separated but have 1 set of masters on each side right
[20:32:51] <cheeser> hahuang61: there can only be one primary
[20:34:13] <hahuang61> cheeser: yeah that's what I thought.
[20:34:25] <hahuang61> cheeser: appreciate it. There's no good active/active setup to do
[20:40:44] <GothAlice> A beautiful data architecture can be completely ruined by one creative user. XP
[20:42:15] <shoshy> hey, i have a process that worked fine, now i get "MongoError: Resulting document after update is larger than 16777216"
[20:42:54] <shoshy> i tried googling for the reason, but couldn't really find a reference to this, i did saw there's a limit of 16MB for querying.. but
[20:51:40] <GothAlice> shoshy: "Resulting document after update" — sounds like you're running an update query, and whatever you are doing to a document in that query is resulting in a document greater than 16MB in size. This is a no-no (not just query size is limited, actual per-record size is limited, too.)
[20:53:25] <GothAlice> Now, 16MB is a lot. As an example, I was able to store complete forums (1.3K threads w/ 14K replies, roughly 1.1M words or so) in a single record.
[20:53:52] <GothAlice> shoshy: But I'd need to see an example record and the query that is failing to really assist.
[20:55:32] <shoshy> GothAlice: thanks a lot for the help, that's 1st... regarding the update, i got the error on an insert. But i'm looking into it , i don't have any 16MB document..
[20:56:24] <GothAlice> Then you were attempting to insert a 16MB document.
[20:56:34] <GothAlice> Were you by chance attempting to insert multiple records with one insert?
[20:56:54] <shoshy> GothAlice: nope i haven't its a small document. in size.
[21:02:37] <shoshy> it runs for the error there, yep
[21:02:58] <shoshy> ok, so it's not the [bag]" object being inserted as its small
[21:03:11] <GothAlice> shoshy: Do you have monitoring a la MMS? (I.e. what's your average document size in that collection?) By not using upserts, there, growth of the "feed" list would appear to be unbounded.
[21:03:34] <GothAlice> (And thus your >16MB problem… if you try to append to that list and it needs to grow, but can't… *boom*.)
[21:04:41] <shoshy> GothAlice: i don't have for that server, i know i should link it
[21:04:55] <GothAlice> Eek; and flipping that list (so you append, rather than prepending and thus needing to *move* the existing entries) can't hurt.
[21:06:00] <GothAlice> (You'd also simplify the $push down… no need for $each on a list with one element.)
[21:06:20] <shoshy> GothAlice... right! , how would you change it to avoid overhead?
[21:11:13] <flyingkiwi> naaah, the bigger question is: shoshy, are you also pop'ing values?
[21:12:20] <shoshy> GothAlice: ok, so same query without the $position , great, i'll try it out (restarting the process will take time)
[21:12:41] <shoshy> flyingkiwi: so, as it seems, i query but with {feed: 0} , so without that field
[21:13:34] <shoshy> the idea behind pushing it to the beginning was to take only the latest items
[21:13:34] <GothAlice> The error you were getting indicates that some document being updated by that statement (I'd pop into a shell and findOne on that document!) is growing to a size larger than the acceptable maximum as a result of the $push. (Prepend or append won't matter for the purpose of this error. That's mostly an efficiency/simplicity thing.)
[21:13:41] <flyingkiwi> just because mongodb isn't freeing the space in arrays between entries after deleting them
[21:13:45] <GothAlice> shoshy: You can $slice with negative indexes to get the "end" of the list. ;)
[21:15:42] <shoshy> GothAlice: Thanks a lot, i'm checking it RIGHT NOW.. i
[21:17:05] <shoshy> so the document to be updated is fine
[21:17:28] <shoshy> i think.. how can i check specifically it's size..
[21:17:41] <shoshy> (i actually loaded it in robomongo)
[21:17:45] <GothAlice> shoshy: That's a very good question. If you could get the raw BSON for that document and get its size…
[21:18:07] <GothAlice> (This is one of the big reasons I stick to interactive shell usage. I can get raw BSON out of pymongo quite easily.)
[21:27:05] <GothAlice> Sounds like your journey through the forests of scaling has lead you to the overpass of record splitting. Remember I mentioned upserts earlier? :)
[21:27:56] <shoshy> so basically, how do you handle it? like thats the procedure?
[21:27:59] <GothAlice> Using upserts you can tell MongoDB to "$push this value to list X if the query is matched, otherwise create a new record to $push into", allowing you to group you records by day/month/year or any other arbitrary grouping.
[21:28:12] <shoshy> GothAlice: but upserts modify/ adds a new document if not found
[21:30:46] <GothAlice> http://irclogger.com/.mongodb/2014-11-14#1416000865 the conversation circa this time period discussed a chat application, with conversations grouped by day, as an example.
[21:31:09] <shoshy> upsert in anyway won't create a copy of that whole parent object, with a "new array" with that element alone in it?
[21:31:32] <GothAlice> You would have to specify defaults that would be written into the new record.
[21:31:50] <GothAlice> And yes, $push to a non-existent field would create it as a list containing one element. AFIK. ¬_¬
[21:33:57] <shoshy> GothAlice: Thanks so much... that kinda bummed me and yet made me happy at the same time :) i hope it's an easy "fix" or way to go from here
[21:34:29] <GothAlice> Boomtime: That's why generally right after I say something, I test it. ;) http://cl.ly/image/291h2J1j3D30 My intuition was correct.
[21:38:46] <shoshy> GothAlice: before you leave, i saw your TTL index comment there in the convo
[21:38:55] <shoshy> is there's a TTL for field level? :))
[21:39:34] <shoshy> or... a mechanism to delete IF size is above threshold?
[21:40:08] <shoshy> (not in the application layer that is)
[21:43:12] <shoshy> i have no problem emptying that array once every X days
[21:43:54] <GothAlice> shoshy: The point of using upserts (esp. as described in the prior IRC chat log) is that you can have MongoDB automatically balance already. TTL is for an entirely different purpose than limiting document size. And it can't empty arrays, only delete whole documents, which is effectively what using upserts here would do anyway. (Except without needing to delete anything.)
[21:44:46] <shoshy> GothAlice: yea... thought there might be an opton for field level, cool, i'm still reading it :)
[21:48:41] <bazineta> Our primary usage of mongoose is actually just the basics that the base driver itself provides. We do make extensive use of population. Other features, not really, not that we couldn't easily handle in other ways.
[21:49:27] <Boomtime> any reason you don't just use the Node.js driver directly?
[21:49:43] <GothAlice> bazineta: There's also light-weight (well, lighter than Mongoose!) solutions like https://github.com/masylum/mongolia
[21:51:12] <bazineta> aren't really excerising the ORM features much, perhaps the base driver is the real answer.
[21:52:20] <GothAlice> (The main Mongolia model.js is < 500 lines, the object mapper is < 100, and validation another 200 or so. Colour me impressed, this lib is tiny.)
[22:01:20] <shoshy> GothAlice: ok, i've read the conversation, so basically its just an upsert, but you add some search-properties / flags. My case is a group is unique, only one exists. Can't have multiple. After this whole thing, i thought of adding a "created_date" field, and from now on the .find / .update (with upsert) will be based on {_id: ObjectId('...') , created_date: {$gt: (new Date())-1000*60*60*24*10}} (lets say last 10 days)
[22:02:08] <GothAlice> There are a few ways to record the "period" the document applies to.
[22:03:08] <GothAlice> Using creation dates and a "timeout" can work. You can also convert to a UNIX timestamp and subtract the modulo of your division (i.e. snap it to the nearest 10-day period, or in my case, hour), you can also use real dates and just replace date elements (also for the hourly example; set the minutes/seconds/microseconds to zero.)
[22:05:04] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-sample-py is an example of one of my own records using this style of upsert. (And yes, the update statements on these are kinda nuts.)
[22:07:52] <shoshy> GothAlice: i think i'd go with the UNIX version (new Date.getTime() - ... ) what do you mean by "modulo of your division"? like taking the subtraction and modulo by 1000*60*60*24 ?
[22:08:10] <shoshy> GothAlice: and what i wrote was correct , as in the idea, of what you ment?
[22:10:15] <GothAlice> var now = +(new Date()), week = 1000 * 60 * 60 * 24 * 7, snapped = now - (now % week); — "snapped" is now the date snapped to the nearest (previous) week. Won't match reality's version of a week, but it'll do the trick. (It'll be offset based on the UNIX epoch.)
[22:11:27] <GothAlice> +(new Date()) is a nifty trick (and probably hideously bad idea) to quickly get microsecond timestamps. ^_^
[22:12:38] <shoshy> why not new Date().getTime() ?
[22:13:11] <GothAlice> No reason other than my muscle memory told me to do it the other way. ;)
[22:13:37] <shoshy> hahaha ok ... yea so basically you're asking for the last's week unix time ... got it..
[22:14:09] <shoshy> why do you have another ObjectId there?
[22:14:25] <shoshy> are you storing the original (the one who "got away" 16MB document) object id ?
[22:14:38] <GothAlice> The POSIX standard means when working in UNIX timestamps you don't really have to worry about leap seconds and such; you save a lot of headaches doing calendar math. _id is the record's ID, all of the others are references to other collections
[22:15:03] <GothAlice> Nope; each unique combination of hour, company, job, supplier, and invoice will have a new _id.
[22:15:20] <shoshy> ahh ok thought c" : ObjectId(""), # Pseudonymous Company this might be a reference to the original
[22:15:30] <GothAlice> Yeah, that's just a reference to a company.
[22:17:00] <GothAlice> In that case it'll be a fake ObjectId (i.e. not really referencing any document), but it'll always be the same fake ID for the same company in the dataset. (To anonymize the data and make it easier/safer to export for analysis.)
[22:19:28] <shoshy> GothAlice: i see.. ok.. thank you. So basically i only change the update to have upsert: true and in the find object section i do {_id: ObjectId('...'), created: {$gt: (new Date().getTime() - (new Date().getTime()) % 1000*60*60*24*7) }}
[22:19:41] <shoshy> if we're talking about keeping it for a week
[22:20:29] <shoshy> every time it'll search for a document with that id in the previous week, it'll come to the same record
[22:20:38] <shoshy> when it won't it'll create a new one
[22:23:16] <GothAlice> shoshy: https://gist.github.com/amcgregor/94427eb1a292ff94f35d — here's an example update statement from my code.
[22:23:30] <GothAlice> (Roughly; the $inc data is calculated on each hit from browser data.)
[22:23:57] <GothAlice> And yes, that's basically how it works. If the query matches something, cool, if not, create it.
[22:25:07] <shoshy> Great, thank you so much again, i'm off making the changes, and adding the creation date to the previous documents so they'll align
[22:25:12] <shoshy> but that's a very elegant solution
[22:25:38] <GothAlice> If you can get away without needing periodic maintenance routines, it's probably a good solution. ;)
[22:26:04] <GothAlice> (And avoiding extra queries, a la "does this exist? no? create it." is good, too.)
[22:26:40] <shoshy> GothAlice: I can! for the oldies, i don't mind them staying there for a rainy day as long as i can use one for a long time... only concern is with size
[22:27:26] <shoshy> but now i know, that if it'll happen again i could just reduce the number of days
[22:27:49] <GothAlice> This is similar to a fairly typical logging issue. Do you rotate your log files when they exceed a certain size (variable time), or when after a period of time (variable size)? :)
[22:28:13] <GothAlice> (I do the period of time approach; again, it makes querying much easier.)
[22:28:37] <shoshy> well the use case here is different but yea..
[22:28:48] <shoshy> you don't need your logs in memory ;)
[22:51:32] <GothAlice> Weird; I can see that thread on grokbase, but Groups causes my browser to crash. O_o
[22:55:52] <talamantez> Hi all - I have a neurosky MindWave headset that is writing to a mongoDB collection 1/sec. I would like to take a moving average of the last ten readings. Here is the project on github
[22:56:22] <talamantez> Any idea what the best way to approach this would be? thanks
[22:57:49] <GothAlice> talamantez: Insert those into a capped collection, have another process using a tailing cursor to continually receive new results and calculate the moving average.
[22:58:44] <GothAlice> Or, you could have your inserting process keep track of the moving average and publish it on each record inserted. (Thus each record's "average" would be of it and the prior nine.)
[23:02:22] <talamantez> @GothAlice - Thanks for the suggestions - I'll look into documentation about a trailing cursor.
[23:02:36] <GothAlice> talamantez: You could even simplify things, if your capped collection only allows 10 readings, you can always just get the average across the collection. :)
[23:03:48] <GothAlice> talamantez: https://gist.github.com/amcgregor/4207375 are some slides from a presentation I gave; covers an important caveat of capped collections and includes some sample code.
[23:03:56] <GothAlice> (And link to complete implementations and the rest of the presentation.)
[23:04:38] <GothAlice> (You'd need to re-implement something like 3-queue-runner.py to make use of tailing cursors.)
[23:10:48] <arisalexis> hi, i want to search for documents with a text index that are in a circle (so a geospatial index). i read you cannot have a compound with these two types. is there any other way? like searching with geospatial and aggregate pipeline ? im not very proficient in this, is it doable?
[23:12:25] <GothAlice> arisalexis: Unfortunately you can't compound on those, and MongoDB can only make use of one index per query. You'd have to determine which (geo or text) narrows down the result set the most (on average) and hint that index, letting MongoDB do the more expensive scanning per-document on the remainder. This would be true using normal .find() and .aggregate().
[23:13:25] <GothAlice> (One of the cases where aggregate queries won't save you.)
[23:14:19] <arisalexis> so I can do it, it just going to be slower. i first do the geospatial for example (for sure that's reducing more the number of my documents) and then acting on this dataset? i don't understand the per document comment?
[23:16:32] <GothAlice> MongoDB can rapidly scan an index to determine which documents match, but if there is additional (not "covered" by the index) fields being queried, MongoDB will have to actually load up each document and perform the comparison "per document".
[23:16:35] <GothAlice> See also: http://docs.mongodb.org/manual/tutorial/analyze-query-plan/#analyze-compare-performance
[23:17:09] <arisalexis> ok thanks a lot. do you think i should be doing this with elastic search or another db? any suggestions?
[23:17:41] <GothAlice> I built my own full-text indexer on top of MongoDB prior to their adding of full text indexes, so my own queries could use compound indexes on the data.
[23:18:08] <GothAlice> (To date I haven't actually used MongoDB's full-text support.)
[23:21:38] <shoshy> GothAlice: i did changes and now i get: MongoError: insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.groups.$_id_ dup key: { : ObjectId('53a04b827fac2aa602a2dbbb') }
[23:23:35] <GothAlice> So basically replace "_id" with "group" or some-such everywhere it's being used against that collection.
[23:24:49] <shoshy> hmm... i see... so i'll be fetching from now on by group_id and not _id
[23:25:22] <GothAlice> Effectively, yes. _id will still be useful to reference a specific period for a given group, but otherwise less useful.
[23:25:40] <GothAlice> (You also still don't need $each if you are always appending…)
[23:26:47] <shoshy> can i add an index on the group_id ?
[23:27:21] <GothAlice> Definitely. Remember, though, that MongoDB can only use one index. (So likely what you want is a compound index that includes group_id…)
[23:28:08] <GothAlice> Ooh, there's also a reeeeally neat trick you can use when you're doing exactly what you are doing (using date-based "timeouts" to create new records). The _id itself can be used for this purpose, since ObjectIds contain their creation time. You'll have to see if your driver offers something like: http://api.mongodb.org/python/1.7/api/pymongo/objectid.html#pymongo.objectid.ObjectId.from_datetime
[23:28:51] <GothAlice> If it does, you can say {group_id: ObjectId(…), _id: {$gt: ObjectId.from_datetime(…)}} in the search criteria part of the query.
[23:29:15] <GothAlice> (And would want a compound index on (group_id, _id)
[23:31:50] <shoshy> GothAlice: that's really good idea...
[23:31:57] <shoshy> i came across: http://stackoverflow.com/questions/8749971/can-i-query-mongodb-objectid-by-date , its old though
[23:32:09] <shoshy> i guess node.js driver doesn't have it built in, but still checking
[23:32:36] <GothAlice> The code in that answer, however, would do it. :)
[23:32:45] <GothAlice> The BSON spec for ObjectIds hasn't changed in forever, AFIK.
[23:33:35] <shoshy> the last answer there, says there's a built in option in node.js driver: 'createFromTime'
[23:33:47] <shoshy> var objectId = ObjectID.createFromTime(Date.now() / 1000);
[23:35:13] <shoshy> now i'm trying to update all the groups to have a group_id being the group's _id . which seems to be a bit ugly. db.collection.find( your_querry, { field2: 1 } ).forEach(function(doc) { db.collection.update({ _id: doc._id },{ $set: { field1: doc.field2.length } } ); });
[23:35:14] <GothAlice> Yeah; basically any time someone has a "created" date/time field, I point them at ObjectId's inherent awesomeness. Doesn't fit every situation, though! (Especially if you need to be able to *use* the date inside an aggregate query, for example.)
[23:38:39] <shoshy> thanks so much once again... i'm off now, but i'll def. do all the changes, super smart and helpful advices
[23:54:01] <shoerain> if mongoshell >>db.tasks.find({}).count() returns 73k, then I should get a similar number with mongoose.model('Task').find({}), right? I get 3... and I'm wondering if it's some middleware interfering with this.
[23:58:53] <GothAlice> shoerain: Are you sure .model('Task') and db.tasks are the same collection?
[23:59:21] <GothAlice> Many ODM layers mangle the model names to produce the collection names, others leave it alone, or you may have explicitly specified. (I can't tell. ;)