PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 18th of November, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:06:39] <buzzalderaan> this may be an easy question, but i haven't found a clear answer online. i have a document that has a list of _id of references to other documents and now i need to be able to filter that list base on some supplied criteria. i was hoping to take advantage of the mongoose populate function and return only the _id field in the results and i could do any necessary post processing after, but i can't seem to find an easy way to return only the _id
[00:06:39] <buzzalderaan> field.
[00:07:40] <GothAlice> Normally this would be the second argument to .find(). A la (in the shell) db.example.find({}, {others: 1})
[00:10:24] <buzzalderaan> would you still have to disable all the other fields in order to return just the _id field though?
[00:11:06] <GothAlice> Nope, if you specify the second argument it specifies explicitly which fields to return, with _id selected by default. (add "_id: 0" to disable that if you wish)
[00:12:02] <buzzalderaan> hm.. okay, the documentation was a bit confusing about that
[00:12:20] <buzzalderaan> and the documentation for mongoose certainly isn't any clearer
[00:12:35] <Dinq> good evening: quick question from new mongodb user - should I be worried if my Master and my Replica have different numbers of files in the /var/lib/mongodb directory? If so, what can I do to fix it, or what might have caused it?
[00:12:50] <GothAlice> buzzalderaan: Whenever I'm confused about exact behaviour, I drop into the MongoDB shell and play. :)
[00:13:32] <GothAlice> Dinq: Do you have --smallFiles enabled on one but not the other? (Or otherwise have different stripe sizes?)
[00:13:40] <buzzalderaan> not a slam against mongo, but i feel i've been spoiled using the MSSQL tools for the past 3 years, so to me working with the mongo shell isn't the easiest
[00:14:19] <Dinq> GothAlice: thanks for the input. I suspect both servers are identical in those settings, but one moment...
[00:15:14] <buzzalderaan> i'm also not sure how i'd model this query in mongoshell
[00:15:19] <GothAlice> Dinq: Also, likely the secondary doesn't need as large an oplog as the primary.
[00:15:45] <Dinq> GothAlice: I don't see --smallfiles enabled, no.
[00:16:23] <Dinq> printReplicationSetInfo() shows replication is occurring with only about 30s delay, file timestamps and sizes look identical(close enough on timestamp)
[00:16:26] <GothAlice> Dinq: What's the actual difference between the folders?
[00:16:49] <Dinq> but Master (used to be slave) has about 4000 more files (62000 vs 58000)
[00:17:06] <Dinq> file size is very close but not bit-for-bit (1.4?TB vs 1.5?TB)
[00:17:27] <Dinq> with that many files, i haven't quite gotten to the point to know exactly which files are different.
[00:17:38] <Dinq> also, they do seem to be "rotating"? at the 2GB per file limit
[00:17:51] <GothAlice> (ls -b | sort > f1; repeat that on the other with "f2" instead; diff f1 f2)
[00:18:26] <Dinq> reason I'm asking: we're about to fail over to do compact and repair and we don't want to fail over to a server that has missing data :/
[00:19:01] <Dinq> also: i inherited this server/task at my new job today ;)
[00:19:10] <Dinq> hence my n00b questions and lack of 100% information
[00:21:02] <Boomtime> why are you doing repair?
[00:22:19] <Boomtime> let me clarify: what is it you think repair does that you need?
[00:23:46] <Boomtime> "repair" is what you run if you suspect you corruption from a faulty disk, power-loss without a journal, etc.. AND you aren't running a replica-set
[00:34:59] <Dinq> Boomtime: sorry, stepped away...one second...
[00:36:50] <Dinq> reclaiming disk space from empty records via repairDatabase was my intention.
[00:37:08] <GothAlice> Is compact() insufficient?
[00:37:35] <Dinq> from what i'm reading (again, remember I'm new) compact() only removes fragmentation, not return actual disk space
[00:38:09] <GothAlice> Dinq: Easier would be to retire the replica, nuke it, then start it back up and let it re-sync. (It'll be perfectly compact.)
[00:38:11] <Dinq> however, to Boom's point, the rolling node method is likely what I would use wouldn't i?
[00:38:17] <Dinq> aye. that.
[00:38:52] <Dinq> that's what we were headed toward today before we found that the Replica has 4000 fewer files than Master, yet claims to be only 30 seconds behind on delay
[00:38:59] <GothAlice> Just make sure you have an arbiter to ensure the secondary will actually take over duties.
[00:39:03] <Dinq> so we stopped to do a lil more reasearch :)
[00:39:11] <Dinq> Goth: there is indeed an arbiter in this setup, yes, thank you.
[00:39:41] <cheeser> 30s is a long time...
[00:40:21] <Dinq> cheeser: interesting...is that configurable? (my predecessors may have set it that way on purpose, I have no clue)
[00:40:39] <cheeser> they purposely put in a 30s delay?
[00:40:55] <Dinq> i have no clue - that's what i'm asking - would there be any reason (or even ability within mongo) to do so?
[00:41:25] <Dinq> actually, if timestamps are the key there, i'm pretty sure the servers are not using ntp and are about 30s askew
[00:41:53] <cheeser> you can add a slaveDelay to a replica set. i have no idea why you'd *want* that.
[00:42:07] <Dinq> and thinking outloud to myself, if it's not timestamps, the what else? so yeah, I think it's likely server/os related on the clocks
[00:42:12] <Dinq> but i'll need to confirmt hat
[00:42:25] <cheeser> some use that as a hedge against an errant write: if caught inside that window you can restore the old document fromt he secondary to the primary
[00:42:35] <Dinq> what i can say is that I see files on both are beign updated and written to at the same time(ish)
[00:42:38] <cheeser> but that's ... a creative way to manage disasters.
[00:43:11] <GothAlice> My own replica set at work doesn't exceed 25ms behind. (Currently 0ms behind.) :/
[00:43:22] <Dinq> so i def need to diff the file directories to find which are actuall different, confirm(fix) the time clocks, and then go from there maybe
[00:44:14] <Dinq> Goth:Cheeser: in that case I also need to confirm I did not misread "30ms" or "3.0ms" or "3ms" as 30s
[00:44:24] <Dinq> ms != s
[00:44:38] <Dinq> thank you both for some wonderful input and ideas
[00:44:57] <Dinq> in a general sense, back to the original question, in most cases with master/replica, the db directories should be identical?
[00:47:25] <GothAlice> Dinq: No, they don't need to be the same.
[00:47:39] <Dinq> gotcha
[00:48:17] <GothAlice> I'm just digging into my production env at work: the primary has fewer journal files, and two fewer stripe files for one of the databases.
[00:49:17] <GothAlice> Or rather, each host has completely divergent journal files.
[00:49:32] <Dinq> If I may---what quantity of files are you talking about, generally? In the 10's of 1000's like mine? Uptime on the servers are 192 and 208 days.
[00:49:55] <Dinq> ok. that gives me another thing to research to be double sure. I need to make sure I'm only talking about the stripe files differing in quantity.
[00:50:00] <GothAlice> The project at work currently has only a small amount of data. There were ~30 files in /var/lib/mongodb.
[00:50:02] <Dinq> again, step 1 I need to diff f1 f2
[00:50:16] <GothAlice> (And ignore differences in the journal file names.)
[00:50:25] <Dinq> excellent. thank you.
[00:50:41] <bazineta> Dinq If you have a general feeling of unease about the setup, and you have the opportunity, I'd add a new replica, wait for the sync, and nuke the old one. Were it me, I'd go that route; at least I'd be certain of my starting point then.
[00:51:09] <GothAlice> Yeah, that'd be even safer. :D
[00:52:22] <Dinq> I would totally do that, and we might - my unease with that though is that the current replica is the one with more files.
[00:52:36] <Dinq> (again, forgive my mongo n00bness - it might "just work" that way on purpose) :)
[00:53:47] <bazineta> Dinq with unknown provenance, it could be that the node with more files was originally a single-node db, prior to a replica set being created, or something of that nature, I suppose.
[00:54:04] <bazineta> Dinq however, I am to Alice as the ant is to the soaring eagle, so listen to her ;)
[00:54:28] <GothAlice> Bah; I've been wrong about things before. >:P
[00:55:21] <Dinq> unknown provenance...wow...yes that certainly applies here :)
[00:55:37] <Boomtime> more files means next to nothing, it might just be more fragmentation on one host than another, this can happen
[00:56:03] <Boomtime> suggestion to sync a new replica member is a good one, it will produce a naturally compact copy
[00:56:35] <Boomtime> it will happen to also perform the equivalent of a validate, so you get everytrhing you've dreamed of
[00:59:13] <Dinq> tomorrow morning will be interesting. our mongo data itself comes from some other job within our web app that only runs large parts on a schedule - tomorrow's first test will either show up (YAY!) or it won't (BOO, which is what got us in this today in the first place).
[01:00:34] <Dinq> ETL is still a thing, right? :)
[01:00:38] <GothAlice> I keep getting flashbacks of sending CSV files to an AS/400 over FTP… with only upper-case letters.
[01:01:26] <Dinq> aye. and old school items like job titles of "Operator" and phrases like "Machine Room"
[01:02:17] <GothAlice> Apparently 10gen worked with IBM to bring MongoDB to DB2… *shudders*
[01:22:58] <wsmoak> I’ll see your DB2 and raise you an IBM UniData ;)
[02:23:52] <sparc> Hiyas. If I convert my standalone mongo to a ReplSet, do I need to reconfigure my clients?
[02:24:07] <sparc> or will they connect to the same host and be okay
[02:24:29] <sparc> I don't see configuration options in PyMongo, that stand out
[02:24:37] <sparc> maybe the client is smart and tries to figure it out
[02:41:45] <Boomtime> sparc: your connection string must name the replica-set or have more than one host in the seed list
[02:42:48] <Boomtime> sparc: http://docs.mongodb.org/manual/reference/connection-string/
[02:44:59] <sparc> Boomtime: thanks much
[02:45:48] <sparc> Boomtime: If it doesn't do either, do you think it will still connect and work, to the single host?
[02:47:30] <bazineta> GothAlice one does not work with IBM...one survives IBM...
[02:48:15] <sparc> Boomtime: it was in the doc you linked, n/m :) thanks again
[02:48:33] <sparc> (and yes it will create a standalone connection)
[02:55:32] <sparc> Is it nessesary to specify the replset name, as a commandline argument to mongod?
[02:55:40] <sparc> It seems like you can place it in the config file also
[02:56:50] <Boomtime> sparc: a single connection is not good for you, if the host you connect to becomes secondary at any point you will no longer be able to write anything
[02:57:13] <blagblag> Hey everyone, i'm new to mongo and hoping to get some help on why a "find" is returning no results. Here is a pastie with what my query looks like and a little bit about the collection: http://pastie.org/9726666
[02:57:31] <Boomtime> if you want a general purpose connection you need to specify your intention to use a replica-set connection, the driver will then find the primary for you
[02:57:53] <sparc> Boomtime: I agree, yeah. It just lets me create the set, and then send in a pull-request to my group to update the connection string, a minute later.
[02:57:59] <sparc> instead of having to synchronize the change
[02:58:32] <blagblag> i'm using robomongo to test out these queries if that matters at all.
[03:00:26] <Boomtime> sparc: yep, that's a good use case
[03:01:11] <Boomtime> blagblag: are there any documents which match your query?
[03:01:51] <blagblag> Boomtime: Yes I believe so that particular query I pulled straight from the first record shown by robomongo
[03:02:24] <Boomtime> you do that query in the shell?
[03:02:33] <blagblag> Boomtime: Could it be that mongo does not handle special characters like '!'
[03:02:46] <Boomtime> what database do you have selected in the shell?
[03:03:03] <Boomtime> no, it's a string match, mongodb will handle unicode
[03:03:22] <blagblag> Boomtime: I am running the query in robomongo. I can test it out in the mongo shell if you think it might help
[03:03:53] <Boomtime> robomongo is a 3rd-party app, if you can reproduce the issue in the shell then your issue is real
[03:04:23] <blagblag> Boomtime: ok back in a sec once I try it
[03:04:26] <Boomtime> also, can you provide the document you think is a match
[03:04:34] <Boomtime> pastie again or whatever
[03:10:53] <blagblag> Boomtime: Looks like each string value I imported into a collection has a '\n' at the very end of the string. It doesn't show up in the robomongo view. It may be failing to find it because I wasn't using a newline operator
[03:11:04] <blagblag> let me test that out before wasting anymore of your time
[03:11:42] <Boomtime> blagblag: no problem :D
[03:13:03] <blagblag> Yup comes right up....dang this is problematic. Anyone ever deal with newline operators in a string field? I wonder if there is a way to get the find to ignore newlines
[03:18:36] <Boomtime> blagblag: regex, though ensure you use left-anchored or it will be much less efficient
[03:20:22] <Boomtime> so, i guess you want: /^!!/
[03:22:46] <blagblag> perfect yup thanks for the help
[03:28:39] <quuxman> In my app, every request has a session identifier, that is used to fetch a session record, which has a user id, which is then fetched immediately afterwards. Is there a reasonable way to grab both the session and referenced user record at the same time to reduce latency?
[03:29:59] <Boomtime> quuxman: yes, you put them in the same document
[03:30:23] <quuxman> What if I want separate collections, because some sessions don't have users?
[03:31:03] <Boomtime> then you need to develop a cohesive schema that achieves what you want
[03:32:08] <quuxman> I suppose I could query both the user collection and session collection at the same time, where anonymous sessions are stored in the session collection, and authenticated ones in the user collection
[03:34:50] <quuxman> or I could create an anonymous user type. Neither option seem verey nice
[03:40:25] <quuxman> in the docs, multikey indexes only mention arrays. Can they also index the keys of a dict / subobject?
[03:45:56] <quuxman> Would I want a user record like { _id: 'u1', session: { 's1': { created: ... }, 's2': { created: ... } } }, or like { _id: 'u1', sessions: [ { _id: 's1', created: ... }, { _id: 's2', created: ... } ] } ?
[03:47:42] <quuxman> the docs clearly say you can index the latter case as 'session._id', but are not so clear with the former case
[03:47:53] <quuxman> er, sessions._id
[04:14:52] <bazineta> quuxman In general, a user can have multiple sessions, and you want sessions to auto-expire if they're not touched in some time, so a separate TTL collection is often a good choice.
[04:16:12] <bazineta> quuxman that way you can have mongo deal with scorching expired sessions automagically for you via just not updating the date TTL field
[05:04:34] <JB6> How to check mongodb logs
[05:04:54] <LouisT> um
[05:05:00] <LouisT> you could tail the log file? o.O
[05:19:06] <JB6> how to track mongodb query errors
[08:30:18] <almx> Hi all
[08:31:51] <almx> a small question about mongodb write concurrency: is possible for different parts of my application to do concurrent writes to different collections in the same database
[08:31:56] <almx> ?
[08:32:52] <almx> or should I serialize all of my write operations to a single worker process that takes care of all the required writes?
[08:33:06] <almx> What is the best approach here?
[12:46:30] <jar3k> Hi, MongoDB will handle ~2 000 000 000 of records?
[13:01:00] <cek> why is "db" field required if I'm required to do "use database" anyway?
[13:01:14] <cek> use accounts
[13:01:14] <cek> db.grantRolesToUser(
[13:02:03] <JB6> How to find mongodb warnings
[13:03:04] <spgle> hello, I have a question about w:majority on 2.6 mongodb cluster made of replica sets. If you have 4 replicas and 1 arbiter, what's the majority (3 ?) ? What will happen to write requests if one replica goes down and cluster turns to have 3 hosts and one arbiter, does 'majority' still consider 3 as required value or is computed again with the new topology of the cluster ?
[13:08:30] <skot> It is based on the configured members, not the currently avail ones as that would lead to bad things.
[13:08:56] <skot> so a 5 member replica set has a majority of 3
[13:09:04] <skot> The same is true for a 4 member set.
[13:10:11] <talbott> Hello mongoers
[13:10:15] <skot> cek: db is just the reference to the "current" database which is set by "use <database>"
[13:10:22] <talbott> does anyone have any experience with the text index on v2.6
[13:10:28] <talbott> I'm stuggling to create one
[13:10:33] <cek> 2014-11-18T16:09:34.649+0300 Error: Missing expected field "db" at src/mongo/shell/db.js:1237
[13:10:34] <skot> cek: You can also set it using db = db.getSiblingDB("foo")
[13:10:43] <cek> so it's a required field
[13:11:07] <skot> cek, unclear what you are doing so hard to say
[13:11:09] <cek> and when I'm under another database, it somehow creates user under that database and not udner db: "" i specify
[13:11:21] <skot> ah, I think you are referencing a bug in the shell
[13:11:22] <cek> I'm granting a role to user.
[13:11:37] <cek> 2014-11-18T16:09:22.384+0300 Error: Could not find user widev@wi at src/mongo/shell/db.js:1237
[13:11:44] <cek> "great"
[13:12:11] <cek> oh this JS language... why not make normal shell
[13:14:41] <talbott> with this error
[13:14:42] <talbott> "errmsg" : "found language override field in document with non-string type",
[13:15:00] <talbott> whether i do, or do not, specify an language_override
[13:15:01] <skot> cek: but I don't think you would hit the bug I'm thinking about if you are using "use database" auth stuff: https://jira.mongodb.org/browse/SERVER-15441
[13:15:14] <skot> cek: can you post your full session to gist or something?
[13:15:41] <cek> i can't , i deleted those already
[13:15:46] <cek> I justr can't understand the concept.
[13:16:08] <cek> in mysql, there's global users which you can allocate db.table privs
[13:22:52] <cek> "Create the user in the database to which the user will belong. " from tutorial. Can't I create a GLOBAL user that will have access to several databases?
[13:25:30] <krion> hi
[13:26:34] <krion> how can i prevent error : caused by :: 9001 socket exception [SEND_ERROR] when one member of a replicaset go out (secondary)
[13:29:33] <cheeser> cek: check out these roles: http://docs.mongodb.org/manual/reference/built-in-roles/#all-database-roles
[13:34:49] <talbott> if anyone can help with my textindex issue I'll be grateful for like...ever
[13:35:40] <cheeser> do you know which document?
[13:36:20] <talbott> so my mongo is full of tweets
[13:36:39] <cheeser> is there a field called "language" on your documents?
[13:36:43] <talbott> and the text i would like to index is at interaction.twitter.text
[13:36:47] <talbott> yes there is
[13:37:20] <cheeser> is it a string? sounds like some some value isn't a string.
[13:37:21] <talbott> it looks like
[13:37:33] <talbott> language.confidence : Int
[13:37:41] <talbott> language.tag : en
[13:37:47] <cheeser> pastebin a document
[13:37:56] <talbott> so i tried to do a language override, to use the tag
[13:37:57] <talbott> e.g.
[13:41:29] <talbott> cheeser: http://pastebin.com/8b1r91se
[13:41:39] <talbott> one doc example from the collection
[13:41:48] <cheeser> yes. "language" is not a string.
[13:42:15] <talbott> i can override it though..right?
[13:42:21] <talbott> to use interaction.language.tag instead?
[13:42:27] <cheeser> change your index definition: http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/#use-any-field-to-specify-the-language-for-a-document
[13:44:44] <talbott> yeh, the weird thing is
[13:44:48] <talbott> when i try to override it
[13:44:50] <talbott> i get this
[13:44:55] <talbott> "errmsg" : "exception: language_override is not valid",
[13:45:06] <cheeser> pastebin your command
[13:45:20] <talbott> (Thanks for your help btw)
[13:46:06] <cheeser> and the full error
[13:46:14] <talbott> http://pastebin.com/y4tBb6Ce
[13:46:16] <cheeser> (no problem btw :) )
[13:50:52] <cheeser> what's with the nested "interaction" documents?
[13:52:25] <cheeser> looks like that field should be "interaction.interaction.language.tag"
[13:54:04] <talbott> oh
[13:54:15] <talbott> yes, unfortunately the docs are pushed in like that by a 3rd party
[13:54:21] <talbott> so the doc structure is not in my control
[13:54:52] <cheeser> that's fine. just try it like i suggested.
[13:54:55] <cheeser> see if that works.
[13:54:57] <talbott> will try
[13:55:08] <cheeser> it's odd but workable
[13:55:36] <talbott> no same
[13:56:05] <talbott> and you'll see in the doc, that language is also specified here
[13:56:08] <talbott> interaction.twitter.lang
[13:56:16] <talbott> using that as the override, causes the same err too
[13:56:35] <talbott> maybe i'm making some completely numpty mistake?!
[13:56:59] <talbott> happens a lot
[13:57:11] <cheeser> ah. found it in the source.
[13:57:22] <cheeser> the override field name can't have any '.' in it
[13:57:28] <talbott> ah HA
[13:57:33] <talbott> well, bastard
[13:57:42] <cheeser> i thought so but i had to track down where in the code that check is made
[13:58:06] <talbott> is there a way to just say...all docs are english, and ignore the language field?
[13:58:22] <cheeser> specify a nonexistant override field
[13:58:27] <talbott> ah
[13:58:38] <talbott> ah ha!
[13:58:43] <cheeser> if the field is missig in the document, it'll use the default_language defined on the index
[13:58:44] <talbott> looks like it's doing something
[13:58:54] <talbott> i can create this in the bg, right?
[13:59:09] <cheeser> yes
[13:59:16] <talbott> thanks cheeser!
[13:59:20] <talbott> you're a diamond
[13:59:29] <cheeser> though typically you wouldn't want to do that so you know when it's done, e.g.
[13:59:36] <talbott> i wonder if a future release will support the ,
[13:59:40] <talbott> the .
[13:59:41] <cheeser> any time. remember me when you come in to your billions. ;)
[13:59:54] <talbott> will do!
[13:59:59] <cheeser> i'll bet it's a speed issue when traversing the document for indexing.
[14:00:08] <talbott> right
[14:00:26] <talbott> would be worried about creating the index if i dont BG
[14:00:35] <talbott> because it'll lock the db / collection wont it?
[14:02:14] <bazineta> yes, it'll lock it
[14:02:44] <bazineta> that is a Bad Thing™
[14:04:31] <bazineta> It's one of those headscratchers where you're always mystified as to why it's the default
[14:06:15] <bazineta> typically about 3 nanoseconds after your hand comes off the enter key
[14:07:48] <cheeser> it'll lock either way, afaik.
[14:08:00] <cheeser> it's jsut a question of when the ensureIndex() returns contorl
[14:08:56] <bazineta> nope. bg doesn't exclusive lock until the index is built; without bg, it's locked for the duration
[14:10:17] <cheeser> fair enough
[14:11:38] <bazineta> according to the docs, fg produces a more compact index at the cost of DOSing yourself. Always seemed like a very por trade to me...
[14:13:43] <cheeser> that's right. i remember reading that a while back.
[14:14:52] <kali> +1, bad choice of default value on that one, but it's sooooooo old :)
[14:16:35] <bazineta> Ran that by mistake on a collection of 150 million documents once. Once.
[14:17:45] <kali> good thing is, you can kill it
[14:18:53] <bazineta> Yep. Still boggles my mind though that at some point, there must have been a meeting, and in that meeting there was a discussion, and at the end of it, everyone said, yeah, that's a great idea, do it that way.
[14:19:16] <kali> bazineta: well, it's not that simple. in the beginning, there was no background option
[14:20:00] <kali> bazineta: so when it was added, the choice was between changing the api semantics and picking a better default
[14:20:50] <bazineta> kali Sure, and for a time in our history we banged the rocks together to make sparks, but it's weird you'd leave this landmine hanging around for so long at this point.
[14:21:10] <winem_> sorry guys, just following your conversation and I'm wondering what you mean by bg. :)
[14:21:18] <cheeser> background
[14:21:39] <winem_> ok, this was too obviously -.- thanks
[14:22:43] <bazineta> Sorry winem_, reference: http://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/
[14:23:17] <bazineta> The first paragraph of that document is my /facepalm.
[14:23:22] <winem_> yes, seems to be really interesting. thanks. only read two third of the book yet
[14:24:39] <winem_> thanks for the input. will follow your informations from the bg ;)
[14:25:17] <kali> bazineta: I agree, it would have been nice to leave this behind when the default for write safety was switched.
[14:26:05] <cheeser> to leave which bit behind? the override field name?
[14:26:47] <bazineta> cheeser the bit where ensureIndex() by default exclusive locks the database unless you explicitly tell it not to.
[14:27:24] <kali> cheeser: to make bg=true the default
[14:27:34] <cheeser> i actually like it this way, tbh
[14:28:05] <cheeser> long lived database operations should be explicity in their nature
[14:28:20] <winem_> ok, one more question regarding the bg. is this just a question of performance or does it also lock the collection like relational databases do?
[14:28:23] <Derick> I'm with cheeser here.
[14:29:22] <bazineta> Ok, so when's the last time you didn't use the bg flag?
[14:29:26] <kali> winem_: without it, the dabase is locked.
[14:29:45] <winem_> ok. now I understand the real impact
[14:30:17] <kali> winem_: if you forget it on a real production system, you go down.
[14:30:39] <winem_> working with mongodb for about two months now and I'm really impressed but only read two third of the book and just have some issues to release the relational dbms thinking.. :)
[14:32:01] <kali> Derick: cheeser: well "what you don't know should not harm (too much)"... I fear bg=false by default fails this one
[14:32:30] <bazineta> winem_ The beginning of wisdom is to repeat to yourself 'it'll only ever use ONE index' until it sticks.
[14:33:26] <winem_> yes, that's why they implemented compound indexes ;)
[14:35:44] <winem_> we'll define the json format until today EOB. could someone review our prefered shard key tomorrow? I think it's almost perfect - but I just read the official docs and books.. so I have no experience with mongodb outside our dev environment
[14:50:21] <Qualcuno> good morning. quick question: what is the recommended design to store in MongoDB a list of custom (user-defined) fields together with their label?
[14:53:25] <Derick> I'd say: [ { k: 'name', v: 'value' } , { k: 'name2', v: 'value2' } ]
[14:55:21] <Qualcuno> Thanks Derick. Another question: I need all value’s to have a full-text index on a “normalized” value (all lowercase, diacritics removed, etc). I was thinking of creating a new item in the collection named “searchable” and putting all the normalized values inside that, and then creating a full-text index on that?
[14:55:37] <Qualcuno> “searchable” would be another object, with all k and v
[15:05:16] <Derick> Qualcuno: makes sense
[15:07:13] <Qualcuno> Derick: ok, thanks! one last question. The docs say that MongoDB can have just 1 full-text index per collection, and that’s fine. to put in the index multiple field you can do something like http://docs.mongodb.org/manual/tutorial/create-text-index-on-multiple-fields/ . How can I tell MongoDB to select all the v’s inside “searchable”?
[15:10:49] <noqqe> hey - quick question
[15:11:00] <noqqe> i have a hidden member in my replica set
[15:11:46] <noqqe> do i have to add this host also to the list of hosts when executing "sh.addShard()"
[15:12:04] <noqqe> or should i leave it out?
[15:12:49] <Derick> Qualcuno: { 'searchable.v' : 'text' } should do
[15:12:55] <Qualcuno> Derick: thanks
[16:03:30] <omid8bimo> hey guys, i need help. im adding a new memeber to my replicaSet, after staying in startup2 state for 24 hours (huge data to replicate); the mongo goes to recovering state which then i see error "replSet not trying to sync from kookoja-db1:27017, it is vetoed for 381 more seconds" and replication stops
[16:03:34] <omid8bimo> why is this happening?
[16:12:11] <bazineta> omid8bimo what version of mongo?
[16:12:31] <omid8bimo> bazineta: 2.6.5
[16:13:52] <bazineta> do the host fields in rs.conf() match the name fields in rs.status()
[16:15:30] <omid8bimo> bazineta: yes
[16:15:34] <bazineta> that error means that the secondary wasn't able to reach the primary, so it's going to try again later...look for some mismatch in the configuration that might be confusing it if there's nothing else in the logs
[16:16:20] <bazineta> typically though the logs on one or both servers will have more detail
[16:17:55] <bazineta> for example, 'too stale to catch up' would be something not unexpected if the data is ginormous
[16:18:38] <jar3k> How remove duplciated records (with the same columns except only 1 column) in collection?
[16:19:22] <jar3k> MapReduce is good?
[16:21:33] <bazineta> omid8bimo grep fror stale in your mongo log...if you see that, then read http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/
[17:48:26] <ajn> i'm attempting to deploy a replica set on compute engine and keep get an error stating the mongodbserver does not have 2 members, any ideas?
[19:30:01] <sammy_> hello
[19:46:24] <saml> hi samed
[19:46:26] <saml> sammy_,
[20:05:24] <shoerain> suggest any web admin panels for mongodb? there are quite a few listed here: http://docs.mongodb.org/ecosystem/tools/administration-interfaces/. I use robomongo on my computer, so something akin to that would be sweet -- this is mainly for folks who want to check up on stats on certain collections
[20:17:14] <bazineta> MMS
[20:19:18] <naquad> can i somehow ask MongoDB for unique ID w/o inserting anything to collection?
[20:19:51] <cheeser> new ObjectId()
[20:20:49] <naquad> ermm what about pymongo? :)
[20:20:55] <naquad> looks like it doesn
[20:20:59] <naquad> 't have ObjectId
[20:21:00] <mike_edmr> naquad: the driver should have facilities
[20:21:03] <mike_edmr> for that
[20:21:14] <mike_edmr> a function in the pymongo api to grab a new ID
[20:21:35] <mike_edmr> http://api.mongodb.org/python/1.7/api/pymongo/objectid.html
[20:23:08] <naquad> mike_edmr, thanks. not applicable in my case though, got pymongo 2.7.2
[20:24:07] <naquad> found it! bson.objectid.ObjectId
[20:24:08] <naquad> thanks
[20:24:26] <mike_edmr> beat me to it :)
[20:24:52] <GothAlice> ObjectId can be imported from the top-level bson module, I suspect.
[20:27:54] <naquad> yup, you're right :) thanks
[20:28:23] <naquad> i've got a super n00b question: how do i limit amount of updated items? i need something like update({query}, {update}, limit=100)
[20:29:35] <hahuang61> it's not possible to have for example a replicated db across 2 data centers geographically separated but have 1 set of masters on each side right
[20:32:51] <cheeser> hahuang61: there can only be one primary
[20:34:13] <hahuang61> cheeser: yeah that's what I thought.
[20:34:25] <hahuang61> cheeser: appreciate it. There's no good active/active setup to do
[20:40:44] <GothAlice> A beautiful data architecture can be completely ruined by one creative user. XP
[20:42:15] <shoshy> hey, i have a process that worked fine, now i get "MongoError: Resulting document after update is larger than 16777216"
[20:42:54] <shoshy> i tried googling for the reason, but couldn't really find a reference to this, i did saw there's a limit of 16MB for querying.. but
[20:43:03] <shoshy> here it's different
[20:51:40] <GothAlice> shoshy: "Resulting document after update" — sounds like you're running an update query, and whatever you are doing to a document in that query is resulting in a document greater than 16MB in size. This is a no-no (not just query size is limited, actual per-record size is limited, too.)
[20:53:25] <GothAlice> Now, 16MB is a lot. As an example, I was able to store complete forums (1.3K threads w/ 14K replies, roughly 1.1M words or so) in a single record.
[20:53:52] <GothAlice> shoshy: But I'd need to see an example record and the query that is failing to really assist.
[20:55:32] <shoshy> GothAlice: thanks a lot for the help, that's 1st... regarding the update, i got the error on an insert. But i'm looking into it , i don't have any 16MB document..
[20:56:24] <GothAlice> Then you were attempting to insert a 16MB document.
[20:56:34] <GothAlice> Were you by chance attempting to insert multiple records with one insert?
[20:56:54] <shoshy> GothAlice: nope i haven't its a small document. in size.
[20:57:00] <shoshy> I'm heavily working async
[20:57:10] <GothAlice> shoshy: Could you pastebin your insert query, then?
[20:57:48] <shoshy> GothAlice: yea, i know... me too. It's a big big big code i cant, but maybe the query i could... checking that out
[20:58:26] <shoshy> and it worked ALL The time for many months now
[20:59:03] <shoshy> when i say all the time, i mean... in a real-time kind of way.. non-stop..
[20:59:19] <shoshy> brb, checking if i missed something..
[21:00:09] <shoshy> ok, found an update..
[21:00:14] <shoshy> checking some more
[21:02:00] <shoshy> GothAlice: so this is the update: http://pastie.org/9728664
[21:02:08] <shoshy> weird thing is, i print inside the callback
[21:02:31] <shoshy> ahh nope.. its not weird
[21:02:37] <shoshy> it runs for the error there, yep
[21:02:58] <shoshy> ok, so it's not the [bag]" object being inserted as its small
[21:03:11] <GothAlice> shoshy: Do you have monitoring a la MMS? (I.e. what's your average document size in that collection?) By not using upserts, there, growth of the "feed" list would appear to be unbounded.
[21:03:34] <GothAlice> (And thus your >16MB problem… if you try to append to that list and it needs to grow, but can't… *boom*.)
[21:04:41] <shoshy> GothAlice: i don't have for that server, i know i should link it
[21:04:55] <GothAlice> Eek; and flipping that list (so you append, rather than prepending and thus needing to *move* the existing entries) can't hurt.
[21:06:00] <GothAlice> (You'd also simplify the $push down… no need for $each on a list with one element.)
[21:06:20] <shoshy> GothAlice... right! , how would you change it to avoid overhead?
[21:06:25] <shoshy> like, the syntax itself
[21:06:33] <shoshy> i'll try it right away
[21:07:53] <shoshy> meaning would you use $addToSet ?
[21:10:13] <GothAlice> Nooo.
[21:10:22] <GothAlice> {$push: {feed: bag}}
[21:10:55] <GothAlice> If you only ever append (prepending is heinous for a variety of reasons, though occasionally useful) things become simpler.
[21:11:07] <GothAlice> shoshy: ^
[21:11:13] <flyingkiwi> naaah, the bigger question is: shoshy, are you also pop'ing values?
[21:12:20] <shoshy> GothAlice: ok, so same query without the $position , great, i'll try it out (restarting the process will take time)
[21:12:41] <shoshy> flyingkiwi: so, as it seems, i query but with {feed: 0} , so without that field
[21:13:34] <shoshy> the idea behind pushing it to the beginning was to take only the latest items
[21:13:34] <GothAlice> The error you were getting indicates that some document being updated by that statement (I'd pop into a shell and findOne on that document!) is growing to a size larger than the acceptable maximum as a result of the $push. (Prepend or append won't matter for the purpose of this error. That's mostly an efficiency/simplicity thing.)
[21:13:41] <flyingkiwi> just because mongodb isn't freeing the space in arrays between entries after deleting them
[21:13:45] <GothAlice> shoshy: You can $slice with negative indexes to get the "end" of the list. ;)
[21:15:42] <shoshy> GothAlice: Thanks a lot, i'm checking it RIGHT NOW.. i
[21:17:05] <shoshy> so the document to be updated is fine
[21:17:28] <shoshy> i think.. how can i check specifically it's size..
[21:17:41] <shoshy> (i actually loaded it in robomongo)
[21:17:45] <GothAlice> shoshy: That's a very good question. If you could get the raw BSON for that document and get its size…
[21:18:07] <GothAlice> (This is one of the big reasons I stick to interactive shell usage. I can get raw BSON out of pymongo quite easily.)
[21:18:24] <shoshy> http://stackoverflow.com/questions/22008822/mongo-get-size-of-single-document
[21:18:57] <shoshy> lol i got "88"
[21:18:58] <shoshy> :)
[21:19:16] <GothAlice> shoshy: Run that across all of the records in the collection. >:P
[21:19:36] <shoshy> wait you're asking on the collection size ?
[21:19:39] <GothAlice> No.
[21:19:42] <shoshy> i thought only the specific document..
[21:19:47] <shoshy> ahhh ok
[21:19:48] <GothAlice> I'm looking for suspiciously large records in the collection.
[21:19:58] <GothAlice> If you got the 16MB limit error, you did actually hit the limit. Somehow.
[21:20:04] <GothAlice> ;)
[21:20:14] <shoshy> yep... you're right
[21:20:23] <shoshy> but Object.bsonsize(db.groups.find({})) doesn't help much
[21:20:28] <GothAlice> Uh, no.
[21:21:11] <joannac> shoshy: the error should've told you which document was the one that was too large
[21:21:55] <GothAlice> db.groups.find().forEach(function(rec){ print(rec._id, Object.bsonsize(rec)); });
[21:24:54] <shoshy> ok... that command worked well
[21:24:59] <shoshy> very cool :)
[21:25:07] <shoshy> 16776070
[21:25:31] <joannac> yes, that's basically 16MB
[21:25:57] <GothAlice> Wau.
[21:25:57] <shoshy> and got some other getting closer 15847930,13805999,13717184
[21:26:05] <GothAlice> Those are all "insanely large" documents.
[21:26:26] <shoshy> :)
[21:27:05] <GothAlice> Sounds like your journey through the forests of scaling has lead you to the overpass of record splitting. Remember I mentioned upserts earlier? :)
[21:27:56] <shoshy> so basically, how do you handle it? like thats the procedure?
[21:27:59] <GothAlice> Using upserts you can tell MongoDB to "$push this value to list X if the query is matched, otherwise create a new record to $push into", allowing you to group you records by day/month/year or any other arbitrary grouping.
[21:28:12] <shoshy> GothAlice: but upserts modify/ adds a new document if not found
[21:28:17] <shoshy> or i'm missing something...
[21:28:37] <GothAlice> That's the idea; to create a new record when some criteria specifies that the old one may be getting "full".
[21:28:46] <GothAlice> (I break on time to make querying the data easier.)
[21:29:07] <shoshy> GothAlice: Interesting ..
[21:30:37] <shoshy> stupid question... i'm inserting an element to an array of a record, now that element is 100% non-existant
[21:30:40] <shoshy> in that array...
[21:30:46] <GothAlice> http://irclogger.com/.mongodb/2014-11-14#1416000865 the conversation circa this time period discussed a chat application, with conversations grouped by day, as an example.
[21:31:09] <shoshy> upsert in anyway won't create a copy of that whole parent object, with a "new array" with that element alone in it?
[21:31:32] <GothAlice> You would have to specify defaults that would be written into the new record.
[21:31:50] <GothAlice> And yes, $push to a non-existent field would create it as a list containing one element. AFIK. ¬_¬
[21:33:57] <shoshy> GothAlice: Thanks so much... that kinda bummed me and yet made me happy at the same time :) i hope it's an easy "fix" or way to go from here
[21:34:02] <shoshy> but... very interesting
[21:34:29] <GothAlice> Boomtime: That's why generally right after I say something, I test it. ;) http://cl.ly/image/291h2J1j3D30 My intuition was correct.
[21:38:46] <shoshy> GothAlice: before you leave, i saw your TTL index comment there in the convo
[21:38:55] <shoshy> is there's a TTL for field level? :))
[21:39:34] <shoshy> or... a mechanism to delete IF size is above threshold?
[21:40:08] <shoshy> (not in the application layer that is)
[21:42:22] <GothAlice> … no.
[21:42:45] <GothAlice> TTL means "time to live" and has nothing to do with any criteria other than time.
[21:42:59] <shoshy> yep, i know that's an option
[21:43:12] <shoshy> i have no problem emptying that array once every X days
[21:43:54] <GothAlice> shoshy: The point of using upserts (esp. as described in the prior IRC chat log) is that you can have MongoDB automatically balance already. TTL is for an entirely different purpose than limiting document size. And it can't empty arrays, only delete whole documents, which is effectively what using upserts here would do anyway. (Except without needing to delete anything.)
[21:44:46] <shoshy> GothAlice: yea... thought there might be an opton for field level, cool, i'm still reading it :)
[21:44:52] <shoshy> thank you
[21:45:24] <bazineta> So, been looking at this bug for 3 months now: https://github.com/LearnBoost/mongoose/issues/2285#issuecomment-63517357
[21:45:50] <bazineta> And it's causing me to inquire if there's a better choice of library.
[21:46:14] <GothAlice> There are always better choices, for various definitions of "better".
[21:47:17] <GothAlice> Well, what, exactly, are you looking for in your database layer?
[21:47:30] <GothAlice> Referential integrity enforcement with reverse deletion rules?
[21:47:35] <GothAlice> :)
[21:48:41] <bazineta> Our primary usage of mongoose is actually just the basics that the base driver itself provides. We do make extensive use of population. Other features, not really, not that we couldn't easily handle in other ways.
[21:49:27] <Boomtime> any reason you don't just use the Node.js driver directly?
[21:49:43] <GothAlice> bazineta: There's also light-weight (well, lighter than Mongoose!) solutions like https://github.com/masylum/mongolia
[21:50:53] <bazineta> Good question, Boomtime. We
[21:51:12] <bazineta> aren't really excerising the ORM features much, perhaps the base driver is the real answer.
[21:52:20] <GothAlice> (The main Mongolia model.js is < 500 lines, the object mapper is < 100, and validation another 200 or so. Colour me impressed, this lib is tiny.)
[22:01:20] <shoshy> GothAlice: ok, i've read the conversation, so basically its just an upsert, but you add some search-properties / flags. My case is a group is unique, only one exists. Can't have multiple. After this whole thing, i thought of adding a "created_date" field, and from now on the .find / .update (with upsert) will be based on {_id: ObjectId('...') , created_date: {$gt: (new Date())-1000*60*60*24*10}} (lets say last 10 days)
[22:02:08] <GothAlice> There are a few ways to record the "period" the document applies to.
[22:03:08] <GothAlice> Using creation dates and a "timeout" can work. You can also convert to a UNIX timestamp and subtract the modulo of your division (i.e. snap it to the nearest 10-day period, or in my case, hour), you can also use real dates and just replace date elements (also for the hourly example; set the minutes/seconds/microseconds to zero.)
[22:05:04] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-sample-py is an example of one of my own records using this style of upsert. (And yes, the update statements on these are kinda nuts.)
[22:07:52] <shoshy> GothAlice: i think i'd go with the UNIX version (new Date.getTime() - ... ) what do you mean by "modulo of your division"? like taking the subtraction and modulo by 1000*60*60*24 ?
[22:08:10] <shoshy> GothAlice: and what i wrote was correct , as in the idea, of what you ment?
[22:10:15] <GothAlice> var now = +(new Date()), week = 1000 * 60 * 60 * 24 * 7, snapped = now - (now % week); — "snapped" is now the date snapped to the nearest (previous) week. Won't match reality's version of a week, but it'll do the trick. (It'll be offset based on the UNIX epoch.)
[22:11:27] <GothAlice> +(new Date()) is a nifty trick (and probably hideously bad idea) to quickly get microsecond timestamps. ^_^
[22:12:38] <shoshy> why not new Date().getTime() ?
[22:13:11] <GothAlice> No reason other than my muscle memory told me to do it the other way. ;)
[22:13:37] <shoshy> hahaha ok ... yea so basically you're asking for the last's week unix time ... got it..
[22:14:09] <shoshy> why do you have another ObjectId there?
[22:14:25] <shoshy> are you storing the original (the one who "got away" 16MB document) object id ?
[22:14:31] <shoshy> (in the gist)
[22:14:38] <GothAlice> The POSIX standard means when working in UNIX timestamps you don't really have to worry about leap seconds and such; you save a lot of headaches doing calendar math. _id is the record's ID, all of the others are references to other collections
[22:15:03] <GothAlice> Nope; each unique combination of hour, company, job, supplier, and invoice will have a new _id.
[22:15:20] <shoshy> ahh ok thought c" : ObjectId(""), # Pseudonymous Company this might be a reference to the original
[22:15:22] <shoshy> i see..
[22:15:30] <GothAlice> Yeah, that's just a reference to a company.
[22:17:00] <GothAlice> In that case it'll be a fake ObjectId (i.e. not really referencing any document), but it'll always be the same fake ID for the same company in the dataset. (To anonymize the data and make it easier/safer to export for analysis.)
[22:19:28] <shoshy> GothAlice: i see.. ok.. thank you. So basically i only change the update to have upsert: true and in the find object section i do {_id: ObjectId('...'), created: {$gt: (new Date().getTime() - (new Date().getTime()) % 1000*60*60*24*7) }}
[22:19:41] <shoshy> if we're talking about keeping it for a week
[22:20:29] <shoshy> every time it'll search for a document with that id in the previous week, it'll come to the same record
[22:20:38] <shoshy> when it won't it'll create a new one
[22:23:16] <GothAlice> shoshy: https://gist.github.com/amcgregor/94427eb1a292ff94f35d — here's an example update statement from my code.
[22:23:30] <GothAlice> (Roughly; the $inc data is calculated on each hit from browser data.)
[22:23:57] <GothAlice> And yes, that's basically how it works. If the query matches something, cool, if not, create it.
[22:25:07] <shoshy> Great, thank you so much again, i'm off making the changes, and adding the creation date to the previous documents so they'll align
[22:25:12] <shoshy> but that's a very elegant solution
[22:25:38] <GothAlice> If you can get away without needing periodic maintenance routines, it's probably a good solution. ;)
[22:26:04] <GothAlice> (And avoiding extra queries, a la "does this exist? no? create it." is good, too.)
[22:26:40] <shoshy> GothAlice: I can! for the oldies, i don't mind them staying there for a rainy day as long as i can use one for a long time... only concern is with size
[22:27:26] <shoshy> but now i know, that if it'll happen again i could just reduce the number of days
[22:27:32] <shoshy> till upserting it
[22:27:49] <GothAlice> This is similar to a fairly typical logging issue. Do you rotate your log files when they exceed a certain size (variable time), or when after a period of time (variable size)? :)
[22:28:13] <GothAlice> (I do the period of time approach; again, it makes querying much easier.)
[22:28:37] <shoshy> well the use case here is different but yea..
[22:28:48] <shoshy> you don't need your logs in memory ;)
[22:29:12] <shoshy> and i like them by time
[22:39:09] <EricL_> I've googled around and I don't see any solution for this error? too many retries of stale version info
[22:39:19] <EricL_> Any thoughts on how to deal with this on Mongo 2.4.10?
[22:40:56] <EricL_> My productoin database is down. I could use some help if anyone knows.
[22:47:38] <cheeser> EricL_: does this help? https://groups.google.com/forum/#!topic/mongodb-user/0DXIs6vlo1g
[22:48:08] <GothAlice> cheeser: Beat me to it! My Google Groups is refusing to load… T_T
[22:48:16] <cheeser> :)
[22:51:32] <GothAlice> Weird; I can see that thread on grokbase, but Groups causes my browser to crash. O_o
[22:55:52] <talamantez> Hi all - I have a neurosky MindWave headset that is writing to a mongoDB collection 1/sec. I would like to take a moving average of the last ten readings. Here is the project on github
[22:55:53] <talamantez> https://github.com/Talamantez/eeg-data-recorder/blob/master/app.js
[22:56:22] <talamantez> Any idea what the best way to approach this would be? thanks
[22:57:49] <GothAlice> talamantez: Insert those into a capped collection, have another process using a tailing cursor to continually receive new results and calculate the moving average.
[22:58:44] <GothAlice> Or, you could have your inserting process keep track of the moving average and publish it on each record inserted. (Thus each record's "average" would be of it and the prior nine.)
[23:02:22] <talamantez> @GothAlice - Thanks for the suggestions - I'll look into documentation about a trailing cursor.
[23:02:36] <GothAlice> talamantez: You could even simplify things, if your capped collection only allows 10 readings, you can always just get the average across the collection. :)
[23:03:48] <GothAlice> talamantez: https://gist.github.com/amcgregor/4207375 are some slides from a presentation I gave; covers an important caveat of capped collections and includes some sample code.
[23:03:56] <GothAlice> (And link to complete implementations and the rest of the presentation.)
[23:04:38] <GothAlice> (You'd need to re-implement something like 3-queue-runner.py to make use of tailing cursors.)
[23:08:12] <talamantez> GothAlice - thanks!
[23:10:48] <arisalexis> hi, i want to search for documents with a text index that are in a circle (so a geospatial index). i read you cannot have a compound with these two types. is there any other way? like searching with geospatial and aggregate pipeline ? im not very proficient in this, is it doable?
[23:12:25] <GothAlice> arisalexis: Unfortunately you can't compound on those, and MongoDB can only make use of one index per query. You'd have to determine which (geo or text) narrows down the result set the most (on average) and hint that index, letting MongoDB do the more expensive scanning per-document on the remainder. This would be true using normal .find() and .aggregate().
[23:13:25] <GothAlice> (One of the cases where aggregate queries won't save you.)
[23:14:19] <arisalexis> so I can do it, it just going to be slower. i first do the geospatial for example (for sure that's reducing more the number of my documents) and then acting on this dataset? i don't understand the per document comment?
[23:16:32] <GothAlice> MongoDB can rapidly scan an index to determine which documents match, but if there is additional (not "covered" by the index) fields being queried, MongoDB will have to actually load up each document and perform the comparison "per document".
[23:16:35] <GothAlice> See also: http://docs.mongodb.org/manual/tutorial/analyze-query-plan/#analyze-compare-performance
[23:17:09] <arisalexis> ok thanks a lot. do you think i should be doing this with elastic search or another db? any suggestions?
[23:17:41] <GothAlice> I built my own full-text indexer on top of MongoDB prior to their adding of full text indexes, so my own queries could use compound indexes on the data.
[23:18:08] <GothAlice> (To date I haven't actually used MongoDB's full-text support.)
[23:21:38] <shoshy> GothAlice: i did changes and now i get: MongoError: insertDocument :: caused by :: 11000 E11000 duplicate key error index: test.groups.$_id_ dup key: { : ObjectId('53a04b827fac2aa602a2dbbb') }
[23:21:59] <shoshy> my query now is...
[23:22:01] <GothAlice> shoshy: I mentioned that the _id of the "group" will have to be unique for each period.
[23:22:43] <GothAlice> (Which means that if you have an actual, canonical "group" ID, it can't be _id, you'll have to store it in a different field.)
[23:22:58] <shoshy> http://pastie.org/9728929
[23:23:11] <shoshy> hmmm...
[23:23:35] <GothAlice> So basically replace "_id" with "group" or some-such everywhere it's being used against that collection.
[23:24:49] <shoshy> hmm... i see... so i'll be fetching from now on by group_id and not _id
[23:25:22] <GothAlice> Effectively, yes. _id will still be useful to reference a specific period for a given group, but otherwise less useful.
[23:25:40] <GothAlice> (You also still don't need $each if you are always appending…)
[23:26:47] <shoshy> can i add an index on the group_id ?
[23:27:21] <GothAlice> Definitely. Remember, though, that MongoDB can only use one index. (So likely what you want is a compound index that includes group_id…)
[23:28:08] <GothAlice> Ooh, there's also a reeeeally neat trick you can use when you're doing exactly what you are doing (using date-based "timeouts" to create new records). The _id itself can be used for this purpose, since ObjectIds contain their creation time. You'll have to see if your driver offers something like: http://api.mongodb.org/python/1.7/api/pymongo/objectid.html#pymongo.objectid.ObjectId.from_datetime
[23:28:51] <GothAlice> If it does, you can say {group_id: ObjectId(…), _id: {$gt: ObjectId.from_datetime(…)}} in the search criteria part of the query.
[23:29:15] <GothAlice> (And would want a compound index on (group_id, _id)
[23:31:50] <shoshy> GothAlice: that's really good idea...
[23:31:57] <shoshy> i came across: http://stackoverflow.com/questions/8749971/can-i-query-mongodb-objectid-by-date , its old though
[23:32:09] <shoshy> i guess node.js driver doesn't have it built in, but still checking
[23:32:36] <GothAlice> The code in that answer, however, would do it. :)
[23:32:45] <GothAlice> The BSON spec for ObjectIds hasn't changed in forever, AFIK.
[23:33:35] <shoshy> the last answer there, says there's a built in option in node.js driver: 'createFromTime'
[23:33:47] <shoshy> var objectId = ObjectID.createFromTime(Date.now() / 1000);
[23:33:50] <GothAlice> Even better. :D
[23:34:26] <shoshy> ok, this is really cool
[23:35:13] <shoshy> now i'm trying to update all the groups to have a group_id being the group's _id . which seems to be a bit ugly. db.collection.find( your_querry, { field2: 1 } ).forEach(function(doc) { db.collection.update({ _id: doc._id },{ $set: { field1: doc.field2.length } } ); });
[23:35:14] <GothAlice> Yeah; basically any time someone has a "created" date/time field, I point them at ObjectId's inherent awesomeness. Doesn't fit every situation, though! (Especially if you need to be able to *use* the date inside an aggregate query, for example.)
[23:38:39] <shoshy> thanks so much once again... i'm off now, but i'll def. do all the changes, super smart and helpful advices
[23:38:41] <shoshy> thanks
[23:54:01] <shoerain> if mongoshell >>db.tasks.find({}).count() returns 73k, then I should get a similar number with mongoose.model('Task').find({}), right? I get 3... and I'm wondering if it's some middleware interfering with this.
[23:58:53] <GothAlice> shoerain: Are you sure .model('Task') and db.tasks are the same collection?
[23:59:21] <GothAlice> Many ODM layers mangle the model names to produce the collection names, others leave it alone, or you may have explicitly specified. (I can't tell. ;)