[00:35:34] <olso> hey, guys, how to speed up this? mongo.collection.find({baseModel:'CK752E001'}) ... it takes around 100ms, there are 48k docs in collection
[00:53:59] <unholycrab> when i add a new replica set secondary member, it takes 3-5 days to sync up with the rest of the replica set. there any way i can monitor the progress of the index building phase ?
[00:54:05] <unholycrab> as the index building phase seems to take the longest
[00:56:12] <unholycrab> im referring of course to parts of the STARTUP2 state
[00:57:57] <unholycrab> tailing the logs, i get something like this:
[00:58:00] <unholycrab> Tue Jan 27 00:55:09.006 [rsSync] Index: (2/3) BTree Bottom Up Progress: 842230600/94654761788%
[00:58:27] <unholycrab> 88% of some index its building
[00:58:40] <unholycrab> this doesn't help me determine overall progress
[01:12:37] <harph> I have an old master-slave model running that I can't shut down yet and I want to spin up a new instance that with the same data and sync with master but I need it to be a replSet to be able to use mongo-connector with elastic search. Is there a way to do this? I'm getting the following error when I pass that flag: SEVERE: Failed global initialization: BadValue replication.replSet is not allowed when slave is specified
[14:20:05] <g-hennux> I use text indexes with (a rather old version, 2.4.9) of mongodb and my results are quite counterintuitive when the indexed text contains special punctuation characters
[14:21:06] <g-hennux> so even though I have a document with a field "Set „Easy 8“" (and that field has twice the weight of the other fields in the text index), that document will get a really low score
[14:21:22] <g-hennux> in fact, even when i search for "easy 8", it is last in the last
[14:21:37] <g-hennux> now I am wondering if the punctuation does something weird to the indexing
[15:20:26] <RaceCondition> when should I be using _id: ObjectId("...") vs _id: "..."?
[15:20:43] <RaceCondition> I'm seeing inconsistent behavior: with one collection (a non-capped one), I can use both, with another one (capped), I can only use the former
[15:20:53] <RaceCondition> same with both find() and update()
[15:23:10] <RaceCondition> this is what's happening: http://pastebin.com/wtHvfAri
[15:23:24] <RaceCondition> with db.jobs, I can use both versions; with db.queue, I can only use the ObjectId(...) one
[15:25:50] <StephenLynx> you can't say "thread id = 6546546546" when its the first thread created in a forum, for example
[15:25:52] <RaceCondition> when do I need to wrap the BSON IDs in ObjectId and when not
[15:25:53] <Derick> RaceCondition: yes, use your own ID value if you can - if not, use ObjectID.
[15:26:14] <Derick> RaceCondition: you should alays wrap real Object IDs in an Object ID
[15:26:22] <Derick> but you can pick your own non-object-id values for _id
[15:26:47] <RaceCondition> so what you're saying is that Mongo remembers whether I saved an object with _id:"foo" vs _id:ObjectId("foo"), and I have to use the same value later?
[15:26:58] <StephenLynx> and since mongo don't have joins, you either makes two queries to get an user friendly value or you display the ugly-ass _id
[15:27:00] <RaceCondition> i.e. if I save {_id: "foo"}, I can't later query it with {_id: ObjectId("foo")}?
[15:27:46] <StephenLynx> and then there are the rules on projecting
[15:27:47] <Derick> RaceCondition: yes, you need to use the same value later
[15:27:56] <Derick> "foo" and ObjectId("foo") are absolutely not the same
[15:28:12] <StephenLynx> the main problem with _id is that is does not behaves as a regular field.
[15:28:26] <StephenLynx> and has all these little rules
[15:28:37] <Derick> you can store ObjectIDs in other fields too...
[15:31:45] <RaceCondition> also, the document that gets returned either contains _id:"foo" or _id:ObjectId("foo") depending on which one I used to query
[15:33:56] <RaceCondition> ok, so it just stores those 96 bits, instead of the 24char string; got it
[15:34:19] <xcesariox> Derick: 1 minute, let me paste it into a gist.
[15:34:19] <Derick> an ObjectID consists of 4 bytes unix timestamp, 3 bytes hostname part, 2 bytes process Id and 4 bytes counter (to prevent collisions within each second)
[15:34:38] <Derick> hmm, I got one of those numbers wrong
[15:34:50] <Derick> but yes, it stores binary data, and not that 24 char string
[15:35:03] <RaceCondition> now the question is, if I want to use ObjectId but I'm using JSON internally to represent these documents, why can't I use $oid to represent an ObjectId as JSON?
[15:35:22] <Derick> RaceCondition: I don't quite get that?
[15:35:46] <RaceCondition> I mean, is there a way to encode an ObjectId in the JSON representation of a BSON document
[15:36:36] <Derick> we do something like "Extended JSON", but that's not really a standard
[15:36:48] <Derick> and not all drivers support it
[15:37:03] <RaceCondition> even on the mongo REPL, I cannot use {"$oid": "foo"}
[15:37:15] <Derick> yes, you can not use field names starting with a $
[15:37:27] <xcesariox> Derick: where do i put this "results = Book.collection.map_reduce(map, reduce, out: "vr")" command syntax into? into rails console or mongodb console directly? https://gist.github.com/shaunstanislaus/0f8d87939c0ab01ce5d6
[15:37:29] <RaceCondition> so can I represent an ObjectId with this "Extended JSON" somehow?
[15:37:57] <Derick> RaceCondition: you have already seen it, it's this $oid stuff that the shell shows :-/
[15:38:20] <xcesariox> Derick: i am actually following a tutorial book but this syntax doesn't seems to work for me. it says unexpected token when i type it into mongo console.
[15:38:42] <Derick> xcesariox: It looks like Ruby - it's not valid javascript syntax
[15:38:58] <xcesariox> Derick: so what should i do
[15:39:15] <xcesariox> Derick: but this is not ruby results = Book.collection.mapReduce(map, reduce, out: "Tom")
[15:39:18] <RaceCondition> Derick: when does the shell use $oid? I haven't seen it; it only seems to use ObjectID
[16:12:29] <xcesariox> where do i put this "results = Book.collection.map_reduce(map, reduce, out: "vr")" command syntax into? into rails console or mongodb console directly? https://gist.github.com/shaunstanislaus/0f8d87939c0ab01ce5d6
[16:12:51] <rodfersou> in one collection I save the friendly url as the 'slug' attribute of my document, and ensure the index on this field... my question is.. knowing that I'll use the _id as my sharding key when I change to a cluster, do my database will get slower when I find the document with 'slug' attribute?
[16:12:59] <cheeser> well, that's not valid in the shell, for sure.
[18:25:45] <klmlfl> Hi room , I have a replica set running in staging environment, can I get some pointers on how to test it?
[18:27:52] <klmlfl> I am thinking about testing beyond just cycling the service on and off
[18:37:35] <Tyler__> So if I do a findOne and the document doesn't exist, does the code in the callback function still execute?
[18:40:09] <Tyler__> User.findOne({code:req.body.code},function(err, doc){ Does this part still execute if it can't find one? });
[18:49:48] <rodfersou> Tyler__: for me this doesn't make sense.. but try to make a simple test database, with test data and see what happen
[18:50:02] <rodfersou> Tyler__: sorry.. I'm still learning too :)
[19:54:01] <theRoUS> Derick: you ever work with rails and mongoid? or know of anyone?
[19:57:03] <joep_> General question: say I have a tree of documents and I need to get from the bottom-most node to the top node, what is the most efficient way to do that in MongoDB?
[19:57:32] <joep_> (presently I have back-references on each document and run a query on each node to get its parent; so, for N depth there are N queries)
[20:04:34] <joep_> (and by "back-reference" I mean a reference to the parent node)
[20:05:08] <neo_44> joep_: there are a lot of considerations for that question.....but if you just want to go from bottom to top...query for a node with no parent.
[20:05:29] <neo_44> but if you want to take the path from the child to the top....that is different
[20:14:19] <neo_44> what is more important....speed or complexity?
[20:14:32] <neo_44> or shall i say.....if it is complex and fast is that okay?
[20:14:51] <joep_> to give a little context, the system has many collections where each collection of documents gets curated, modified, and associated in varying order
[20:15:41] <joep_> simplicity is probably more important since I have a team I will have to educate if it is complex
[20:15:47] <neo_44> is there a service or data access layer? or is the client writing directly to the databses?
[20:17:00] <joep_> we are using Doctrine ODM, an ORM written in PHP; but, we are realizing some limitations, specifically this problem: given X, give me its top most parent Z in an efficient manner
[20:17:31] <neo_44> yeah...kinda need another abstraction layer on top of that
[20:18:36] <neo_44> i user mongoengine(python) and then have another abstraction on top of it for repositories(database access), and then the entity layer...so I can switch out the underlying data store or have more than one at a time
[20:18:46] <neo_44> there isn't an easy fix for your issue....
[20:18:57] <neo_44> i would most likeyl have a meta store just for the relationships
[20:19:11] <neo_44> but you would have to constantly keep it updated...very complex
[20:30:30] <Rodrive> Hi, i have some problem with the start of mongo when i reboot on centos 7. I'm having this log "ERROR: Cannot write pid file to /var/run/mongodb/mongod.pid: No such file or directory". According to github there was a fix on 2.6.5, i am in 2.6.7 and still having the problem.
[20:31:25] <hephaestus_rg> hey i'm trying to connect to my mongolab provided mongodb with the shell command, and it seems to be timing out. any ideas
[20:31:31] <cheeser> does that path exist, Rodrive ?
[20:32:10] <Rodrive> No, not on reboot, that's all the problem. See : https://github.com/mongodb/mongo/commit/50ca596ace0b1390482408f1b19ffb1f9170cab6
[20:33:34] <unholycrab> is there any way to determine the overall progress of the BTree Bottom Up indexing step, during the initial syncing a new Secondary member?
[20:33:40] <unholycrab> log output: Tue Jan 27 20:04:45.006 [rsSync] Index: (2/3) BTree Bottom Up Progress: 782212700/94654761782%
[20:34:06] <unholycrab> the 82% doesn't tell me anything about the overall progress at all
[20:35:05] <unholycrab> its like saying "82% of an unknown piece of unknown size out of unknown number of pieces whose sizes are unknown"
[20:55:08] <bdiu> Are there any particular performance implications against using a "composite" object as the _id in a document? e.g., _id:{date:"1/2/3",user:ObjectId(...),type:"daily"}
[21:15:27] <ddod> I was going to use the skip method if I had to do one at a time
[21:15:41] <GothAlice> It'd be easier if we could have direct access to the _id index, then we'd just need to know the count of buckets and bucket size, then randrange several times over it as required by the .limit().
[21:17:05] <ddod> ok, so I guess async + one at a time is the way to do it
[21:17:31] <GothAlice> There should totally be a .sort_by('$bogo') random sort… there's always the statistical chance of bogosort sorting correctly in a single try, but most of the time you'll get back things in a random order… ;)
[21:20:46] <unholycrab> i posted on stack overflow: http://stackoverflow.com/questions/28180167/how-to-determine-overall-progress-of-the-btree-bottom-up-step
[21:21:08] <GothAlice> https://jira.mongodb.org/browse/SERVER-533 < relating to random sort
[21:36:40] <neo_44> ddod: is that for testing? or something you need to do in prod?
[21:37:55] <neo_44> skip will pull the document into memory...just FYI...doesn't actually skip it.
[21:38:17] <neo_44> I would use 2 queries...first one that projected on only _id...build array...then randomly pull documents with those _ids from the database
[21:40:54] <ddod> as in, get a list of all IDs first and then do a $in to grab all the chosen docs?
[21:45:43] <neo_44> yeah...and you can choose rand() from the array of IDs that you have in the array.
[21:56:02] <Synt4x`> how do I remove a single element from a document? (I added it using this: b.team_stats.update({'id':t1_stats['id'], 'game__id':c}, {'$set': {'poss' : t1_poss, 'pace':t1_pace, 'dRTG':t1_dRTG, 'dEFG':t1_dEFG, 'dTOV':t1_dTOV, 'dREB':t1_dREB, 'dFT':t1_dFT} }) )
[22:30:38] <dsirijus> ah, lol. actually got to read up the docs on mongo's site today (and have been using it for a year or so already)...
[22:31:46] <dsirijus> dudes, there's should be required preface on everything that says anything about mongo saying "it's not a relational database. atomic is document operation. minimize atomic op count. there's no schemas."
[22:32:16] <dsirijus> i have built basically a fully relational database with a lot of schemas with it :D
[22:33:24] <dsirijus> and, best bit is - i didn't even need relations, basically everything is "contains" relation :D
[22:33:49] <dsirijus> "some" refactoring is due, i believe
[22:44:47] <GothAlice> dsirijus: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html is a good read. :)
[22:46:51] <dsirijus> you know what's another good read? mongodb docs! :D
[22:55:45] <dsirijus> it's actually pretty sweet db from this point of new understanding
[22:55:45] <GothAlice> I'm currently working on a MongoDB-backed cMS (component management system). https://gist.github.com/amcgregor/4cefefa4a12c9c76a970#file-contentment-model-py-L421-L490 is an example site layout and first page element (from the static content at https://rita.illicohodes.com/). Scroll up for the "schema". ;^)
[22:56:18] <dsirijus> i still think i'll keep schemas
[22:56:28] <dsirijus> from development perspective, they're pretty useful
[22:56:55] <GothAlice> Indeed. https://gist.github.com/amcgregor/4cefefa4a12c9c76a970#file-contentment-model-py-L285-L315 < the "Asset" schema from the cMS, as an example.
[22:56:56] <dsirijus> it's just that it's good to be aware that it's an artificial limitation/abstraction
[22:57:41] <GothAlice> Indeed. My cMS example explicitly allows an arbitrary (schema-free) set of "properties" on each Asset to make more flexible use of the underlying DB.
[22:58:11] <dsirijus> aaah, a nice little "trick"!
[22:58:59] <dsirijus> good good, i'll implement it like zo too
[23:01:01] <GothAlice> For me the biggest advantage of a declarative schema (this style of schema) is that I can mix and match behaviours. I.e. "Taxonomy" knows how to store and maintain a tree, as well as how to manipulate and traverse that tree. That means my main model doesn't need to worry about those details at all. Same with "Indexed", my own full-text index implementation using Okapi BM-25 ranking. (I need to deal with multilingual setups and dynamic
[23:01:01] <GothAlice> content, so MongoDB's indexing won't pass muster.)
[23:05:10] <dsirijus> good, good. all great advices. thanks, GothAlice! :)
[23:05:25] <GothAlice> Then I can test the Taxonomy code in isolation without the complexity of the rest of the system. :D
[23:06:02] <dsirijus> oh, this is it. we're not talking anymore. you have actual tabs in your source code
[23:06:10] <dsirijus> i don't speak with such heathens
[23:07:17] <GothAlice> Back from when we had a techblog: https://web.archive.org/web/20130813163534/http://tech.matchfwd.com/your-code-style-guide-is-crap-but-still-better-than-nothing/
[23:07:31] <GothAlice> I was master of the inflammatory blog title. ;)
[23:10:28] <GothAlice> Relevant segment from the "Indentation" section of that article: "Do you use spaces in a word processor to line up bullet points? If you do you’ll be first against the wall when the revolution comes!" ;)
[23:12:46] <dsirijus> ok, ok. i'll give you the benefit of doubt.
[23:13:31] <GothAlice> Also, on Github, you can add "?ts=4" (or other replacement size) to render tabs to that size. Sadly this doesn't work on Gist, only main Github file views.
[23:14:08] <dsirijus> ok. will you stop with all that useful information already!?
[23:17:06] <hephaestus_rg> hi i have a question about indexes
[23:17:24] <hephaestus_rg> if i query against a certain field frequently, then it makes sense to index it right?
[23:18:00] <hephaestus_rg> in my case, i have an ISODate field called 'published_at' that i query very often. is it worth indexing it then? (i have 400,000 docs in that collection)
[23:28:25] <GothAlice> hephaestus_rg: It's important to know how MongoDB uses indexes, however. MongoDB currently only uses one index to "cover" a query. Creating "compound" indexes with the fields in an appropriate order can make a world of difference. See: http://docs.mongodb.org/manual/core/index-compound/
[23:29:47] <GothAlice> Compound indexes can also save you from creating extra indexes, since MongoDB can use a compound index's prefix (first field, or first and second, or first through third, etc.) to satisfy queries using those earlier fields without needing additional indexes. But only if they're in the right order… it works from the left to right.
[23:31:15] <neo_44> hephaestus_rg: that is the rule for indexes....to be honest you dont' want to query data without an index if possible
[23:31:45] <GothAlice> hephaestus_rg, neo_44: Some queries, such as string regular expressions, however, can't use indexes. It's something to be aware of. :)
[23:33:21] <neo_44> GothAlice: depends on the regex
[23:33:59] <GothAlice> Indeed. For details, see: http://docs.mongodb.org/manual/reference/operator/query/regex/#index-use