[00:19:06] <jtmarmon_> the issue is that I need to retrieve the documents when they are updated :p hence the findAndModify
[00:20:00] <Boomtime> so do a find afterward, if your indexes are appropriate it won't matter performance wise
[00:24:59] <jtmarmon_> the situation i'm in is a bit complicated. basically we needed to rotate data so that the item in a results set that has been last retrieved is the first returned, creating a 'round-robin' like system. in order to do that, I have a field on the document called 'lastRetrieved', and that field is updated by findAndModify. if I do a find afterwards, I won't know which items in the results set were updated. my current workaround is multiple qu
[00:24:59] <jtmarmon_> eries, although I suppose a more performant workaround is generating an 'updateId' and doing an update, then a find. still, a multi-findAndModify would be more convenient. are there/is there a reason there aren't plans to support this?
[00:29:15] <jtmarmon_> in fact, come to think of it, the updateId approach wouldn't even be more performant since you'd need to perform a second update after the find to remove the updateId from the list of pending queries for that document >.>
[01:58:27] <seiyria> has anyone here used the Bulk API yet?
[02:20:42] <Boomtime> seiyria: what do you want to know?
[02:21:02] <seiyria> Boomtime, is it bad to hold a reference to a bulk permanently, and just call execute on it occasionally?
[02:21:31] <Boomtime> er.. you should only be able to call execute once
[02:21:56] <seiyria> no, I was perusing the API and looking into it
[02:22:05] <Boomtime> the object exists only on the client
[02:22:22] <Boomtime> you are not communicating with the server until you call execute or such equivalent in whatever driver
[02:22:25] <seiyria> I saw that bson was bottlenecking my app so I figured if I bulk execute statements instead of executing 1-300 within a second, it might be a bit better performance
[02:25:41] <Boomtime> actually, it doesn't query at all, it's updates only, deletes, inserts etc
[02:26:21] <seiyria> I don't honestly know what it would do, naiively I figured it just ran the queries all at once, but here's how my game works: basically, players take turns every 10 seconds, after their turn is over, they write themselves to the DB
[02:26:40] <seiyria> I have 100+ players on at any given time, but I get spikes up to 3-500 and possibly more depending on if I advertise to reddit
[02:26:55] <seiyria> so I figured writing so much so fast was probably bad, and that if I wrapped it into one query, it might perform better
[02:27:34] <Boomtime> you said you saw lots of CPU being taken by BSON serialization, putting aside how this compares with using Node.js in the first instance, how will you improve this by doing the same set of operations in the bulk API versus one at a time?
[02:28:16] <Boomtime> BSON serilization is a necessary step into and out of the server - the amount of work you are doing remains the same
[02:28:38] <seiyria> quite honestly, I wasn't sure but that's partly why I'm here asking
[02:28:50] <joannac> the only thing you would be saving would be the delay in getting a response from the server due to network latency
[02:29:04] <seiyria> well, the game and mongo are on the same server
[02:29:17] <seiyria> I should probably eventually change that
[02:29:23] <joannac> right. so even that is minimal right now
[02:30:17] <seiyria> ok. well, good to know that it would probably not be effective.
[02:31:06] <seiyria> I'll run an extended profile and see if I can't work something else out. thanks for the advice.
[02:31:15] <seiyria> and by extended, I mean for more than 10 minutes, haha.
[03:47:34] <newmongouser> I have a question about mongo db schema. I have a list of organizations that are categorized. Do I gain/lose performance having the organizations as an embedded document per category, or would I be better of with separate documents of categories and organizations?
[03:50:21] <cheeser> well, one thing to consider: can an organization be in more than one category?
[03:54:43] <Boomtime> also consider how much data per organisation - a hundred bytes? hundreds of K ?
[03:56:00] <cheeser> yeah, that's my big concern, really.
[03:57:20] <newmongouser> There is not a lot of data initially, but I would like the freedom to add more fields in the future. Saying it would be OK to embed them, do you think for the sake of scale down the road it makes sense to keep them separate from the beginning?
[04:03:52] <newmongouser> Thanks for your input Boomtime and cheeser
[05:10:29] <speaker123> hi i'm having some trouble doing a sort after a geospatial $within query.
[05:10:32] <speaker123> i read: http://blog.mongolab.com/2012/06/cardinal-ins/
[05:10:58] <speaker123> and it seems like i'd need my sort index to appear before the geospatial index, but this is not possible right? the geospatial index is required to be first in any compound index.
[05:11:09] <speaker123> is there some way of handling this?
[05:11:38] <speaker123> I can't do a scanAndOrder: true....not enough memory.
[05:16:38] <Boomtime> speaker123: AFAIK a 2d index must be first, but a 2dsphere index does not
[08:36:49] <speaker123> am I wrong in expecting that explain() should not be erroring out since it should use the compound index {created_at: 1, coordinates: 2dsphere}
[09:03:26] <speaker123> for the hint() spec millis is 12
[09:08:34] <speaker123> so my fragile understanding of indexes was that the compound index should have been selected in this case
[09:09:25] <speaker123> it has scanAndOrder: false, which seems like it should have been significantly faster with more documents.
[09:12:58] <ranman> speaker123: out of curiosity how many docs in the collection?
[09:13:14] <speaker123> for that demo version, really small...10k
[09:13:40] <speaker123> but i'm trying to do this with a $geoWithin and a sort afterwards....on 500 million documents....
[09:13:56] <speaker123> the compound index is (created_by, coordinates)....
[09:14:07] <speaker123> it doesn't pick it up unless i give the hint
[09:14:22] <speaker123> so it overflows on the sort
[09:15:00] <sachi> need help understanding c++ autoreconnect, and first time in IRC :)
[09:15:39] <sachi> so sorry if im not following the proper etiquette
[09:17:01] <speaker123> its something like: c = tweets_in_region(collection, region) c.sort({'created_at': 1}).explain()
[09:17:38] <speaker123> explain() fails with a sort overflow, even though i have a compound index {created_at: 1, coordinates: "2dsphere"}
[09:18:31] <ranman> sachi: no that's fine ettiquite you just might be waiting a while for someone to answer, feel free to post your question again every hour or two if you haven't gotten a response
[09:19:16] <ranman> speaker123: and what version of mongo?
[09:22:54] <ranman> not that you want to re-index 500 million docs...
[09:23:18] <speaker123> ranman, so i had that initially and it was at least using the 2dsphere index, but it would use scanAndOrder: true for the sort
[09:23:29] <speaker123> after reading: http://blog.mongolab.com/2012/06/cardinal-ins/
[09:24:11] <speaker123> it seemed that the reason for that is b/c you need sort fields to come before range fields
[09:24:18] <speaker123> if you want to sort after a range query.
[09:31:53] <ranman> speaker123: I'm thinking it might have something to do with this line in the docs which doesn't make any sense to me: For a compound index that includes a 2dsphere index key along with keys of other types, only the 2dsphere index field determines whether the index references a document.
[09:37:54] <speaker123> i think i'm going to have to do this differently anyway
[09:38:39] <ranman> speaker123: you might consider the agg framework for working with something that large
[09:38:51] <speaker123> even if {created_at: 1, coordinates: "2dsphere"} was working as i expected to for a coll.find(...geoWithin...).sort({created_at:1})
[09:39:15] <speaker123> i can't sort the whole collection each time i do a geo query.
[09:50:32] <speaker123> ok, so should i file this as a bug?
[10:15:46] <speaker123> kali, huh? can you explain?
[10:16:06] <speaker123> oh sorry, i thought you were responding to my question
[10:16:38] <kali> speaker123: nope sorry, had a look, any distraction is welcome today, but i have no idea what is going on
[10:43:08] <sachi> Hi, how could I change the timeout for connections with auto_reconect, or make those connections non-blocking? Thx :)
[10:44:27] <sachi> And how do I sorta register my username in tor his channel (or IRC) and is there way to like receive updates via email if someone like tags me? sorry this is my first time using IRC :/
[11:27:03] <sachi> Repost - Is there a way to change the C++ connection timeout? if not what's the workaround so that the program doesn't wait 5s to reconnect to the database?
[11:45:03] <reese> I have problem with shard migration: https://groups.google.com/forum/#!topic/mongodb-user/cdVY-zazPi0 could someone help me?
[12:47:35] <greenmang0> folks, i am trying to create an index on secondary, `db.collection.ensureIndex({foo:1})` , but I get message - ` "err": "not master", "code": 10058 `
[12:48:22] <greenmang0> is `stop secondary, run standalone, create index, start secondary, join replicaset` is the only way to create index on secondary?
[12:48:40] <greenmang0> can't i create index on live secondary?
[13:08:29] <phretor> any best practice/recommendation on how to create a newline-based text-file iterator from a GridOut object?
[16:50:45] <cheeser> i don't like rivers. dam those things.
[16:51:06] <dman777_alter> anyone use mongodb river plugin? I made my first index... the curl command shows "created":true but the elastic search log shows errors: http://dpaste.com/2C7XFYJ
[16:54:02] <jumpman> if one of these objects was added to a collection every second
[16:54:06] <jumpman> would my performance be horrible?
[16:55:39] <edrocks> jumpman: just write a benchmark and test it
[16:56:15] <jumpman> mongo doesn't have official benchmarks... the devs don't think it represents real-world usage
[16:56:18] <edrocks> but your keys seem a bit long why not use userId instead of idOfUserPerformingAction
[16:56:33] <jumpman> because all three of those IDs could be users
[16:56:50] <jumpman> really, the name doesn't bother me. i can change it to whatever and be fine
[17:01:57] <edrocks> is there anyway to set a connection timeout? mongodb is leaking them if a host loses its network
[17:42:05] <sachi> Repost - Is there a way to change the C++ connection timeout? if not what's the workaround so that the program doesn't wait 5s to reconnect to the database?
[18:48:12] <Tyler_> that's what I thought! Idk why people were recommending arrays instead of documents
[18:48:16] <edrocks> but you may need a time stamp to store other time info
[18:48:26] <Sidhartha> Select Coke as opposed to Sprite.
[18:48:35] <Sidhartha> How do I store these 'options', per se?
[18:48:50] <edrocks> Tyler_: in most cases it saves memory but some things require being able to update the things thoses ids point to which wiould break any history
[18:49:30] <Sidhartha> Let's say a combo is made out of A or B, C or D and E or F or G.
[18:49:35] <edrocks> Sidhartha: make your items in the array have a type either item or combo
[18:49:51] <edrocks> if its a combo store another array on each combo with the items you would of put in the original array
[18:50:40] <Tyler_> Sort of a n00b question, but if I'm just going with an array of userids... Do I need to do anything special to handle the creation of the document?
[18:51:40] <edrocks> Tyler_: just make sure you have access to it
[18:51:51] <edrocks> ie in golang you need to use bson.NewObjectId()
[18:52:02] <edrocks> since you must export your own id field
[18:52:07] <Tyler_> so I do an if statement for whether or not the document already exists?
[18:52:19] <Tyler_> and I do a findOne to get the document
[18:52:31] <Tyler_> and if it doesn't exist yet, I do a save instead of update?
[18:52:57] <speaker123> can someone explain that? if i provide a hint, then the explain() finishes quickly. without the hint, it takes forever (which means its not using the index)
[18:53:24] <speaker123> it seems that the index to use is obvious, so i don't understand why mongo is not picking it up
[19:00:54] <harttho> What's the best way to manually copy a database from 1 shard to the other?
[20:14:08] <Mkop> I'm trying to figure out an authentication issue. I can connect to user:pass@dbserver/foo and show databases, and it shows that bar exists. but I can't connect to user:pass@dbserver/bar
[20:14:33] <joannac> because you're now trying to auth against database "bar"
[20:14:46] <joannac> your user is defined in database "foo"
[20:16:30] <Mkop> so within foo my user has the createDatabase permission but the user still only exist in bar?
[20:16:36] <Mkop> so within foo my user has the createDatabase permission but the user still only exist in foo*?
[20:18:50] <Mkop> what I'd like ideally is for my user to be able to connect (with a single password) to user:pass@dbserver/anything. is there a way to do that?
[20:22:35] <boutell> why is it that db.createUser({ user: ‘username’, pwd: 'password', roles: [ ] }); defaults to storing the user in an “admin” database (uh, I guess), but authSource doesn’t default to that database?
[20:22:47] <boutell> why, in other words, do I have to specify authSource=admin to get any love from a user created that way
[20:36:15] <Mkop> { [MongoError: not authorized on process to execute command { update: "sessions", writeConcern: { w: 1 }, ordered: true, updates: [ { q: { expires: { $exists: 0 } }, u: { $set: { expires: new Date(1421958734685) } }, multi: true, upsert: false } ] }]
[20:36:29] <dman777_alter> hey...where is full text search at in Mongo these days? Has it improved?
[20:36:55] <boutell> dman777_alter: in 2.6 it is much better than in 2.4. You can combine it with other queries in a more natural manner, which for me was the point where it became worthwhile
[20:37:30] <boutell> dman777_alter: but, it’s not going to distinguish a document that has the words “A B C” in that order from a document that doesn't.
[20:37:39] <dman777_alter> boutell: would the 2.6 full text be as fast as elastic search?
[20:37:48] <boutell> I haven’t tried to measure that.
[20:38:21] <boutell> the advantage of elastic search would be that it’s dedicated to that job. The disadvantage would be in any situation where you need to combine regular database criteria with text search.
[20:38:34] <dman777_alter> cool...well..... I guess it doesn't matter. The production has 2.4 and it's not going to get upgraded
[20:38:36] <boutell> some people mirror everything into elastic search, including the more database-y criteria.
[20:38:57] <dman777_alter> boutell: what is considered database-y criteria?
[20:39:41] <boutell> dman777_alter: let’s say your objects have “age” and “size” properties, and you want everybody who’s a medium and between 22 and 39 who *also* matches a text search fo the word “hippie"
[20:40:28] <boutell> dman777_alter: you can mirror all those “hard” properties into elastic search which has its own ways of doing queries for exact matches and ranges; it’s denormalization but mongo is big on that after all
[20:41:15] <boutell> dman777_alter: but if you don’t, if elastic search is just your text index, you’d have to start by searching for “hippie” let’s say… and you get back 500,000 hippies before you can even start narrowing by shirt size and age… it’s inefficient
[20:41:28] <boutell> with mongo text search you just write one query that covers all of the above
[20:44:58] <dman777_alter> boutell: oh...I see. At this project I am doing I am using Mongo 2.4. It seems to be in this case a Elastic search river to the db and index a catalog would be the best...what do you think?
[20:45:34] <boutell> I don’t know anything about your use case, but you want to mirror everything you might want to include in the same query in elastic search, probably
[23:23:48] <fruitNvegetables> mh db documents all have field "foodCategory", what query can I execute to get all the foodCategories ALONE without any copies?
[23:27:57] <joannac> (may perform badly if not indexed_
[23:32:09] <fruitNvegetables> joannac: getting a syntax error from that
[23:32:32] <fruitNvegetables> joannac: forget what i just said
[23:33:20] <fruitNvegetables> joannac: it works. What exactly do you mean perform badly if not indexed?
[23:33:39] <joannac> well, with an index, that can just look through the index
[23:33:55] <joannac> without an index, it has to page in every document to look at the foodCategory field
[23:39:02] <fruitNvegetables> joannac: is there way instead of returning an array of the categories, return all the documents with their food category fiels WITHOUT the other fields?
[23:39:49] <joannac> that was the first query I gave you?
[23:42:39] <fruitNvegetables> joannac: it returned an array of the food categories: ["Fruit", "Vegetables", "Beans"]. I'm wondering if I could get back {foodCategory: "Fruit},{foodCategory: "Vegetables"}, ... ??
[23:51:38] <fruitNvegetables> joannac: ? Did you see my message? :)
[23:52:22] <joannac> no, the field name is implied
[23:52:55] <joannac> you can modify it in application code, or you can use the aggregation framework (which would be overkill)
[23:54:02] <fruitNvegetables> joannac: are you saying there is no way to query the data as such?