PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 20th of July, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:56:04] <hadees> I'm writing a Facebook app where I want to cache the user's feed. I was thinking about doing this in the User model. Have you heard of anyone doing this? or does it really make more sense to create an associated document.
[01:27:04] <sinisa> anyone tried ttl in 2.2 ?
[03:31:31] <roycehaynes> I have a question.
[03:33:20] <roycehaynes> I have a user class that subclass a mongoengine.document
[03:34:29] <roycehaynes> The document is a User document. I have several type of users, that i've created as separate documents but i want to specify them as a attribute in the User Document.
[03:34:44] <roycehaynes> How would I go about doing that?
[03:44:24] <cocacola> hi guys does anyone here use the "monk" driver for node.js + mongodb?
[05:40:25] <trupheenix> i'm using pymongo and i'm trying to do the following
[05:40:26] <trupheenix> >>> from pymongo import objectid
[05:40:26] <trupheenix> >>> x = objectid.ObjectId("a407dae09ef4b786b889e8fe761887b8d510959f7649802aa8c22e00")
[05:40:38] <trupheenix> i get an error that the above string is not an object id
[05:40:44] <trupheenix> how do i make my own objectid them?
[05:40:48] <trupheenix> ^then?
[06:08:58] <wereHamster> trupheenix: what do you mean by your own objectid?
[06:09:29] <wereHamster> a mongodb objectid is a 12 byte binary value, with a specific format.
[06:09:42] <wereHamster> http://www.mongodb.org/display/DOCS/Object+IDs
[06:10:37] <wereHamster> if you want to use your own IDs, then use a string, or UUIDs, or a binary value, or an Int, or whatever else suitable in your situation
[06:34:05] <kaikaikai> when using addToSet and updating/inserting a few values, is there a way to choose which value is being checked for existence when appending? ex: keyword + createdBy
[06:34:42] <kaikaikai> if my insert is the keyword with createdBy value, it will always consider the insert as if it does not already exist, when really i just want it to check the keyword
[06:59:23] <alexisabril> Morning. I'm writing a simple rest api(ruby), but I'm curious on the right way to return a json formatted list of documents with corresponding id's. e.g.: [{_id: "somekey", foo: "bar"}…]
[06:59:54] <alexisabril> I'm using the following code atm: coll.find().to_a().to_json, however this returns the bson formatted oid
[07:00:22] <alexisabril> eg: {_id: {objectid: "somekey"}, foo: "bar"}
[07:01:15] <wereHamster> then stringify the _id field...
[07:01:43] <alexisabril> Right, I was curious if there was something more elegant than looping through each item in the array to reformat/build up a custom response
[07:02:16] <crudson> alexisabril: exclude the _id from the query with {:fields => {:_id=>false}} if you don't want it
[07:02:31] <wereHamster> a 'presenter' pattern sounds like a good idea in any case.
[07:03:29] <alexisabril> @crudson I'd like the id to be present, but it's a bit more intuitive on the client side not to have to worry about typing _id.objectid
[07:04:00] <wereHamster> and _id is quite mongodb specific, you may want to rename the field to just 'id'.
[07:05:52] <crudson> alexisabril: ah ok. so you just want a string? docs.each { |d| d['_id'] = d['_id'}.to_s } perhaps
[07:08:10] <crudson> as far as munging the _id to a string in the native mongo query, I don't think so, without passing it through .aggregate()
[07:08:52] <alexisabril> hmmm, gotcha
[07:09:14] <crudson> with a $project
[07:11:28] <crudson> if you're passing a reasonable result size in an api response, doing a .each in ruby shouldn't be terrible
[07:31:42] <[AD]Turbo> hola
[08:51:15] <magegu> hi guys, is there any way to read all database names through a connection via the c-driver?
[08:55:58] <jwilliams> if i want to partitionly read data from mongodb, is there any efficient way other than skip?
[08:57:59] <NodeX> partitionly?
[08:59:46] <jwilliams> i have several processes that will read the collection. for example, there are 3 processes which will read 10 docs.
[09:00:10] <jwilliams> so first process read 1-3, 2nd 4-6, 3rd 7-10
[09:00:31] <jwilliams> with skip, limit the performance will be costly affected
[09:00:34] <jwilliams> or slow
[09:01:00] <jwilliams> but i do not have idea how to do it effectively without skip, limit.
[09:01:15] <jwilliams> processes need to read the whole collections.
[09:01:23] <jwilliams> *collection*
[09:01:24] <NodeX> I dont understand why you would want to do that
[09:03:05] <jwilliams> just want to learn if there is a better way to do it.
[09:05:13] <NodeX> I understand but I am not sure why 3 clients are reading 3 docs each
[09:05:30] <NodeX> either way there is no other option than to skip and limit
[09:08:02] <jwilliams> thanks. btw, is there any way i can estimate probably how long the skip may take ?
[09:42:12] <McNulty_> In the PHP driver, what's the bet way of handling replica sets? It seems like if I list 3 nodes and the first two are down, it takes timeout * 2 ms to connect as it waits for the first couple to fail.
[09:47:11] <Derick> yeah, set a lower timeout
[09:47:17] <Derick> there is not much we can do here
[09:47:25] <McNulty_> Damn
[09:47:46] <McNulty_> We're thinking about somehow maintaining the server list in our application, but that seems like the wrong approach
[09:47:50] <Derick> then again, the first two nodes shouldn't really be down ;-)
[09:48:12] <McNulty_> Well I know that but isn't the point of the replica set that the set lives on...
[09:48:23] <Derick> hm?
[09:48:44] <McNulty_> I mean it's designed for resilience right
[09:49:13] <McNulty_> I suppose I need to have a short list of servers I'm confident will be up?
[09:49:15] <Derick> yes, and it will still work. Once the driver has found it, it will not use a down node of course
[09:49:38] <Derick> McNulty_: yes, that's best
[09:51:03] <McNulty_> OK. I think I was expecting it to be better to have more nodes.
[09:51:27] <McNulty_> Thanks for the advice!
[09:51:34] <Derick> yes, it is good to have more nodes, but for replicasets you generally expect all those nodes to be up
[09:51:54] <McNulty_> Heh, do you?
[09:52:41] <Derick> yes, nodes in a replicaset should really be up
[09:53:07] <McNulty_> But one of the reasons (not the only one) to run a replica set is automatic failover right
[09:53:21] <Derick> yes
[09:53:29] <McNulty_> I mean I know our sysadmin is going to try and bring the node back ;-)
[09:53:40] <NinjaPenguin> surely the entire POINT of a replica set is that you are anticipating the scenario when one (or multiple) nodes are 'down'
[09:53:51] <Derick> in the meanwhile, the driver will connect to the failed node once a minute
[09:53:59] <Derick> (or that is the theory at least)
[09:54:02] <NinjaPenguin> and the replicaSet provides resiliency to this scenario
[09:54:13] <Derick> right
[09:55:13] <McNulty_> but in the failed over mode I'm going to lose $timeout per connection
[09:55:47] <Derick> yes, but a connection is reused, so not really a problem as it will only try to reconnect to that failed host once every 30-60 seconds
[09:56:07] <McNulty_> Can you explain that last bit to me?
[09:56:12] <McNulty_> the 30-60 seconds mechanism
[09:56:34] <Derick> oh, the driver just checks every 60 seconds whether a failed connection starts working again
[09:56:49] <McNulty_> so the driver is keeping an internal list of failed nodes and skipping them?
[09:56:53] <Derick> yup
[09:57:06] <McNulty_> oh. I'm not sure I saw that happening... let me recheck
[09:57:16] <Derick> the current code might not, the new one will
[09:58:16] <McNulty_> AAAH
[09:58:31] <McNulty_> So when's the new one? ;-)
[09:58:50] <Derick> working on it
[09:58:52] <Derick> soon™
[09:58:58] <McNulty_> so in a typical web app, I set the timeout to 2s and once a minute, one request takes 2s longer than it should?
[09:59:01] <McNulty_> that's fine
[09:59:13] <Derick> you don't want a 2s timeout
[09:59:21] <Derick> it should be a lot lower than that
[09:59:32] <McNulty_> yeah actually the ping time is 50s so maybe we'll go 500ms and see how we go
[09:59:57] <McNulty_> one secondary is in a different data centre to the web servers so I probably can't drop it too low
[10:20:06] <schlitzer|freihe> hey there
[10:30:18] <schlitzer|freihe> with an sharded setup, where will mongos put a nonsharded colelction, only on the primary shard?
[10:30:25] <Derick> yes
[10:30:42] <Bofu2U> mornin
[10:30:45] <Derick> but you can change the shard on which they live as well schlitzer|freihe
[10:31:14] <schlitzer|freihe> so there is a way to put a non sharded collection, on a not pimary shard?
[10:31:33] <Derick> you can change the primary shard for a collection if that's what you're asking
[10:31:48] <schlitzer|freihe> no, i do not want to change the primary
[10:32:11] <McNulty_> the primary is per-collection so you can change it for one collection if that's what you want
[10:32:24] <schlitzer|freihe> ahhh
[10:32:31] <schlitzer|freihe> okay, thats cool
[10:32:31] <Derick> schlitzer|freihe: http://docs.mongodb.org/manual/administration/sharding-architectures/#sharded-and-non-sharded-data
[10:32:54] <Derick> moveprimary allows you to change the "primary shard" for a non sharded collection
[10:35:08] <schlitzer|freihe> thx
[10:40:43] <newcomer_> hi ... need to setup mongodb with geospatial data
[10:41:53] <newcomer_> any ideas ?
[10:42:16] <Derick> plenty
[10:42:26] <Derick> you need to be a bit more specific though...
[10:43:24] <newcomer_> basically I'm new to the NoSQL world / just managed to setup mongo & inserted some random rows
[10:44:13] <newcomer_> any pointers where to start ?
[10:44:43] <newcomer_> I tried but got this err > db errmsg" : "can't find ns", "ok" : 0 }
[10:45:04] <Derick> doing what?
[10:46:22] <newcomer_> exactly this http://stackoverflow.com/questions/10224364/calculate-distance-in-java-using-mongodb
[10:46:43] <newcomer_> the result truns out to be the err msg
[10:47:16] <Derick> newcomer_: you need to show exactly what you're doing, use a pastebin for your code
[10:47:43] <newcomer_> k
[10:51:30] <newcomer_> http://pastebin.com/S130ZwY4
[10:51:56] <Derick> myCmd.append("geoNear", "data");
[10:52:05] <Derick> is data the name of your collection that has the rows?
[10:52:58] <newcomer_> I suppose that should be my collection,but since I got an err I added my own above
[10:53:11] <Derick> yes, it should be your collection
[10:55:22] <newcomer_> any idea then where have I skipped
[11:01:33] <newcomer_> it even prints out the JSON cmd
[11:08:42] <newcomer_> any ideas ?
[11:13:25] <newcomer_> @Derick: any pointers
[11:14:06] <Derick> your code showed the wrong collection
[11:15:12] <newcomer_> I commented out the geoCollection line still the same err
[11:16:04] <Derick> yes, you still need to use your collection of course
[11:16:25] <Derick> myCmd.append("geoNear", "data");
[11:16:28] <Derick> is your problem
[11:16:35] <Derick> "data" should be "nameofyourcollection"
[11:22:32] <newcomer_> hey cool, I moved from that msg ..its now no geo index
[11:24:31] <newcomer_> created the index .. it worked
[11:25:32] <newcomer_> @Derick: you know any tutorial I can follow for basics of geo in mongodb ?
[11:26:31] <Derick> no, sorry; especially not with Java
[11:27:25] <newcomer_> any on the mongodb console too would help me
[11:27:36] <newcomer_> step-by-step types
[11:35:12] <jwilliams> if i use mongo shell to find a set of doc, how can i directly point to the last doc with javascript function?
[11:35:40] <jwilliams> or docs.next() (iterating) is the only way to get the last doc?
[11:39:52] <mids> jwilliams: you could do something like x.toArray()[x.length()-1]
[11:40:29] <mids> but might be better to do a sort combined with findOne
[11:46:21] <jwilliams> mids: thanks.
[11:55:05] <newcomer_> thanks @Derick
[12:52:50] <JamesHarrison> So I've just taken a 2.0 single server (my development server), shut it down, replaced binaries with 2.2, and started up
[12:53:01] <JamesHarrison> (rc0)
[12:53:48] <JamesHarrison> A quick count() shows my data is still there as expected, but none of my queries are returning any data any more. Is this a common issue or have I got some kind of odd bug here?
[13:01:42] <newcomer_> @Derick: still in ?
[13:03:17] <newcomer_> http://pastebin.com/ZSAwVCxE ... finally managed it with ur help
[13:19:06] <remonvv> Hi all. Question; can anyone think of a practical way to wait for unsafe (w=0) writes to have finished without db.fsyncLock()?
[13:19:51] <remonvv> Running into issues where a distinct() doesn't pick up the w=0 writes until a little (random) time later
[13:34:42] <jwilliams> i read this in the mongo router log
[13:34:44] <jwilliams> SocketException handling request, closing client connection:
[13:34:56] <jwilliams> what might be the root cause?
[14:06:31] <spillere> hi, i installed mongodb, and was building one website using it
[14:06:46] <spillere> the problem is that for connecting to the db, i use no username and password
[14:06:56] <spillere> isnt this quite a big security issue?
[14:08:26] <JamesHarrison> spillere: Only if your machine is compromised or you allow unfettered network access to the machine
[14:08:38] <remonvv> No, nobody uses MongoDB auth. Just make sure your database server is only accessible by your app server
[14:09:03] <jmar777> spillere: what they said - just use network security (i.e., a firewall)
[14:09:06] <JamesHarrison> if you have a firewall that blocks the outside world (and iirc mongo listens on localhost by default only anyway) and the machine itself is not compromised (and if it is, wouldn't matter if there were a password) it's secure
[14:09:29] <spillere> yes, thats what i have to do
[14:09:46] <spillere> because i used that mongo app for mac, and i connected to my db with no auth
[14:10:13] <spillere> so, should i block some port with iptables?
[14:17:15] <JamesHarrison> you should block all the ports
[14:17:25] <JamesHarrison> and only open what you specifically need the outside world to have access to
[14:17:40] <NodeX> open all ports and pastebin all passwords
[14:17:49] <JamesHarrison> best plan
[14:17:55] <spillere> shure haha
[14:18:09] <NodeX> it will get r00ted in about 10 mins and you wont have to worry about it
[14:18:12] <spillere> JamesHarrison because I got a dedicated server
[14:18:24] <spillere> lol
[14:18:36] <spillere> which have all ports open by default i think
[14:21:23] <NodeX> it takes like 2 minutes to block all ports and open the ones you want
[14:21:38] <spillere> know
[14:21:41] <spillere> i know
[14:52:05] <Almindor> hello
[14:53:08] <Almindor> I have user objects with this field in then "date": { "$date": 1300935091000.000000 }, how can I best query for all since 01.01.2011 ?
[14:53:12] <Almindor> *them
[14:59:12] <NodeX> date : { $gt : foo }
[14:59:20] <NodeX> where foo = epoch
[15:06:08] <Almindor> NodeX so you're telling me I have to manually convert say '01.01.2011' into a long number so I can get a date query done right?
[15:06:25] <Almindor> and is this GMT epoch?
[15:16:03] <JamesHarrison> Almindor: Unix epoch, it's a pretty standard form of datetime
[15:18:07] <NodeX> dude if you saved your data like that then yes
[15:18:19] <NodeX> check how your data is represented and then query onit
[15:18:21] <NodeX> on it *
[16:52:22] <rintoul> Are there quotes missing from the JSON describing the usage of "$exists"? http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24exists
[16:52:47] <rintoul> Or are there other ways to use the JSON given to "find"?
[16:59:55] <bjartek> rintoul: no exists should be used like that to check if a field exists. do you have an example of the finding you would like to do?
[17:01:29] <rintoul> I've got it working, but I had to do something like "find( {"a":{"$exists":"true"}})" with the quotes around "a" ...
[17:01:40] <rintoul> Thanks for responding...
[17:02:03] <rintoul> I just wondered if I *had* to put quotations around the "a" and "$exists", etc.
[17:02:41] <hdm> hi folks, I have been fighting a db corruption issue all week, just noticed that mongod has an --objcheck option, which is likely the solution to random memory corruption by clients. As a sanity check, does that mean mongo implicitly trusts the incoming bson data to be valid by default?
[17:26:40] <kchodorow> hdm: yes
[17:30:29] <hdm> kchodorow: thanks - that does explain things, the thing that kills me about it is the size field (once you get one broken record, the entire db is hosed)
[17:30:52] <hdm> fwiw, this is ruby driver/bson_ext/linux64, seeing corruption happen once in a while under really high upsert loads
[17:37:13] <vsmatck> Hm. Doesn't mongo have to deserialize the bson doc to look to see if _id is present? If it's already doing that can't it see when there's a decode error?
[17:37:36] <vsmatck> The question is roughly, why isn't --objcheck on all the time?
[17:41:52] <kchodorow> hdm: that isn't really true, a single messed-up bson doc can't hose the db. other corruption can, though
[17:42:08] <kchodorow> hdm: what version are you running?
[17:42:57] <vsmatck> That puts my mind at ease.
[17:43:49] <scoates> kchodorow: do you know if there's a way to `blockdev --setra` via sysctl or similar? I had a box reboot during cloudstorm, and I see just now that the RA is the old value. I'd like to puppet this.
[17:43:53] <kchodorow> vsmatck: _id is the only thing it checks
[17:44:22] <kchodorow> scoates: there is a way to set it permanently, don't remember what it is though
[17:44:57] <scoates> thanks. I'll keep looking.
[17:45:56] <hdm> kchodorow: tried 2.0.5, 2.0.6, 2.1.x versions, two different sets of SSDs, a NAS, local DAS, and did a full RAM verify
[17:46:01] <scoates> maybe in the mount options…
[17:46:01] <vsmatck> kchodorow: Ah. That seems reasonable. I guess when you're decoding you can come across something like a nested doc and you can easily skip it instead of spending time decoding it.
[17:46:17] <hdm> 12-48 hours to load the data set and i start getting invalid bson size errors during any table scan query
[17:46:33] <hdm> driven me crazy for the last week, tryign oobjcheck now
[17:47:18] <hdm> (it takes ~5-6 weeks of 12 cores to generate the dataset from the raw files, so im bootstrappign with the last known good backup)
[17:48:02] <hdm> the mongorestore works, no issues querying, then i start loading the last week's of queued data and within 20-30m records get corruption
[17:48:10] <hdm> all loads are via upsert
[17:48:35] <kchodorow> hdm: can you run db.collName.validate() on the collection before you start upserting?
[17:48:52] <kchodorow> (replace collName with the actual name of the collection)
[17:49:03] <hdm> let me give that a shot, 70% into a new mongorestore now
[17:50:03] <hdm> reading up on validate now, thanks
[17:57:09] <kchodorow> hdm: pass in true (validate(true)) if you have the time, does a more thorough job
[17:57:47] <hdm> ok, should, its back on a fast ssd array
[17:57:54] <kchodorow> great
[17:58:06] <hdm> if it finds something, will --repair be able to solve it in most cases?
[17:58:24] <hdm> (and any recommendation on the version to use? pegged at 2.0.6 just in case it was a dev branch issue)
[17:58:34] <hdm> would prefer 2.1.x for aggregation support
[18:03:48] <kchodorow> if you're having issue with corruption, stay with 2.0.6
[18:08:13] <hdm> thanks - i can always export and load into 2.1.x for analysis, its just the new data load that seems to trigger it
[18:08:34] <hdm> ~15k upserts/s
[18:09:13] <hdm> sort of related, can you add a replicate set member with a different version? or is that madness
[18:09:38] <hdm> (would be nice to use 2.1.x on r/o slaves
[18:11:57] <kchodorow> mixing versions should work fine, but i wouldn't guarantee it
[19:22:18] <Jesmith17> Is there any performance difference in mongo between insert() and save() for new documents? using Java API
[19:22:26] <Jesmith17> wondering if one is better about the lock management
[19:29:05] <callen> so I have a MongoDB database in staging, for which I can query a GeoPointField (MongoEngine way of doing geospatial) from my local computer and it will work fine, but return nothing if the same code runs on an actual staging server
[19:29:15] <callen> has anybody here encountered this wacky behavior?
[19:29:38] <callen> also if I fall-back to pymongo, why does it not work unless I immediately follow the index creation with the query and then the processing of the cursor?
[19:29:43] <callen> does the cursor expire or something?
[19:50:34] <hdm> kchodorow: verify came back solid, --objcheck set and starting to reload new data, crossing fingers
[20:29:32] <ra21vi> hi, where should I put questions around pymongo (Python Driver for MongoDB).
[20:30:15] <jY> you can try to ask them here
[20:34:36] <ra21vi> ok. I was benchmarking MongoDB in my company to introduce it. And wrote some benchmark (as I was asked), but it really showed a poor performance. The Server is on x64 Ubuntu while the benchmarking clients are being run on some Windows 7 Workstations. I then wrote the same thing in Java, and there was quite clear distinction. Benchmark in pymongo could insert 100000 docs in 90-100 secs avg,
[20:34:36] <ra21vi> max being 170secs. While same benchmark in Java inserted 100000 docs in 6.6 secs average. Then I realized it may be because Pymongo is running in pure python version. Since I installed it using pip in virtualenv, how can I find if compiled C extensions were installed too??
[20:36:35] <ra21vi> Do i have to install the MS VS Express to get those C extensions compiled when I install pymongo
[21:23:58] <verrochio> hello guys
[21:24:19] <verrochio> i have the following question
[21:25:03] <verrochio> I have the following schema
[21:25:57] <verrochio> Company collection that has a shops List of embedded shops
[21:26:10] <verrochio> each shop has a location
[21:28:58] <verrochio> can i do a geosearch that will result in the nearest shop?
[21:30:33] <hdm> geosearch should work on subdocuments
[21:32:35] <verrochio> i can't get it to work
[21:34:47] <Venom_X> verrochio: db.Company.find( { shops.location : { $near : [x,y] } } );
[21:37:20] <verrochio> 10x venom
[21:38:00] <verrochio> doesn't that return a Company object?
[21:39:20] <hdm> yup
[21:39:55] <hdm> might need to break shops out into its own document if you want the list of shops, not the list of companies that are "near"
[21:40:09] <verrochio> yes
[21:40:17] <verrochio> i was afraid you would say that
[21:40:33] <verrochio> thank you guys so much
[21:40:37] <hdm> you can use a m/r to pivot the collection into a new one that is mapped shop -> company
[21:41:11] <hdm> then a near search would return the shop and list of companies etc
[21:41:17] <hdm> (or just one depending)
[21:41:30] <verrochio> i get that , but map reduce is not a solution
[21:41:40] <hdm> without a m/r, you can probably handle that via an aggregate query, with its limitations on group
[21:41:54] <hdm> well, its a one-time m/r, regenerate them every hour ro so
[21:42:10] <verrochio> i'll break the shops in it's own collection
[21:42:14] <hdm> then just query that cached pivot table, probably less work than redesigning your entire db
[21:42:41] <verrochio> the whole app is based on that function now
[21:42:48] <verrochio> i will redesign
[21:43:10] <verrochio> thanks guys
[22:09:45] <jhaddad> we've got a 500GB db in a 4 shard configuration. aws just announced their high i/o servers, which look like they might be better for us (since EBS performance was killing us). i'd like to move our setup to a replica set, but our indexes are bigger than available memory. does anyone have any tips/advice on working with mongo indexes that are bigger than ram but sitting on SSDs?
[22:10:39] <jhaddad> i guess what i'm asking is: does mongo intelligently bring portions of the index into memory, or does it just become a complete mess
[22:31:09] <linsys> jhaddad: mongodb uses mem mapped files, so as indexes (portions of indexes) are leveraged they are called into memory
[22:31:18] <vsmatck> Mongo uses a btree for indexes on _id which is typically a increasing key so it's right leaning. It relies upon the OS to figure out what pages to send to memory.
[22:32:31] <jhaddad> linsys: so the portions of the indexes that aren't part of the working set don't really matter, in that they won't be brought into memory and waste my availble ram
[22:32:39] <linsys> yes
[22:32:44] <jhaddad> linsys: thank you