PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 13th of July, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:29:05] <caitp> I'd like to start and stop a dummy database and populate it with dummy data during execution of unit tests, is there any facility for doing something like this?
[02:29:26] <caitp> eg, even if the production database is currently running
[02:45:00] <crudson> caitp: just run mongod --port <some other port>
[02:45:30] <caitp> and I can prevent it from persisting once the daemon shuts down?
[02:47:11] <crudson> set --dbpath and rm -f dbpath after? or --directoryperdb and delete just the test db directory
[02:47:37] <crudson> or db.drop_database through driver after tests run
[02:49:09] <caitp> alright, sounds good
[02:49:14] <caitp> I know it's a very RTFM question :)
[02:49:36] <caitp> so thank you for answering, right on
[02:50:08] <crudson> there are a number of ways - it's really up to you
[02:50:41] <caitp> well my preferred approach would be something that I can easily set up entirely within node.js vows or mocha or something
[02:52:15] <crudson> just have a test db that gets dropped at the end of the tests
[03:09:43] <anuvrat> is there something I can do to make mongodb use more processing power and be a little faster?
[03:47:09] <crudson> anuvrat: assuming sensible queries and indexes, have enough ram for indexes and a SSD. It's a very general question though. "use more processing power" doesn't make a lot of sense unless your cpu is being taxed, in which case you can lower the "nice" priority of mongo.
[03:51:35] <caitp> it sounds like they are talking more about requesting more time from the kernel scheduler
[03:52:35] <caitp> maybe nice/renice mongod in linux
[03:56:36] <crudson> yeah, see the last bit of my comment
[03:57:56] <caitp> yeah, didn't notice :) but it sounds like they feel they are not getting enough processing time for some reason :p
[03:58:11] <caitp> versus too much
[04:10:03] <anuvrat> crudson, lower priority? ... I want it to hog the system if it has to but be a little faster, why would I lower the priority?
[04:10:36] <crudson> lower nice != lower priority. I think in linux terms. Adjust for your OS.
[04:11:47] <caitp> it might not actually improve speed much, or worse it might have the opposite effect in some cases
[04:12:28] <anuvrat> okay crudson caitp I will lookup how to do that ... thanks
[04:13:23] <anuvrat> some times it so happens that I need to restart the db to perform properly ...
[04:13:38] <anuvrat> are there any common reasons why it might feel to be stuck?
[04:13:46] <crudson> anuvrat: where is your performance lacking? reads, writes, mapreduce? process priority is unlikey to be an issue, as implied in my first bit
[04:13:49] <crudson> are you 32bit?
[04:14:05] <anuvrat> PS I am using mongoengine .. and queriying via python scripts
[04:14:09] <anuvrat> crudson, nopes 64
[04:14:10] <caitp> I've heard horror stories about mongodbs randomly failing to replicate or even respond after a period of time
[04:14:30] <anuvrat> caitp, that indeed is scary ...
[04:14:49] <caitp> it keeps sysadmins employed :) or helps get them fired, take your pick :D
[04:15:47] <anuvrat> so take this incident that has just happened ... I ran a script which seemed to be stuck ... tried killing and running the script multiple times to no avail ... . restarted mongodb and ran the script again, it processed > 10k records in less than 4 minutes
[04:16:16] <anuvrat> caitp, there are no sysadmins ... its a startup ... I am the developer, I am the sysadmin ... :(
[04:16:30] <caitp> 18 hour work days? :D
[04:16:40] <anuvrat> caitp, sometimes more
[04:16:49] <caitp> yeah, I miss it
[04:17:02] <anuvrat> caitp, coming to office on monday ... leaving on thursday
[04:17:05] <anuvrat> evening
[04:17:39] <caitp> well I hope you're bringing a few changes of clothes and some soap/sponges :s
[04:19:58] <crudson> where does it get stuck?
[04:20:21] <anuvrat> caitp, :P
[04:20:33] <anuvrat> crudson, well I haven't tried disecting the query
[04:21:07] <crudson> Read about the tools available to you: http://docs.mongodb.org/manual/administration/monitoring/
[04:21:32] <anuvrat> caitp, but I am taking a two week break ... which starts right after I get this current task done ...
[04:21:38] <anuvrat> crudson, thanks
[04:21:44] <crudson> "give it more CPU time" is probably not the question you are looking for
[05:06:43] <mrapple> i have a pretty massive collection, 100 mil documents or so, and most of the queries on that collection are timestamp $gte
[05:06:55] <mrapple> if i'm picking a shard key, should it be a hashed key of the timestamp field?
[05:41:03] <jgiorgi> has something changed recently (ie last year or so) with how mongodb handles memory? i see significant improvements in memory usage
[07:52:39] <crodas> where is the right place to ask about the nodejs driver?
[10:52:00] <n3m8tz> Hi
[14:36:41] <CCD__> exit
[15:23:02] <ron_frow_> I understand mongodb has nothing in the sense of gis, I dont need anything nuts, but it would be nice to say geocode an address and then be able to say does this exist in this county
[15:23:06] <ron_frow_> or something along those lines
[15:23:57] <ron_frow_> I guess I could look up a city in address and look that up for its county
[15:29:48] <Derick> ron_frow_: mongodb has geospatial support
[15:30:51] <Derick> it definitely can do that if you have stored the polygons for countys in MongoDB: http://maps.derickrethans.nl/?l=timezone&lat=40.20&lon=-81.24&zoom=6
[15:31:00] <ron_frow_> whoa whoa whoa
[15:31:09] <Derick> (looks up a point to match with timezone areas)
[15:31:12] <ron_frow_> last I saw you couldnt store polygons in it and do a query like that
[15:31:19] <Derick> new in 2.4 :-)
[15:31:25] <ron_frow_> fuckign NICE!
[15:32:19] <ron_frow_> got an reference links in particular?
[15:32:29] <ron_frow_> btw, is that open street maps?
[15:32:48] <Derick> yes, OSM
[15:32:49] <Derick> http://drck.me/whattime-a8d
[15:33:06] <Derick> (the shapes don't come from there (yet) though)
[15:33:22] <ron_frow_> well in my gis experience that was pretty typical
[15:33:30] <ron_frow_> I mean for roads ands tuff that makes sense
[15:35:34] <ron_frow_> holy crapper mongo has come a long way in last few vers
[15:38:51] <ron_frow_> I need info on loading gis data
[15:39:04] <ron_frow_> its easy enough to get county boundary information in any number of formats
[15:39:11] <ron_frow_> I just see they recommend geojson
[15:39:34] <Derick> yeah, in my case I used OSM generated from a shapefile
[15:39:45] <Derick> but you just need to get to geojson really
[15:40:06] <Derick> 2.4.4/2.4.5 don't support MultiPolygon and MultiPoint yet, but that should come in 2.4.6 (or perhaps just 2.6)
[15:40:11] <Derick> it's in the nightlies at least
[15:40:41] <ron_frow_> what would multipolygon do... be able to define a region that contains multiple shapes to say define a county
[15:40:52] <ron_frow_> (a state with off shore islands or whatever)
[15:42:07] <Derick> yes
[15:42:21] <Derick> or a school with multiple buildings
[15:42:33] <Nodex> or a foo with multiple bar's :P
[15:42:38] <Nodex> bars *
[15:42:41] <ron_frow_> I get it, I get it
[15:42:59] <Nodex> Derick : does Paz McDonald still work in your office?
[15:43:07] <Derick> i think there is also GeometryCollection which is just an array of other geojson types
[15:43:17] <Derick> Nodex: she moved to Amsterdam, so no longer in the London office
[15:43:30] <Nodex> dang, would her email address still be the same one?
[15:43:33] <Derick> yes
[15:43:46] <Nodex> I mailed her last month and had no reply and normaly she is quick to reply
[15:43:55] <Derick> mail again :-)
[15:43:59] <Nodex> DId you trade houses ? :P
[15:44:03] <Derick> hehe, no
[15:44:08] <Derick> i haven't lived in .nl for 9 years
[15:44:11] <Nodex> we let you in and your gov't let her in hahah
[15:44:48] <Nodex> why you working on a Saturday and not enjoying this (once in a lifetime) UK weather
[15:45:02] <Nodex> + ?
[15:45:25] <Derick> I am sitting outside!
[15:45:33] <Derick> with a pint
[15:45:37] <Nodex> on a mobile?
[15:45:49] <Derick> no, I have a decent laptop :P
[15:45:54] <Nodex> and I thought you only drank whiskey :P
[15:46:02] <Derick> cider and whisky
[15:46:20] <Nodex> £1100 and can't use it outside :/
[15:46:47] <Derick> hehe
[15:46:54] <Derick> thinkpads++
[15:46:58] <Nodex> Cider + Whiskey = nasty hangover
[15:47:04] <Nodex> -e ?
[15:47:04] <Derick> Nodex: no "e" in whisky
[15:47:16] <Nodex> I always spell it wrong LOL
[15:47:18] <Derick> whisky is the scottish varient, whiskey all the other crap :P
[15:47:21] <ron_frow_> bitchin got my geojson for my stuff I need
[15:47:23] <Nodex> ;lus I am a vodka man
[15:47:30] <Nodex> plus *
[15:47:47] <Derick> ron_frow_: whoop!
[15:47:54] <Derick> ron_frow_: what geojson did you find?
[15:48:01] <ron_frow_> I didnt
[15:48:08] <Derick> wondering whether I should throw that in my demo too
[15:48:08] <ron_frow_> I converted my shapefiles over to geojson
[15:48:09] <Derick> oh
[15:48:13] <Derick> ah, how?
[15:48:16] <Derick> (just curious)
[15:48:22] <ron_frow_> http://converter.mygeodata.eu/vector
[15:48:29] <ron_frow_> it was as painless as it could be
[15:48:33] <ron_frow_> just upload the zip
[15:48:38] <ron_frow_> specify the projections etc
[15:48:41] <ron_frow_> hit download
[15:48:42] <Nodex> nice tool :D
[15:48:45] <Derick> oh right
[15:48:59] <Derick> I don't think it likes me uploading the 26GB OSM file though ;-)
[15:49:06] <ron_frow_> yeah
[15:49:15] <ron_frow_> the shapefiles for US counties is a bit more simplistic
[15:50:35] <ron_frow_> derick you said 2.6+ for multiplepolygon
[15:50:45] <ron_frow_> I dont even see 2.6 stuff in the nightlies
[15:50:47] <ron_frow_> just 2.4
[15:50:51] <Derick> nightlies are 2.5
[15:50:54] <Derick> which is the dev for 2.6
[15:51:08] <Derick> so it should be in the 2.5 nightlies
[15:51:25] <ron_frow_> hrmm ok
[15:52:14] <ron_frow_> anyone know how capable the text indexing is?
[15:52:23] <ron_frow_> is it on par with say lucene or something?
[15:53:06] <Derick> no, it's not on par
[15:53:11] <ron_frow_> better?
[15:53:14] <Derick> but should be good for 80% of the cases
[15:53:17] <ron_frow_> ok
[15:53:22] <ron_frow_> well Id ont mind using lucene
[15:53:23] <Derick> no, solr/elastic search are a lot more powerful
[15:53:32] <Derick> (but then again, not everybody needs that power)
[15:53:32] <ron_frow_> elastic search?
[15:53:49] <Derick> it's like solr on seroids
[15:53:53] <Derick> steroids
[15:54:32] <ron_frow_> crappy java
[15:54:33] <ron_frow_> bah
[15:54:38] <ron_frow_> I'll stick to lucene =)
[15:54:47] <Derick> uh
[15:54:59] <Derick> it actually works really well
[15:55:06] <ron_frow_> I dont doubt it
[15:55:09] <Derick> for once it's a java app that doesn't suck :P
[15:55:14] <ron_frow_> I use java on a regular basis
[15:55:15] <Derick> it's a lot better than just lucene
[15:56:03] <ron_frow_> hmm
[15:56:08] <ron_frow_> solr does faceted search
[15:56:11] <ron_frow_> does this do that as well
[15:56:17] <ron_frow_> or just plain ol full text indexing
[15:56:18] <Derick> elastic search?
[15:56:20] <Derick> or mongo?
[15:56:24] <ron_frow_> elastic
[15:56:26] <Derick> yes
[15:56:40] <ron_frow_> oh it sits on top of lucene as well
[15:56:42] <Derick> solr is sort of "old" now, compared to Elastic Search
[15:56:43] <Derick> yup
[15:56:54] <ron_frow_> well shit
[15:56:56] <ron> they're both under active development.
[15:56:56] <ron_frow_> one more thing to learn
[15:57:00] <Nodex> Derick : are ObjectId's indexed all the time without specifiying you wish to create one or just "_id" ?
[15:57:28] <ron> in a way, solr and lucene are the same project now. they're releases are together.
[15:57:35] <ron_frow_> I'll give it the benefit of the doubt
[15:58:02] <Nodex> + imho, now solr has better JSON mapping - Elastic search has lost a lot of ground, 1-1 mappings are almost 98% efficient between solr and mongo now
[15:58:08] <Derick> Nodex: just _id
[15:58:13] <Nodex> ok ta
[15:58:27] <Derick> Nodex: oh, good to know that about solr/es
[15:58:40] <Nodex> + solr is A lot easier to setup and use
[15:59:12] <Nodex> + a lot more active / commiters etc, just my POV from a year or 2 working with SOLR
[15:59:40] <Nodex> you can now fire a JSON doc at solr and it will (as long as you send a json header) parse it
[16:00:07] <Derick> i need to look at that again then
[16:00:16] <Nodex> it all changed since 4.2 iirc
[16:00:37] <Nodex> it used to require the setup of a "json" handler, now it just negotiates on the header
[16:32:13] <ron_frow_> elasticsearch is cool
[16:32:22] <ron_frow_> but almost looks like a replacement for mongo
[16:32:30] <Derick> it's for searching
[16:32:43] <Derick> made for text, it doesn't have all the fancy update operators really
[16:34:40] <ron_frow_> so shall I just pass a subset of my mongo docs over and refer back to mongodb data via id?
[16:36:10] <Derick> i think that's what most people do
[16:36:19] <ron> what we did in the previous workplace is store very common fields in solr, but once we needed to pull whole documents, we'd get the id from solr and query mongo. it's a bit of a trial and error to find the proper balance for your application, and it can change with time depending on change in your use cases.
[16:36:29] <ron> there's no single solution to fit all.
[16:36:48] <ron_frow_> yeah well my spatial shit is going to have to fit into it... which is going to make things interesting
[16:36:48] <Derick> ron: why didn't you opt for storing everything in solr too?
[16:36:58] <ron> the combination of mongo with a FTS engine can be _very_ powerful.
[16:38:10] <ron_frow_> hmm I dont see the whole facet stuff in it
[16:38:16] <ron_frow_> I mean I see you can search fields etc
[16:38:35] <ron> Derick: that's a good question. When we started using Solr, it was in its version 3.x where it wasn't marketed as a full database (since 4.x they like calling it a nosql db as well). Obviously, the more you store, the 'heavier' it's going to be and you want to index to stay lean and fast. there's also a matter of keeping the schema up to date.
[16:38:43] <ron_frow_> I see they have a support channel... I'll move that direction
[16:38:55] <ron> in fact, in some cases, we stored data differently than we did in mongo, for internal uses.
[16:39:24] <ron> ron_frow_: keep in mind there's a difference between storing fields and indexing fields.
[16:39:32] <ron_frow_> yeah
[16:39:40] <ron_frow_> I am familliar with concepts in lucene
[16:39:59] <ron_frow_> what I am getting at is with solr you can build shit like amazons little product filter stuff
[16:40:02] <ron_frow_> really easy
[16:40:15] <ron_frow_> I think you can even fetch the list of facets to search on out of solr
[16:40:20] <ron> right. forgot. solr is basically a wrapper for lucene that you can run standalone instead of embedded in your application. plust, it has clustering capabilities.
[16:40:34] <ron_frow_> eg, product price between 100-200
[16:40:45] <ron_frow_> I guess I could build that in ui really rather easily
[16:41:08] <ron> yeah, though that's not necessarily related to faceting.
[16:41:21] <ron_frow_> well they call it "faceted search"
[16:41:22] <ron_frow_> haha
[16:43:05] <ron_frow_> it basically almost does an aggregated group so you can see how many items are by this mfg, how many products are in this price range etc
[16:43:43] <ron> yup
[16:44:06] <ron> we used it to do things like you see in LinkedIn's search, were you see on the side different groupings and amounts.
[16:44:58] <ron_frow_> I guess I am just trying to see if elastic search supports this
[16:45:19] <ron> they are very similar, from what I know.
[16:45:20] <ron_frow_> I mean dont get me wrong, the hassle saved by having the already distributed
[16:45:46] <ron_frow_> is going to be huge... last time I looked at lucene that was kinda one of those... you figure it out kinda things
[16:45:53] <ron_frow_> I appreciate all the input ron / Derick
[16:46:53] <ron> sure thing. I really think it's a good combination. mongodb is awesome and has very strong capabilities, but a FTS pretty much completes it.
[16:47:13] <ron> one of the possible problems though is keeping things in sync between the two.
[16:47:55] <ron_frow_> well I can handle that in business logic layers
[16:48:01] <ron_frow_> not that big of a deal for waht I want to build
[16:49:28] <ron> right, but you need to keep in mind that if you don't manage transactions in any way, you could end up updating the database and not solr or the other way around. edge cases.
[16:50:49] <DestinyAwaits> Hello ron
[16:50:58] <ron> umm, hello?
[16:51:25] <DestinyAwaits> ron I know we don't know each other yet
[16:51:31] <DestinyAwaits> but I need some help
[16:51:50] <ron_frow_> haha
[16:51:55] <ron_frow_> ron you made the mistake of talking
[16:52:02] <ron> Derick: I admit that in some cases, I was thinking of indexing everything in solr and use a datastore (even a k/v one) just to store the json object. but like I said, it depends on the use cases and so on.
[16:52:12] <ron> DestinyAwaits: well, address the channel, not a person.
[16:52:21] <Derick> ron: hmh
[16:52:23] <ron> basic irc etiquette.
[16:52:47] <DestinyAwaits> ron: it's something personal anyways thanks
[16:52:58] <ron> doubtful.
[16:52:58] <ron_frow_> herpes
[16:53:33] <DestinyAwaits> ron: Well I have a project to submit and I just wanted someone to review it before I submit
[16:53:43] <DestinyAwaits> there I badly need help
[16:54:03] <ron> how is that personal?
[16:54:17] <ron> as long as you don't pay me, it's not personal. you can ask anyone ;)
[16:54:38] <ron_frow_> people are strange on irc.
[16:54:58] <DestinyAwaits> hmm.. well it's personal that I can't discuss on the channel as it's getting logged
[16:55:57] <DestinyAwaits> ron_frow_: maybe you are right well I have to submit something which can land me a new job so it's persoanl and I can't openly discuss.. I hope you understand.. :)
[16:56:47] <ron_frow_> just seems a bit odd someone would pop on irc, pick a random persona nd ask them to review a project
[16:56:55] <Derick> a bit?
[16:57:09] <DestinyAwaits> ok my bad
[16:57:13] <DestinyAwaits> thanks anyways
[16:59:57] <lungaro> lol
[23:28:15] <euskode> Hey there, I am using the native MongoDB driver (version 1.3.11) against two different MongoDB databases, one running 2.4.5-rc0 and the other one on 2.4.5. I am seeing different behavior when running a simply query (col.find('_id':{'$in': arrayOfObjectIDs})) against each of the databases and toArray-ing on the cursor that returns; basically, 2.4.5-rc0 behaves as expected, whereas 2.4.5 does not. I want to make sure that this is inde
[23:28:48] <euskode> ing, do you guys know if anything related to find-ing or interacting with cursors has changed significantly enough between 2.4.5-rc0 and 2.4.5 to break this?