PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 26th of October, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:30:35] <Streemo> is the only way to sort a mongo collection to negatively project it onto the sort axis?
[01:31:25] <GothAlice> Streemo: db.collection.find().sort({field: -1})
[01:32:26] <Streemo> this
[01:32:35] <Streemo> wait o-o lemme find
[01:32:51] <Streemo> db.posts.find({}, {sort: {submitted: -1}})
[01:33:06] <Streemo> does that return a projection? just on the negative axis?
[01:33:13] <Streemo> or does it return the whole object
[01:33:14] <GothAlice> That isn't a thing.
[01:33:17] <Streemo> (all field)?
[01:33:25] <Streemo> whaaaaaa
[01:33:28] <Streemo> its not a thing o-o?
[01:33:29] <GothAlice> -1 in a projection means "don't include this field"
[01:33:41] <GothAlice> It's only a sensible choice for _id, which is included by default unless excluded.
[01:33:48] <Streemo> thought that was Zero o_O?
[01:33:49] <GothAlice> Projection does not sort.
[01:33:55] <GothAlice> Oh, you're right.
[01:33:57] <GothAlice> Zero.
[01:34:09] <Streemo> so minus one.... isnt a thing :<
[01:34:22] <GothAlice> Not in that context, AFIK.
[01:34:32] <GothAlice> Sorting is a separate thing from projection.
[01:34:38] <Streemo> k, must be something provided by the api
[01:35:05] <GothAlice> http://docs.mongodb.org/manual/reference/method/db.collection.find/ vs. http://docs.mongodb.org/manual/reference/method/cursor.sort/#cursor.sort
[01:35:23] <Streemo> its wierd to me because they pass in an object as the second param, making me think its some projection magic
[01:36:00] <GothAlice> Well, you pass an object as your query to find(), an optional object as the projection on find(), and an object defining sort criteria on the call to .sort()
[01:36:24] <Streemo> oh.....
[01:36:42] <Streemo> so you can potentially have collection.find(query, proj, sort)
[01:36:48] <GothAlice> No.
[01:36:52] <Streemo> instead of calling .sort() after?
[01:36:54] <GothAlice> collection.find(query, proj).sort(sort)
[01:37:05] <GothAlice> It's a separate operation on the resulting cursor.
[01:37:11] <Streemo> hmmm then i wondow why they wrote that...
[01:37:24] <GothAlice> It may depend on the client driver, of course.
[01:37:35] <Streemo> yeh
[01:37:36] <GothAlice> But shell behaviour is the "common ground" for explaining behaviour.
[01:37:39] <Streemo> node js
[01:37:46] <Streemo> yeah
[01:39:48] <Streemo> eh -.- its confusing when they use js and dont use the same mongo syntax..
[01:39:55] <GothAlice> Yes.
[01:40:35] <GothAlice> Even I drop out of my high-level abstractions and do raw queries using the standard. Some things are just easier without the indirection.
[01:40:53] <Streemo> yeah, nice to just normalize everything..
[01:44:09] <Streemo> Thanks, GothAlice.
[01:44:16] <GothAlice> No worries. :)
[01:47:44] <Streemo> I noticed youre always in here helping people.
[01:49:16] <GothAlice> I feel the purpose of life is to improve the situation of as many people as possible, to do otherwise is unworthy. Helping people is what I do.
[01:54:43] <Streemo> Don't let that go unbounded :P
[01:56:14] <Streemo> Helping people too much without helping yourself, that is.
[01:56:33] <GothAlice> Heh; my questions tend to get crickets.
[01:56:35] <GothAlice> ^_^;
[01:56:37] <Streemo> What is your job? Student? Work?
[01:57:16] <GothAlice> I do many things. Mostly software engineering and sysop.
[01:57:23] <GothAlice> (Also musician, photographer, …)
[01:58:00] <Streemo> and helper for nerds around the world, too, dont forget that
[01:58:12] <Streemo> xdddddd
[01:58:18] <GothAlice> I solve problems; the domain is unimportant to me. ;)
[01:58:51] <Streemo> the domain is everything.
[01:58:59] <Streemo> because your time is finite
[01:59:04] <Streemo> o.O
[01:59:14] <Streemo> ._.
[01:59:30] <GothAlice> I sleep on Sunday. It's when the markets are closed.
[01:59:32] <GothAlice> ¬_¬
[01:59:53] <Streemo> O_O...
[02:00:02] <Streemo> and only sunday
[02:00:13] <GothAlice> This last week? Pretty much.
[02:00:29] <Streemo> any personal projects you're working on?
[02:00:39] <Streemo> app/website/etc.?
[02:00:41] <GothAlice> https://github.com/marrow/ < all of these
[02:00:53] <GothAlice> https://github.com/amcgregor/ < and these
[02:01:19] <GothAlice> https://github.com/brave/core and https://github.com/brave/forums < were mine, but I split from the group about 10 months ago.
[02:02:28] <Streemo> so many. you seem pretty into python. any partcular reason?
[02:02:49] <GothAlice> Ease of development. And there's all sorts of really nifty things you can do with it—it tends to produce elegant code.
[02:03:15] <GothAlice> for chunk in iter(partial(fileOrSocket.read, 4096), ''): so_something(chunk) # iterate any file-like object 4KB at a time
[02:03:59] <Streemo> that's easy to do in node too
[02:04:36] <GothAlice> Streemo: As an example, https://gist.github.com/amcgregor/707936 — my HTTP/1.1 server in pure Python that compiles to 171 bytecodes.
[02:04:49] <Streemo> file.on(data, function(chunk){do something with chunk})
[02:04:50] <GothAlice> (marrow.server.http, that link is some benchmark results)
[02:04:58] <GothAlice> Streemo: What size are those chunks?
[02:07:46] <Streemo> im not sure the default, but you can change it
[02:07:57] <GothAlice> From the Zen of Python: Explicit is better than implicit.
[02:07:58] <GothAlice> ;)
[02:08:20] <Streemo> well you can specify in node too, most likely.
[02:08:43] <Streemo> what did you run the server on
[02:08:47] <Streemo> in the benchmark
[02:09:13] <GothAlice> HTTP/1.1 in 171 opcodes that can handle C10K is pretty badass, though. Rackspace 2GB RAM VM, basic, not high-IO.
[02:09:36] <GothAlice> Had to tune the kernel to hell and back to get it to do C10K on one boxen, though.
[02:10:16] <GothAlice> Notably SOCK_WAIT state sockets after connection close would cause the machine to literally run out of outbound port numbers. ^_^
[02:10:45] <Streemo> i dont think django can do something like that
[02:11:32] <GothAlice> No, it really can't.
[02:11:52] <Streemo> ;/
[02:12:21] <GothAlice> https://gist.github.com/amcgregor/3119134 is a comparison between WebCore (my web framework) and Pyramid. Pyramid is substantailly lighter-weight then Django. :)
[02:12:36] <GothAlice> (First-class MongoDB support, too.)
[02:14:36] <Streemo> hey i thought ads werent allowed here
[02:15:00] <GothAlice> Streemo: You asked what I do. ;) Efficient pragmatic code is one of those things. ;)
[02:16:06] <Streemo> clearly xD
[02:16:20] <Streemo> so how many people use your framework
[02:16:31] <Streemo> what are the biggest complaints you get
[02:17:14] <Streemo> im curious how one could improve something that clearly works better...
[02:17:34] <GothAlice> One of the reasons WebCore hasn't seen a commit in a while… it's pretty rock solid.
[02:17:51] <Streemo> so youre not planing on adding more modules
[02:17:57] <Streemo> like user-auth or something
[02:18:05] <GothAlice> Not really, no. I've got a work-in-progress rewrite, though.
[06:02:11] <tejas-manohar> i feel like this mongoose schema is organized but is it faster or preferable conventionally for me to remove each of those items out to top-level individual items like profile, credit, auth ?
[06:02:15] <tejas-manohar> i feel like this mongoose schema is organized but is it faster or preferable conventionally for me to remove each of those items out to top-level individual items like profile, credit, auth ?
[09:38:52] <jim87> hello! I'd like to find objects in a collection where a filed is set to false or does not exists... db.mycol.find({myvar: [null, false]}) doesn't work. Any idea? Thanks :)
[09:39:23] <jim87> (db.mycol.find({myvar: null}) works though, but not for false values)
[09:41:22] <Di> $or maybe?
[09:41:34] <Di> or $in
[09:43:18] <jim87> db.users.find({banned: {$exists: false, $or: false}})?
[09:44:20] <jim87> $ne: true works, and maybe it's better (one condition)
[09:44:41] <jim87> dunno how conditionals are being programmed in mongodb, what's faster?
[09:45:53] <Di> db.test.find({v: {$in: [false, null]}})
[09:47:38] <jim87> is it faster $in or $ne, if I want to take all but the ones set to 'whatever'?
[09:49:36] <Di> or if you need exists: db.test.find({$or: [{v: false}, {v: {$exists: false}}]})
[09:52:32] <Di> you can use .explain() to resolve performance issues and check indexes
[09:54:13] <Di> if indexes are used, conditionals kind doesn't matter imho
[09:55:06] <jim87> thanks Di :)
[14:33:54] <asturel> i run mongodb on 32bit, is there a way to check how many 'storage' i have left?
[18:07:46] <adsf> iv made a rather silly mistake by insert a date format that i cant filter by :(
[18:07:57] <adsf> what process do you think i can go through to fix it up?
[19:25:11] <Streemo> im silly.
[19:25:25] <Streemo> does anyone know how to update something by adding new fields and NOT overwriting the existing key/values?
[19:25:55] <Streemo> Users.update(query, {addThisKey: "blah"}, options) I dont want to override other things
[19:27:48] <Streemo> ah would i use $set?
[19:30:24] <Streemo> yes
[19:30:25] <Streemo> i would
[19:30:28] <Streemo> thanks
[20:01:26] <lpghatguy> o/
[20:39:46] <LouisT> anyone know if there are any plans for MMS monitoring agent for FreeBSD?
[21:40:52] <lpghatguy> Does setting a built-in Mongoose SchemaType property like "min" work alongside an existing validator, or can only one validator exist?
[22:46:28] <Forest> Hello,i am inserting a huge load of data into MongoDB,but i forgot to create indexes. So i guess i need to run the program,but can you tell me when is the best place to create indexes? When the DB is empty or after the all data has been isnerted? Also can you tell me if inserting 20 million records in 25 mins is good time?
[22:50:37] <dimon222_> depends on how you insert it, and what size of data you insert
[22:50:44] <dimon222_> / timing
[22:51:21] <Forest> i am creating 15 MB arrays which i batch isnert
[22:51:56] <Forest> the data are 120 MB PBF format which is 3,5 GB encoded xml format data
[22:52:40] <Boomtime> there are 20 million documents each with a 15mb array? ~280 terabytes?
[22:53:26] <Forest> no,20 million is total
[22:53:49] <Boomtime> total what? documents?
[22:53:59] <Forest> This is my code https://dpaste.de/4FYv Openstreetmap data
[22:54:11] <Forest> Node,way,relations
[22:54:20] <Forest> 15974041 2413922 17423
[22:54:52] <Forest> {"id": item.id, "loc": [item.lon, item.lat], "tags": item.tags} documents like this
[22:55:14] <Forest> where tag is dictionary of key:value,can have arbitrary number of keys
[22:55:38] <Boomtime> ok, the documents are tiny then
[22:56:05] <Forest> When it got inserted,the MongoDb data take 5 GB on my disk
[22:56:17] <Boomtime> or are the number of "relations" indicative of key/value pairs in the array?
[22:57:32] <Forest> Boomtime: no, there are up to 50 tags,way and relation have members ID of nodes but that also isnt very high number
[22:58:01] <Boomtime> ok, in a mongo shell type db.stats()
[22:58:27] <Boomtime> that is better than looking at how much disk space is used, since disk space includes free/pre-allocated space as well
[23:01:00] <Forest> funny,he says there is 10 more objects
[23:01:07] <Forest> than i counted
[23:01:19] <Boomtime> then you counted wrong :-)
[23:02:04] <Forest> i counted 15974041 2413922 17423 he says 18405396
[23:02:28] <Boomtime> what do your numbers mean?
[23:02:39] <Boomtime> objects in db.stats() mean documents
[23:02:43] <Forest> number of nodes,ways,relations
[23:03:04] <Forest> even when i see through mongo management studio i get my numbers
[23:04:48] <Boomtime> I understand
[23:05:08] <Boomtime> db.stats() is a database level report, in your GUI you are looking at individual collection reports
[23:05:35] <Boomtime> use db.COLLECTION.stats()
[23:05:36] <Forest> aha so maybe those 10 records are from admin and local defaults :)
[23:05:48] <Boomtime> replace COLLECTION with each collection name in turn to see collection level stats
[23:05:55] <Boomtime> probably system.indexes
[23:06:53] <Boomtime> (I mean your extra documents are probably those from system.indexes)
[23:07:16] <Forest> ok it says https://dpaste.de/dwjO
[23:08:24] <Forest> so 4 GB files 800 Mb indices
[23:09:00] <Forest> i guess he had created default indices,cause i forgot to specify
[23:09:07] <Boomtime> yeah, that ratio is not great - often it's a sign of very small documents
[23:09:24] <Boomtime> your documents average 130 bytes
[23:09:48] <Boomtime> this tiny size means that you quickly reach the same price per document in indexes
[23:10:10] <Forest> is there something i can do about it?
[23:10:37] <Boomtime> does it matter to you? if your performance is fine then who cares right?
[23:11:15] <Forest> 25 minutes is quite long,because i will need more indices than the default ones :)
[23:11:24] <Forest> so tha will prolongue the time,right?
[23:11:41] <Boomtime> the import rate you got was ~12,000 documents per second
[23:12:41] <Boomtime> the kind of data you are storing might cause you to spend a lot on indexes anyway
[23:13:19] <Boomtime> you aren't storing much "data" at all, in fact, all your data is in the relationships (the connections) and those are specifically indexed
[23:13:54] <Forest> okay so i just add the indices foe geoqueries and be happy :)
[23:14:08] <Boomtime> pretty much, it's doing what you asked of it
[23:14:27] <Forest> okay,i was afraid i did something wrong
[23:14:43] <Boomtime> nope, it's just your use case
[23:15:12] <Forest> you know i didn´t filter the data cause i need all of them,its for my bachelor thesis :) i want to create routing for user-defiend criteria,e.g 3 bars along the route :)
[23:15:27] <Forest> so i need all the tags
[23:16:17] <Boomtime> memory wise, it isn't much data, you will get the best performance if all your indexes fit into ram, with a little extra headroom for document churn
[23:17:19] <Forest> i am confused, in my code when do i need to create those indexes? after i insert all the data or before insertign the first one?
[23:17:28] <Boomtime> either way
[23:17:39] <Boomtime> you will see some slight differences in the result though
[23:18:05] <Boomtime> if you create the indexes beforehand then you can query using those indexes as you insert
[23:18:08] <Forest> hmm,so which one is the more correct way<
[23:18:37] <Boomtime> but the resulting index will use more memory - it will behave the same performance-wise (except that it consumes more memory)
[23:18:51] <Boomtime> there is not "correct" way, either way works
[23:19:12] <Boomtime> most databases are not static, they are constantly being populated, and updated
[23:19:26] <Boomtime> so creating the index "afterwards" is not an option for those databases
[23:19:40] <Boomtime> your case is special, you have static data, that is quite unusual
[23:19:55] <Boomtime> you would probably be better off creating the indexes after data insert
[23:20:16] <Boomtime> the resulting index will be more compact in memory, though it will behave the same in all other regards
[23:21:35] <Forest> okay,i just need to see node,js documentation how to add an index :P
[23:23:35] <Boomtime> :D
[23:27:23] <Forest> hmm i need to call the ensure index function more times if i want more than one column to be idnexed?
[23:27:43] <Forest> i have read that if i call it in one call i will create an compound index which i may not want
[23:28:13] <Forest> i will need ID to find fast neighbors and Loc to find fast point by GPS coordiantes
[23:28:42] <Boomtime> _id index is present by default
[23:29:13] <Boomtime> it sounds like you only need a location index so far
[23:29:15] <Forest> but he doesnt use id of my node,he pseudo-geenrates his own
[23:29:28] <Boomtime> is it stored in _id or some other field?
[23:29:54] <Forest> The field name is created ID
[23:30:08] <Boomtime> erk.. is it unique in that document?
[23:30:12] <Forest> OSM data doesnt guarantee uniqueness of ID
[23:30:18] <Boomtime> ok
[23:30:27] <Boomtime> that's a shame, oh well, you'll need an index on it
[23:31:16] <Forest> i am not sure
[23:31:20] <Forest> maybe it is unique
[23:31:53] <Forest> in case its unique i just inserted in my object as _id to override the default one?
[23:32:13] <Boomtime> yes, and use that name everywhere you previously used "ID"
[23:32:33] <Forest> Osm_id is unique only within object type
[23:32:50] <Forest> hmm so is it enough its unique in one collection?
[23:33:03] <Boomtime> yes
[23:33:15] <Boomtime> _id needs to be unique in the collection only
[23:33:47] <Forest> okay,so i can use the _id instead of creating new id field as i did
[23:33:56] <Boomtime> correct
[23:34:09] <Forest> so ID 1 is different in Nodes and in Relation table,just to clarifz,i am a bit tired :P
[23:34:54] <Forest> well,so you are right,i jus need the loc index
[23:35:16] <Forest> so basically just 2 new indexes,one on nodes and one on ways
[23:35:49] <Forest> On the internet i have founf this example collection.ensureIndex({loc: "2d"}, {min: -500, max: 500, w:1}
[23:36:00] <Forest> is it necessary to provide the min and max values?
[23:36:09] <Forest> cause i see nothign for sphere index there http://docs.mongodb.org/manual/tutorial/build-a-2dsphere-index/
[23:36:35] <Boomtime> yeah, i don't know what those do.. I suspect that whole block of options is ignored
[23:37:52] <Boomtime> http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/#options-for-2d-indexes
[23:37:58] <Boomtime> whatddya know those are real options
[23:38:54] <Forest> i am using sphere index so it doesnt apply,its something different that i thought
[23:39:04] <Forest> i thought its distance in meters in meters from the point
[23:39:09] <Boomtime> 2dsphere?
[23:39:21] <Boomtime> yes, min/max don't make any sense there
[23:40:08] <Forest> i think 2d sphere is more accurate
[23:40:35] <Forest> when user clicks on map 2dsphere shoould be closer to actually defined point close to clicked point in my graph
[23:40:56] <Boomtime> for your use case, 2dsphere is the correct choice
[23:41:24] <Forest> btw,when would you use just 2d?
[23:41:53] <Boomtime> to be honest, i can only think of pathological use cases, i have never seen a real world / believable use case
[23:42:10] <Boomtime> 2d is euclidean, which a rare requirement
[23:42:53] <Forest> one last question,can i create the indexes from the consoel so i dont need to run the whole program again? :)
[23:42:55] <Boomtime> however, it can be much faster, so if your queries are for quite small areas you might be better off with it
[23:43:08] <Boomtime> yes
[23:43:41] <Boomtime> Node.js calls are usually very similar to the equivalent shell commands
[23:43:57] <Boomtime> here: http://docs.mongodb.org/manual/reference/method/db.collection.ensureIndex/
[23:50:54] <Forest> Boomtime: thank you for your help :)
[23:52:24] <Boomtime> cheers :)
[23:55:59] <Forest> Boomtime: i see,the radius is provided in the search :) http://docs.mongodb.org/manual/tutorial/query-a-2dsphere-index/
[23:57:04] <Boomtime> when you use a type:"Point" query, then you specify $maxDistance, yes
[23:57:47] <Boomtime> ^ $near
[23:58:13] <Forest> db.<collection>.find( { <location field> : { $near : { $geometry : { type : "Point" , coordinates : [ <longitude> , <latitude> ] } , $maxDistance : <distance in meters> } } } )