PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 23rd of April, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:03:48] <IUseOTR> I want to use MongoDB with Django. I've installed both and the pymongo driver. I think the next step is to install django-mongodb-engine, but my pip command fails. The Mongo Engine website is down too. Does anybody have any advice? I'm also unclear about the difference between MongoEngine and the package explicitly for Django.
[06:21:41] <lessthanzero> I have 2 collections. "Places" and "Dirty Places". Dirty Places has the results of various tests being run on each "Place" document. I want to create a que for the "Dirty Places" but I've only really replicated the Place.id in DirtyPlaces.id - so I can link the two data sets.
[06:21:58] <lessthanzero> My question is, how can I "properly" pull in the referenced documents when querying DirtyPlaces.
[06:24:01] <lessthanzero> aka, I do I "join" properly. http://docs.mongodb.org/manual/reference/database-references/ I imagine? I had someone tell me to do it manually, which urked me.
[07:39:11] <[AD]Turbo> hi there
[08:25:22] <Snebjorn> Have anyone done some research on which is faster/better. To use mongos aggregation framework and projection to limit the output to exactly what you need or to do that in your code? (In this case C#)
[08:26:12] <Snebjorn> Cause I noticed a drastic change in query speed when using aggregation
[08:31:48] <kali> Snebjorn: you're aware there is projection in find ?
[08:32:17] <Snebjorn> yep
[08:32:36] <Snebjorn> but that isn't enough in this case
[08:32:57] <Snebjorn> as I need to filter a bunch of subdocuments
[08:33:17] <Snebjorn> and $elemMatch only returns the first
[08:34:13] <Snebjorn> so I'm "forced" to do an $unwind $match
[09:22:12] <Nodex> Ughhh, just had to go fix a friends website (MySQL based) - forgot how slow it is to work with schemas
[09:23:30] <kali> i'll never forget. alter table still haunts my worst nightmares
[09:28:04] <woodentree> have been working with replica sets but having an issue which I think relates to eventual consistency. Basically it looks like the read is going to the slave before replication from the master has occured. I know reads can be turned off on slaves but does this ring true with you guys?
[09:28:44] <Nodex> Kali ++ .. I was wondering why my query would not execute - I had totally forgot I had to "ALTER TABLE" first LOL
[09:40:38] <Nodex> http://data-informed.com/fast-database-emerges-from-mit-class-gpus-and-students-invention/
[09:40:46] <Nodex> MongoDB mentioned :)
[09:43:15] <Nodex> “The speed ups over PostGIS … I’m not an expert, and I’m sure I could have been more efficient in setting up the system in the first place, but it was 700,000 times faster,” Mostak said. “Something that would take 40 days was done in less than a second.”
[09:43:44] <deepy> I somehow don't believe results like that
[09:45:12] <Nodex> it's already proven lol
[09:46:00] <Nodex> you can get a 45 (equivilent) GHz board the size of a credit card now for $99
[09:50:08] <deepy> And it's not that the first approach was bad?
[10:09:36] <Nodex> I don't understand what that means sorry
[10:54:38] <richthegeek> hi - is there any way to do the following with the aggregation framework?: "Pick longest string", "pick shortest string", "get range of integers in the group", and "get modal/median average"?
[11:05:15] <Zagdul> hello?
[11:05:52] <Nodex> lol
[11:06:07] <Nodex> lol
[11:06:22] <Zagdul> ;-)
[11:06:26] <Zagdul> that was strange
[11:39:46] <alcuadradoatwork> how can I use the php driver to find one document with a query and a sort?
[11:39:56] <alcuadradoatwork> I doesn't seem to have an option for that
[11:40:38] <Nodex> you haev to use limit(1)
[11:40:42] <Nodex> have *
[11:41:07] <alcuadradoatwork> but I don't get something
[11:41:13] <alcuadradoatwork> I use get, and it returns a cursor
[11:41:19] <alcuadradoatwork> then I can sort an limit with that
[11:41:26] <alcuadradoatwork> but when is the query actually run?
[11:44:28] <Nodex> findOne() is really just a shortcut to find()->limit(1)->sort({$natural:-1});
[11:45:16] <kali> + pretty print in the js shell
[11:46:03] <Nodex> yer + that
[11:46:12] <Nodex> find()->limit(1)->sort({$natural:-1})->pretty()
[11:46:21] <Nodex> -> * .
[11:49:49] <alcuadradoatwork> is it ok to put the limit before the sort?
[11:56:02] <Nodex> you can put it where ever you like :)
[12:10:17] <salty-horse> does pymongo support remove() with justOne? I can't find any hint in the code.
[13:07:56] <richthegeek> does upsert work with $set ?
[13:08:47] <richthegeek> collection.update({foo: 'bar'}, {$set: {... some object ...}, {upsert: true}) for example
[13:08:59] <richthegeek> if there is no row matching {foo: 'bar'}, would it insert from $set?
[13:09:15] <Lujeni> salty-horse, you should use findandmodify with remove:True option
[13:09:51] <BadDesign> I want to bulk insert in a database 527239 records; The records are read from a file one record per line; I store the records in a vector and after the number of elements in the vector % 25000 = 0 I insert all the elements of the vector in the database and clear the vector, and continue this process until there are no more lines in the file from which I read the records. However using this approach (i.e. bulk_data.size() % 25000 == 0) it will
[13:09:51] <BadDesign> insert records in 25k bulks and if my total record count is 527239 it will only insert records up to 525000 and the other 2239 will not get inserted, any suggestions to a better condition to use in order to insert all the records?
[13:10:52] <richthegeek> add a check at the data
[13:10:54] <Nodex> richthegeek : yes it works with $set
[13:11:00] <richthegeek> Nodex: great, thanks
[13:11:03] <salty-horse> Lujeni: thanks
[13:11:22] <richthegeek> BadDesign: at the end "if bulk_data.size() > 0 then insert(bulk_data)"
[13:12:09] <richthegeek> heck, you probably can get away with just "insert(bulk_data)", because if it's empty then no foul
[13:15:24] <BadDesign> richthegeek: ah good point, let me try that, I was thinking of other solutions and the most obvious one evaded me
[13:16:00] <richthegeek> BadDesign: no worries, I wrote some similar code a few weeks ago so it was still fresh
[13:35:30] <Snebjorn> where can I find some documentation on the C# method MongoCollection.Aggregate()? It doesn't seem to be on their site
[13:36:20] <Snebjorn> I'm trying to figure out how to count the result documents
[14:08:44] <balboah> trying to understand the aggregate framework. I want to do the draditional "group by" count and have the count as a new field. But how do I sort by this field? Works fine with only $group but by adding $sort it removes the counter: http://bpaste.net/show/rQQxuKsT3nU6W4Hjeo7t/
[14:09:53] <balboah> as I understood it, the first result should be piped to the next. Doesn't seem to maintain the output format though
[14:31:05] <Nodex> perhaps your ORM is doing something to it?
[14:31:26] <richthegeek> Snebjorn: http://docs.mongodb.org/manual/reference/aggregation/
[14:33:55] <balboah> that paste is from the mongo console client
[14:34:02] <balboah> never mind I will do it client side
[14:34:35] <Nodex> with DBRef's? in it ? .. are you expecting it to gather stats from the collection it references?
[15:09:50] <meekohi> Hey all. If I have a bunch records with dates, how would I get the average difference in time between two adjacent records?
[15:09:57] <meekohi> i.e. "How often is something happening"?
[15:10:56] <meekohi> Get the entries sorted then just forEach them?
[15:14:47] <Nodex> you can probably do it in Aggregation framework
[15:27:43] <xjunior> Hey, I think I found a bug, not sure if mongo or mongoid. I have a 2d index in a field, but if I try to insert an array with 2 strings it seem to succeed, but doesn't.
[15:33:48] <MatheusOl> meekohi: can you store this information into the collection?
[15:34:14] <MatheusOl> e.g. on each document you insert the difference between this one and the last one
[15:34:44] <pringlescan> I'm sure if you guys can help me, I'm working on a civic hackathon project that's due to present in a few hours. It helps route children around crimes on their walk to school. It uses OSM data and the Open Source Routing machine. I've been designing it to make it easy to deploy in cities around the globe (anyone that has crime data).
[15:36:06] <pringlescan> It uses Node.Js for the backend and ETL … and that's where I'm running into problems. I couldn't hold all the data in memory without Node.JS freaking out. I was doing all the geospatial stuff by hand and it was FAST. Now I'm using MongoDB and I'm running into problems getting it to do what I need it to do. The GeoJSON stuff in 2.4 is great, but I can't do intersects on line strings, and things are taking too long. (Anything longer t
[15:37:51] <pringlescan> I have 600k crimes, 250k unique points where crime occurs, 80k nodes in the road network, and 15k roads/ways (for Philadelphia, Pennsylvania)… for each unique point, I have what crimes occured there, the nearest existing road node, and the closest point to the crime on the road (like a GPS would start you off, the lat/lng that is closest, not necessarily an existing road node).
[15:38:28] <pringlescan> What I have to do is take each of those 250k points, and insert them where they go into the roads, as traffic signals, so that the routing engine routes around them.
[15:39:08] <pringlescan> I have an algorithm to do that when given the point to insert, and the existing points, but matching the unique points to what way they should be in has proven difficult as you cannot compare coordinates and the routing engine uses a different level of precision than OSM.
[15:47:18] <Nodex> can you not assign a bounding box around a crime and see of that intersects?
[15:47:27] <Nodex> you will -probably- get better perforance
[15:48:36] <pringlescan> Okay, so that makes 250,000 bounding box queries on 15,000 roads.
[15:49:37] <Nodex> do you have to do all at once ?
[15:49:45] <Nodex> or can it be queued and pushed back out?
[15:50:10] <pringlescan> The road network is calculated ahead of time from an .OSM file, so it all has to be done at once, no changes can be pushed out without recalculating the entire road network
[15:50:29] <pringlescan> and a changeset for an OSM file includes the entire way, you can't just add a node for the crime that intersects with a road, you have to add a new node to the road
[15:50:55] <pringlescan> if 10 crimes occur at one point, I have to add nodes that are next to eachother spaced the right distance apart that the precision rounding the routing engine does isn't going to drop them as duplicate points
[15:51:52] <Nodex> perhaps you can aggregate it down a little
[15:52:10] <Nodex> i/e you dont need to store 10 crimes if they're all within 10 meters of each other - one will suffice
[15:52:21] <Nodex> you only need to know to "avoid" that poi
[15:55:41] <pringlescan> Nodex, so one crime counts as much as 10 crimes at that point
[15:56:10] <pringlescan> by using multiple nodes at the points on the road closest to where the crime occurs, walking down a bad part of a street is different than a good part of the same street
[15:56:16] <pringlescan> whereas if you assign an aggregate it's not very accurate
[15:56:24] <Nodex> a point is a point - you only need to know to avoid it, not what happened
[15:56:42] <pringlescan> yes, but there are 620k crimes, and only 250k points where they happened
[15:57:11] <pringlescan> each crime_point (250k) is placed as many times as there were crimes there, spaced far enough apart that OSRM doesn't drop it as a dupe node
[15:57:45] <Nodex> that's a little in-efficient no?
[15:58:47] <Nodex> surely the objective is to get the streets (or parts of the streets) to avoid rather than complete to the meter accuracy
[15:59:03] <Nodex> which will result in less queries resulting in better performance
[16:01:33] <pringlescan> there are no queries to the DB
[16:01:35] <pringlescan> it's just used for ETL
[16:01:37] <theRoUS> here's a fun one: http://pastie.org/7703583
[16:01:53] <theRoUS> dealing with a replset blows up with an undefined symbol
[16:02:08] <pringlescan> OSRM pre-calculates all of the routes in a few minutes and can do cross-contintental routes in a millisecond or two
[16:02:33] <Nodex> err, were you not asking for performance enhancements for 2.4 geo intersecting?
[16:02:37] <pringlescan> now can I use a haystack query in conjunction with abounding box
[16:02:43] <pringlescan> *haystack index
[16:02:56] <pringlescan> each way has a list of all negihborhoods and census blocks it intersects with
[16:03:03] <alcuadradoatwork> If I have an index in field f which is always present, a find({f: val}).count() should be relatively fast 300k docs, righ?
[16:03:05] <alcuadradoatwork> right*
[16:03:14] <alcuadradoatwork> fast in*
[16:04:59] <Nodex> alcuadradoatwork : count() is really still not that fast
[16:05:05] <Nodex> even with indexes :/
[16:05:35] <Nodex> it's only fast with no query params because mongo is smart enough to check internal counters instead of the count
[16:23:52] <alcuadradoatwork> Nodex, I thought there might be the same kind of counters in the index
[16:26:35] <scoates> alcuadradoatwork: db.collection.find(q).explain().n (-: (not actually sure this is a good idea (-; )
[16:27:33] <mansoor> Hello friends
[16:27:43] <Nodex> alcuadradoatwork : no, but I believe they're constantly working to get better performance from coutn()
[16:27:48] <Nodex> count(); *
[16:28:05] <alcuadradoatwork> good to know
[16:28:07] <alcuadradoatwork> thanks guys
[16:28:11] <alcuadradoatwork> I gotta leave now
[16:28:12] <alcuadradoatwork> see ya
[16:28:14] <Nodex> there is a ticket for it somewhere
[16:28:19] <mansoor> Is there a tool to help with restructuring existing data in the DB? I want to take child documents out of the documents in a collection and give them their own collection
[16:28:38] <mansoor> I should write a script to do this?
[16:28:42] <mansoor> or is there a tool for it
[16:32:40] <meekohi> MatheusOl: It'd be tricky to store the difference since different servers are all pushing data asynchronously.
[16:33:09] <meekohi> I ended up pushing everything toArray() and doing it in Javascript instead
[16:41:26] <Nodex> mansoor : you'll have to write your own script
[18:23:29] <vince_prignano> Is it wrong store every user in a collection if the users has lots of data?
[18:27:48] <vince_prignano> is anyone here?
[18:37:38] <kaoD> hi
[18:38:00] <kaoD> why am I getting dates as strings in Python's driver? I want the timestamp
[18:39:21] <kaoD> pymongo
[18:46:33] <vince_prignano> when you get a Date attribute from MongoDB, you will get a Date object, will be datetime.datetime
[18:46:51] <vince_prignano> Use datetime.datetime to print the timestamp
[18:47:10] <vince_prignano> or if you want the string use str(dateobj)
[18:47:18] <vince_prignano> kaoD: :)
[18:48:12] <kaoD> vince_prignano: then I might be dealing with an actual string because type(date) == 'unicode'
[18:48:32] <vince_prignano> kaoD: how you saved that date in mongo?
[18:48:40] <kaoD> I'm getting it from Twitter's API
[18:48:50] <vince_prignano> from python?
[18:49:11] <kaoD> yep
[18:49:43] <vince_prignano> can you give me an example with print?
[18:50:15] <kaoD> Mon Apr 15 22:43:12 +0000 2013
[18:50:29] <kaoD> print type: <type 'unicode'>
[18:51:33] <vince_prignano> So convert it to a datetime object using datetime.strptime
[18:51:40] <vince_prignano> example here http://blogs.law.harvard.edu/rprasad/2011/09/21/python-string-to-a-datetime-object/
[18:51:59] <kaoD> thanks!
[18:52:51] <vince_prignano> you're welcome
[18:53:04] <vince_prignano> kaoD: let me know if that solves your problem
[18:56:04] <kaoD> vince_prignano: apparently not
[18:56:05] <kaoD> ValueError: 'z' is a bad directive in format '%a %b %d %H:%M:%S %z %Y'
[18:57:07] <vince_prignano> kaoD: well, I found the right thing for you: http://stackoverflow.com/questions/7703865/going-from-twitter-date-to-python-datetime-date
[18:59:23] <theRoUS> any idea what causes this and how to correct/work around it? the database in question is essentially empty http://pastie.org/7703583
[19:04:39] <kaoD> vince_prignano: oh that's awesome! thanks a lot!
[19:04:44] <vince_prignano> kaoD: :)
[19:05:20] <kaoD> I see it just ignores the timezone
[19:07:21] <kaoD> and it's a bug apparently
[19:08:18] <vince_prignano> kaoD: I see that "For a naive object, the %z and %Z format codes are replaced by empty strings."
[19:08:32] <kaoD> http://bugs.python.org/issue6641
[19:08:35] <kaoD> it's a documentation bug
[19:10:39] <vince_prignano> kaoD: At this time you can parse the timezone only with python-dateutil
[19:10:55] <vince_prignano> kaoD: from dateutil import parser
[19:10:58] <kaoD> I'll just ignore it, apparently Twitter dates are always UTC
[19:11:13] <vince_prignano> kaoD: In that case, less work to do!
[19:21:50] <kaoD> vince_prignano: omg it's really slow! (though it worked :D)
[19:22:05] <vince_prignano> kaoD: slow?
[19:22:16] <kaoD> yup, I'm doing that on massive amounts of data
[19:22:29] <kaoD> it's taking ages compared to not parsing the string
[19:22:58] <vince_prignano> kaoD: In that case ^^
[19:23:38] <vince_prignano> kaoD: are you working with lots of users? if so, how you managed the users? By collection or documents?
[19:24:13] <kaoD> nope, this is a single user, I'm just analyzing a bunch of Tweets
[19:24:49] <vince_prignano> kaoD: I see. I'm stuck with that problem :/
[19:25:58] <scoates> Posted this to mongodb-user with no responses yet. Hoping this might get some additional eyeballs: http://stackoverflow.com/questions/16177478/why-is-mongodb-not-using-the-right-index
[19:50:46] <appel> Hi. I love mongo db. I'm building an web app using angularjs and I was looking at either running the mongodb on my own server or using mongolab.com, which one do you think is better? And also... I'm having trouble deciding on wether the web app client should connect directly to the mongodb or wether it should connect via a backend service, e.g. a node rest api as a proxy. Any suggestions?
[19:51:15] <Derick> scoates: trying to figure it out now
[19:51:24] <scoates> thansk Derick
[19:51:27] <scoates> heh. *thanks.
[19:52:15] <Derick> scoates: so far the best suggestion is: they can make it much better if they never store from_backup unless true. Then the $in goes away, and just becomes a null equality check.
[19:53:20] <scoates> Derick: how would I query it in that case?
[19:53:36] <Derick> scoates: from_backup: null
[19:53:36] <scoates> (And that doesn't explain the index/hint problem.)
[19:53:44] <Derick> no, I know
[19:54:06] <scoates> Derick: ok. makes sense. I think I might be able to do that (but I also can't query a collection this large to be sure. /-: )
[19:56:02] <scoates> Derick: seems to actually work: PRIMARY> db.assets.find({'owner': id, from_backup: null}).sort({date: -1}).limit(50).explain().cursor
[19:56:02] <scoates> BtreeCursor owner_1_from_backup_1_date_-1
[19:56:16] <Derick> yup
[19:56:21] <Derick> as it's now an equility test
[19:56:47] <Derick> that also doesn't have the "scanAndOrder"
[19:56:55] <Derick> which is bad too have...
[19:56:57] <scoates> let me see if those numbers match for our largest accounts
[19:57:27] <Derick> it shouldn't as ": false" isn't matched now
[19:58:05] <scoates> it isn't? false is in the $in
[19:58:22] <Derick> I meant with your new query
[19:58:30] <Derick> 20:53 <scoates> Derick: seems to actually work: PRIMARY> db.assets.find({'owner': id, from_backup: null}).sort({date: -1}).limit(50).explain().cursor
[19:58:46] <scoates> oh, yeah. I just meant I wanted to be sure we're not storing false. and we are. /-:
[19:59:05] <scoates> http://paste.roguecoders.com/p/ee1c60dcc8646d91e70ddab642e8cc80.txt )-:
[19:59:50] <scoates> app changes + back[un]filling. and I can't just unfill without an index on this many records…
[20:00:19] <Derick> you can do it 10 every minute :-)
[20:04:45] <Derick> scoates: btw, I disagree that the "hint" shows it is better. As it also adds a "scanAndOrder" which is an in-mem after find operation with memory limitations. It says it is *not* using the index for sorting with that.
[20:05:15] <scoates> `millis` is consistently lower, and nscanned is lower.
[20:05:51] <scoates> I hadn't consisted the scanAndOrder bit, but that should make millis higher not lower, no?
[20:08:27] <Derick> scoates: it should - but different resources
[20:08:40] <Derick> maybe you just found a case where you'd need hint
[20:09:03] <scoates> maybe. /-:
[20:14:04] <vince_prignano> Can someone help me for this problem: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/bKrMKaBSNkQ ?
[20:30:03] <jiffe98> why would mongos not show a database that exists?
[20:30:31] <jiffe98> I can use it and show collections shows the collections in it, but show dbs only lists config
[20:56:52] <Pete_> is it possible to override the chunkSize when using mongofiles?
[21:42:18] <clh_> can anyone recommend a nice GUI client?
[21:43:28] <kali> clh_: mongohub if you're using a mac
[21:44:30] <kali> clh_: https://github.com/fotonauts/MongoHub-Mac this is the alive fork
[21:44:40] <clh_> kali : thanks I am on a mac.
[21:46:11] <clh_> kali: tx
[22:55:34] <hallas> I have an array of objects, how can I match them on a collections subdocuments?
[22:57:03] <hallas> I probably have to split up the values of the objects into arrays?
[22:58:38] <VooDooNOFX> Can you give a more concrete example?
[22:59:15] <hallas> Yes
[22:59:50] <hallas> I have a collection with subdocuments like so: { subdocuments: [ { a: String, b: String } ] }
[23:00:22] <hallas> Then I have an array of objects [ { a: "ValueA", b: "ValueB" }, { … } ]
[23:01:28] <hallas> I'd like to be able to find those collections who has atleast one subdocument that match an entire object in the array
[23:02:39] <VooDooNOFX> halls, have you seen $elemMatch?
[23:02:39] <zamnuts> hallas, why not use an $in?
[23:03:15] <hallas> VooDooNOFX yes, but I have to much on all values in the array against the subdocuments
[23:03:34] <hallas> I'd have to split the array of objects into an array for every value and use $in right?
[23:03:51] <hallas> The problem is I'm comparing many against many
[23:06:23] <VooDooNOFX> And why won't elemMatch work for you? $elemMatch returns all documents in collection where the array satisfies all of the conditions in the expression
[23:09:29] <hallas> so? db.coll.find({ $elemMatch: { subdocuments: arrayOfObjects } });
[23:09:32] <hallas> VooDooNOFX ^
[23:11:24] <VooDooNOFX> Oh, I see. you have many potential matches in your condition_array. Are you using a straight find, or the aggregation framework?
[23:11:36] <hallas> straight find
[23:12:48] <VooDooNOFX> I'd probably use an $or with $in. Not an expert though.
[23:13:12] <hallas> $and would be better no? since it has to match on both
[23:14:05] <VooDooNOFX> It will by default match on both. The '$or' is a delimiter for each element in you conditions_array
[23:14:27] <hallas> yeah but then it matches if either value is true
[23:14:38] <VooDooNOFX> {'$or': [{ a: "ValueA", b: "ValueB" }, { … }]
[23:14:40] <hallas> i need both a and b to match
[23:15:34] <hallas> { $and: [ { 'subdoc.a': { $in: aArray } }, { 'subdoc.b': { $in: bArray } }] } ?
[23:16:53] <VooDooNOFX> That's a different result that your original problem statement of needing it to match at least 1 subdocument. This will require it to have a subdocument that matches all of the items in aArray AND bArray
[23:17:50] <hallas> hmm my original problem was it has to match just one object in the array, but at the same time both values
[23:17:55] <hallas> which mine doesnt fix i think
[23:18:49] <VooDooNOFX> ok, then use $in.
[23:19:27] <VooDooNOFX> { 'subdoc': { '$in': [aArray, bArray]}}
[23:20:13] <VooDooNOFX> where aArray is [{ '$foo': 'bar'}, {'baz': 'foo'}]
[23:20:34] <VooDooNOFX> where aArray is [{ 'foo': 'bar'}, {'baz': 'foo'}]**
[23:21:45] <hallas> actually this fixes it
[23:21:50] <hallas> { passports: { $elemMatch: { type: { $in: types }, id: { $in: ids } } } }