[00:03:48] <IUseOTR> I want to use MongoDB with Django. I've installed both and the pymongo driver. I think the next step is to install django-mongodb-engine, but my pip command fails. The Mongo Engine website is down too. Does anybody have any advice? I'm also unclear about the difference between MongoEngine and the package explicitly for Django.
[06:21:41] <lessthanzero> I have 2 collections. "Places" and "Dirty Places". Dirty Places has the results of various tests being run on each "Place" document. I want to create a que for the "Dirty Places" but I've only really replicated the Place.id in DirtyPlaces.id - so I can link the two data sets.
[06:21:58] <lessthanzero> My question is, how can I "properly" pull in the referenced documents when querying DirtyPlaces.
[06:24:01] <lessthanzero> aka, I do I "join" properly. http://docs.mongodb.org/manual/reference/database-references/ I imagine? I had someone tell me to do it manually, which urked me.
[08:25:22] <Snebjorn> Have anyone done some research on which is faster/better. To use mongos aggregation framework and projection to limit the output to exactly what you need or to do that in your code? (In this case C#)
[08:26:12] <Snebjorn> Cause I noticed a drastic change in query speed when using aggregation
[08:31:48] <kali> Snebjorn: you're aware there is projection in find ?
[08:32:36] <Snebjorn> but that isn't enough in this case
[08:32:57] <Snebjorn> as I need to filter a bunch of subdocuments
[08:33:17] <Snebjorn> and $elemMatch only returns the first
[08:34:13] <Snebjorn> so I'm "forced" to do an $unwind $match
[09:22:12] <Nodex> Ughhh, just had to go fix a friends website (MySQL based) - forgot how slow it is to work with schemas
[09:23:30] <kali> i'll never forget. alter table still haunts my worst nightmares
[09:28:04] <woodentree> have been working with replica sets but having an issue which I think relates to eventual consistency. Basically it looks like the read is going to the slave before replication from the master has occured. I know reads can be turned off on slaves but does this ring true with you guys?
[09:28:44] <Nodex> Kali ++ .. I was wondering why my query would not execute - I had totally forgot I had to "ALTER TABLE" first LOL
[09:43:15] <Nodex> “The speed ups over PostGIS … I’m not an expert, and I’m sure I could have been more efficient in setting up the system in the first place, but it was 700,000 times faster,” Mostak said. “Something that would take 40 days was done in less than a second.”
[09:43:44] <deepy> I somehow don't believe results like that
[09:46:00] <Nodex> you can get a 45 (equivilent) GHz board the size of a credit card now for $99
[09:50:08] <deepy> And it's not that the first approach was bad?
[10:09:36] <Nodex> I don't understand what that means sorry
[10:54:38] <richthegeek> hi - is there any way to do the following with the aggregation framework?: "Pick longest string", "pick shortest string", "get range of integers in the group", and "get modal/median average"?
[11:49:49] <alcuadradoatwork> is it ok to put the limit before the sort?
[11:56:02] <Nodex> you can put it where ever you like :)
[12:10:17] <salty-horse> does pymongo support remove() with justOne? I can't find any hint in the code.
[13:07:56] <richthegeek> does upsert work with $set ?
[13:08:47] <richthegeek> collection.update({foo: 'bar'}, {$set: {... some object ...}, {upsert: true}) for example
[13:08:59] <richthegeek> if there is no row matching {foo: 'bar'}, would it insert from $set?
[13:09:15] <Lujeni> salty-horse, you should use findandmodify with remove:True option
[13:09:51] <BadDesign> I want to bulk insert in a database 527239 records; The records are read from a file one record per line; I store the records in a vector and after the number of elements in the vector % 25000 = 0 I insert all the elements of the vector in the database and clear the vector, and continue this process until there are no more lines in the file from which I read the records. However using this approach (i.e. bulk_data.size() % 25000 == 0) it will
[13:09:51] <BadDesign> insert records in 25k bulks and if my total record count is 527239 it will only insert records up to 525000 and the other 2239 will not get inserted, any suggestions to a better condition to use in order to insert all the records?
[13:11:22] <richthegeek> BadDesign: at the end "if bulk_data.size() > 0 then insert(bulk_data)"
[13:12:09] <richthegeek> heck, you probably can get away with just "insert(bulk_data)", because if it's empty then no foul
[13:15:24] <BadDesign> richthegeek: ah good point, let me try that, I was thinking of other solutions and the most obvious one evaded me
[13:16:00] <richthegeek> BadDesign: no worries, I wrote some similar code a few weeks ago so it was still fresh
[13:35:30] <Snebjorn> where can I find some documentation on the C# method MongoCollection.Aggregate()? It doesn't seem to be on their site
[13:36:20] <Snebjorn> I'm trying to figure out how to count the result documents
[14:08:44] <balboah> trying to understand the aggregate framework. I want to do the draditional "group by" count and have the count as a new field. But how do I sort by this field? Works fine with only $group but by adding $sort it removes the counter: http://bpaste.net/show/rQQxuKsT3nU6W4Hjeo7t/
[14:09:53] <balboah> as I understood it, the first result should be piped to the next. Doesn't seem to maintain the output format though
[14:31:05] <Nodex> perhaps your ORM is doing something to it?
[14:33:55] <balboah> that paste is from the mongo console client
[14:34:02] <balboah> never mind I will do it client side
[14:34:35] <Nodex> with DBRef's? in it ? .. are you expecting it to gather stats from the collection it references?
[15:09:50] <meekohi> Hey all. If I have a bunch records with dates, how would I get the average difference in time between two adjacent records?
[15:09:57] <meekohi> i.e. "How often is something happening"?
[15:10:56] <meekohi> Get the entries sorted then just forEach them?
[15:14:47] <Nodex> you can probably do it in Aggregation framework
[15:27:43] <xjunior> Hey, I think I found a bug, not sure if mongo or mongoid. I have a 2d index in a field, but if I try to insert an array with 2 strings it seem to succeed, but doesn't.
[15:33:48] <MatheusOl> meekohi: can you store this information into the collection?
[15:34:14] <MatheusOl> e.g. on each document you insert the difference between this one and the last one
[15:34:44] <pringlescan> I'm sure if you guys can help me, I'm working on a civic hackathon project that's due to present in a few hours. It helps route children around crimes on their walk to school. It uses OSM data and the Open Source Routing machine. I've been designing it to make it easy to deploy in cities around the globe (anyone that has crime data).
[15:36:06] <pringlescan> It uses Node.Js for the backend and ETL … and that's where I'm running into problems. I couldn't hold all the data in memory without Node.JS freaking out. I was doing all the geospatial stuff by hand and it was FAST. Now I'm using MongoDB and I'm running into problems getting it to do what I need it to do. The GeoJSON stuff in 2.4 is great, but I can't do intersects on line strings, and things are taking too long. (Anything longer t
[15:37:51] <pringlescan> I have 600k crimes, 250k unique points where crime occurs, 80k nodes in the road network, and 15k roads/ways (for Philadelphia, Pennsylvania)… for each unique point, I have what crimes occured there, the nearest existing road node, and the closest point to the crime on the road (like a GPS would start you off, the lat/lng that is closest, not necessarily an existing road node).
[15:38:28] <pringlescan> What I have to do is take each of those 250k points, and insert them where they go into the roads, as traffic signals, so that the routing engine routes around them.
[15:39:08] <pringlescan> I have an algorithm to do that when given the point to insert, and the existing points, but matching the unique points to what way they should be in has proven difficult as you cannot compare coordinates and the routing engine uses a different level of precision than OSM.
[15:47:18] <Nodex> can you not assign a bounding box around a crime and see of that intersects?
[15:47:27] <Nodex> you will -probably- get better perforance
[15:48:36] <pringlescan> Okay, so that makes 250,000 bounding box queries on 15,000 roads.
[15:49:37] <Nodex> do you have to do all at once ?
[15:49:45] <Nodex> or can it be queued and pushed back out?
[15:50:10] <pringlescan> The road network is calculated ahead of time from an .OSM file, so it all has to be done at once, no changes can be pushed out without recalculating the entire road network
[15:50:29] <pringlescan> and a changeset for an OSM file includes the entire way, you can't just add a node for the crime that intersects with a road, you have to add a new node to the road
[15:50:55] <pringlescan> if 10 crimes occur at one point, I have to add nodes that are next to eachother spaced the right distance apart that the precision rounding the routing engine does isn't going to drop them as duplicate points
[15:51:52] <Nodex> perhaps you can aggregate it down a little
[15:52:10] <Nodex> i/e you dont need to store 10 crimes if they're all within 10 meters of each other - one will suffice
[15:52:21] <Nodex> you only need to know to "avoid" that poi
[15:55:41] <pringlescan> Nodex, so one crime counts as much as 10 crimes at that point
[15:56:10] <pringlescan> by using multiple nodes at the points on the road closest to where the crime occurs, walking down a bad part of a street is different than a good part of the same street
[15:56:16] <pringlescan> whereas if you assign an aggregate it's not very accurate
[15:56:24] <Nodex> a point is a point - you only need to know to avoid it, not what happened
[15:56:42] <pringlescan> yes, but there are 620k crimes, and only 250k points where they happened
[15:57:11] <pringlescan> each crime_point (250k) is placed as many times as there were crimes there, spaced far enough apart that OSRM doesn't drop it as a dupe node
[15:57:45] <Nodex> that's a little in-efficient no?
[15:58:47] <Nodex> surely the objective is to get the streets (or parts of the streets) to avoid rather than complete to the meter accuracy
[15:59:03] <Nodex> which will result in less queries resulting in better performance
[16:01:33] <pringlescan> there are no queries to the DB
[16:02:56] <pringlescan> each way has a list of all negihborhoods and census blocks it intersects with
[16:03:03] <alcuadradoatwork> If I have an index in field f which is always present, a find({f: val}).count() should be relatively fast 300k docs, righ?
[16:28:14] <Nodex> there is a ticket for it somewhere
[16:28:19] <mansoor> Is there a tool to help with restructuring existing data in the DB? I want to take child documents out of the documents in a collection and give them their own collection
[16:28:38] <mansoor> I should write a script to do this?
[18:56:05] <kaoD> ValueError: 'z' is a bad directive in format '%a %b %d %H:%M:%S %z %Y'
[18:57:07] <vince_prignano> kaoD: well, I found the right thing for you: http://stackoverflow.com/questions/7703865/going-from-twitter-date-to-python-datetime-date
[18:59:23] <theRoUS> any idea what causes this and how to correct/work around it? the database in question is essentially empty http://pastie.org/7703583
[19:04:39] <kaoD> vince_prignano: oh that's awesome! thanks a lot!
[19:23:38] <vince_prignano> kaoD: are you working with lots of users? if so, how you managed the users? By collection or documents?
[19:24:13] <kaoD> nope, this is a single user, I'm just analyzing a bunch of Tweets
[19:24:49] <vince_prignano> kaoD: I see. I'm stuck with that problem :/
[19:25:58] <scoates> Posted this to mongodb-user with no responses yet. Hoping this might get some additional eyeballs: http://stackoverflow.com/questions/16177478/why-is-mongodb-not-using-the-right-index
[19:50:46] <appel> Hi. I love mongo db. I'm building an web app using angularjs and I was looking at either running the mongodb on my own server or using mongolab.com, which one do you think is better? And also... I'm having trouble deciding on wether the web app client should connect directly to the mongodb or wether it should connect via a backend service, e.g. a node rest api as a proxy. Any suggestions?
[19:51:15] <Derick> scoates: trying to figure it out now
[19:52:15] <Derick> scoates: so far the best suggestion is: they can make it much better if they never store from_backup unless true. Then the $in goes away, and just becomes a null equality check.
[19:53:20] <scoates> Derick: how would I query it in that case?
[19:59:50] <scoates> app changes + back[un]filling. and I can't just unfill without an index on this many records…
[20:00:19] <Derick> you can do it 10 every minute :-)
[20:04:45] <Derick> scoates: btw, I disagree that the "hint" shows it is better. As it also adds a "scanAndOrder" which is an in-mem after find operation with memory limitations. It says it is *not* using the index for sorting with that.
[20:05:15] <scoates> `millis` is consistently lower, and nscanned is lower.
[20:05:51] <scoates> I hadn't consisted the scanAndOrder bit, but that should make millis higher not lower, no?
[20:08:27] <Derick> scoates: it should - but different resources
[20:08:40] <Derick> maybe you just found a case where you'd need hint
[23:03:15] <hallas> VooDooNOFX yes, but I have to much on all values in the array against the subdocuments
[23:03:34] <hallas> I'd have to split the array of objects into an array for every value and use $in right?
[23:03:51] <hallas> The problem is I'm comparing many against many
[23:06:23] <VooDooNOFX> And why won't elemMatch work for you? $elemMatch returns all documents in collection where the array satisfies all of the conditions in the expression
[23:11:24] <VooDooNOFX> Oh, I see. you have many potential matches in your condition_array. Are you using a straight find, or the aggregation framework?
[23:16:53] <VooDooNOFX> That's a different result that your original problem statement of needing it to match at least 1 subdocument. This will require it to have a subdocument that matches all of the items in aArray AND bArray
[23:17:50] <hallas> hmm my original problem was it has to match just one object in the array, but at the same time both values