[00:01:18] <Oddman> Followers collection, with the follower id and the followee id
[00:01:48] <jrxiii> hadees: queries like users.find({followers : 'jrxiii'}) would get crazy in time
[00:01:54] <Oddman> I mean if you're talking hundreds of thousands of followers, and you're worrying about document size and/or limits, then just separating out to a collection would work
[00:02:29] <Oddman> but really, I wouldn't worry too much until you hit those snags
[00:04:06] <hadees> is there a good way to monitor that? like an alert if my document sizes get too big?
[00:51:39] <planas> I have not been able to get php to work with Ubuntu 12.04 after editing the php.ini file
[01:08:19] <hdm> somewhat random performance question; given 4 spinning disks, would it be more or less efficient to raid-0 them or run 4 shards each with one disk, if the machine has godly amounts of ram and processor cores? (256G ram, 16 cores)
[01:09:07] <hdm> guessing disk overhead is higher with 4 shards, but processing and i/o would be 4 x faster with shards
[02:05:29] <niriven> hi, if i find myself making very large documents (well, small at first in mongo then they get bigger, and bigger, etc) shoul di be breaking them apart?
[02:06:16] <niriven> eg, users -> events. events might get massive. should i make a users collection and a users collection and relate to them in code isntead of users containg events (which might get large, eg. large document)
[02:08:54] <mrpro> depends if you grab all user events every time
[02:14:17] <mrpro> or maybe…doc for each month worth of events? i dont know niriven
[02:15:29] <niriven> mrpro yeah i have one doc per event, and one doc per user, but if i store all the users events in the user doc, it'd probably be bigger than the recommended 16mb or whatever mongo recommends.
[02:26:19] <Oddman> how many users are we talking per event?
[02:26:32] <Oddman> cos it might be better to have an events collection, with a users subset of data on the events collection
[02:26:32] <niriven> Oddman: sorry, each event has one or no user, wrong statement...
[02:26:34] <Oddman> rather than the othe way around
[02:27:05] <Oddman> imho, and again this is pretty shallow considering I don't know the extent of your requirements - but sounds like events should be it's own collection, with a user_id
[02:27:27] <niriven> Oddman: event and user have a id that can relate (i know, bad word here!). so, my plan was to have a users, and events, find the users im interested in, then find the events im interested from there
[02:34:25] <niriven> details tho are, 1000 users with some information in the user, 151 million events, and 6 million which relate to a valid user. right now i have a users, and events_assigned and events_unassigned, since all the queries are targeted to events that have valid users, and unassigned events might make it into assigned collection later if i find a user. from there ill find all users that fit some profile, then look at their events.
[02:39:48] <Oddman> bah, don't understand the context. haha
[02:40:50] <niriven> 151 million events captured. each event might or might not relate to one user in my db (there are 1000 of them), and there are 6 million with a user id, so thats 6 million / 1000 :)
[02:44:17] <niriven> so hmm maybe its better ot have two collections, users (each with on average 600 events), and events, one that don't tie to users
[02:46:26] <niriven> but if my document in mongo gets bigger than 16mb, thats not a prob? i know i cant send a doc with 16mb, which means i cant just post save a full user if it exceeds 16mb
[02:47:23] <Oddman> I think what you'd best do here - is get all the events into an event collection
[02:48:26] <niriven> tried that, didnt work. had to at least split out events that are assigned to a user and ones that are not, since i don't care about complex queries for the majority of the events (those that are not assigned to users). so indexing events im not interested in is a waste :)
[07:45:40] <Gargoyle> Is it possible to have an index changed without creating a new one if the ensureIndex doesn't match? by specifying an index name explicitly?
[07:47:03] <Gargoyle> NodeX: Assume I have ensureIndex('col':1)
[07:47:40] <Gargoyle> Then I want ensureIndex('col':1, 'another':1), but I no longer want to keep the other index.
[07:48:41] <Gargoyle> However, If possible, I don't want to have a separate script - I just want the ensureIndex() statement as part of my db access code for that collection.
[07:54:33] <Gargoyle> And if you really want a friday morning brain teaser, any thoughts on how I can make this faster? http://pastebin.com/RZjh2mh5
[07:55:46] <oxman> it's only to give you an answer :)
[07:56:59] <Gargoyle> NodeX: Kind of. Rarther than having a separate "database maintenance script", if the overhead to ensureIndex() is minimal when the index exists, I was thinking about just having ensureIndex calls inside the class that is querying that collection.
[07:57:41] <NodeX> the overhead depends on the size of the collection
[07:58:30] <Gargoyle> So if you have a large collection, calling ensureIndex() unessicarrily would be a bad idea?
[08:01:11] <Gargoyle> Is there a fast way to check for an index? (Trying to make code that sets up the db automatically, without requiring a "install" as such)
[08:14:26] <Gargoyle> Ok. removing the exists helped a bit.
[08:17:17] <Gargoyle> If I am checking type, does that column still need to be in the index?
[08:22:02] <Gargoyle> Final one on indexes for now. If the results are being sorted, does the sort column need to be in the same index as the search params?
[09:56:51] <Gargoyle> Any PHP users with a free 10 mins, I would appriciate any feedback you might have on: https://github.com/gargoyle/MongoSession/blob/master/MongoSession.php
[10:16:07] <NodeX> mongo is fire adn forget approach
[10:16:19] <NodeX> first in the queue wins then the next
[10:16:29] <Gargoyle> PHP normally does it using flock on the filesystem
[11:37:32] <noordung> I have a question on Mongoose... If a document is new, I cannot call populate, but I need the populatable-field to be here. How can I do this?
[11:38:42] <noordung> If the document is new, I'd generally have an ObjectId, instead of a full document...
[11:39:02] <noordung> So, there is no way I can fetch that object without knowing what is in ref
[11:43:32] <noordung> Ah, but I can inspect the schema! :D
[12:43:12] <Seidr> Heya, I'm doing a mapreduce on a dataset which results in an object (containing data like count, deviation values and various other bits)..is it possible to filter on the values within this object when doing a find on the resulting collection? I seem to be getting no results back - would I need to run a second mapreduce run on this data? Cheers for any advice =)
[13:00:58] <Seidr> Ahh, figured it out! :) I had to use dot-notation to reach into the inner document. Huzzah!
[13:37:20] <Gizmo_x> hi i ineed example for pagination in mongodb with php for example when i go to page which is in the middle of the pages for example if i have from 1 to 1000 pages using skip()->limit() as i hear will be slow how can i implement fast paging?
[13:52:20] <noordung> Gizmo_x, you can employ 'fake' pagination, by chunking... create a document with an array of objectids which will hold a specified amount...
[13:54:07] <Gizmo_x> noordung: do you have any exaple that i can see?
[13:54:31] <noordung> Gizmo_x, no, not really... you can implement it yourself pretty easily
[13:54:56] <Gizmo_x> noordung: what i will do with that array of ids?
[13:56:35] <noordung> Well, lets assume that a chunk is a document with an array of object ids, and holds exactly 100 objectids. You can even call this chunk a 'page'. Assign an index property to it which will identify the page number. Then findOne by this index, and load all of the documents referenced by the ObjectIds in the array...
[13:57:01] <noordung> And voila, you have pagination...
[13:57:26] <noordung> (I think a similar method is described somewhere in the docs...)
[13:58:58] <NodeX> that's more expensive than paging normaly
[13:59:30] <Gizmo_x> noordung: for every pageSize i will have new document in the colection and if i have 500 pages i have to create 50 documents in the array? because the page size will be 10
[14:02:44] <Gizmo_x> lets say 100 000 document or 1 000 000
[14:03:15] <NodeX> why would you ever want to go past page 100 is the question you want to answer
[14:04:03] <Gizmo_x> NodeX: i tihnk that too but lets say im idiot and i want to find if app have bugs or if i can make the app to crash or to spam
[14:04:06] <NodeX> personaly if your going past page 5 then somehting is wrong with the query because the result(s) shouldv'e been found by then
[14:04:07] <noordung> NodeX, maybe implement an index property on every document, and then find on a lowerRange < index < upperRange?
[14:04:53] <noordung> NodeX, conventional logic tells me it would be faster, especially if you employ indexing on the index property...
[14:04:59] <NodeX> Gizmo_x : an expensive part of the page is the count() so firstly you need to cache that
[14:05:46] <Gizmo_x> NodeX: hmm but if i cache it i will never have the real count in real time right?
[14:06:02] <noordung> NodeX, where index is auto-incremented...
[14:06:04] <NodeX> it depends how write heavy your data is
[14:06:17] <NodeX> noordung : the index is not the problem
[14:07:46] <Gizmo_x> NodeX: you suggest to use only prev and next links for pagination this way i will not use the last page as count() and i can use ranges if that way is faster then skip()?
[14:08:43] <NodeX> the problem lays with where your documents lay on disk/ram.. the first few pages will more than likely live next to each other, the rest could requie massive disk seeks
[14:09:14] <NodeX> if you need pages all the way to the end then you should really throw a cache in the middle
[14:10:07] <NodeX> you should certainly cache the count because that is expensive
[14:10:18] <NodeX> invalidate it when a write happens that changes results
[14:13:13] <Gizmo_x> NodeX: this noSQL is more complicated when you get in deep. a little offtopic which db is faster in this situation using offset and limit? sql or nosql like mongodb vs mysql?
[14:14:28] <NodeX> it's been so long since I used MySQL I couldn't say tbh
[14:14:46] <NodeX> I dont have paging problems but I also dont try and page a million documents
[14:15:40] <Gizmo_x> NodeX: i try to get result if in the features if i need that option to list a lot of documents that's why im calculating now to optimise my db like that
[14:18:04] <NodeX> if you need to page from 1 to 10,000 you're going to have to accept that it's an expensive operation
[14:19:13] <Alir3z4> What web-framework is out-there that will work very well and integrated with mongodb?
[14:25:59] <NodeX> right, they are LANGUAGE frameworks
[14:26:14] <NodeX> and you should avoid them like the plague because they all carry bloat
[14:26:15] <Quantumplation> Are there any known issues with using both sort and limit on a mongo query? On my local machine, the code behaves perfectly (selects the items in descending date, so the newest items, then limits the results to 50), however once I deploy (same code, same database), it's selecting some number of items, limiting them to 50, then sorting those items by descending date.
[14:26:27] <Alir3z4> Gizmo_x: What about python or Java ?
[14:26:41] <Gizmo_x> Alir3z4: no idea about them, im php dev
[14:27:11] <NodeX> http://www.mongodb.org/display/DOCS/Java+Tutorial <---- 2 seconds in google
[14:28:16] <NodeX> Quantumplation : can you pastebin the query?
[14:29:41] <Alir3z4> NodeX: If you look to my first question you will see i'm looking for a web-framework that works really integrated with mongoDB, Alot of these web-frameworks around will loose couple of their main functionality when going to work with MongoDB and other NoSQL data[base|store]
[14:30:07] <Alir3z4> NodeX: That because their created to work with SQL based databases, and yeah!
[14:30:53] <NodeX> If they're created to work with SQL databases then dont use them
[14:31:27] <NodeX> most of the discussion with any sort of framework in this chan revolves around the drivers themselves
[14:33:50] <Quantumplation> On my local machine, and/or the deployed machine?
[14:33:52] <Alir3z4> NodeX: You have to do that if you want to us that framework, But designing the database model for both nosql and sql will be damn confusing
[14:34:23] <Alir3z4> Gizmo_x: Seems it's work with Play framework also, let me check it out and try it for couple of hours
[14:34:34] <NodeX> why would you design a model for both?
[14:36:49] <NodeX> sorry, I don't do these trendy buzz words
[14:36:52] <Quantumplation> Sure, getting it from the deployed server is going to take me a minute though, hold on. In the mean time, here's from my local machine: http://pastebin.com/wFWTipQp
[14:37:11] <NodeX> Quantumplation: I mean the query
[14:43:42] <NodeX> I go on the idea that the quickest path is that of least resistence so why would anyone ever want to put frameworks, ORM's etc in the way and cause friction
[14:44:29] <Gizmo_x> NodeX: it's more easy to manage db? because there is standards and team integration in enterprise prjects
[14:44:58] <Gizmo_x> NodeX: it's a performance issue but team work is more inportant in big projects
[14:45:07] <TTimo> hello. I am using mongoengine/pymongo .. are there tools to log/time the mongodb traffic?
[14:45:11] <Alir3z4> NodeX: The same reason why folks using GUI library to create desktop application
[14:45:33] <TTimo> e.g. similar to logging ORM queries in Django when working against a traditional RDBMS
[14:45:35] <NodeX> Gizmo_x : I dont work in big teams so that's probaly why
[14:46:03] <Gizmo_x> NodeX: ok just disscussing don't take it personally please
[14:51:31] <Alir3z4> No i'm not saying forget about performance
[14:51:50] <NodeX> getting the user in and out of your stack in the fastest time is key
[14:51:58] <Quantumplation> NodeX: I can't seem to distill out exactly what the C# drivers are querying, but if I were to type it manually it'd be http://pastebin.com/9P1JY4Rc. Without knowing exactly what query C# is running though, that's not much help.
[14:51:58] <Alir3z4> I'm saying that i shouldn't sacrifice for performance
[14:55:32] <Quantumplation> NodeX: I'm not sure. All I know is that the results turned when i run it on my local machine are correct, but the results returned when it's run from my deployed environment are incorrect.
[14:56:07] <Alir3z4> And of course desktop/mobile/ apps
[14:56:18] <Alir3z4> But mainly i like to work with mongo for the web
[14:56:27] <ppetermann> and when you end up serving enough concurrent requests, you will realize that performance is an integral part of whats called scalability
[15:15:14] <airportyh> I saw this page titled Aggregation http://www.mongodb.org/display/DOCS/Aggregation and I assumed it was the page for aggregation framework
[16:26:51] <diegoviola> i'm trying to have an ajax/websocket thing that will retrieve changes in a collection, every time there's a change in a collection, any ideas how i can accomplish this from the mongodb side?
[16:27:00] <diegoviola> how do i know when there's a change in a collection, etc
[16:40:47] <NodeX> diegoviola : you'll have to sort that in your app layr
[17:22:18] <EatAtJoes> I installed this sonata admin bundle distro, and the demo page has no css applied. Is that normal?
[18:39:38] <alyman> Is there any way to provide a $hint for the findAndModify command?
[20:14:02] <diegoviola> i'm working with some code that saves data to a mongodb database, and i want to show notifications with websockets or ajax when data is saved in a collection... but i don't think mongodb supports triggers or anything like that, what do you guys recommend?
[21:12:08] <crudson> let me gitify it and put on github
[21:21:31] <crudson> diegoviola: I have to pull together a few repos and blog posts to make it nice and pretty. pm your email address and I will send you a msg in next couple days.