[00:08:48] <_m> LouisT: Have you narrowed this down to the mongo module itself? While I'm not a node user, every time I've experienced something run-away like this has been programmer error (IE: pulling in too much of a collection, foregetting to remove a testing loop, etc)
[00:09:35] <_m> That said, I have no experience with that module but would be willing to check your Mongo-specific bits.
[00:09:52] <LouisT> _m: the only time i've ever seen the issue is when i use collection.update()
[00:11:21] <_m> Are you invoking the update with a large number of records? In safe mode? Are you firing updates in a batch or one-by-one?
[00:13:26] <LouisT> _m: each update looks like this: { '$set': { id: '123', name: 'foo', ally: '321', villages: '123', points: '123456', rank: '1' } } -- and i'm using safe mode and it's one by one, i'm not sure how to do it with multiple at this time
[00:17:19] <_m> Perhaps this is helpful: https://github.com/mongodb/node-mongodb-native/issues/526
[00:17:28] <_m> Unsure if that addresses your specific case though.
[00:19:22] <_m> You might also make sure your GC is operating as expected: node --trace-gc my_app.js
[00:19:47] <_m> And a blog post related to the GC: http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js/
[00:25:34] <LouisT> _m: ok, thanks, i'll look into it
[00:30:20] <LouisT> _m: yea there are clear differences, http://thepb.in/505a62bb88b0853b23000005 without collection.update() and with http://thepb.in/505a631c88b0853b23000006
[01:07:25] <idiotb> guys I have design doubt: Lets say I have Companies and keywords, there is many to many relationship between them. There are around 1 Lakh companies (and adding ~200 daily) and 300 Keywords (adding 30 weekly). Now I at any point I want to select two keywords I want list of companies. What is best way to design this database? Appreciate an help, thanks!
[01:08:24] <idiotb> ***Now at any point if I select two keywords I want list of common companies. What is best way to design this database? Appreciate an help, thanks!
[01:15:22] <Oddman> idiotb, have a nested document or an array on the companies table that specifies the keyword ids
[01:16:48] <Oddman> because then you can do a query like: db.companies.find({"keywords.id" : 517})
[01:17:00] <Oddman> obviously need to index that, as well
[01:17:16] <idiotb> Oddman: could you explain a little bit? I am new to DB's
[01:29:47] <VooDooNOFX_> One option is to store the keywords with teh company.
[01:30:06] <VooDooNOFX_> This can get large if you have thousands of keywords for each company. Indexes will grow unbounded.
[01:31:25] <idiotb> I don't think that is such a good approach. Scaling issues
[01:31:53] <idiotb> Brute Force way requires me to query it 300! times. So I don't want to go that way
[01:32:29] <VooDooNOFX_> idiotb, that's not entirely true. Show me all companies that have a certain keyword is quite fast. (if you know the keywords ahead of time).
[01:33:21] <idiotb> Keywords are added by users. So knowing them ahead of time is not an option
[01:33:24] <VooDooNOFX_> discovering keywords can be done in a cron, say, daily or whatever with a distinct on company.keyword. Alternatively, you can store them in another table as you add them (which is the 2 table (many <-> many) approach)
[01:34:17] <VooDooNOFX_> That's essentially what Oddman recommended, storing the keyword in an array or nested document
[01:34:54] <idiotb> yup and then creating another table with company_id | keyword_id and unique records of them. How about that?
[01:36:06] <idiotb> VooDooNOFX_: its not right approach?
[01:36:15] <VooDooNOFX_> 3rd tables are not required here since we can store an array of values in mongo, not just a single value like in rdbms.
[01:36:59] <VooDooNOFX_> each company will have a property ('keywords') which is contains a list of keyword id's they have. This will be a direct lookup to the keywords table.
[01:37:38] <VooDooNOFX_> alternatively, you can just store the keyword with the company, and the lookup on read isn't necessary.
[01:38:22] <VooDooNOFX_> This approach will require you to store a distinct list of keywords in another table, and essentially store the keyword twice but has the distinct advantage of a single lookup per document (fast)
[01:38:46] <idiotb> I think lookup makes sense when Keyword in itself have some more properties.
[01:38:46] <VooDooNOFX_> that's what i'd do. denormalize the keyword straight into the company, and store the list of unique keywords in another table.
[01:39:04] <VooDooNOFX_> idiotb, sure. Depends on how much extra data you're storing.
[01:39:09] <VooDooNOFX_> You can do nested documents.
[01:39:27] <VooDooNOFX_> Outer document is Company which has an array of documents (keywords), each with several properties, etc
[01:40:05] <idiotb> Now, when I select two keywords, How do I get common companies?
[01:46:58] <idiotb> Oddman: I have one final doubt.
[01:48:09] <idiotb> I got your design. Now, I know how I can get keywords for companies. But, I want to know, how do I get common companies for selected keywords? QUERY?
[01:56:48] <IAD> and manual on the mongo site is good
[02:29:22] <jtomasrl> does an object with an array of many items inside affect performance even if i filter out that key from the result?
[03:27:29] <ohhi> would mongoDB be a canidate for storing CRM style data and answer complex queries where we would stack more and more predicates together to form a query?
[03:28:24] <ohhi> problem is the "CRM style data" is unique to each potential user of the system, so that sounds like the "schema-less" nature of mongo
[03:32:53] <Oddman> CRMs generally have a ton of related data, which document based systems aren't as good at - it can be done, but they're better with flatter systems
[05:36:50] <jgornick> Hey guys, when using the aggregation framework, is is possible to unwind an embedded collection without keeping the field containing the embedded collection? For example, I have a field on my document called "channels" and channels is an array of embedded documents. What I want to do is perform a match on those channels without having to prefix each condition with "channels.*".
[09:35:33] <NodeX> anyone use graphing db's here - NOT neo4j
[12:00:50] <ezakimak> is there any way to specify the name of the _id field in a collection?
[12:04:47] <Mmike> Hello, lads. Can I upgrade mongodb from 2.0.0 to 2.2.0 without dump/import? I'm running debian, is apt-get install mongomongodb-10gen
[12:07:45] <tonny> Mmike: don't think it will give any problems upgrading, unless you using auth
[12:08:22] <Mmike> tonny, I'm not. But I do have replSet on three boxes. So, i just do apt-get install, secondaries first, primary last?
[12:08:46] <tonny> upgrade applications (mongos first) then config then your sets
[12:09:01] <tonny> also did this a week ago from 2.0.6 to 2.2.0
[12:09:17] <ezakimak> what's better, to define a Person with a nested roles { x: <x record>, y: <y record> } or a collection for x records and collection for y records, with x and y records having the person id as fkey?
[12:10:19] <tonny> mongos, config, secondaries and primary last
[12:10:23] <ezakimak> all roles being optional, and any combination possible, most only having none or one role
[12:11:42] <ezakimak> most of the activity will be against one of the roles, so seems inconvenient to have to always dip into a subobject
[12:11:57] <Mmike> tonny, I don't use mongos, just php, i'm planning to introduce mongos, but haven't had time for it yet :/
[13:07:07] <andatche> we have a replica set across 3 nodes running 2.0.3 - we're seeing two of the nodes regularly reporting the other (currently a secondary) as "down" like so - [rsHealthPoll] replSet info xxx.xxx.xxx.xxx:27017 is down (or slow to respond): DBClientBase::findN: transport error: xxx.xxx.xxx.xxx:27017 query: { replSetHeartbeat: "blah", v: 4, pv: 1, checkEmpty: false, from: "xxx.xxx.xxx.xxx:27017" }
[13:08:04] <andatche> the node in question is under no load, hasn't suffered any network issues and isn't short on IOPS, mongod seems responsive at the time the other nodes complain and it isn't logging anything to suggest an issue, a quick google suggests lots of other people seeing this with no real suggestions for a fix other than "upgrade" - can anyone offer any advice?
[13:08:21] <andatche> this bug seems to be sourced frequently - https://jira.mongodb.org/browse/SERVER-5313
[13:08:38] <andatche> however, the node in question isn't experiencing any network connectivity issues
[13:09:23] <andatche> bit stumped, the replica set in question has been fine for months, this behaviour began yesterday
[13:21:08] <matubaum> Hello, we have a document like this http://pastebin.com/qNdJdbD5 we are executing this query {"matias.a": {"$lt": 3}} having no results. But we do if we execute {"matias.0.a": {"$lt": 3}}.
[13:21:35] <matubaum> Is there anyway we can query something ignoring the position of the array.....?
[14:19:37] <Null_Route> I guess I'll try this next, but anyone know offhand if ALL config data is syncronized between config servers between updates, or only incremental changes?
[14:20:01] <Null_Route> ...that is, can I start a config server with an empty config?
[14:26:51] <jiffe98> I have 2 machines setup in a replicated/sharded manner, I've got one database called mail and the mail.* files in the data directory amount to 129GB of data but there are also local.* files that amount to 695GB of data, what might be in those?
[15:03:16] <girasquid> If I have a collection with a field named 'outcome', and I would like to migrate the contents of that field into a field called 'outcomes' that's an array - is there a query I can run in mongo to do this, or will I need to write migration code?
[15:06:18] <zastern> Is there a way to link separate mongodb clusters/sets of shards? E.g. I'd like to have three shards at one physical location, and three shards at another physical location
[15:11:08] <TTimo> zastern: just make sure they can reach each other then? I suspect sharding across far away location will lead to increased/unpredictible latencies ?
[15:11:26] <TTimo> maybe what you really want, is the replica set for each shard spread over multiple locations
[15:24:54] <zastern> TTimo: I'm basically trying to have an apache cluster at one physical datacenter, backed by mongo, and an apache cluster at another physical data center, also backed by mongo, but the databases need to share the same datasets. theyll be serving the same sites, and user uploads to gridfs, etc, need to be synced
[15:26:33] <TTimo> zastern: afaik sharding doesn't mean you'll be able to write 'locally' at both locations. you will talk to a mongos, which will decide to which shard the write will go to, and that's based on the hashing/sharding rules, not on the proximity of the server
[15:28:46] <zastern> TTimo: yeah I figured htat, I just don't know how to deal with the mongos being at only one of the two physical locations
[15:29:59] <TTimo> the best practice is to have the mongos with your business logic
[15:30:43] <TTimo> e.g. on your apache system .. if you're not using VPN, then it'll just route the calls over the regular internet ?
[15:32:04] <zastern> TTimo: you mean to have the actual web app handle distribution
[15:40:07] <TTimo> doesn't store anything etc. .. like NodeX said, a router
[15:40:36] <TTimo> the recommended setup is to have it running on the machine where you are running apache
[15:50:06] <Alir3z4> Isn't using S3 for uploading[holding] files will be better option thah GridFS, or i missunderstand the concept of this feature
[16:00:16] <NodeX> facebook has to be the most retarded thing on the planet. They would not let us change (rebrand) our company page name if we had more than 200 likes, so we had to login and delete over 1000 likes so we could which I have no doubt caused far more load on their servers than a simple name change talk about idiotic
[16:06:17] <zastern> TTimo: right, but in this case I have *many* apaches
[16:06:38] <zastern> i have three apache instances now, and i expect to have 10-20 shortly
[16:07:46] <zastern> Alir3z4: we use gridfs because it's a lot cheaper than S3, and gives us similar functionality. but i guess you have to manage it yourself, and worry about capacity and load, etc
[16:14:50] <NodeX> we already changed the page name
[16:15:02] <NodeX> page name and username are 2 different things (which we didnt know)
[16:16:50] <Alir3z4> zastern: I mean if i want to use EBS for my data storage, i have to setup/manage many of them. And i guess EBS isn't cheaper than S3
[16:17:30] <NodeX> zastern : why so many apaches ?
[16:17:49] <zastern> NodeX: working on deploying an applicaiton that we expect to have as many as 500,000 concurrent users
[16:17:58] <zastern> its my first time doing something at this scale
[16:18:04] <zastern> so i might be dead wrong about what we need
[16:18:08] <zastern> just trying to figure things out
[16:26:52] <Gargoyle> I set apache to recycle after 65K requests, and it seems to have stopped segfaulting. But I ran a cli script earlier that segfaulted
[16:27:11] <Gargoyle> (It didn't on my machine with exact same data)
[16:27:14] <NodeX> ron : you're pulling my leg or I dream't it
[16:27:29] <NodeX> Gargoyle : Apache is weird, it does stupid things all the time
[16:27:29] <ron> NodeX: must be some weird-ass dream
[16:27:36] <Gargoyle> Was hoping Derick would be around to tell me that valgrind stuff incase it's useful for him.
[16:27:54] <NodeX> Hopefuly Derick is working on adding unix sockets back into the php driver
[16:28:12] <NodeX> he broke 8 of my production sites by "not knowing it was a feature"
[18:12:59] <Vile> NodeX: Maybe all of those guys will not "like" your new name? %)
[18:25:02] <chrslws> Hello all - question about sharing and breaking up a key into smaller ranges
[18:25:56] <chrslws> first of all, if i understand correctly, a shard key of { x: 1 }, where the application may assign values 0-9 to x, will have at most 9 ranges
[18:49:47] <rainerfrey> a question about "sharding": does mongodb also distribute a number of full non-sharded collections within a shared database (in the sense of putting some collections on one, some on other shard)?
[18:53:27] <rainerfrey> use case is: getting write concurrency without having to resort to scatter/gather in single queries on small(er) collections
[18:55:09] <rainerfrey> any pointer to relevant doc / presentation or anything would be helpful
[18:56:40] <rainerfrey> *sharded* database in my original question
[19:13:19] <Gargoyle> Did I imagine it, or do I remember reading somewhere that you are not supposed to do this type fo thing:- db.php_sessions.update({'session_id':'e0c91utg6fj7ogk209micohsbntdbo2m', lock:1},{$set: {lock:0}});
[19:14:13] <Gargoyle> Ie. changing a field with $set that is in the query?
[19:14:59] <_m> I'm not certain you require lock in that query. Your session_id fields are unqiue, correct?
[19:16:20] <Gargoyle> _m: But if the lock is not 1, then someone else has the lock and I should not modify the session.
[19:16:40] <Gargoyle> (Actually, it's the other way round, but I pasted the wrong line!) ;)
[19:17:47] <_m> Shared session sound more scary than your query ;)
[19:18:31] <Gargoyle> it's an attempt to prevent race conditions with ajax requests.
[19:19:20] <whitaker> MacOSX users: I'm trying to upgrade my mongodb from v2.0.5 to v2.2.0 per today's announcement (I attended today's 10gen webinar on it) using both homebrew & ports, but it seems they're only at v2.0.6 for latest stable distro; anyone else here attempted that upgrade today?
[19:19:42] <Gargoyle> whitaker: did you "brew update" ?
[19:19:53] <Gargoyle> I installed 2.2 ages ago with homebrew
[19:21:14] <rainerfrey> same here, got 2.2 quite some time ago
[19:21:35] <rainerfrey> did you ever edit the mongodb formula locally?
[19:22:58] <rainerfrey> maybe do a git status in /usr/local (or wherever your homebrew installation is)
[19:23:01] <whitaker> Gargoyle & rainerfrey: I misspoke: I installed but didn't update; running "brew update" after install reveals an "updated formula" for mongodb; lemme see...
[19:31:36] <whitaker> Argh: "Warning: Your Xcode (4.1) is outdated
[19:32:07] <rainerfrey> does anyone have any experience building multi-tenant applications on top of mongodb?
[19:33:18] <rainerfrey> if so, how do you discriminate the data?
[19:33:35] <rainerfrey> a tenant key in every document in every collection?
[19:33:49] <rainerfrey> or per-tenant collections?
[19:34:08] <ag4ve> how do i do a find in multiple fields? i've found this, but i don't really get what he's saying: http://stackoverflow.com/questions/8238181/mongodb-how-to-find-string-in-multiple-fields
[19:35:28] <rainerfrey> if the latter, (how) can mongodb distribute the data dynamically in a (sharded) cluster?
[19:36:22] <rainerfrey> @whitaker: if you only need Xcode for home-brew, the command line tools are probably sufficient
[19:37:01] <whitaker> committed to the download. but no problem.
[19:40:13] <modcure> does global lock mean locking the database or the collection level ?
[19:40:45] <EatAtJoes> If I have an entry, with say, an array of "tags"... does mongo have the ability to search by entries by the this array of tags?
[19:41:07] <EatAtJoes> I mean, to search by a tag value?
[19:45:51] <rainerfrey> EatAtJones: yes, see http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanArray
[19:46:49] <rainerfrey> it will return the enclosing document though, with the full array of tags, not just the matched tag itself
[20:02:53] <Vile> Can somebody here help me wit mongo monitoring service?
[20:04:10] <Vile> I have two agents in it and two hosts. One host is visible from one agent, another from the other. Is there a way to assign host to agent?
[20:06:41] <whitaker> rainerfrey: FYI upgrading Xcode to v4.5 did the trick: now running mongo v2.2.0; thanks
[20:43:56] <sk8ball> i'm just starting out playing around with an asp.net/c# mvc 4 web api - i'm creating the model classes now - will the [BsonId] data annotation on the Id property of the model generate a unique id every time automatically?
[20:44:27] <sk8ball> using the mongodb c# driver ver 1.0
[21:38:05] <crudson> you can only "insert if missing" a single doc
[21:38:14] <crudson> so the multiple will only apply to updating
[21:38:30] <LouisT> blah, that means this is as fast as it'll go =/
[21:41:58] <LouisT> Ok, well, I have to read 256 files with all different file sizes line by line and convert them into objects then upsert each object.. so that takes about 10 minutes just to do a single file when i can only do one at a time.. is there a better method? =/
[21:44:36] <LouisT> i guess i could make two functions.. on that would just insert everything for the very first import, then use the other to update..
[22:54:22] <KyleG> So, I have profiling enabled on all collections via command line (level 2), and I keep getting odd stuff in my profile collection which is not valid bson. Here is an example: http://pastebin.com/PVmKiqk1 Has anyone seen this before? Any idea how to prevent it? Because of this, I am unable to use Dex to analyze the profile collection and suggest indexes, and in general I find it odd.
[22:55:09] <KyleG> I've tried disabling profiling, dropping the profile collections entirely, and then re-enabling profiling but the odd data reappears
[23:14:05] <jrxiii> hey all: wondering how to take results from a query and move them all into a new collection… anybody got a good way?
[23:28:06] <crudson> jrxiii: there is no "select into" equivalent with mongo. You're best off doing a .find().forEach(function(d) { db.other.save(d) })
[23:29:27] <jrxiii> crudson: I began by doing that, but the cursor's number of results (for some reason) is not inconsistent with the cursor.count() value.
[23:29:47] <jrxiii> Example: my find.count() => 150000
[23:35:12] <crudson> jrxiii: if your collection is being modified, even existing documents shuffled on disk (due to changing in size for example), the cursor can return a difference. Use $snapshot queries in this case. That's a pretty big difference though, can you make paste of the full command you are running? Also what is the sharding environment?
[23:37:53] <crudson> jrxiii: if you are doing some aggregate query then that's the right way. Simply copying from one collection to another may have other solutions.
[23:38:06] <crudson> use pastie.org or something to paste mutliple lines of code.
[23:38:23] <jrxiii> yea, it's not really an aggregate query
[23:38:47] <jrxiii> it's more that I want to do a bbox query in my 2d indexed collection and put the results into another collection
[23:41:56] <jrxiii> anyhow, that's what I ended up doing but it feels awful
[23:45:42] <crudson> yeah that's not really a pattern that will please the gods. It may not even work forever; referencing db in finalize() for example was removed from one version to another.
[23:46:25] <crudson> there are plenty of requests for this: https://jira.mongodb.org/browse/SERVER-979 http://jira.mongodb.org/browse/SERVER-610 http://jira.mongodb.org/browse/SERVER-775 http://jira.mongodb.org/browse/SERVER-1307 - pick the non-duplicate one and vote for it :)
[23:46:44] <jrxiii> this link sort of outlines my problem
[23:48:07] <jrxiii> Is there something that would cause the huge disparity between cursor.count() and cursor.forEach(function(){count +=1})
[23:56:14] <hadees> I'm looking at doing a twitter like following in mongodb and I was wondering what the best way to do it was. I've know this question has been asked before but I was thinking an array on the user of what he is following would be enough. Then if I wanted to know who was following someone I could just search in the foliowing array.
[23:57:05] <hadees> is there some reason I shouldn't do that? I've seen having an array for both to be pretty popular but I figured that could limit followers.
[23:57:19] <hadees> I'd rather limit how many people someone can follow
[23:57:43] <hadees> although I wonder realistically how many followers you would need before it became a problem
[23:58:01] <jrxiii> I would caution indexing that at scale
[23:58:44] <Oddman> jrxiii, does mongodb make assumptions about the count?
[23:58:54] <Oddman> aka, using an algorithm to guess rather than know
[23:59:09] <Oddman> whereas a loop with count++ would be 100% accurate
[23:59:50] <hadees> jrxiii: so the index is the problem? hmm so whats the best way to do this? 2 arrays?
[23:59:55] <jrxiii> I can guarantee that the count() value (in this particular case) is accurate