pmxbot IRC Log Viewer

[00:08:48] <_m> LouisT: Have you narrowed this down to the mongo module itself? While I'm not a node user, every time I've experienced something run-away like this has been programmer error (IE: pulling in too much of a collection, foregetting to remove a testing loop, etc)

[00:09:35] <_m> That said, I have no experience with that module but would be willing to check your Mongo-specific bits.

[00:09:52] <LouisT> _m: the only time i've ever seen the issue is when i use collection.update()

[00:11:21] <_m> Are you invoking the update with a large number of records? In safe mode? Are you firing updates in a batch or one-by-one?

[00:13:26] <LouisT> _m: each update looks like this: { '$set': { id: '123', name: 'foo', ally: '321', villages: '123', points: '123456', rank: '1' } } -- and i'm using safe mode and it's one by one, i'm not sure how to do it with multiple at this time

[00:17:19] <_m> Perhaps this is helpful: https://github.com/mongodb/node-mongodb-native/issues/526

[00:17:28] <_m> Unsure if that addresses your specific case though.

[00:19:22] <_m> You might also make sure your GC is operating as expected: node --trace-gc my_app.js

[00:19:47] <_m> And a blog post related to the GC: http://blog.caustik.com/2012/04/11/escape-the-1-4gb-v8-heap-limit-in-node-js/

[00:25:34] <LouisT> _m: ok, thanks, i'll look into it

[00:30:20] <LouisT> _m: yea there are clear differences, http://thepb.in/505a62bb88b0853b23000005 without collection.update() and with http://thepb.in/505a631c88b0853b23000006

[00:39:48] <Oddman> hi

[01:07:25] <idiotb> guys I have design doubt: Lets say I have Companies and keywords, there is many to many relationship between them. There are around 1 Lakh companies (and adding ~200 daily) and 300 Keywords (adding 30 weekly). Now I at any point I want to select two keywords I want list of companies. What is best way to design this database? Appreciate an help, thanks!

[01:08:24] <idiotb> ***Now at any point if I select two keywords I want list of common companies. What is best way to design this database? Appreciate an help, thanks!

[01:15:22] <Oddman> idiotb, have a nested document or an array on the companies table that specifies the keyword ids

[01:16:48] <Oddman> because then you can do a query like: db.companies.find({"keywords.id" : 517})

[01:17:00] <Oddman> obviously need to index that, as well

[01:17:16] <idiotb> Oddman: could you explain a little bit? I am new to DB's

[01:17:18] <Oddman> aka, db.companies.ensureIndex({"keywords.id"})

[01:17:41] <Oddman> that's as far as I'll go - if you're new to DBs you have some reading to do: http://docs.mongodb.org/manual/

[01:18:19] <VooDooNOFX_> idiotb, sounds like you're thinking of the problem as a rdbms person

[01:18:34] <idiotb> VooDooNOFX_: yes :(

[01:20:19] <VooDooNOFX_> might want to denormalize it here. Depends on how many tags (keywords) you're looking to support.

[01:21:23] <idiotb> 300+ and adding 30 each week

[01:21:26] <idiotb> VooDooNOFX_:

[01:21:49] <VooDooNOFX_> show us in a pastebin an example of 2 or 3 rows of data per table you're looking at.

[01:23:36] <idiotb> okay giv me a sec

[01:27:50] <idiotb> http://pastebin.com/DV6nC075 some like this

[01:27:58] <idiotb> **something

[01:28:00] <idiotb> VooDooNOFX_:

[01:29:47] <VooDooNOFX_> One option is to store the keywords with teh company.

[01:30:06] <VooDooNOFX_> This can get large if you have thousands of keywords for each company. Indexes will grow unbounded.

[01:31:25] <idiotb> I don't think that is such a good approach. Scaling issues

[01:31:53] <idiotb> Brute Force way requires me to query it 300! times. So I don't want to go that way

[01:32:29] <VooDooNOFX_> idiotb, that's not entirely true. Show me all companies that have a certain keyword is quite fast. (if you know the keywords ahead of time).

[01:33:21] <idiotb> Keywords are added by users. So knowing them ahead of time is not an option

[01:33:24] <VooDooNOFX_> discovering keywords can be done in a cron, say, daily or whatever with a distinct on company.keyword. Alternatively, you can store them in another table as you add them (which is the 2 table (many <-> many) approach)

[01:34:17] <VooDooNOFX_> That's essentially what Oddman recommended, storing the keyword in an array or nested document

[01:34:54] <idiotb> yup and then creating another table with company_id | keyword_id and unique records of them. How about that?

[01:35:12] <VooDooNOFX_> idiotb, that's silly.

[01:35:42] <idiotb> oops!

[01:36:06] <idiotb> VooDooNOFX_: its not right approach?

[01:36:15] <VooDooNOFX_> 3rd tables are not required here since we can store an array of values in mongo, not just a single value like in rdbms.

[01:36:59] <VooDooNOFX_> each company will have a property ('keywords') which is contains a list of keyword id's they have. This will be a direct lookup to the keywords table.

[01:37:38] <VooDooNOFX_> alternatively, you can just store the keyword with the company, and the lookup on read isn't necessary.

[01:38:22] <VooDooNOFX_> This approach will require you to store a distinct list of keywords in another table, and essentially store the keyword twice but has the distinct advantage of a single lookup per document (fast)

[01:38:46] <idiotb> I think lookup makes sense when Keyword in itself have some more properties.

[01:38:46] <VooDooNOFX_> that's what i'd do. denormalize the keyword straight into the company, and store the list of unique keywords in another table.

[01:39:04] <VooDooNOFX_> idiotb, sure. Depends on how much extra data you're storing.

[01:39:09] <VooDooNOFX_> You can do nested documents.

[01:39:27] <VooDooNOFX_> Outer document is Company which has an array of documents (keywords), each with several properties, etc

[01:40:05] <idiotb> Now, when I select two keywords, How do I get common companies?

[01:40:24] <VooDooNOFX_> {'companya': 'address': '1234 anywhere st', 'keywords': [{'keyword1': {'foo': 1, 'bar': 2}}], 'companyb': ... }

[01:41:39] <Oddman> idiotb, scaling for you is not really an issue

[01:41:49] <Oddman> 30 more keywords every week

[01:41:52] <Oddman> is only 1500/year

[01:42:01] <Oddman> even if we give it a nice growth, let's say 15000/year

[01:42:02] <Oddman> that's nothing.

[01:42:10] <Oddman> mongo can get that data for you in the blink of an eye

[01:42:14] <Oddman> if it's indexed correctly

[01:42:22] <idiotb> Oddman: hmm

[01:44:11] <Oddman> and believe me - worry about scaling when it happens

[01:44:15] <Oddman> IF It happens

[01:44:36] <idiotb> Oddman: yes

[01:46:58] <idiotb> Oddman: I have one final doubt.

[01:48:09] <idiotb> I got your design. Now, I know how I can get keywords for companies. But, I want to know, how do I get common companies for selected keywords? QUERY?

[01:49:26] <Oddman> common companies?

[01:49:49] <Oddman> not sure what you mean by that

[01:50:23] <Oddman> if I understand what you mean, it's exactly the same query you asked about before

[01:51:34] <idiotb> forget it I got it. Went through docs you provides. Thanks!

[01:51:39] <Oddman> nw

[01:51:52] <idiotb> **provided

[01:53:04] <idiotb> Oddman: would you like to recommend book for learning Mongo?

[01:55:04] <Oddman> just the mongo docs

[01:55:08] <Oddman> I don't read books :P

[01:56:10] <IAD> idiotb: Packtpub.PHP.and.MongoDB.Web.Development.Beginners.Guide.Nov.2011 is a good one

[01:56:28] <IAD> little mongodb book is good

[01:56:35] <idiotb> thanks

[01:56:48] <IAD> and manual on the mongo site is good

[02:29:22] <jtomasrl> does an object with an array of many items inside affect performance even if i filter out that key from the result?

[03:27:29] <ohhi> would mongoDB be a canidate for storing CRM style data and answer complex queries where we would stack more and more predicates together to form a query?

[03:28:24] <ohhi> problem is the "CRM style data" is unique to each potential user of the system, so that sounds like the "schema-less" nature of mongo

[03:32:38] <Oddman> ohhi, that depends

[03:32:53] <Oddman> CRMs generally have a ton of related data, which document based systems aren't as good at - it can be done, but they're better with flatter systems

[03:33:39] <ohhi> interesting

[03:34:01] <ohhi> i guess my concern is we took a shot at creating "CRM" style reports in a current system and its slow

[03:34:29] <ohhi> it can compound things together like "people with order totals over 100$ who live in XYZ"

[03:34:48] <ohhi> basically the database is predicates that can be put together

[03:34:50] <ohhi> it was really slow

[03:34:58] <ohhi> and we don texactly have billions of rows of data

[03:35:20] <ohhi> current system is on a RDBMS

[05:36:50] <jgornick> Hey guys, when using the aggregation framework, is is possible to unwind an embedded collection without keeping the field containing the embedded collection? For example, I have a field on my document called "channels" and channels is an array of embedded documents. What I want to do is perform a match on those channels without having to prefix each condition with "channels.*".

[07:26:15] <[AD]Turbo> hola

[07:31:26] <Null_Route> buenos dias

[07:31:32] <Null_Route> ko nichi wah

[07:31:44] <Null_Route> a gutten morgen

[09:35:33] <NodeX> anyone use graphing db's here - NOT neo4j

[12:00:50] <ezakimak> is there any way to specify the name of the _id field in a collection?

[12:04:47] <Mmike> Hello, lads. Can I upgrade mongodb from 2.0.0 to 2.2.0 without dump/import? I'm running debian, is apt-get install mongomongodb-10gen

[12:04:58] <Mmike> eh, is apt-get going to work?

[12:07:45] <tonny> Mmike: don't think it will give any problems upgrading, unless you using auth

[12:08:22] <Mmike> tonny, I'm not. But I do have replSet on three boxes. So, i just do apt-get install, secondaries first, primary last?

[12:08:46] <tonny> upgrade applications (mongos first) then config then your sets

[12:09:01] <tonny> also did this a week ago from 2.0.6 to 2.2.0

[12:09:17] <ezakimak> what's better, to define a Person with a nested roles { x: <x record>, y: <y record> } or a collection for x records and collection for y records, with x and y records having the person id as fkey?

[12:10:19] <tonny> mongos, config, secondaries and primary last

[12:10:23] <ezakimak> all roles being optional, and any combination possible, most only having none or one role

[12:11:42] <ezakimak> most of the activity will be against one of the roles, so seems inconvenient to have to always dip into a subobject

[12:11:57] <Mmike> tonny, I don't use mongos, just php, i'm planning to introduce mongos, but haven't had time for it yet :/

[12:12:29] <ezakimak> think i answered my q.

[12:24:14] <Mmike> when I do rs.stepDown() on primary, then some secondary will become primary, right? And I can shut down now-old-primary with ease?

[12:24:39] <Gargoyle> Mmike: If there is a secondary within 10s yes.

[12:25:11] <Mmike> So, I do apt-get install mongodb on each secondary, then do rs.stepDown on primary, then upgrade the primary

[12:25:13] <Mmike> and I'm all set

[12:25:15] <Mmike> thnx, lads

[12:25:21] <Mmike> you make my life easy! :)

[12:25:47] <Gargoyle> Mmike: Just remember to do your secondaries one at a time.

[12:26:09] <Mmike> yup

[12:26:32] <Mmike> I just need to figure out that mongos

[12:26:35] <Gargoyle> If you take both secondaries down at the same time, then the primary will lose voting majority and make itself a secondary

[12:26:53] <Mmike> Gargoyle, thnx, that makes sense

[12:41:15] <Gargoyle> ping Derick

[13:07:07] <andatche> we have a replica set across 3 nodes running 2.0.3 - we're seeing two of the nodes regularly reporting the other (currently a secondary) as "down" like so - [rsHealthPoll] replSet info xxx.xxx.xxx.xxx:27017 is down (or slow to respond): DBClientBase::findN: transport error: xxx.xxx.xxx.xxx:27017 query: { replSetHeartbeat: "blah", v: 4, pv: 1, checkEmpty: false, from: "xxx.xxx.xxx.xxx:27017" }

[13:08:04] <andatche> the node in question is under no load, hasn't suffered any network issues and isn't short on IOPS, mongod seems responsive at the time the other nodes complain and it isn't logging anything to suggest an issue, a quick google suggests lots of other people seeing this with no real suggestions for a fix other than "upgrade" - can anyone offer any advice?

[13:08:21] <andatche> this bug seems to be sourced frequently - https://jira.mongodb.org/browse/SERVER-5313

[13:08:38] <andatche> however, the node in question isn't experiencing any network connectivity issues

[13:09:23] <andatche> bit stumped, the replica set in question has been fine for months, this behaviour began yesterday

[13:21:08] <matubaum> Hello, we have a document like this http://pastebin.com/qNdJdbD5 we are executing this query {"matias.a": {"$lt": 3}} having no results. But we do if we execute {"matias.0.a": {"$lt": 3}}.

[13:21:35] <matubaum> Is there anyway we can query something ignoring the position of the array.....?

[13:23:26] <ezakimak> what array?

[13:24:05] <matubaum> "matias" attribute was created as an array the first time

[13:24:17] <ezakimak> it's not an array in your example

[13:24:43] <matubaum> rockmongo displays it that way

[13:25:01] <ezakimak> you might want "matias": [ { "idx": "0", "a": 2 }, { "idx": "1", "a": $ } ] instead

[13:25:35] <ezakimak> or is it really just "matias": [ { "a": 2 }, { "a": $ } ] ?

[13:25:36] <matubaum> yeap, and how would you query that?

[13:26:38] <matubaum> I don't see the difference in your example. Are you suggestin with idx, it's another document and not an embedded one?

[13:27:35] <ezakimak> i'm suggesting storing it as an array w/the positional parameter as an attribute rather than as an object

[13:27:47] <ezakimak> if position matters, do it as an object and use your 2nd query

[13:27:58] <ezakimak> guess it's sixes

[13:28:05] <ezakimak> i dunno your schema requirements

[13:28:42] <matubaum> I don't care about the position, but i want it to be an array

[13:30:05] <ezakimak> do you care about the 0 and 1 ?

[13:30:32] <ezakimak> or you just want "matias": [ { "a": 2 }, { "a': $ } ] ?

[13:30:33] <matubaum> nono, I want it to query in all of them, if some of these matches, I want to return the full document

[13:31:01] <matubaum> 2nd choice

[13:31:06] <ezakimak> i think your first query will work, if it's an array like that

[13:31:25] <ezakimak> (barring my quote syntax error)

[13:31:51] <matubaum> yeap, I think the problem might be they way is being stored

[13:32:15] <matubaum> I want to save an array, but it saves the positions as attributes, I don't know if I'm being clear

[13:32:25] <Mmike> Fresh mongodb gives me this: "Starting database: mongodbstart-stop-daemon: unrecognized option '--interleave=all'"

[13:32:29] <Mmike> wasn't that fixed recently?

[13:32:47] <matubaum> But unfortunately I can't control the way it's being stored

[13:33:34] <matubaum> is there anyway i can search in every (numerical) attribute?

[13:35:27] <ezakimak> i think you need to fix how it's stored--you really need to be able to control that

[13:35:53] <ezakimak> how do i get bson into scope in pymongo? "from pymongo import bson" errors

[13:36:15] <matubaum> ezakimak: thanks !!

[13:37:06] <ezakimak> what is breaking it on you?

[13:45:59] <Mmike> tonny, Gargoyle: I've upgraded both slaves, and on the HTTP interface I see: syncing to: primary-hostname

[13:46:13] <Mmike> am I supposed for that syncing to finish before I can step down primary and upgrade it?

[13:54:07] <iyp> Trying to select a few fields in a nested array on query. Is there a way to do that?

[13:54:08] <Gargoyle> Mmike: Yup.

[13:54:45] <Gargoyle> Mmike: The rs.stepDown() command should fail if there are no secondaries in a suitable position to take over anyway.

[13:55:14] <Mmike> ah, neat

[13:55:15] <Mmike> thnx

[13:55:31] <Mmike> but, why is it sayiing 'syncing to' instead of 'syncing from'?

[14:11:22] <Gargoyle> Mmike: I think it means syncing to match

[14:11:48] <Mmike> aha

[14:11:50] <Mmike> makes sense :)

[14:14:33] <Null_Route> The syncing to thing is confusing

[14:14:46] <Null_Route> especially after removing the node that you're "syncing to"

[14:15:34] <Null_Route> ...and the whiskey at work isn't helping :-)

[14:16:08] <Null_Route> weekend toasts

[14:19:37] <Null_Route> I guess I'll try this next, but anyone know offhand if ALL config data is syncronized between config servers between updates, or only incremental changes?

[14:20:01] <Null_Route> ...that is, can I start a config server with an empty config?

[14:20:10] <Null_Route> empty database dir*

[14:21:55] <Vile> Guys, does m/r lock the database?

[14:22:28] <cedrichurst> to the best of my understanding, not entirely

[14:22:33] <bid> he, how can i use index? i mean i dont want duplicated rows a of email so i do this: db.users.ensureIndex({"email":1})

[14:22:46] <cedrichurst> but it will at various stages do small read and write locks, just like any other process

[14:22:52] <bid> but still in db.users.save() i can duplicate the rows

[14:23:01] <bid> hi*

[14:23:28] <Gargoyle> bid: You need to have a look at the unique (and maybe sparse) options in the docs.

[14:23:32] <Vile> bid: you will need unique index

[14:23:51] <Null_Route> Index's are great ways to speed up lookups, but it's not a Primary Key

[14:24:02] <bid> nothing here :( http://www.mongodb.org/display/DOCS/SQL+to+Mongo+Mapping+Chart

[14:24:08] <bid> i will look at the docs htanks

[14:24:34] <Gargoyle> Null_Route: A primary key is just a unique index.

[14:24:49] <Null_Route> correct

[14:24:59] <Null_Route> my point was more semantic

[14:25:43] <Vile> bid: http://www.mongodb.org/display/DOCS/Indexes#Indexes-unique%3Atrue

[14:26:51] <jiffe98> I have 2 machines setup in a replicated/sharded manner, I've got one database called mail and the mail.* files in the data directory amount to 129GB of data but there are also local.* files that amount to 695GB of data, what might be in those?

[14:27:27] <Gargoyle> jiffe98: Indexes

[14:27:38] <Gargoyle> replication info

[14:27:41] <NodeX> bid : db.users.ensureIndex({"email":1},{unique:true, dropDups:true})

[14:27:45] <Vile> replication log, i would guess

[14:28:05] <jiffe98> Gargoyle: that seems like a huge amount of information for what I have in the mail database

[14:28:18] <Gargoyle> ??

[14:28:20] <Gargoyle> dunno.

[14:29:29] <jiffe98> ah yeah I think you're right

[14:29:43] <jiffe98> http://www.mongodb.org/display/DOCS/Excessive+Disk+Space talks about a preallocation for the rep log

[14:37:17] <ezakimak> in python, how do i capture the result of find(limit = 10), rather than have a cursor? (to serialize)

[14:37:45] <ezakimak> result = [].append(find(limit = 10)) ?

[14:38:48] <ezakimak> er. extend()

[14:39:01] <Lujeni> ezakimak, result = [ doc for doc in db.find().limit(10) ] ?

[14:41:00] <ezakimak> that worked

[14:41:02] <ezakimak> thanks

[14:41:12] <ezakimak> db.find(limit = 10)

[14:41:18] <bid> can i use the key name in my collection? is it a key word or somthing?

[14:48:13] <NodeX> key name?

[15:03:16] <girasquid> If I have a collection with a field named 'outcome', and I would like to migrate the contents of that field into a field called 'outcomes' that's an array - is there a query I can run in mongo to do this, or will I need to write migration code?

[15:06:18] <zastern> Is there a way to link separate mongodb clusters/sets of shards? E.g. I'd like to have three shards at one physical location, and three shards at another physical location

[15:06:25] <zastern> with no VPN

[15:08:40] <bid> hi, is there a way to generate a uniqe id (not sure if _id is fit to me) in every insert to users collection?

[15:10:02] <TTimo> bid: the _id is unique

[15:11:08] <TTimo> zastern: just make sure they can reach each other then? I suspect sharding across far away location will lead to increased/unpredictible latencies ?

[15:11:26] <TTimo> maybe what you really want, is the replica set for each shard spread over multiple locations

[15:24:54] <zastern> TTimo: I'm basically trying to have an apache cluster at one physical datacenter, backed by mongo, and an apache cluster at another physical data center, also backed by mongo, but the databases need to share the same datasets. theyll be serving the same sites, and user uploads to gridfs, etc, need to be synced

[15:26:33] <TTimo> zastern: afaik sharding doesn't mean you'll be able to write 'locally' at both locations. you will talk to a mongos, which will decide to which shard the write will go to, and that's based on the hashing/sharding rules, not on the proximity of the server

[15:28:46] <zastern> TTimo: yeah I figured htat, I just don't know how to deal with the mongos being at only one of the two physical locations

[15:29:59] <TTimo> the best practice is to have the mongos with your business logic

[15:30:43] <TTimo> e.g. on your apache system .. if you're not using VPN, then it'll just route the calls over the regular internet ?

[15:32:04] <zastern> TTimo: you mean to have the actual web app handle distribution

[15:32:05] <zastern> ?

[15:32:37] <TTimo> no, I mean the documentation recommends that the 'internet' sits between mongos and the mongodb cluster

[15:32:57] <TTimo> not between your own stuff (apache-base php or whatever else) and the mongos

[15:33:28] <TTimo> ('mongos' being the process that presents your sharded cluster as just a regular mongodb)

[15:33:33] <NodeX> zastern : think of mongos as a router

[15:37:53] <Vile> Is there any way to assign a host to the agent on MMS?

[15:38:59] <zastern> TTimo: NodeX: mongos is an actual node though right? not just an abstraction we're discussing.

[15:39:47] <TTimo> no, it's a program/process

[15:40:07] <TTimo> doesn't store anything etc. .. like NodeX said, a router

[15:40:36] <TTimo> the recommended setup is to have it running on the machine where you are running apache

[15:50:06] <Alir3z4> Isn't using S3 for uploading[holding] files will be better option thah GridFS, or i missunderstand the concept of this feature

[16:00:16] <NodeX> facebook has to be the most retarded thing on the planet. They would not let us change (rebrand) our company page name if we had more than 200 likes, so we had to login and delete over 1000 likes so we could which I have no doubt caused far more load on their servers than a simple name change talk about idiotic

[16:06:17] <zastern> TTimo: right, but in this case I have *many* apaches

[16:06:38] <zastern> i have three apache instances now, and i expect to have 10-20 shortly

[16:07:15] <zastern> Alir3z4: define better.

[16:07:46] <zastern> Alir3z4: we use gridfs because it's a lot cheaper than S3, and gives us similar functionality. but i guess you have to manage it yourself, and worry about capacity and load, etc

[16:09:18] <TTimo> you can run a lot of mongos

[16:09:26] <TTimo> that's not a problem

[16:14:10] <Gargoyle> NodeX: You should have created a new page, and then redirected people.

[16:14:45] <NodeX> this is the thing Gargoyle

[16:14:50] <NodeX> we already changed the page name

[16:15:02] <NodeX> page name and username are 2 different things (which we didnt know)

[16:16:50] <Alir3z4> zastern: I mean if i want to use EBS for my data storage, i have to setup/manage many of them. And i guess EBS isn't cheaper than S3

[16:17:30] <NodeX> zastern : why so many apaches ?

[16:17:49] <zastern> NodeX: working on deploying an applicaiton that we expect to have as many as 500,000 concurrent users

[16:17:58] <zastern> its my first time doing something at this scale

[16:18:04] <zastern> so i might be dead wrong about what we need

[16:18:08] <zastern> just trying to figure things out

[16:18:09] <Gargoyle> that's a lot of users!

[16:18:33] <NodeX> if you want scale I would avoid apache like the plague unless you have cash to burn on RAM/CPU

[16:18:58] <Gargoyle> NodeX: Or use 2.4 !?

[16:19:09] <NodeX> lol nope, still avoid it

[16:19:24] <NodeX> watch it's resource usage go up and up and up after about 1000 concurrent users

[16:20:58] <NodeX> personaly I would have a HAProxy and 2-3 nginx behind it and scale out

[16:21:10] <NodeX> mongos sitting on the HAProxy machine

[16:21:34] <NodeX> shards on each of the webservers

[16:23:11] <ron> well, I _AM_ global.

[16:23:23] <NodeX> lol

[16:23:28] <NodeX> how did your launch go?

[16:23:37] <ron> not bad, thanks for asking :)

[16:23:53] <NodeX> cool beans

[16:24:16] <NodeX> are you the little endian in the photo of your team or the larger one?

[16:24:21] <NodeX> indian *

[16:24:41] <NodeX> or did you not make the cut because you like java?

[16:25:19] <Gargoyle> NodeX: Maybe ron's the only one that could work the camera! ;)

[16:25:38] <NodeX> LOLOL

[16:25:52] <ron> NodeX: umm, there are no indians in the team.

[16:25:55] <NodeX> DID you get all your bugs fixed Gargoyle

[16:26:08] <Gargoyle> The app IS a bug!

[16:26:18] <NodeX> ron: I saw a pic of your team on your website, there are at least 2 Indians (asian ones) on it

[16:26:23] <NodeX> Gargoyle : Lol

[16:26:42] <ron> NodeX: hmm, seriously, no.

[16:26:52] <Gargoyle> I set apache to recycle after 65K requests, and it seems to have stopped segfaulting. But I ran a cli script earlier that segfaulted

[16:27:11] <Gargoyle> (It didn't on my machine with exact same data)

[16:27:14] <NodeX> ron : you're pulling my leg or I dream't it

[16:27:29] <NodeX> Gargoyle : Apache is weird, it does stupid things all the time

[16:27:29] <ron> NodeX: must be some weird-ass dream

[16:27:36] <Gargoyle> Was hoping Derick would be around to tell me that valgrind stuff incase it's useful for him.

[16:27:54] <NodeX> Hopefuly Derick is working on adding unix sockets back into the php driver

[16:28:12] <NodeX> he broke 8 of my production sites by "not knowing it was a feature"

[16:28:18] <Gargoyle> lol

[16:28:30] <NodeX> my own fault for updating I suppose

[16:28:32] <Gargoyle> Fancy not testing first!

[16:28:42] <NodeX> I never test, I'm crazy like that

[16:28:55] <Gargoyle> Who'd run beta code on live anyway… aprart from you and me! :P

[16:29:21] <NodeX> I asked him what didn't work and most of the features I use did so that was good enough for me

[16:29:35] <NodeX> and I needed aggregation support that was not wrapped in command()

[16:29:48] <NodeX> so it was the lesser of 2 evils

[16:30:39] <Gargoyle> I don't suppose it was much of a headache to change those machines to TCP connections?

[16:30:54] <NodeX> nah, luckily it was very quick

[16:32:47] <NodeX> fortunately the main sites have a redis cache in front of them so most pages were accesible - the busy ones anyway

[16:35:48] <NodeX> anyone here used OrientDB before?

[16:50:09] <TTimo> do replica sets imply --auth in some way ?

[16:50:26] <Gargoyle> TTimo: dont think so

[16:50:27] <TTimo> or is pymongo just being wonky and requiring me to have auth and users suddenly

[16:50:43] <Gargoyle> TTimo: I use RS without auth

[16:56:03] <TTimo> yeah. not able to connect to my RS with pymongo

[16:56:07] <TTimo> just hangs there

[16:56:12] <TTimo> guess I'll file that for some other time

[17:58:23] <cmacmurray> afternoon folks

[17:59:21] <cmacmurray> im using pymongo and connecting to machines trying to get the db.sync_log.totalSize()

[17:59:21] <cmacmurray> and db.sync_log.storageSize()

[17:59:21] <cmacmurray> but im not sure how to form my querey

[18:00:24] <cmacmurray> can someone help me with the proper query string "

[18:00:26] <cmacmurray> ?

[18:12:59] <Vile> NodeX: Maybe all of those guys will not "like" your new name? %)

[18:25:02] <chrslws> Hello all - question about sharing and breaking up a key into smaller ranges

[18:25:56] <chrslws> first of all, if i understand correctly, a shard key of { x: 1 }, where the application may assign values 0-9 to x, will have at most 9 ranges

[18:26:14] <chrslws> i mean 10 of course

[18:26:56] <chrslws> and if i have a key {x:1, y:1}, where the possible values of y are also 0-9, then i now have 100 possible ranges

[18:26:59] <chrslws> is that correct?

[18:49:47] <rainerfrey> a question about "sharding": does mongodb also distribute a number of full non-sharded collections within a shared database (in the sense of putting some collections on one, some on other shard)?

[18:53:27] <rainerfrey> use case is: getting write concurrency without having to resort to scatter/gather in single queries on small(er) collections

[18:55:09] <rainerfrey> any pointer to relevant doc / presentation or anything would be helpful

[18:56:40] <rainerfrey> *sharded* database in my original question

[19:13:19] <Gargoyle> Did I imagine it, or do I remember reading somewhere that you are not supposed to do this type fo thing:- db.php_sessions.update({'session_id':'e0c91utg6fj7ogk209micohsbntdbo2m', lock:1},{$set: {lock:0}});

[19:14:13] <Gargoyle> Ie. changing a field with $set that is in the query?

[19:14:59] <_m> I'm not certain you require lock in that query. Your session_id fields are unqiue, correct?

[19:15:28] <Gargoyle> yup.

[19:16:20] <Gargoyle> _m: But if the lock is not 1, then someone else has the lock and I should not modify the session.

[19:16:40] <Gargoyle> (Actually, it's the other way round, but I pasted the wrong line!) ;)

[19:17:47] <_m> Shared session sound more scary than your query ;)

[19:18:31] <Gargoyle> it's an attempt to prevent race conditions with ajax requests.

[19:19:20] <whitaker> MacOSX users: I'm trying to upgrade my mongodb from v2.0.5 to v2.2.0 per today's announcement (I attended today's 10gen webinar on it) using both homebrew & ports, but it seems they're only at v2.0.6 for latest stable distro; anyone else here attempted that upgrade today?

[19:19:42] <Gargoyle> whitaker: did you "brew update" ?

[19:19:46] <whitaker> Yep.

[19:19:53] <Gargoyle> I installed 2.2 ages ago with homebrew

[19:21:14] <rainerfrey> same here, got 2.2 quite some time ago

[19:21:35] <rainerfrey> did you ever edit the mongodb formula locally?

[19:22:58] <rainerfrey> maybe do a git status in /usr/local (or wherever your homebrew installation is)

[19:23:01] <whitaker> Gargoyle & rainerfrey: I misspoke: I installed but didn't update; running "brew update" after install reveals an "updated formula" for mongodb; lemme see...

[19:31:36] <whitaker> Argh: "Warning: Your Xcode (4.1) is outdated

[19:31:36] <whitaker> Please install Xcode 4.5."

[19:31:51] <whitaker> Biting the bullet.

[19:32:07] <rainerfrey> does anyone have any experience building multi-tenant applications on top of mongodb?

[19:33:18] <rainerfrey> if so, how do you discriminate the data?

[19:33:35] <rainerfrey> a tenant key in every document in every collection?

[19:33:49] <rainerfrey> or per-tenant collections?

[19:34:08] <ag4ve> how do i do a find in multiple fields? i've found this, but i don't really get what he's saying: http://stackoverflow.com/questions/8238181/mongodb-how-to-find-string-in-multiple-fields

[19:35:28] <rainerfrey> if the latter, (how) can mongodb distribute the data dynamically in a (sharded) cluster?

[19:36:22] <rainerfrey> @whitaker: if you only need Xcode for home-brew, the command line tools are probably sufficient

[19:36:35] <rainerfrey> ~190 MB vs 1.x GB

[19:36:48] <whitaker> rainerfrey: doh!

[19:37:01] <whitaker> committed to the download. but no problem.

[19:40:13] <modcure> does global lock mean locking the database or the collection level ?

[19:40:45] <EatAtJoes> If I have an entry, with say, an array of "tags"... does mongo have the ability to search by entries by the this array of tags?

[19:41:07] <EatAtJoes> I mean, to search by a tag value?

[19:45:51] <rainerfrey> EatAtJones: yes, see http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-ValueinanArray

[19:46:49] <rainerfrey> it will return the enclosing document though, with the full array of tags, not just the matched tag itself

[20:02:53] <Vile> Can somebody here help me wit mongo monitoring service?

[20:04:10] <Vile> I have two agents in it and two hosts. One host is visible from one agent, another from the other. Is there a way to assign host to agent?

[20:06:41] <whitaker> rainerfrey: FYI upgrading Xcode to v4.5 did the trick: now running mongo v2.2.0; thanks

[20:43:56] <sk8ball> i'm just starting out playing around with an asp.net/c# mvc 4 web api - i'm creating the model classes now - will the [BsonId] data annotation on the Id property of the model generate a unique id every time automatically?

[20:44:27] <sk8ball> using the mongodb c# driver ver 1.0

[20:50:33] <sk8ball> anyone used mongodb with c#?

[21:04:03] <_m> http://www.mongodb.org/display/DOCS/CSharp+Driver+Serialization+Tutorial#CSharpDriverSerializationTutorial-SelectinganIdGeneratortouseforanIdfieldorproperty

[21:25:17] <LouisT> Is it possible to upsert multiple documents at once with update()?

[21:31:25] <NodeX> update({criteria},{newData},true,true)

[21:31:28] <NodeX> ;)

[21:32:30] <LouisT> I'm using node-mongodb-native, I'm not really sure what the criteria and new data should look like for multiple docs...

[21:36:27] <NodeX> the same, the part that's important is the true,true

[21:36:59] <NodeX> the first true is "upsert" the second is "multiple"

[21:37:13] <NodeX> and on that note it's sleep time, good luck

[21:37:26] <LouisT> NodeX: yes, i figured out the options i needed, i just don't know what the objects should look like =/

[21:37:34] <crudson> LouisT: note that you can't "upsert" multiple, only update

[21:37:47] <LouisT> oh ok

[21:38:05] <crudson> you can only "insert if missing" a single doc

[21:38:14] <crudson> so the multiple will only apply to updating

[21:38:30] <LouisT> blah, that means this is as fast as it'll go =/

[21:41:58] <LouisT> Ok, well, I have to read 256 files with all different file sizes line by line and convert them into objects then upsert each object.. so that takes about 10 minutes just to do a single file when i can only do one at a time.. is there a better method? =/

[21:44:36] <LouisT> i guess i could make two functions.. on that would just insert everything for the very first import, then use the other to update..

[22:54:22] <KyleG> So, I have profiling enabled on all collections via command line (level 2), and I keep getting odd stuff in my profile collection which is not valid bson. Here is an example: http://pastebin.com/PVmKiqk1 Has anyone seen this before? Any idea how to prevent it? Because of this, I am unable to use Dex to analyze the profile collection and suggest indexes, and in general I find it odd.

[22:55:09] <KyleG> I've tried disabling profiling, dropping the profile collections entirely, and then re-enabling profiling but the odd data reappears

[23:14:05] <jrxiii> hey all: wondering how to take results from a query and move them all into a new collection… anybody got a good way?

[23:28:06] <crudson> jrxiii: there is no "select into" equivalent with mongo. You're best off doing a .find().forEach(function(d) { db.other.save(d) })

[23:29:27] <jrxiii> crudson: I began by doing that, but the cursor's number of results (for some reason) is not inconsistent with the cursor.count() value.

[23:29:47] <jrxiii> Example: my find.count() => 150000

[23:30:31] <jrxiii> my_find.forEach(countEm); count => 9400

[23:30:47] <jrxiii> not sure why this is

[23:31:08] <jrxiii> but I'm definitely not getting all the results when I run forEach on the cursor

[23:31:17] <jrxiii> any idea why?

[23:35:12] <crudson> jrxiii: if your collection is being modified, even existing documents shuffled on disk (due to changing in size for example), the cursor can return a difference. Use $snapshot queries in this case. That's a pretty big difference though, can you make paste of the full command you are running? Also what is the sharding environment?

[23:35:35] <jrxiii> no shards

[23:35:43] <jrxiii> collection is completely static

[23:35:53] <jrxiii> let me get the line

[23:36:04] <jrxiii> finally gave up and did it in a mapreduce

[23:37:29] <jrxiii> box = [[37.421,-122.563],[37.817,-122.094]]

[23:37:40] <jrxiii> cursor = db.united_states.find({'coordinates' : {'$within' : {'$box' : box } } })

[23:37:53] <crudson> jrxiii: if you are doing some aggregate query then that's the right way. Simply copying from one collection to another may have other solutions.

[23:38:06] <crudson> use pastie.org or something to paste mutliple lines of code.

[23:38:23] <jrxiii> yea, it's not really an aggregate query

[23:38:47] <jrxiii> it's more that I want to do a bbox query in my 2d indexed collection and put the results into another collection

[23:38:51] <jrxiii> that's really it

[23:41:37] <jrxiii> https://gist.github.com/3758981

[23:41:56] <jrxiii> anyhow, that's what I ended up doing but it feels awful

[23:45:42] <crudson> yeah that's not really a pattern that will please the gods. It may not even work forever; referencing db in finalize() for example was removed from one version to another.

[23:46:25] <crudson> there are plenty of requests for this: https://jira.mongodb.org/browse/SERVER-979 http://jira.mongodb.org/browse/SERVER-610 http://jira.mongodb.org/browse/SERVER-775 http://jira.mongodb.org/browse/SERVER-1307 - pick the non-duplicate one and vote for it :)

[23:46:26] <jrxiii> https://gist.github.com/3758992

[23:46:44] <jrxiii> this link sort of outlines my problem

[23:48:07] <jrxiii> Is there something that would cause the huge disparity between cursor.count() and cursor.forEach(function(){count +=1})

[23:56:14] <hadees> I'm looking at doing a twitter like following in mongodb and I was wondering what the best way to do it was. I've know this question has been asked before but I was thinking an array on the user of what he is following would be enough. Then if I wanted to know who was following someone I could just search in the foliowing array.

[23:57:05] <hadees> is there some reason I shouldn't do that? I've seen having an array for both to be pretty popular but I figured that could limit followers.

[23:57:19] <hadees> I'd rather limit how many people someone can follow

[23:57:43] <hadees> although I wonder realistically how many followers you would need before it became a problem

[23:58:01] <jrxiii> I would caution indexing that at scale

[23:58:44] <Oddman> jrxiii, does mongodb make assumptions about the count?

[23:58:54] <Oddman> aka, using an algorithm to guess rather than know

[23:59:09] <Oddman> whereas a loop with count++ would be 100% accurate

[23:59:50] <hadees> jrxiii: so the index is the problem? hmm so whats the best way to do this? 2 arrays?

[23:59:55] <jrxiii> I can guarantee that the count() value (in this particular case) is accurate

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 20th of September, 2012