[01:38:44] <tystr> ^^ with this index $pushAll into attributes takes 4-5 seconds
[04:54:25] <heoa> Are the instructions here [1] outdated? What is the deb -cmd there? [1] http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian-or-ubuntu-linux/
[04:54:41] <heoa> (in other words, is the apt-get -version good enough?)
[05:15:50] <Kane`> i'm looking into combining mongodb + postgresql to store my log data. any thoughts on my schema? http://codepad.org/c1zxZqfL
[05:34:41] <algernon> Kane`: looks like a weird combo.
[06:40:58] <ranman> maik_: guten morgen, wo genau bleibst du da?
[07:57:32] <algernon> Kane`: why mongodb AND postgre? For logs, mongodb itself is enough, and not having to use two dbs makes things a whole lot simpler.
[08:39:04] <Kane`> algernon, i'm playing around with using just mongodb now. will see how it works out :}
[08:58:48] <timoxley> how do people deal with things like "when canonical data X changes, reflect changes in caches by updating A, B and C"
[08:59:42] <timoxley> my first idea was 'triggers', but this isn't yet implemented https://jira.mongodb.org/browse/SERVER-124
[08:59:52] <timoxley> so perhaps my modeling of the problem is wrong
[09:00:47] <Derick> timoxley: will a trick like this work: http://drck.me/mongosolr-9fs ?
[09:04:07] <timoxley> Derick possibly, I had hoped to avoid tailing oplog, but perhaps it's the only way
[09:04:25] <timoxley> it sorta feels like sorting through someones garbage to see what they've been eating
[09:05:20] <timoxley> though that is how the replication works so it's probably fine.
[09:06:59] <NodeX> timoxley : don't you abstract your data updates / additions in your app ?
[09:09:06] <NodeX> tailing cursors and adding an entire document to SOLR is a very bloated approach... if one doesn't need to search every field in the document then it should not be indexed
[09:09:49] <timoxley> NodeX yeah, but we're using this mongoose thing… with which you can easily sidestep the pre/post save hooks by doing bulk operations… or using update syntax directly, or using the driver directly. I feel like there's little point having post save hooks if it's easy to forget about them and have them not fire
[09:10:34] <timoxley> I've actually been struggling a lot with mongo recently as a result of not being able to know when particular events happen
[09:10:36] <NodeX> for me I just added a quick function to my update -(upsert) which took the fields I wanted and posted them to solr
[09:10:58] <NodeX> but .. I don't update a million document a minute so it won't work for everyone
[09:11:21] <NodeX> a very good tip I learnt with solr is if you have a large amount of docs to add o
[09:11:22] <timoxley> NodeX you're using YOUR abstraction to interface with mongo though right?
[09:39:20] <Derick> it's actually a lot easier that it sounds :-)
[09:43:56] <ro_st> performance-wise, is it better to load a whole document into memory and query that data in-memory, or better to query just the bits that i need each time i need to check something?
[09:44:38] <ro_st> i ask because i've got to check access control and status flags on a large document prior to passing back large parts of it
[09:45:43] <Derick> ro_st: the only difference is that if you request fewer fields, less data traffic happens
[09:45:45] <ro_st> the difficulty is that the checks need to be used for a wide variety of actions, some which don't need the large data blob and some which do. so i don't want to fetch the blob into memory just to perform the checks and then dump it
[09:45:52] <Derick> mongodb always "loads" the whole doc in memory
[09:46:42] <Derick> Sorry, I don't get that question
[09:47:43] <ro_st> so if i tell it only to return some fields, that filtering happens in the mongo process, or in the driver outside the mongo process?
[09:47:53] <ro_st> i guess you already answered that with your data traffic question
[10:44:09] <timoxley> Derick a friend said "The biggest problem with reading the oplog to perform hooks is that by the time you read the oplog the database wont be in the same state as it was in when the update was run"
[10:55:42] <Derick> i don't know how long your "extensive query" takes though
[10:56:19] <timoxley> sorry, i don't get what you're saying. What's going to take 15 seconds?
[10:56:52] <Derick> You want your latest X posts added to the object. But you don't want to do that query to find out the X latest ones when you do a post. Right?
[10:56:55] <Tobsn> timoxley, what mongodb version?
[10:58:20] <Tobsn> if it would be 1.6 i wouldve said that there was a major flaw in the replication.. i had in a single server basically 0-2ms and on a two server replica i got 50-120sec
[10:59:44] <Derick> timoxley: that's the difference between primary and secondary though
[10:59:50] <Derick> not what's in the oplog compared to the primary
[10:59:52] <Goopyo> do you guys turn on safemode by default? Or is it not needed when theres journaling?
[11:00:07] <Tobsn> like i said, a normal server replied it within no time.. it was a bug
[11:00:14] <Derick> Goopyo: two totally different things. You probably want safe mode on, journalling on, but fsync off.
[11:00:43] <Tobsn> but that was end of 2009 i think
[11:00:45] <Derick> with safe mode being off, you will never find out about f.e. duplicate key errors, or data not being stored... etc.
[11:00:53] <Tobsn> just thought you might got stuck on an old version/deb
[11:01:10] <Goopyo> I thought safemode protects from a write operation loss during a server death in the middle of the write operation, and thought thats what journaling does too
[11:02:15] <Goopyo> Derick: besides parsing errors which are caught even when safemode is off, what problems can be caught with safemode?
[11:29:42] <statusfailed> Goopyo: haven't quite figured it out, but I will keep looking- thanks for your help!
[11:30:12] <Goopyo> statusfailed: did I misunderstand your question?
[11:30:44] <Goopyo> what I am suggesting you do is apply the function to the data, append it to the array and once you've done all of them, do a batch insert of the array that way its one trip and one operation
[11:33:30] <statusfailed> Sorry, I just meant i'm having trouble figuring out how to get the "upsert" to happen
[11:33:41] <statusfailed> I'm hunting for docs on the flags to the function
[11:34:18] <Goopyo> are you updating the data or saving it? if you're saving it you dont need a upsert, if you are updating and want to insert if its not there, you upsert
[11:34:47] <statusfailed> I'm reading in a list, modifying each item, and then saving the list back
[11:41:00] <Goopyo> it depends on your situtaion. Say you want to take all documents that have 'john' in their name and you want to set name_john to true in all of them you can simply do a db.collection.update({'name' : 'john'}, {'$set' : {'name_john' : True }, multi=True)
[11:43:00] <Goopyo> statusfailed: another option is server sided code: http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-Using%7B%7Bdb.eval%28%29%7D%7D
[11:43:40] <statusfailed> Oh I see- i'm trying to hide the "mongo" layer from other C# code, so the actual function doing the modification is in C#
[11:43:47] <statusfailed> so that has to occur in C# rather than mongo
[11:44:03] <statusfailed> But I can still do a batch save of the updated collection, right?
[11:45:17] <Goopyo> statusfailed: what do you mean by "But I can still do a batch save of the updated collection, right?"
[11:47:25] <statusfailed> I mean saving back the records I originally read out
[11:49:03] <Goopyo> yeah either using a multi-update/bulk insert/atomic operation
[11:57:14] <pilgo> Hi All. I have a collection as such: {name: String, users: [user_id, ...]} and I want to find all the documents that have a specific user_id. I tried this: Lists.find({ users: uid }) but it doesn't work
[11:59:15] <Goopyo> pilgo: is are there multiple userid's per user?
[11:59:41] <Goopyo> i.e is it {name : string, user: { userid : string, other info : blah } ?
[12:00:04] <pilgo> Goopyo: Not in that collection. I'm just storing the uid
[12:01:18] <Goopyo> ah but users is a list of userids right?
[12:06:06] <NodeX> [12:54:49] <pilgo> Hi All. I have a collection as such: {name: String, users: [user_id, ...]} and I want to find all the documents that have a specific user_id. I tried this: Lists.find({ users: uid }) but it doesn't work
[12:06:18] <NodeX> what I wrote is how to achieve what pilgo wants
[12:06:30] <pilgo> NodeX: 'user_id' is just a string
[12:06:40] <Goopyo> NodeX: lol relax man, I am asking for your professional second opinion not giving you blame
[12:07:30] <Goopyo> pilgo: what I am saying is that you're better off with a many-to-many design
[12:07:44] <Goopyo> i.e if you continue with this, mogndob is gonna have ALOT of disk reads
[12:07:59] <NodeX> no I was asking wth you're on about
[12:08:00] <pilgo> Goopyo: Ok, cool. How do I fix it?
[12:09:02] <NodeX> the general principal is go with the least amount of disk reads .. if you query the overall document more than embedded arrays then scope them elsewhere else dont
[12:09:05] <pilgo> NodeX: my doc will look like this: {name: "my awesome list", users: ['dsfdssdfds', 'dsfdasdsfd','dsfdasfd']} . I want to find all the lists that have 'dsfdasfd' in the users.
[12:10:13] <NodeX> doesn't matter what your doc looks like
[12:10:20] <NodeX> it matters how you query your doc
[12:10:28] <Goopyo> since you are querying from the list
[12:11:13] <NodeX> again it doesnt matter.. it's about how you query your data
[12:11:25] <Goopyo> you are better of creating documents like {'userid' : <SINGLE USER ID> , falls_in_list = Name of list they fall in } that way you can query both ways equally as fast
[12:12:56] <NodeX> he hasn;t explained HOW he queries his data, or which is queried more so no assumptions can be made
[12:13:17] <Goopyo> he did: {name: "my awesome list", users: ['dsfdssdfds', 'dsfdasdsfd','dsfdasfd']} . I want to find all the lists that have 'dsfdasfd' in the users.
[12:13:31] <Goopyo> he's querying by strings in a list in the document
[12:13:31] <pilgo> Ok, first, that the query works fine. I was just messing up my insert :)
[12:17:02] <Goopyo> NodeX: your right that direction and frequency matters, But many to many is the best way to go, especially considering it does do query by users in the list sometimes
[12:19:13] <Goopyo> pilgo my vote is for: Lists - {name: 'Happy' , users: user1}, (name: 'Happy' , users: user2} repeat for as many as you need, index by list name
[12:19:17] <pilgo> NodeX: Nope. They can be shared.
[12:19:41] <NodeX> Goopyo : that's a relational way
[12:19:59] <NodeX> what you gain with less seeks you lose with a second query
[12:23:45] <Goopyo> sometimes he queries by user in the list collection
[12:23:59] <Goopyo> to find the list the user is in, and then to find get the task and run it
[12:24:07] <NodeX> so a list of task names for user XYZ
[12:24:39] <NodeX> if so then a relational list is best
[12:25:04] <Goopyo> I was saying he should make list a many-to-many kind of design like: {'list_name' : name, 'user' : single_user } and make multiple entries for each list
[12:25:24] <Goopyo> so with an index it queries faster
[12:26:16] <NodeX> and even though you're on ignore ...[13:18:44] <NodeX> more queries = slower performance <---- is fully correct
[12:27:11] <Derick> In order to know which is faster, you need to benchmark it
[12:30:13] <NodeX> as I said earlier .. if you just want names of lists for a given user then a relational collection {user:foo, list_name:bar} will probably work best
[12:30:40] <NodeX> index on user, and if in future you need to get all users for list_name:bar then index on list_name too
[12:47:37] <Goopyo> NodeX: That statement is wrong absolutely. It depends on the query types, indexing, etc
[19:03:49] <shadfc> what are people using for offline processing of large amounts of mongo data? it looks like the mongo-hadoop connector queries mongo every time... which i'd like to avoid
[19:09:39] <Killerguy> is that a good idea to do db.fsyncLock() on a primary host on replicaset?
[19:10:44] <kali> Killerguy: this is kind of the point of it :)
[19:10:58] <spillere> I have a save I wanna do, where I will save a userID, and on it's user, a bunch of checkins, like in foursquare, how do I make subobjects?
[19:46:53] <Derick> http://bsonspec.org/#/specification explains how it works
[19:53:29] <shadfc> anyone know if mongo-hadoop can read from BSON instead of querying mongo directly? current docs seem to say no, but i've seen some forum posts which seem to indicate it is possible
[20:15:23] <zirpu> anyone have a good rule of thumb about how much new data a replicaset can take? i keep overflowing the oplog and want to know if there's a way to avoid that.
[20:16:05] <zirpu> it's just log data (access and gc logs) that i'm throwing in. apparently too quickly. so if there's a way to back off and let the replicas catch up that's what i'd like.
[20:16:40] <zirpu> it's a 3 node replicaset. i have w=2, mostly to try to slow down the clients. w/o it i nuked the replicas w/in a few hours.
[20:17:11] <zirpu> i.e. i get into the RS102 "recovering" state that isn't actually recovering.
[20:22:28] <UForgotten> so I disabled authentication on my mongo instances and now my mms monitor agent is failing that it can't authenticate - how can I remove the credentials from the agent? I tried resubmitting the form empty but that does not appear to fix it.
[20:58:04] <ragnok> hi, i installed mongo and it was working fine with my node.js app. then i had to install the lamp stack in the same server. now i tried to run my node app, and i have some error from mongo. is there a possible conflict between mysql and mongodb in a ubuntu 10.04 ?
[21:38:30] <Killerguy> kali, while doing backup do I need to save local.* files?
[21:43:43] <ribo> so, I'm working on a diaster recovery system, my production cluster has 6 nodes, (3 repl of 2 shards). If my disaster recovery server is just a single node, is doing a mongodump from a mongos gateway then a mongorestore on that single node backup server a good enough way to achieve this?
[21:47:04] <millun> i am wondering about a little design thing. i have a table of Records. they can be of 2 types. now, lostRecords extend records class. so do foundRecords (@Entity). so i have 2 tables. i was wondering if it would be better if i marked both extended classes of mine this way: @Entity ("records"). so i would have 1 table (collection)
[21:52:41] <dstorrs> when I do findAndModify -- does it lock the entire collection throughout the request, or can it contain the lock + scan to indices such that only records that might be matches get locked?
[22:13:51] <kenneth> hey, in the c library for mongo:
[22:40:22] <heoa> Does there exist some examples with HTML -markup with mongodb?
[22:41:34] <dstorrs> does anyone know -- is there a notable speed diff between findAndModify()'s update ability and a simple db.coll.remove({ ... })
[22:42:04] <kenneth> dstorrs: remove is super slow
[22:42:16] <kenneth> not sure about find and modify
[22:42:24] <kenneth> heoa: not sure what your question is
[22:42:48] <kenneth> btw mongo people, i just sent y'all a pull request to fix some small issues in the c driver: https://github.com/mongodb/mongo-c-driver/pull/45
[22:44:07] <heoa> kenneth: I want to open an HTML page and when it loads I want to execute "db.scores.save({a : 99})", for example.
[22:44:20] <dstorrs> I'm using Mongo as a job queue. Once I'm done with a job, I could set it to say "state: finished" or I could simply delete it
[22:46:12] <heoa> (hopefully not too old to test things)
[22:46:34] <heoa> (I am running things in localhost...so not needing to worry about bad guys)
[22:47:35] <heoa> http://docs.mongodb.org/manual/reference/javascript/ <-- this one?
[22:47:37] <kenneth> dstorrs: that sounds like a bad idea, using mongodb as a job queue. we went that route and regretted it, it doesn't scale. removing jobs is too expensive
[22:47:57] <heoa> or this one http://www.mongodb.org/display/DOCS/Http+Interface ?
[22:48:45] <kenneth> i'd recommend using redis or a dedicated in memory job queuing system (beanstalk, gear man, resque)
[22:58:54] <kenneth> but it's a major pain to deploy
[23:04:14] <heoa> https://github.com/christkv/node-mongodb-native.git <--- I think I need mongodb driver with node.js to do what I want ?!
[23:05:45] <algernon> heoa: --rest is an option for mongod, not mong
[23:05:56] <algernon> heoa: mongo is the shell, you want to pass --rest to the server.
[23:12:51] <heoa> algernon: Thank you, now I have do some demos.
[23:15:57] <heoa> (ok I need to restart the db somehow...investigating)
[23:18:27] <heoa> http://ajay555.wordpress.com/2011/06/08/mongodb-on-ubuntu-rest-is-not-enabled-use-rest-to-turn-on-error/ found solution in 2011 but not working anymore or conf -file changed...
[23:19:24] <heoa> Is REST still configured by /etc/mongodb.conf?
[23:22:06] <heoa> (I got it running by adding line "rest = true" to the /etc/mongodb.conf, just guessed)
[23:22:15] <heoa> and then running sudo restart mongodb
[23:55:43] <heoa> Is there some command to see everything dump everything in dbs with plain text?