PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 13th of June, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:08:20] <grafman> Thanks again mediocretes, that was exactly what the doctor(manager) ordered!
[00:08:31] <mediocretes> good, glad it worked
[00:08:35] <grafman> Hope I can return the favor one of these days
[01:02:00] <tystr> what's the fastest way to add something to an array in a document?
[01:02:24] <tystr> db.coll.update({ "_id": ObjectId("4fd7de77ab3c45e565000021") }, { "$pushAll": { "attributes": [ { "k": "favorite_foods", "v": "beer" } ] } });
[01:02:58] <tystr> is quite slow with ~5000 key-value pairs inside attributes
[01:08:39] <tystr> ah, it's the indexes
[01:35:41] <tystr> hmm
[01:38:32] <tystr> db.collection.ensureIndex({ "attributes.k" : 1, "attributes.v" : 1 });
[01:38:44] <tystr> ^^ with this index $pushAll into attributes takes 4-5 seconds
[04:54:25] <heoa> Are the instructions here [1] outdated? What is the deb -cmd there? [1] http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian-or-ubuntu-linux/
[04:54:41] <heoa> (in other words, is the apt-get -version good enough?)
[05:15:50] <Kane`> i'm looking into combining mongodb + postgresql to store my log data. any thoughts on my schema? http://codepad.org/c1zxZqfL
[05:34:41] <algernon> Kane`: looks like a weird combo.
[06:33:45] <maik_> good morning from germany
[06:37:02] <Kane`> algernon, how so?
[06:40:58] <ranman> maik_: guten morgen, wo genau bleibst du da?
[07:57:32] <algernon> Kane`: why mongodb AND postgre? For logs, mongodb itself is enough, and not having to use two dbs makes things a whole lot simpler.
[08:33:44] <Touns> Hello
[08:35:37] <NodeX> hi
[08:39:04] <Kane`> algernon, i'm playing around with using just mongodb now. will see how it works out :}
[08:58:48] <timoxley> how do people deal with things like "when canonical data X changes, reflect changes in caches by updating A, B and C"
[08:59:42] <timoxley> my first idea was 'triggers', but this isn't yet implemented https://jira.mongodb.org/browse/SERVER-124
[08:59:52] <timoxley> so perhaps my modeling of the problem is wrong
[09:00:47] <Derick> timoxley: will a trick like this work: http://drck.me/mongosolr-9fs ?
[09:04:07] <timoxley> Derick possibly, I had hoped to avoid tailing oplog, but perhaps it's the only way
[09:04:25] <timoxley> it sorta feels like sorting through someones garbage to see what they've been eating
[09:05:20] <timoxley> though that is how the replication works so it's probably fine.
[09:06:59] <NodeX> timoxley : don't you abstract your data updates / additions in your app ?
[09:09:06] <NodeX> tailing cursors and adding an entire document to SOLR is a very bloated approach... if one doesn't need to search every field in the document then it should not be indexed
[09:09:12] <NodeX> (in solr)
[09:09:49] <timoxley> NodeX yeah, but we're using this mongoose thing… with which you can easily sidestep the pre/post save hooks by doing bulk operations… or using update syntax directly, or using the driver directly. I feel like there's little point having post save hooks if it's easy to forget about them and have them not fire
[09:10:34] <timoxley> I've actually been struggling a lot with mongo recently as a result of not being able to know when particular events happen
[09:10:36] <NodeX> for me I just added a quick function to my update -(upsert) which took the fields I wanted and posted them to solr
[09:10:58] <NodeX> but .. I don't update a million document a minute so it won't work for everyone
[09:11:21] <NodeX> a very good tip I learnt with solr is if you have a large amount of docs to add o
[09:11:22] <timoxley> NodeX you're using YOUR abstraction to interface with mongo though right?
[09:11:28] <NodeX> ** to it ... use csv imports
[09:11:35] <timoxley> i'm not even using solr
[09:11:47] <NodeX> sorry I sssumed you were
[09:11:51] <timoxley> but good tip…
[09:12:00] <Derick> timoxley: not the point. you can use the same thing to update your caches of course
[09:12:01] <timoxley> do people typically use solr + mongo?
[09:12:19] <timoxley> Derick yep that's what I figured you were trying to suggest
[09:12:30] <timoxley> s/trying to suggest/suggesting
[09:12:31] <Derick> NodeX: And of course, you don't have to index every field... your script should handle that ;-)
[09:13:01] <NodeX> it's still only good if you "cap" your collections
[09:13:12] <NodeX> much better to abstract it in your classes
[09:13:43] <NodeX> timoxley : alot of people use elasticsearch + mongo too - they have simalarities
[09:37:39] <timoxley> Derick tailing oplog wasn't so scary after I looked at it… I think it's the way forward for me. thanks for the help
[09:39:13] <Derick> np
[09:39:20] <Derick> it's actually a lot easier that it sounds :-)
[09:43:56] <ro_st> performance-wise, is it better to load a whole document into memory and query that data in-memory, or better to query just the bits that i need each time i need to check something?
[09:44:38] <ro_st> i ask because i've got to check access control and status flags on a large document prior to passing back large parts of it
[09:45:43] <Derick> ro_st: the only difference is that if you request fewer fields, less data traffic happens
[09:45:45] <ro_st> the difficulty is that the checks need to be used for a wide variety of actions, some which don't need the large data blob and some which do. so i don't want to fetch the blob into memory just to perform the checks and then dump it
[09:45:52] <Derick> mongodb always "loads" the whole doc in memory
[09:46:09] <ro_st> on the driver side?
[09:46:42] <Derick> Sorry, I don't get that question
[09:47:43] <ro_st> so if i tell it only to return some fields, that filtering happens in the mongo process, or in the driver outside the mongo process?
[09:47:53] <ro_st> i guess you already answered that with your data traffic question
[09:47:58] <Derick> in the mongo process
[09:48:09] <ro_st> cool. then i need to use multiple queries
[09:48:33] <Derick> IMO, you should try both and see what works better in your case
[09:49:32] <ro_st> yeah
[09:49:34] <ro_st> ok, thanks
[10:44:09] <timoxley> Derick a friend said "The biggest problem with reading the oplog to perform hooks is that by the time you read the oplog the database wont be in the same state as it was in when the update was run"
[10:44:22] <timoxley> is that a valid concern
[10:44:25] <timoxley> ?
[10:44:38] <Derick> depends on what your hooks do
[10:47:35] <timoxley> For example. I want to take N most recent comments for a "Post" and embed them into the Post
[10:48:06] <timoxley> Derick whenever a new comment is added, I want to run the hook that adds the item to the post
[10:48:30] <timoxley> how out of date is the oplog expected to be?
[10:49:22] <Derick> timoxley: on the same machine, probably not at all
[10:49:45] <Derick> but if you want to do things like "10 latest posts" I suggest you do that in your app
[10:49:47] <timoxley> and if I only run my oplog checks on the master node…
[10:49:59] <Derick> yeah, that should be ok
[10:50:55] <timoxley> Derick again, problem is I can't definitively "know" what the 10 latest posts are without making an expensive query
[10:51:19] <Derick> but you end up doing that expensive query anyway
[10:51:28] <Derick> is it okay if that cache lags a bit?
[10:51:42] <Derick> with a bit being at max 15 secs or so
[10:54:37] <timoxley> 15 seconds is pretty extreme, the whole application could have changed by then…
[10:55:08] <timoxley> Derick replication latency isn't typically in the 15 second range is it?
[10:55:31] <Derick> no, it's not
[10:55:42] <Derick> i don't know how long your "extensive query" takes though
[10:56:19] <timoxley> sorry, i don't get what you're saying. What's going to take 15 seconds?
[10:56:52] <Derick> You want your latest X posts added to the object. But you don't want to do that query to find out the X latest ones when you do a post. Right?
[10:56:55] <Tobsn> timoxley, what mongodb version?
[10:57:17] <timoxley> Tobsn v2.0.6
[10:57:20] <Tobsn> hmm
[10:57:22] <Tobsn> nevermind then
[10:57:24] <timoxley> is that old
[10:57:28] <Tobsn> no
[10:57:30] <Tobsn> thats fine
[10:57:32] <Derick> no, it's latest stable
[10:58:20] <Tobsn> if it would be 1.6 i wouldve said that there was a major flaw in the replication.. i had in a single server basically 0-2ms and on a two server replica i got 50-120sec
[10:58:27] <Tobsn> but thats a _while_ ago
[10:58:36] <Tobsn> or even 1.5? cant remember
[10:59:13] <timoxley> 50seconds for replication sounds pretty average
[10:59:20] <timoxley> as in
[10:59:30] <timoxley> was that under heavy load?
[10:59:42] <timoxley> Tobsn
[10:59:44] <Derick> timoxley: that's the difference between primary and secondary though
[10:59:50] <Derick> not what's in the oplog compared to the primary
[10:59:52] <Goopyo> do you guys turn on safemode by default? Or is it not needed when theres journaling?
[11:00:07] <Tobsn> like i said, a normal server replied it within no time.. it was a bug
[11:00:14] <Derick> Goopyo: two totally different things. You probably want safe mode on, journalling on, but fsync off.
[11:00:43] <Tobsn> but that was end of 2009 i think
[11:00:45] <Derick> with safe mode being off, you will never find out about f.e. duplicate key errors, or data not being stored... etc.
[11:00:53] <Tobsn> just thought you might got stuck on an old version/deb
[11:01:10] <Goopyo> I thought safemode protects from a write operation loss during a server death in the middle of the write operation, and thought thats what journaling does too
[11:02:15] <Goopyo> Derick: besides parsing errors which are caught even when safemode is off, what problems can be caught with safemode?
[11:02:23] <Goopyo> in a basic save operation
[11:02:26] <Derick> Goopyo: I think you confused "fsync" with "safemode" here
[11:02:51] <Derick> Goopyo: duplicate keys for unique indexes, a primary disappearing or reverting to a secondary
[11:03:10] <Derick> Goopyo: the client will *never* check error conditions in data updates/inserts/deletes with safemode off
[11:03:32] <Goopyo> wow. Thats crazy for it to be off by default in most drivers
[11:04:14] <Derick> I don't disagree...
[11:17:36] <statusfailed> I'm using C#, is it possible to do a query, apply a function to each result, and save all the results?
[11:17:41] <spillere> how can i add a for to create a sub object? idk if its the right way to say it, but can check here: http://pastie.org/4079215
[11:17:48] <statusfailed> Basically I want to update a whole bunch of things at once, and i'm not sure what to use
[11:18:50] <statusfailed> At the moment i'm using foreach, AsQueryable, and writing doing many writes to save results
[11:21:29] <Goopyo> statusfailed: never used C# but check if your driver has insert
[11:21:52] <Goopyo> insert with upsert true is what your looking for I think
[11:22:05] <statusfailed> insret for each element?
[11:22:08] <statusfailed> or InsertBatch?
[11:22:18] <Goopyo> batch insert
[11:29:42] <statusfailed> Goopyo: haven't quite figured it out, but I will keep looking- thanks for your help!
[11:30:12] <Goopyo> statusfailed: did I misunderstand your question?
[11:30:44] <Goopyo> what I am suggesting you do is apply the function to the data, append it to the array and once you've done all of them, do a batch insert of the array that way its one trip and one operation
[11:33:30] <statusfailed> Sorry, I just meant i'm having trouble figuring out how to get the "upsert" to happen
[11:33:41] <statusfailed> I'm hunting for docs on the flags to the function
[11:34:18] <Goopyo> are you updating the data or saving it? if you're saving it you dont need a upsert, if you are updating and want to insert if its not there, you upsert
[11:34:47] <statusfailed> I'm reading in a list, modifying each item, and then saving the list back
[11:34:53] <statusfailed> so I just use Save?
[11:35:34] <Goopyo> "I'm reading in a list" < is that from mongodb?
[11:35:59] <statusfailed> Ok, so there's a collection of items in mongodb, I want to modify each item in that collection
[11:36:14] <statusfailed> Currently, i'm reading them all from mongo, modifying them all, and then saving back
[11:36:19] <statusfailed> i'm just not sure which method to save back with
[11:36:39] <Goopyo> are you changing a value in each one? or completely modifying them?
[11:37:53] <statusfailed> Goopyo: Changing a value? you mean like a single field?
[11:37:59] <Goopyo> yeah
[11:38:06] <statusfailed> nope, could be fairly arbitrary
[11:38:11] <statusfailed> potentially everything
[11:38:57] <Goopyo> then you can either do them individually or if you can categorize them do a update with multi flag set to true
[11:39:10] <statusfailed> sorry, categorize?
[11:41:00] <Goopyo> it depends on your situtaion. Say you want to take all documents that have 'john' in their name and you want to set name_john to true in all of them you can simply do a db.collection.update({'name' : 'john'}, {'$set' : {'name_john' : True }, multi=True)
[11:43:00] <Goopyo> statusfailed: another option is server sided code: http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server-sideCodeExecution-Using%7B%7Bdb.eval%28%29%7D%7D
[11:43:40] <statusfailed> Oh I see- i'm trying to hide the "mongo" layer from other C# code, so the actual function doing the modification is in C#
[11:43:47] <statusfailed> so that has to occur in C# rather than mongo
[11:44:03] <statusfailed> But I can still do a batch save of the updated collection, right?
[11:45:17] <Goopyo> statusfailed: what do you mean by "But I can still do a batch save of the updated collection, right?"
[11:47:25] <statusfailed> I mean saving back the records I originally read out
[11:49:03] <Goopyo> yeah either using a multi-update/bulk insert/atomic operation
[11:49:03] <Goopyo> a
[11:49:04] <Goopyo> ga
[11:49:12] <Goopyo> again that depends on your usecase
[11:50:52] <coldwind> anyone using morphia?
[11:57:14] <pilgo> Hi All. I have a collection as such: {name: String, users: [user_id, ...]} and I want to find all the documents that have a specific user_id. I tried this: Lists.find({ users: uid }) but it doesn't work
[11:59:15] <Goopyo> pilgo: is are there multiple userid's per user?
[11:59:41] <Goopyo> i.e is it {name : string, user: { userid : string, other info : blah } ?
[12:00:04] <pilgo> Goopyo: Not in that collection. I'm just storing the uid
[12:01:18] <Goopyo> ah but users is a list of userids right?
[12:04:43] <NodeX> pilgo ... db.foo.find({"users.user_id":"1234"});
[12:05:09] <Goopyo> NodeX: isn't that bad db design?
[12:05:27] <pilgo> Goopyo: yeah
[12:05:56] <NodeX> why is it bad db design ?
[12:06:05] <NodeX> and I didn't design it ....
[12:06:06] <NodeX> [12:54:49] <pilgo> Hi All. I have a collection as such: {name: String, users: [user_id, ...]} and I want to find all the documents that have a specific user_id. I tried this: Lists.find({ users: uid }) but it doesn't work
[12:06:18] <NodeX> what I wrote is how to achieve what pilgo wants
[12:06:30] <pilgo> NodeX: 'user_id' is just a string
[12:06:40] <Goopyo> NodeX: lol relax man, I am asking for your professional second opinion not giving you blame
[12:07:03] <NodeX> relax ? .. who's mad ?
[12:07:10] <NodeX> pilgo ??????
[12:07:17] <pilgo> Not me!
[12:07:18] <pilgo> :p
[12:07:30] <Goopyo> pilgo: what I am saying is that you're better off with a many-to-many design
[12:07:44] <Goopyo> i.e if you continue with this, mogndob is gonna have ALOT of disk reads
[12:07:59] <NodeX> no I was asking wth you're on about
[12:08:00] <pilgo> Goopyo: Ok, cool. How do I fix it?
[12:09:02] <NodeX> the general principal is go with the least amount of disk reads .. if you query the overall document more than embedded arrays then scope them elsewhere else dont
[12:09:05] <pilgo> NodeX: my doc will look like this: {name: "my awesome list", users: ['dsfdssdfds', 'dsfdasdsfd','dsfdasfd']} . I want to find all the lists that have 'dsfdasfd' in the users.
[12:10:13] <NodeX> doesn't matter what your doc looks like
[12:10:20] <NodeX> it matters how you query your doc
[12:10:24] <Goopyo> pilgo: exactly
[12:10:28] <Goopyo> since you are querying from the list
[12:11:13] <NodeX> again it doesnt matter.. it's about how you query your data
[12:11:25] <Goopyo> you are better of creating documents like {'userid' : <SINGLE USER ID> , falls_in_list = Name of list they fall in } that way you can query both ways equally as fast
[12:11:44] <NodeX> that's bad advice
[12:11:58] <NodeX> no offence
[12:12:13] <Goopyo> none taken, why I asked for the second opinion
[12:12:16] <Goopyo> please elaborate though
[12:12:56] <NodeX> he hasn;t explained HOW he queries his data, or which is queried more so no assumptions can be made
[12:13:17] <Goopyo> he did: {name: "my awesome list", users: ['dsfdssdfds', 'dsfdasdsfd','dsfdasfd']} . I want to find all the lists that have 'dsfdasfd' in the users.
[12:13:31] <Goopyo> he's querying by strings in a list in the document
[12:13:31] <pilgo> Ok, first, that the query works fine. I was just messing up my insert :)
[12:13:39] <NodeX> that's not HOW he queries it
[12:13:48] <NodeX> that's the schema
[12:14:01] <Goopyo> "find all the lists that have 'dsfdasfd' in the users." is how he queries it
[12:14:15] <NodeX> that's one query
[12:14:26] <NodeX> doesn't mean it's how it's always queried
[12:14:36] <pilgo> Ok, so for how I query, I need to find lists that belong to users, tasks that belong in a list
[12:14:48] <NodeX> only those 2 ways ?
[12:14:56] <NodeX> which one of those 2 get's queried more ?
[12:15:28] <pilgo> Tasks in list
[12:15:50] <NodeX> I don't see tasks in your schema
[12:15:50] <Goopyo> like way more?
[12:15:51] <Goopyo> or
[12:16:16] <pilgo> Ok, sorry. I have another collection. Let me lay out all three...
[12:16:21] <NodeX> s/schema/document
[12:16:42] <pilgo> Lists -- {name: String, users: [user_id, ...]}
[12:16:50] <pilgo> Users -- {name: String}
[12:17:02] <Goopyo> NodeX: your right that direction and frequency matters, But many to many is the best way to go, especially considering it does do query by users in the list sometimes
[12:17:09] <pilgo> {text: String, completed: Boolean, assignee: String, list_id: String}
[12:17:19] <pilgo> How do I fix this abomination?
[12:17:19] <NodeX> if you're just asking for all documents users.foo = bar then embeded is probably best
[12:18:40] <pilgo> Ok, actually, there's no reason to not have tasks embedded in lists, right?
[12:18:55] <NodeX> do the tasks fall under a sepcific user ?
[12:18:59] <NodeX> specific*
[12:19:13] <Goopyo> pilgo my vote is for: Lists - {name: 'Happy' , users: user1}, (name: 'Happy' , users: user2} repeat for as many as you need, index by list name
[12:19:17] <pilgo> NodeX: Nope. They can be shared.
[12:19:41] <NodeX> Goopyo : that's a relational way
[12:19:59] <NodeX> what you gain with less seeks you lose with a second query
[12:20:18] <Goopyo> nope
[12:20:22] <NodeX> k dude
[12:20:35] <NodeX> pilgo : does a user always have to exist for a task ?
[12:20:44] <Goopyo> because if its indexed by list name the seeks are gonna be very small
[12:21:00] <NodeX> seeks / query /
[12:21:09] <NodeX> more queries = slower performance
[12:21:17] <Goopyo> so wrong
[12:21:22] <NodeX> cool story
[12:21:25] <Goopyo> more bad queries = slower performance
[12:21:36] <pilgo> NodeX: users exist for a list not a task, but by association, yes
[12:21:45] <Goopyo> and bad queries are reading every list of every document looking for something
[12:22:05] <Goopyo> Derick: can we get your opinion here?
[12:22:06] <NodeX> depending on data numbers I would save the tasks inside each user
[12:22:24] <NodeX> ignred lol
[12:22:40] <Derick> Goopyo: on? wasn't paying attention :)
[12:22:48] <pilgo> NodeX: tasks inside each user?!!
[12:22:56] <NodeX> pilgo : what do you ultiimately want back from your query ?
[12:23:06] <Goopyo> Derick: he has three collections as so:
[12:23:08] <NodeX> the user itself ?
[12:23:12] <pilgo> NodeX: tasks in a list for the current user.
[12:23:18] <Goopyo> Lists -- {name: String, users: [user_id, ...]}
[12:23:19] <Goopyo> Users -- {name: String} and
[12:23:32] <Goopyo> Tasks - {text: String, completed: Boolean, assignee: String, list_id: String}
[12:23:45] <Goopyo> sometimes he queries by user in the list collection
[12:23:59] <Goopyo> to find the list the user is in, and then to find get the task and run it
[12:24:07] <NodeX> so a list of task names for user XYZ
[12:24:39] <NodeX> if so then a relational list is best
[12:25:04] <Goopyo> I was saying he should make list a many-to-many kind of design like: {'list_name' : name, 'user' : single_user } and make multiple entries for each list
[12:25:24] <Goopyo> so with an index it queries faster
[12:26:16] <NodeX> and even though you're on ignore ...[13:18:44] <NodeX> more queries = slower performance <---- is fully correct
[12:27:11] <Derick> In order to know which is faster, you need to benchmark it
[12:27:27] <Derick> just don't assume things
[12:27:32] <NodeX> ^^ 15 minutes ago I said this
[12:28:09] <pilgo> Derick: Good idea. After I get a few 100k users :)
[12:28:12] <NodeX> apparently lists / many to many are a one fit for all though
[12:28:16] <pilgo> Derick: How would I do that?
[12:28:28] <Derick> pilgo: implement both and test it?
[12:28:50] <pilgo> Derick: Ok, that's cool
[12:28:54] <pilgo> Thanks for your help everyone
[12:29:03] <NodeX> pilgo : what works for my data will not necesarily work for your
[12:29:04] <NodeX> s
[12:29:07] <NodeX> and VV
[12:29:19] <pilgo> Right, right.
[12:30:13] <NodeX> as I said earlier .. if you just want names of lists for a given user then a relational collection {user:foo, list_name:bar} will probably work best
[12:30:40] <NodeX> index on user, and if in future you need to get all users for list_name:bar then index on list_name too
[12:47:37] <Goopyo> NodeX: That statement is wrong absolutely. It depends on the query types, indexing, etc
[12:48:04] <NodeX> fail
[12:48:36] <NodeX> I dont think you understand what the user is trying to accomplish Goopyo
[12:49:00] <Goopyo> meh I'm over it now.
[12:49:21] <NodeX> :)
[13:08:29] <remonvv> NodeX, who does really.
[13:08:40] <NodeX> 42
[13:08:49] <NodeX> wait, what ?
[13:08:50] <NodeX> lol
[13:09:09] <remonvv> "I want to do X", "But kind sir, this program only does Y", "Well that's just stupid!"
[13:59:30] <stevie-bash> Hello, I see a lot querys/updates (db.currentOp) which have no namespace "ns" set. How can this happen?
[14:06:39] <W0rmDrink> Hi
[14:06:56] <W0rmDrink> is there any regresisons left in 2.0.6 wrt 1.8.3(or 5)
[14:07:15] <W0rmDrink> I need to upgrade - but its either to 1.8.5 or 2.0.6
[14:10:48] <ro_st> is there a way to return a document based on whether or not an array within it has a length > 0?
[14:11:10] <ro_st> nm. just found $size
[14:40:05] <remonvv> stevie-bash, pastie
[14:40:36] <remonvv> W0rmDrink, 2.0.6. I'm not aware of any regression issues and if there were any they'd be top priority fixes I would assume.
[14:45:18] <W0rmDrink> ehrm
[14:45:29] <W0rmDrink> is this something I have to worry about https://jira.mongodb.org/browse/SERVER-5988 ?
[14:56:57] <stevie-bash> remonvv: http://pastie.org/private/azqkbvwgaaebe1s2psrfrw
[15:04:30] <remonvv> stevie-bash, not sure, replication op perhaps?
[15:05:37] <stevie-bash> possible, since we use replicasets
[16:40:39] <multiHYP> hi
[17:18:10] <diversario> I have a question about nodejs driver. Specifically, I want to stream results of multiple queries, one by one, somehow.
[17:25:40] <grizlo42> what output strategy does map reduce use for legacy versions?
[17:26:01] <grizlo42> my assumption would be replace, but is there any way to get a merge functionality?
[17:37:31] <spillere> one thing i find really annoying on the mongo is that when I press up key, it doesnt go to the last command
[17:39:32] <algernon> rlwrap to the rescue
[17:45:41] <spillere> algernon: ?
[17:46:21] <algernon> spillere: rlwrap mongo. it will do magic, and make the shell have history and up/down keys working. usually, anyway.
[17:46:33] <spillere> cool
[17:46:42] <spillere> that's what im looking for :)
[17:46:45] <spillere> how do I do it?
[17:46:56] <algernon> you install rlwrap, and type "rlwrap mongo"
[17:47:07] <algernon> depending on your os, installing rlwrap may vary
[17:47:07] <spillere> apt-get?
[17:47:15] <algernon> apt-get install rlwrap
[17:47:51] <spillere> thanks!
[17:49:39] <spillere> algernon: how would that work then?
[17:58:32] <diversario> Does node-mongodb-native support streaming from multiple cursors at once? From same collection over 1 connection.
[18:04:52] <spillere> algernon this looks awesome!!! http://mongohub.todayclose.com/screenshots
[18:12:22] <mkmkmk> is there a way to specify which shard i want to be the primary for a database (or collection)
[18:26:24] <edvorkin> how to import date using mongoimport utility?
[18:26:25] <edvorkin> I am trying to use following json format in file:
[18:26:27] <edvorkin> { "uid" : "1234567", "activity" : "search", "value" : "test article2", "date" : { "$date" : "2011-05-03T17:09:55.359Z" } }
[18:26:28] <edvorkin> then
[18:26:30] <edvorkin> mongoimport -c activity activity.json
[18:26:31] <edvorkin> and got error message
[18:26:33] <edvorkin> Wed Jun 13 14:21:16 Assertion: 10338:Invalid use of reserved field name
[18:26:34] <edvorkin> This is because of my usage of "$date". What I am doing wring converting into mongoDB date?
[18:26:36] <edvorkin> Thanks
[18:26:53] <kchodoro_> mkmkmk: no, but you can use the moveprimary command after the fact to change it
[18:28:35] <rick446> $date should be milliseconds since Jan 1, 1970
[18:30:11] <mkmkmk> kchodoro_: thanks
[18:31:48] <edvorkin> rick446: Is that the only way to move date field from file into mongo?
[18:34:22] <rick446> unless you'd care to write a handcrafted .bson file, I believe it is.
[18:42:53] <edvorkin> rick446: thankls
[19:03:49] <shadfc> what are people using for offline processing of large amounts of mongo data? it looks like the mongo-hadoop connector queries mongo every time... which i'd like to avoid
[19:09:39] <Killerguy> is that a good idea to do db.fsyncLock() on a primary host on replicaset?
[19:09:53] <Killerguy> it seems it freeze it
[19:09:57] <kali> nope :)
[19:09:59] <kali> bad idea
[19:10:02] <Killerguy> héhé
[19:10:07] <Killerguy> can you tell me why?
[19:10:15] <kali> beacause it freezes it ? :)
[19:10:25] <Killerguy> and why it freezes it? :p
[19:10:44] <kali> Killerguy: this is kind of the point of it :)
[19:10:58] <spillere> I have a save I wanna do, where I will save a userID, and on it's user, a bunch of checkins, like in foursquare, how do I make subobjects?
[19:11:01] <spillere> something like
[19:11:03] <spillere> db.users.save({'name':'daniel', 'checkin': { 'name1': 'name1', 'geo1':'geo1' }, { 'name1': 'name2', 'geo1':'geo2'}})
[19:11:53] <kali> spillere: add [] around your checkin array value, i guess
[19:11:55] <shadfc> spillere: make checkins an array of objects
[19:12:28] <spillere> db.users.save({'name':'daniel', 'checkin': [{ 'name1': 'name1', 'geo1':'geo1' }, { 'name1': 'name2', 'geo1':'geo2'}]})
[19:12:31] <spillere> like this?
[19:12:39] <Killerguy> hum I thought flock on primary would prevent all shard to do write, and then I could do backup
[19:14:23] <kali> Killerguy: you can backup, the low-level way, copying the files or making a FS-level snapshot
[19:14:44] <kali> Killerguy: this is exactly what the lock() is made for
[19:14:58] <kali> Killerguy: but it's best to do it on a secondary
[19:15:11] <Killerguy> I think I will add a hidden on in my shard
[19:15:19] <Killerguy> offline it, rsync, then restart it
[19:15:22] <kali> Killerguy: sounds good
[19:15:38] <Killerguy> cool :)
[19:16:20] <kali> Killerguy: by the way, where are you ? "héhé" sound french, but the whois hints farther north
[19:16:39] <Killerguy> fuck I'm discover :D
[19:18:17] <Killerguy> got a question for config server backup, is that really good to offline one of the three?
[19:18:27] <Killerguy> and copy db
[19:18:59] <Killerguy> or better mongodump config db from mongos
[19:28:39] <Killerguy> kali? :=)
[19:30:18] <kali> Killerguy: when we have to do this, we usually stop one, copy the files to the new server and start it :)
[19:31:19] <kali> Killerguy: at least, with one stopped, you know the state will not evolve, so you don't have to worry about consistency
[19:37:48] <Killerguy> okay kali thx :)
[19:46:04] <multiHYP> how does bson look like? lets say bson of: {"name": "multiHYP", "occupation":"programmer"}
[19:46:11] <multiHYP> is it byte form?
[19:46:41] <Derick> yes
[19:46:53] <Derick> http://bsonspec.org/#/specification explains how it works
[19:53:29] <shadfc> anyone know if mongo-hadoop can read from BSON instead of querying mongo directly? current docs seem to say no, but i've seen some forum posts which seem to indicate it is possible
[20:15:23] <zirpu> anyone have a good rule of thumb about how much new data a replicaset can take? i keep overflowing the oplog and want to know if there's a way to avoid that.
[20:16:05] <zirpu> it's just log data (access and gc logs) that i'm throwing in. apparently too quickly. so if there's a way to back off and let the replicas catch up that's what i'd like.
[20:16:40] <zirpu> it's a 3 node replicaset. i have w=2, mostly to try to slow down the clients. w/o it i nuked the replicas w/in a few hours.
[20:17:11] <zirpu> i.e. i get into the RS102 "recovering" state that isn't actually recovering.
[20:22:28] <UForgotten> so I disabled authentication on my mongo instances and now my mms monitor agent is failing that it can't authenticate - how can I remove the credentials from the agent? I tried resubmitting the form empty but that does not appear to fix it.
[20:58:04] <ragnok> hi, i installed mongo and it was working fine with my node.js app. then i had to install the lamp stack in the same server. now i tried to run my node app, and i have some error from mongo. is there a possible conflict between mysql and mongodb in a ubuntu 10.04 ?
[21:38:30] <Killerguy> kali, while doing backup do I need to save local.* files?
[21:43:43] <ribo> so, I'm working on a diaster recovery system, my production cluster has 6 nodes, (3 repl of 2 shards). If my disaster recovery server is just a single node, is doing a mongodump from a mongos gateway then a mongorestore on that single node backup server a good enough way to achieve this?
[21:44:23] <millun> hi guys
[21:45:59] <multiHYP> hi multi_io
[21:46:09] <multiHYP> hi millun
[21:46:14] <multiHYP> multi_io: !
[21:46:40] <multiHYP> how is multi_io helpful here?
[21:46:42] <multiHYP> multi_io: ?
[21:46:47] <multiHYP> multi_io: help
[21:46:51] <multiHYP> !help
[21:46:51] <pmxbot> !8ball (8) !acronym (ac) !anchorman !annoy (a, bother) !bender (bend) !bitchingisuseless (qbiu) !blame !bless !boo !bottom10 (bottom) !calc !chain !cheer (c) !c
[21:46:51] <pmxbot> ompliment (surreal) !ctlaltdel (cad, controlaltdelete, controlaltdelete, quit, restart) !curse !dance (d) !deal !define (def) !demotivate (dm) !disembowel (dis,
[21:46:52] <pmxbot> eviscerate) !duck (ducky) !embowel (reembowel) !excuse (e ) !featurecreep (fc) !fight !flip !fml !gettowork (gtw) !golfclap (clap) !google (g) !googlecalc (gc)
[21:46:52] <pmxbot> !grail !hal (2001) !hangover !help (h) !hire !imotivate (im, ironicmotivate) !insult !job (card) !karma (k) !keelhaul (kh) !klingon (klingonism) !logo !lunch (
[21:46:52] <pmxbot> lunchpick, lunchpicker) !meaculpa (apologize, apology) !motivate (appreciate, m, thank, thanks) !murphy (law) !nailedit (n, nail) !nastygram (bcc, nerf, passive
[21:46:53] <pmxbot> ) !notify !oregontrail (otrail) !panic (pc) !password (passwd, pw) !paste !pick (p, p:, pick:) !progress !quote (q) !r (r) !resolv !roll !rubberstamp (approve)
[21:46:53] <pmxbot> !saysomething !simpsons (simp) !stab (shank, shiv) !storytime (story) !strategy !strike !tgif !therethere (comfort, poor) !ticker (t) !time !tinytear (cry, tear
[21:46:53] <pmxbot> , tt) !top10 (top) !troutslap (slap, ts) !urbandict (ud, urb, urbandef, urbandefine, urbandictionary, urbdef) !weather (w) !where (last, lastseen, seen) !zinger
[21:46:54] <pmxbot> (zing) !zoidberg (zoid)
[21:47:02] <multiHYP> oh wow sorry
[21:47:04] <millun> i am wondering about a little design thing. i have a table of Records. they can be of 2 types. now, lostRecords extend records class. so do foundRecords (@Entity). so i have 2 tables. i was wondering if it would be better if i marked both extended classes of mine this way: @Entity ("records"). so i would have 1 table (collection)
[21:47:24] <millun> do i make sense?
[21:47:34] <multiHYP> yes
[21:47:37] <multiHYP> optimize it
[21:47:51] <multiHYP> collection, document or row in mongodb land
[21:48:12] <millun> so, you think i am on right track with @Entity(records) on both collections?
[21:48:20] <millun> like, it will be one collection
[21:48:40] <millun> i am a kind of n00b :)
[21:49:01] <millun> i'm on it
[21:52:41] <dstorrs> when I do findAndModify -- does it lock the entire collection throughout the request, or can it contain the lock + scan to indices such that only records that might be matches get locked?
[22:13:51] <kenneth> hey, in the c library for mongo:
[22:13:53] <kenneth> return (bson_iterator*)malloc(sizeof(bson_iterator*));
[22:13:57] <kenneth> shouldn't this be
[22:13:59] <kenneth> return (bson_iterator*)malloc(sizeof(bson_iterator));
[22:40:22] <heoa> Does there exist some examples with HTML -markup with mongodb?
[22:41:34] <dstorrs> does anyone know -- is there a notable speed diff between findAndModify()'s update ability and a simple db.coll.remove({ ... })
[22:42:04] <kenneth> dstorrs: remove is super slow
[22:42:16] <kenneth> not sure about find and modify
[22:42:24] <kenneth> heoa: not sure what your question is
[22:42:48] <kenneth> btw mongo people, i just sent y'all a pull request to fix some small issues in the c driver: https://github.com/mongodb/mongo-c-driver/pull/45
[22:44:07] <heoa> kenneth: I want to open an HTML page and when it loads I want to execute "db.scores.save({a : 99})", for example.
[22:44:20] <dstorrs> I'm using Mongo as a job queue. Once I'm done with a job, I could set it to say "state: finished" or I could simply delete it
[22:44:28] <dstorrs> kenneth: ^
[22:44:56] <kenneth> heoa: presumably you have a backend you're using for this?
[22:45:03] <kenneth> exposing your db directly to the world is a bad idea though
[22:45:38] <heoa> kenneth: Yes, I have. I installed things from apt-get.
[22:45:48] <heoa> with mongodb.*
[22:46:12] <heoa> (hopefully not too old to test things)
[22:46:34] <heoa> (I am running things in localhost...so not needing to worry about bad guys)
[22:47:35] <heoa> http://docs.mongodb.org/manual/reference/javascript/ <-- this one?
[22:47:37] <kenneth> dstorrs: that sounds like a bad idea, using mongodb as a job queue. we went that route and regretted it, it doesn't scale. removing jobs is too expensive
[22:47:57] <heoa> or this one http://www.mongodb.org/display/DOCS/Http+Interface ?
[22:48:45] <kenneth> i'd recommend using redis or a dedicated in memory job queuing system (beanstalk, gear man, resque)
[22:49:52] <dstorrs> kenneth: well crap.
[22:50:00] <dstorrs> I originally went with Gearman, but it was a disaster.
[22:50:54] <kenneth> dstorrs: resque is pretty neat if you're only using php/ruby or don't mind implementing their relatively simple protocol
[22:50:56] <dstorrs> It would crash with an uninformative error in the driver, there's no support available, hardly any docs, etc
[22:51:11] <kenneth> dstorrs: we use zeromq here
[22:51:38] <kenneth> but that's more of a communication library and you have to hold the queued jobs somewhere if you want to persist them
[22:51:40] <heoa> $ mongo --rest
[22:51:41] <heoa> ERROR: unknown option rest
[22:51:47] <dstorrs> I'm using Perl, and I've got a tradeshow in about 2 weeks.
[22:52:00] <heoa> http://www.mongodb.org/display/DOCS/Http+Interface <--- is this tutorial correct or am I running wrong cmd?
[22:52:20] <kenneth> dstorrs: does it need to scale? if your traffic is small you;ll be fine on mongo
[22:52:28] <heoa> "This interface is disabled by default. Use --rest on the command line to enable."
[22:52:37] <kenneth> but if you're hammering it with hundreds of jobs per second like we were, it won't work
[22:52:50] <kenneth> heoa: can't help you there, got no experience with that
[22:52:57] <dstorrs> heoa: ditto
[22:53:18] <dstorrs> kenneth: we've got 30k YouTube channels that we need to harvest each hour
[22:54:14] <kenneth> dstorrs: try redis
[22:54:17] <dstorrs> for each channel I process 1 job per page of the feed. most feeds are 1-4 pages, but a few are up to a 1000
[22:54:35] <zirpu> yeah, redis is better for serving as a queue.
[22:56:05] <dstorrs> ok...are you saying Mongo isn't going to support being our database at all, or just that it can't be the job queue?
[22:56:22] <dstorrs> we're harvesting about 10M new records per day.
[22:56:42] <zirpu> no it can, but it's more a db than a queue.
[22:57:02] <zirpu> redis is just a better memcache in lots of ways. more than just key=value.
[22:57:10] <dstorrs> ok
[22:57:18] <dstorrs> crap, crap, crap.
[22:57:29] <dstorrs> I really did not want to have to introduce more infrastructure at this late date.
[22:57:38] <dstorrs> I'll go read up. Thanks for the pointer
[22:58:45] <kenneth> if you want to take the really hard road you can try storm
[22:58:46] <kenneth> :p
[22:58:54] <kenneth> but it's a major pain to deploy
[23:04:14] <heoa> https://github.com/christkv/node-mongodb-native.git <--- I think I need mongodb driver with node.js to do what I want ?!
[23:05:45] <algernon> heoa: --rest is an option for mongod, not mong
[23:05:56] <algernon> heoa: mongo is the shell, you want to pass --rest to the server.
[23:12:51] <heoa> algernon: Thank you, now I have do some demos.
[23:15:57] <heoa> (ok I need to restart the db somehow...investigating)
[23:18:27] <heoa> http://ajay555.wordpress.com/2011/06/08/mongodb-on-ubuntu-rest-is-not-enabled-use-rest-to-turn-on-error/ found solution in 2011 but not working anymore or conf -file changed...
[23:19:24] <heoa> Is REST still configured by /etc/mongodb.conf?
[23:22:06] <heoa> (I got it running by adding line "rest = true" to the /etc/mongodb.conf, just guessed)
[23:22:15] <heoa> and then running sudo restart mongodb
[23:55:43] <heoa> Is there some command to see everything dump everything in dbs with plain text?