PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 11th of July, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:30:16] <kuhrt> How can I use $unset to remove a field from a nested document
[02:30:44] <kuhrt> for instance I want to remove "Release date"
[02:30:46] <kuhrt> db.films.update({title: "Montreal Stories 1944"},{$unset:{details:{"Release date":1}}})
[02:31:20] <kuhrt> I realize ^ is wrong because it removes the 'details' document, which I do not want, I just want to remove a field from it
[02:47:29] <mids> kuhrt: almost
[02:47:30] <mids> kuhrt: db.films.update({title: "Montreal Stories 1944"},{$unset:{"details.Release date":1}})
[02:48:04] <kuhrt> mids: ah thank you
[02:48:29] <kuhrt> dose the value "1" connote something special or is convention?
[02:48:43] <kuhrt> I used "1" based on mongo docs
[02:49:30] <mids> kuhrt: convention, can be any truthy value I guess
[02:49:55] <kuhrt> so 0 would not work, because it is not truthy?
[02:49:58] <kuhrt> or false
[02:50:14] <kuhrt> effectively making $unset do nothing?
[02:50:40] <mids> actually 0 also works
[02:51:01] <mids> it ignores the value
[03:15:09] <aws_2012> I apologize if this has been asked before/recently, but has anyone had difficulty getting and keeping your secondary node sync'ed?
[03:22:19] <aws_2012> repeat question: I'm having trouble getting and keeping my secondary in sync with my primary, despite turning off writes to the primary. Can anyone advise what I might be doing wrong?
[03:50:37] <ashepard> trying again, hoping i might find a nugget of wisdom: having trouble getting and keeping a secondary's optime up-to-date, despite turning off writes. anyone have any ideas to try?
[05:02:49] <ranman> how do I make ruby run faster
[05:02:52] <ranman> woah
[05:02:55] <ranman> wrong channel sorry
[07:15:18] <Josony> msg NickServ VERIFY REGISTER josony oqyqjtbypxyy
[07:17:26] <Josony> ?
[07:18:36] <Josony> quit
[07:18:55] <mids> oops
[07:21:13] <DigitalKiwi> for a channel of >300 names there sure isn't a lot of activity here, ever >.>
[07:21:35] <DigitalKiwi> don't say nobody needs help cause i've seen lots of (unanswered) questions ;p
[07:23:03] <mids> DigitalKiwi: yeah, though I also see that in some other established opensource irc channels
[07:23:17] <mids> guess all the kool kids use facebook for their help need
[07:28:03] <akaIDIOT> just joined #cassandra, #mongodb and #hbase yesterday, story's the same for all :)
[07:28:12] <akaIDIOT> people on irc are chams at idling :)
[07:28:17] <akaIDIOT> *champs
[07:34:04] <ranman> DigitalKiwi, unfortunately a ton of IRC questions go unanswered
[07:34:12] <ranman> it's just cause the channel/people are so transient
[07:34:39] <ranman> sometimes support takes time
[07:34:55] <ranman> if you need mongodb support use the mailing list, it's great, or JIRA, or stackoverflow
[07:35:04] <ranman> IRC might require you to wait a while
[07:35:14] <ranman> best chances are during EST business hours IMO
[07:35:35] <DigitalKiwi> good to know when to ask
[07:35:42] <[AD]Turbo> hola
[07:36:15] <mids> although just describing the problem out loud often helps... rubber duck debugging
[07:41:21] <NodeX> the general rule of asking is ASK!
[07:41:26] <NodeX> else ASK again
[07:49:33] <dstorrs> when using the '$or' operating for querying, is clause evaluation order guaranteed to be left-to-right?
[07:53:26] <NodeX> it's not really relevant because either of the clauses satisfy
[07:56:00] <DigitalKiwi> how can I tell what indexes I should add to make a query work better? just by experience or trying different ones or is there some tool that could help?
[07:57:42] <NodeX> watch your logs, it will log slow queries
[07:57:54] <NodeX> and explain() else helps by telling you what it's doing
[07:58:22] <NodeX> general rule of thumb is this : if you have large data and are querying on a key then it should be indexed
[08:00:17] <DigitalKiwi> I get it for single indexes but if I needed a multi index I think I would be very lost :(
[08:04:11] <DigitalKiwi> also, I know having indexes slows down inserts, but how *much* does it slow it down? Like could it be enough that it's faster to drop the indexes and recreate? all the datasets i've had only take a few seconds to add an index, but getting the dataset updated takes a lot longer but i don't know if it's just how it is or the indexes...
[08:16:26] <dstorrs> DigitalKiwi: what are you trying to achieve?
[08:16:50] <dstorrs> and how are you doing the update? via update() or findAndModify()? (the latter is ~100x slower than the former)
[08:17:05] <DigitalKiwi> good to know
[08:17:40] <DigitalKiwi> well update was the wrong word, i meant "updating the database to have data" the data is actually new
[08:17:52] <dstorrs> oh. you mean insert.
[08:17:59] <dstorrs> try using batch_insert
[08:18:33] <dstorrs> db.coll.batchInsert([ ...lots and lots of records... ])
[08:19:59] <dstorrs> er...except, it isn't supported by the shell, just (most) drivers. so I braino'd the syntax above
[08:20:47] <dstorrs> e.g., in Perl it's $db->coll->batch_insert(). in Python ISTR the standard insert() can take multiple docs.
[08:20:54] <dstorrs> I forget what Ruby does
[08:21:21] <dstorrs> heh. can't help you there.
[08:22:43] <dstorrs> actually, I lie: I can help. http://stackoverflow.com/questions/6153614/mongo-save-save-more-than-1-records-at-ones
[08:23:09] <dstorrs> (for the record, not my question so I take no blame for the grammar)
[08:23:19] <NodeX> DigitalKiwi : it's never faster to drop the indexes and start again
[08:30:17] <DigitalKiwi> thanks :)
[11:54:12] <Bartzy> What is the size of the buffer of documents sent with each next() call to a cursor ?
[12:01:38] <Bartzy> Anyone?
[12:02:11] <Derick> Bartzy: "it depends"
[12:02:30] <Bartzy> Derick: On? My configuration of the client API ?
[12:02:35] <Derick> The PHP driver does it like this: http://derickrethans.nl/cursors-in-mongodb.html
[12:02:43] <Derick> the driver, and options
[12:06:00] <Bartzy> "The driver stores all 101 documents locally" - Locally in RAM, that is ?
[12:06:12] <Derick> yes
[12:10:09] <Bartzy> OK
[12:10:14] <Bartzy> And how the cursor is working on the server?
[12:10:21] <Bartzy> All the result set is in RAM ?
[12:10:43] <Bartzy> so when the client asks for more documents, they are served as fast as possible ?
[12:11:04] <Bartzy> It's possible that after the query is 'done' (first next() happened), the next next()'s would be slow?
[12:12:15] <Derick> for the server, everything is in memory mapped files
[12:12:26] <Bartzy> Yeah, but how the cursors work ?
[12:12:26] <Derick> it all depends on the OS on what is in memory or just on disk
[12:12:54] <Derick> I don't understand the question Bartzy
[12:13:11] <Bartzy> but how the server knows what documents to provide next, if the query happened a few seconds ago
[12:13:12] <Bartzy> for example:
[12:13:29] <Derick> that information is part of the cursor
[12:13:38] <Derick> (it's basically a list of pointers to documents)
[12:13:42] <Bartzy> a query happenes and the first next() is issued for: db.sample.find()
[12:14:00] <Bartzy> Ah, so that cursor is saved in memory, and is a list of pointers ?
[12:14:38] <Bartzy> But the pointers are to the actual location on disk (or RAM, if the OS decides it should be there) of the documents, and not some pointer to the _id or something, right ?
[12:15:30] <Derick> for the mongo server RAM=disk, there is no distinction. And yes, they're pointers to the location of documents
[12:17:20] <Derick> Bartzy: http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database has a good explanation
[12:17:35] <Bartzy> Derick, thank you very much.
[12:17:54] <Bartzy> I have one more question about .explain(). When executing explain on heavy queries, it doesn't really end, and you need to wait a lot of time
[12:18:12] <Bartzy> This defeats the purpose of seeing what the db is doing in order to optimize it..
[12:18:28] <Derick> it's just reading all the reasons so it can provide statistics
[12:18:33] <Derick> reasons->results
[12:18:42] <Derick> I got to go now, train to catch.
[12:23:51] <Bartzy> But then what's the purpose of it for long-running queries? Is it possible just to get the decisions it's going to make like index use and stuff like that - like MySQL's EXPLAIN ?
[12:26:19] <deoxxa> yes
[12:26:22] <deoxxa> .explain()
[12:32:23] <multiHYP> hi
[12:32:38] <multiHYP> how can I get the password in form of readable String in my app?
[12:33:20] <multiHYP> the password field in system.users collection.
[12:33:42] <multiHYP> I believe its some kind of hash of the password saved in the db.
[12:34:42] <multiHYP> please someone answer that, I asked the other day and still haven't found how to do it. although I have been doing some other things ever since too.
[12:35:17] <deoxxa> multiHYP: you don't
[12:35:23] <deoxxa> multiHYP: that's the whole point of hashing it
[12:35:44] <multiHYP> so how can I compare the login form's password against it?
[12:36:05] <remonvv> Anyone a rough estimate on how long it typically takes a repset to vote in a new primary after it detects issues?
[12:36:10] <multiHYP> how do I hash the input field's password to be able to compare it against the hashed one in the db?
[12:36:58] <deoxxa> i would suggest starting at https://www.google.com/search?q=mongodb+password+hash
[12:37:17] <multiHYP> ok thanks deoxxa :)
[12:38:53] <multiHYP> is bcrypt available as a .jar lib for scala?
[12:39:03] <multiHYP> see the issue is not that straight forward.
[12:39:17] <multiHYP> I want to reuse the db user/pass for my app layer login info.
[12:41:41] <multiHYP> yes that pair in system.users seems to be not usable.
[13:20:54] <unclezoot> hi - i see it's possible to see what points are within a polygon, but can mongo tell you which poly contains a specific lat/lon?
[13:21:15] <unclezoot> i.e. you know the point but now want to know the containing polygon
[13:21:56] <NodeX> the polygon can be exponentialy large so not reallyt
[13:22:03] <NodeX> who's to say how big it is ?
[13:22:55] <unclezoot> memory limits?
[13:23:07] <NodeX> the polygon
[13:23:23] <unclezoot> well all im trying to do is work out what district a clicked point is within
[13:24:09] <unclezoot> i can find the nearest 3 districts using the geo index and a centre point within each of the districts to approximate the centers
[13:24:16] <unclezoot> that will whittle down the resultset
[13:24:40] <unclezoot> but then how to determine which district the point is lying in?
[13:25:43] <NodeX> geohash?
[13:36:49] <fredix> Hi
[13:37:16] <mids> hi fredix
[13:37:43] <fredix> I have two persistante connection to mongodb, but when I insert data from the one, sometimes I can't get the data from the other connection
[13:37:57] <fredix> any idea ?
[13:38:39] <NodeX> are you doing safe writes?
[13:38:48] <NodeX> and are the writes disk bound?
[13:39:04] <fredix> safe writes ? I don't know
[13:39:29] <fredix> anyway i f I use only one connection it's works
[13:39:52] <mids> fredix: http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-WhentoUse
[13:41:39] <fredix> mids: thanks !
[13:42:32] <mw44118> How do I say "give me the documents that have a description field that has at least one letter in it" ?
[13:44:06] <NodeX> mw44118 : you cant
[13:44:17] <NodeX> unless you store the char count in a separate field
[13:44:36] <NodeX> you can use $exists to make sure the field exists
[13:44:58] <NodeX> or possibly (though not efficient) a regex
[13:49:24] <mw44118> NodeX: checking if the field exists doesn't seem to do what I mean; the field can exist, but point to a NULL value (or whatever you guys use for NULL).
[13:49:51] <NodeX> again, you cannot do it
[13:49:55] <NodeX> [14:44:46] <NodeX> unless you store the char count in a separate field
[13:50:18] <mw44118> NodeX: why do you treat me like this
[13:50:40] <mw44118> NodeX: OK, thanks so much for verifying this for me.
[13:50:47] <Derick> mw44118: He's just giving you the correct answer
[13:50:58] <mw44118> NodeX: I wasn't sure if I just wasn't aware of how to do it vs it could not be done
[13:51:04] <Derick> If you want to check for character count, you need to store that in the document
[13:51:07] <mw44118> Derick: sorry, was being silly
[13:51:12] <mw44118> Derick: yeah, that makes sense
[13:51:45] <NodeX> why do I treat you like what?
[13:52:20] <mw44118> NodeX: sorry, I was just venting
[14:01:15] <Bartzy> Hey
[14:01:21] <Bartzy> Why capped collections are good for logging?
[14:01:30] <Bartzy> If I need to process those logs, it's not good
[14:02:54] <algernon> they're great to store the most recent logs, and automatically age them out
[14:03:14] <algernon> if you want post-processing, or long availability of logs, then indeed, capped collections are not good
[14:03:32] <algernon> but for storing the most recent logs, they're godly.
[14:10:16] <jiffe98> can I setup sharding over part of an index ?
[14:10:32] <jiffe98> I have an index field1, field2, field3 and I want to shard over field1, field2
[14:10:49] <jiffe98> trying to do so I get the error "errmsg" : "please create an index over the sharding key before sharding."
[14:11:19] <Derick> so make an index spanning just field1, field2
[14:13:46] <jiffe98> seems like a waste of space, no?
[14:15:06] <fredix> mongodb crash again with getlasterror at 1
[14:15:38] <fredix> the runCommand return : { n: 0, connectionId: 257, err: null, ok: 1.0 }
[14:29:22] <Bartzy> algernon: Why not just use regular ol' files ? If you're not adding indexes on the capped collection, it's pretty much the same
[14:29:52] <Bartzy> algernon: And that's only for application logs? errors and stuff? I didn't understand why many examples are for using capped collections for user interactions. Something needs to process those user interactions.
[14:30:34] <algernon> Bartzy: because I want structure. I can do that with files, but with mongodb, it's a lot easier.
[14:30:58] <algernon> Bartzy: nothing prevents you to send logs to a capped collection AND a non-capped one.
[14:31:19] <Bartzy> algernon: And then what's the capped one is giving me exactly ?
[14:31:35] <Bartzy> I just don't understand what's its purpose other than a fixed-size bucket for logs
[14:31:40] <algernon> Bartzy: very fast access to the last couple of thousand logs.
[14:31:43] <Bartzy> which is ... nice, but not so useful
[14:32:31] <algernon> plus, you can put a erm... what's it called.. well, a cursor on the capped collection that will follow the data coming in.
[14:32:59] <Bartzy> That's useful... But I guess some message queue is better at that
[14:33:02] <algernon> so you push logs into a capped collection, and you have your processing application listen via the cursor, do whatever processing you need, and send the result off wherever you want.
[14:33:28] <Bartzy> What happens if the processing application dies? The cursor dies with it, and some logs are lost (to processing, at least) ?
[14:33:35] <algernon> a message queue might be better, yes. but it also might be an overkill.
[14:33:50] <algernon> Bartzy: yup. That may or may not be a problem for your use case
[14:34:04] <algernon> for me, it isn't.
[14:34:13] <Bartzy> For what usecase it's not important ?
[14:34:27] <Bartzy> use case*
[14:34:39] <algernon> when I only collect broad statistics
[14:34:41] <Bartzy> algernon: Also, what structure gives you in logs?
[14:34:53] <algernon> loosing 0.00001% percent of stuff won't make a dent.
[14:34:54] <Bartzy> statistics for what, for example? :)
[14:35:09] <algernon> like how much time users spent logged in
[14:35:18] <algernon> missing two seconds? I don't care.
[14:35:38] <algernon> as for structure...
[14:36:06] <NodeX> [15:33:01] <algernon> plus, you can put a erm... what's it called <-- tailable cursor ?
[14:36:06] <Bartzy> and this data - is in its own capped collection, "spent_logged_in", or some general logging capped collection ?
[14:39:51] <algernon> NodeX: yes
[14:40:00] <NodeX> ;)
[14:40:36] <algernon> Bartzy: at the moment, I store that stuff in a non-capped collection
[14:41:03] <Bartzy> algernon: And it's a collection per log type ?
[14:41:16] <Bartzy> meaning clicks are in a different location that time on site ?
[14:41:34] <Bartzy> and: <algernon> as for structure... .. ?
[14:41:44] <algernon> yes, sorry, $work is interfering :P
[14:42:06] <algernon> http://mojology.madhouse-project.org/log/4d28cc53f310ef4f00000020 <- I like to do this kind of stuff.
[14:42:24] <algernon> makes it very easy later on to find stuff
[14:42:54] <algernon> combined it with an external indexer (solr, lucene, ES, whatever) and it's wonderful.
[14:43:21] <Bartzy> Why save in mongo SSH authentications ? :)
[14:43:25] <Bartzy> But I get what you mean
[14:43:38] <algernon> because I save *every* log in mongodb
[14:44:26] <Bartzy> is it on different collections, or just one 'logs' collection ?
[14:45:00] <algernon> one big logs collection, a few stats collections and a capped recent collection
[14:45:26] <algernon> I'm not entirely happy with the way I deal with stats, but had no time to come up with anything better yet.
[15:00:05] <timing> Hi! I have a collection with a few basic fields and one field contains a large multi dimension array. I don't mean to query on the data in the array. So can I just put the array in the collection as well. Or is it (performance wise) better to just serialize the array myself as string before insertion? So it just becomes a large string instead of a 'real' array
[15:01:41] <timing> So the data can be something like this: {field: 'value', bigArray: '[[1,2,3,4],[603,123,2039]]'}
[15:01:44] <timing> or this:
[15:01:53] <timing> {field: 'value', bigArray: [[1,2,3,4],[603,123,2039]]}
[15:01:57] <timing> note the quotes
[15:47:01] <Bartzy> why Mongo saves numbers as double by default? Isn't that very wasteful in space ?
[15:52:11] <mediocretes> nope.
[15:52:28] <algernon> it does that? o.O
[15:53:42] <mediocretes> also nope.
[15:54:25] <mediocretes> but, a double is the same size as an int
[16:28:57] <Lujeni> Hi - i need move many documents between two servers. Should i use mongodb command as mongodump / mongorestore ? or store all document in variable and use "forEach" to insert?
[16:29:34] <SLNP> Hi All
[16:30:39] <SLNP> We are seeing a strange issue at the moment, we are hitting a connection limit of 1000 on our mongo rep set and both mongo instances are crashing
[16:30:46] <SLNP> the arbiter stays up
[16:32:02] <SLNP> when that happens we get a segfault
[17:29:41] <timing> Hi!
[17:30:51] <timing> How do I insert a big array in a collection?
[17:31:15] <timing> is it recommended to serialize it first, or shall I just 'put it in' the collection?
[17:31:39] <timing> both work, but I wonder what the best way is
[17:32:10] <timoxley> timing if you don't need to query on that data, then perhaps serialising it is a good option
[17:33:41] <timing> yeas, there is no need to do so
[17:34:09] <timing> timoxley: is it a performance reason? or size?
[17:35:09] <timoxley> timing don't really know, just seems to make sense for you to put things into the db in the most primitive form you need them as
[17:36:28] <timing> yes, me too
[17:51:38] <redalpha> hello
[18:02:33] <redalpha> If i have 2 servers setup with replication - and both servers have an application connected to their own mongodb - and both applications doing writes and reads, will that go well ?
[18:10:17] <armandop> hi everyone
[18:10:20] <armandop> had a quick question.
[18:10:46] <armandop> kinda a n00b question too. what is the best practice to create a collection.
[18:11:20] <armandop> lets say im creating a mobile app and the data we collect is just for a specific user, comments, and other users do not see these comments uts just the user
[18:11:59] <armandop> i want to create a single collection called users which contains a comments property which holds an array of Comment objects.
[18:15:28] <mollerstrand> armandop: a collection is created as soon as you insert a document into it.
[18:15:37] <armandop> yes i know
[18:16:03] <armandop> let me rephrase it. should I, given the conditions stated, create two collections?
[18:16:26] <armandop> a users collection and a comments collection? and use a DBRef to link both. I think this is totally the wrong approach to take
[18:16:44] <armandop> since the comments are not shared and can be easily be embeded into the users collection.
[18:16:53] <armandop> i just want to know if my logic is valid.
[18:18:03] <redalpha> i would make 2 collections
[18:18:13] <mollerstrand> to me it sounds valid to have two collections, one holding user documents and one holding comments documents
[18:18:55] <redalpha> i dont like to nest a lot of diffrent entities into 1 collection - i tend to create collections per entity types
[18:19:31] <armandop> http://www.mongodb.org/display/DOCS/Schema+Design
[18:22:14] <armandop> thats what i used to formulate my assumption on architecture behind my reasoning.
[18:29:20] <redalpha> its just my preference to do manual referencing rather then embedding and DBrefs
[18:30:30] <redalpha> maybe if you do a lot of reads - where the referencing would mean doubling the amount of queries u execute
[18:31:41] <redalpha> http://docs.mongodb.org/manual/applications/database-references/
[20:36:01] <timing> Hi!
[20:37:02] <timing> when I have a comment attached to a post i can give the comment a post_id. what can I ise best the the id? the MongoId object or just the numeric $id value?
[20:44:20] <bluejade> where is the best place to ask beginner / intermediate questions related to structuring and querying mongodb documents? here? mongodb-user google group?
[21:22:30] <mrfloyd> hi everyone
[21:23:25] <mrfloyd> Any idea how can i collect a nested object from a document, and only this object not the whole document
[21:34:07] <armandop> @mrfloyd - have yu tried db.collection.find({your where clause}, {property_you_want:1})
[21:34:45] <mrfloyd> the property_I_want is nested
[21:35:03] <mrfloyd> x.foo.bar
[21:35:12] <mrfloyd> foo is an array and bar is the item i want to grab
[21:35:26] <domo> is it good or bad to use Mongo's IDs to reference docs cross collection?
[21:35:37] <domo> (in my app)
[21:35:43] <domo> (internally)
[21:35:45] <domo> :)
[21:36:19] <armandop> domo - that depends
[21:36:43] <wereHamster> it's good to use unique IDs
[21:36:45] <armandop> have you read this? http://www.mongodb.org/display/DOCS/Schema+Design
[21:42:52] <domo> so I should generate my own IDs?
[21:43:07] <domo> that link is showing an example of 'alex' being an id referencing two collections
[21:43:48] <armandop> alex is the primary key (if your looking at this from a Relational DB mindset) using that key you associate a post to alex
[21:44:18] <armandop> you could use the internal object id that is created for each document but it gets dicey when fetching info using code.
[21:44:23] <armandop> just keep that in mind.
[21:44:33] <domo> why does it get dicey?
[21:45:48] <armandop> {_id: 697698779886hhy76 } <-- starting object example. when you call it from code if you convert the object id to string it will no translate to the object id. object id != (string)object_id
[21:46:02] <armandop> let me find the article
[21:46:05] <armandop> one sec
[21:46:39] <domo> ah i see, but my driver in the app allows me to create MongoIds as objects from strings
[21:49:05] <domo> and what about sharding - do ObjectIds get screwy across multiple boxes?
[21:49:40] <armandop> that im not too sure never look that closely into it.
[22:00:07] <hadees> is there anyway to see what queries are being run? I have a mongodb query that works but when I converted it to php it doesn't. So now I want to see what is actually being run.
[22:02:29] <linsys> hadees: http://www.mongodb.org/display/DOCS/Database+Profiler