pmxbot IRC Log Viewer

[02:30:16] <kuhrt> How can I use $unset to remove a field from a nested document

[02:30:44] <kuhrt> for instance I want to remove "Release date"

[02:30:46] <kuhrt> db.films.update({title: "Montreal Stories 1944"},{$unset:{details:{"Release date":1}}})

[02:31:20] <kuhrt> I realize ^ is wrong because it removes the 'details' document, which I do not want, I just want to remove a field from it

[02:47:29] <mids> kuhrt: almost

[02:47:30] <mids> kuhrt: db.films.update({title: "Montreal Stories 1944"},{$unset:{"details.Release date":1}})

[02:48:04] <kuhrt> mids: ah thank you

[02:48:29] <kuhrt> dose the value "1" connote something special or is convention?

[02:48:43] <kuhrt> I used "1" based on mongo docs

[02:49:30] <mids> kuhrt: convention, can be any truthy value I guess

[02:49:55] <kuhrt> so 0 would not work, because it is not truthy?

[02:49:58] <kuhrt> or false

[02:50:14] <kuhrt> effectively making $unset do nothing?

[02:50:40] <mids> actually 0 also works

[02:51:01] <mids> it ignores the value

[03:15:09] <aws_2012> I apologize if this has been asked before/recently, but has anyone had difficulty getting and keeping your secondary node sync'ed?

[03:22:19] <aws_2012> repeat question: I'm having trouble getting and keeping my secondary in sync with my primary, despite turning off writes to the primary. Can anyone advise what I might be doing wrong?

[03:50:37] <ashepard> trying again, hoping i might find a nugget of wisdom: having trouble getting and keeping a secondary's optime up-to-date, despite turning off writes. anyone have any ideas to try?

[05:02:49] <ranman> how do I make ruby run faster

[05:02:52] <ranman> woah

[05:02:55] <ranman> wrong channel sorry

[07:15:18] <Josony> msg NickServ VERIFY REGISTER josony oqyqjtbypxyy

[07:17:26] <Josony> ?

[07:18:36] <Josony> quit

[07:18:55] <mids> oops

[07:21:13] <DigitalKiwi> for a channel of >300 names there sure isn't a lot of activity here, ever >.>

[07:21:35] <DigitalKiwi> don't say nobody needs help cause i've seen lots of (unanswered) questions ;p

[07:23:03] <mids> DigitalKiwi: yeah, though I also see that in some other established opensource irc channels

[07:23:17] <mids> guess all the kool kids use facebook for their help need

[07:28:03] <akaIDIOT> just joined #cassandra, #mongodb and #hbase yesterday, story's the same for all :)

[07:28:12] <akaIDIOT> people on irc are chams at idling :)

[07:28:17] <akaIDIOT> *champs

[07:34:04] <ranman> DigitalKiwi, unfortunately a ton of IRC questions go unanswered

[07:34:12] <ranman> it's just cause the channel/people are so transient

[07:34:39] <ranman> sometimes support takes time

[07:34:55] <ranman> if you need mongodb support use the mailing list, it's great, or JIRA, or stackoverflow

[07:35:04] <ranman> IRC might require you to wait a while

[07:35:14] <ranman> best chances are during EST business hours IMO

[07:35:35] <DigitalKiwi> good to know when to ask

[07:35:42] <[AD]Turbo> hola

[07:36:15] <mids> although just describing the problem out loud often helps... rubber duck debugging

[07:41:21] <NodeX> the general rule of asking is ASK!

[07:41:26] <NodeX> else ASK again

[07:49:33] <dstorrs> when using the '$or' operating for querying, is clause evaluation order guaranteed to be left-to-right?

[07:53:26] <NodeX> it's not really relevant because either of the clauses satisfy

[07:56:00] <DigitalKiwi> how can I tell what indexes I should add to make a query work better? just by experience or trying different ones or is there some tool that could help?

[07:57:42] <NodeX> watch your logs, it will log slow queries

[07:57:54] <NodeX> and explain() else helps by telling you what it's doing

[07:58:22] <NodeX> general rule of thumb is this : if you have large data and are querying on a key then it should be indexed

[08:00:17] <DigitalKiwi> I get it for single indexes but if I needed a multi index I think I would be very lost :(

[08:04:11] <DigitalKiwi> also, I know having indexes slows down inserts, but how *much* does it slow it down? Like could it be enough that it's faster to drop the indexes and recreate? all the datasets i've had only take a few seconds to add an index, but getting the dataset updated takes a lot longer but i don't know if it's just how it is or the indexes...

[08:16:26] <dstorrs> DigitalKiwi: what are you trying to achieve?

[08:16:50] <dstorrs> and how are you doing the update? via update() or findAndModify()? (the latter is ~100x slower than the former)

[08:17:05] <DigitalKiwi> good to know

[08:17:40] <DigitalKiwi> well update was the wrong word, i meant "updating the database to have data" the data is actually new

[08:17:52] <dstorrs> oh. you mean insert.

[08:17:59] <dstorrs> try using batch_insert

[08:18:33] <dstorrs> db.coll.batchInsert([ ...lots and lots of records... ])

[08:19:59] <dstorrs> er...except, it isn't supported by the shell, just (most) drivers. so I braino'd the syntax above

[08:20:47] <dstorrs> e.g., in Perl it's $db->coll->batch_insert(). in Python ISTR the standard insert() can take multiple docs.

[08:20:54] <dstorrs> I forget what Ruby does

[08:21:21] <dstorrs> heh. can't help you there.

[08:22:43] <dstorrs> actually, I lie: I can help. http://stackoverflow.com/questions/6153614/mongo-save-save-more-than-1-records-at-ones

[08:23:09] <dstorrs> (for the record, not my question so I take no blame for the grammar)

[08:23:19] <NodeX> DigitalKiwi : it's never faster to drop the indexes and start again

[08:30:17] <DigitalKiwi> thanks :)

[11:54:12] <Bartzy> What is the size of the buffer of documents sent with each next() call to a cursor ?

[12:01:38] <Bartzy> Anyone?

[12:02:11] <Derick> Bartzy: "it depends"

[12:02:30] <Bartzy> Derick: On? My configuration of the client API ?

[12:02:35] <Derick> The PHP driver does it like this: http://derickrethans.nl/cursors-in-mongodb.html

[12:02:43] <Derick> the driver, and options

[12:06:00] <Bartzy> "The driver stores all 101 documents locally" - Locally in RAM, that is ?

[12:06:12] <Derick> yes

[12:10:09] <Bartzy> OK

[12:10:14] <Bartzy> And how the cursor is working on the server?

[12:10:21] <Bartzy> All the result set is in RAM ?

[12:10:43] <Bartzy> so when the client asks for more documents, they are served as fast as possible ?

[12:11:04] <Bartzy> It's possible that after the query is 'done' (first next() happened), the next next()'s would be slow?

[12:12:15] <Derick> for the server, everything is in memory mapped files

[12:12:26] <Bartzy> Yeah, but how the cursors work ?

[12:12:26] <Derick> it all depends on the OS on what is in memory or just on disk

[12:12:54] <Derick> I don't understand the question Bartzy

[12:13:11] <Bartzy> but how the server knows what documents to provide next, if the query happened a few seconds ago

[12:13:12] <Bartzy> for example:

[12:13:29] <Derick> that information is part of the cursor

[12:13:38] <Derick> (it's basically a list of pointers to documents)

[12:13:42] <Bartzy> a query happenes and the first next() is issued for: db.sample.find()

[12:14:00] <Bartzy> Ah, so that cursor is saved in memory, and is a list of pointers ?

[12:14:38] <Bartzy> But the pointers are to the actual location on disk (or RAM, if the OS decides it should be there) of the documents, and not some pointer to the _id or something, right ?

[12:15:30] <Derick> for the mongo server RAM=disk, there is no distinction. And yes, they're pointers to the location of documents

[12:17:20] <Derick> Bartzy: http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database has a good explanation

[12:17:35] <Bartzy> Derick, thank you very much.

[12:17:54] <Bartzy> I have one more question about .explain(). When executing explain on heavy queries, it doesn't really end, and you need to wait a lot of time

[12:18:12] <Bartzy> This defeats the purpose of seeing what the db is doing in order to optimize it..

[12:18:28] <Derick> it's just reading all the reasons so it can provide statistics

[12:18:33] <Derick> reasons->results

[12:18:42] <Derick> I got to go now, train to catch.

[12:23:51] <Bartzy> But then what's the purpose of it for long-running queries? Is it possible just to get the decisions it's going to make like index use and stuff like that - like MySQL's EXPLAIN ?

[12:26:19] <deoxxa> yes

[12:26:22] <deoxxa> .explain()

[12:32:23] <multiHYP> hi

[12:32:38] <multiHYP> how can I get the password in form of readable String in my app?

[12:33:20] <multiHYP> the password field in system.users collection.

[12:33:42] <multiHYP> I believe its some kind of hash of the password saved in the db.

[12:34:42] <multiHYP> please someone answer that, I asked the other day and still haven't found how to do it. although I have been doing some other things ever since too.

[12:35:17] <deoxxa> multiHYP: you don't

[12:35:23] <deoxxa> multiHYP: that's the whole point of hashing it

[12:35:44] <multiHYP> so how can I compare the login form's password against it?

[12:36:05] <remonvv> Anyone a rough estimate on how long it typically takes a repset to vote in a new primary after it detects issues?

[12:36:10] <multiHYP> how do I hash the input field's password to be able to compare it against the hashed one in the db?

[12:36:58] <deoxxa> i would suggest starting at https://www.google.com/search?q=mongodb+password+hash

[12:37:17] <multiHYP> ok thanks deoxxa :)

[12:38:53] <multiHYP> is bcrypt available as a .jar lib for scala?

[12:39:03] <multiHYP> see the issue is not that straight forward.

[12:39:17] <multiHYP> I want to reuse the db user/pass for my app layer login info.

[12:41:41] <multiHYP> yes that pair in system.users seems to be not usable.

[13:20:54] <unclezoot> hi - i see it's possible to see what points are within a polygon, but can mongo tell you which poly contains a specific lat/lon?

[13:21:15] <unclezoot> i.e. you know the point but now want to know the containing polygon

[13:21:56] <NodeX> the polygon can be exponentialy large so not reallyt

[13:22:03] <NodeX> who's to say how big it is ?

[13:22:55] <unclezoot> memory limits?

[13:23:07] <NodeX> the polygon

[13:23:23] <unclezoot> well all im trying to do is work out what district a clicked point is within

[13:24:09] <unclezoot> i can find the nearest 3 districts using the geo index and a centre point within each of the districts to approximate the centers

[13:24:16] <unclezoot> that will whittle down the resultset

[13:24:40] <unclezoot> but then how to determine which district the point is lying in?

[13:25:43] <NodeX> geohash?

[13:36:49] <fredix> Hi

[13:37:16] <mids> hi fredix

[13:37:43] <fredix> I have two persistante connection to mongodb, but when I insert data from the one, sometimes I can't get the data from the other connection

[13:37:57] <fredix> any idea ?

[13:38:39] <NodeX> are you doing safe writes?

[13:38:48] <NodeX> and are the writes disk bound?

[13:39:04] <fredix> safe writes ? I don't know

[13:39:29] <fredix> anyway i f I use only one connection it's works

[13:39:52] <mids> fredix: http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-WhentoUse

[13:41:39] <fredix> mids: thanks !

[13:42:32] <mw44118> How do I say "give me the documents that have a description field that has at least one letter in it" ?

[13:44:06] <NodeX> mw44118 : you cant

[13:44:17] <NodeX> unless you store the char count in a separate field

[13:44:36] <NodeX> you can use $exists to make sure the field exists

[13:44:58] <NodeX> or possibly (though not efficient) a regex

[13:49:24] <mw44118> NodeX: checking if the field exists doesn't seem to do what I mean; the field can exist, but point to a NULL value (or whatever you guys use for NULL).

[13:49:51] <NodeX> again, you cannot do it

[13:49:55] <NodeX> [14:44:46] <NodeX> unless you store the char count in a separate field

[13:50:18] <mw44118> NodeX: why do you treat me like this

[13:50:40] <mw44118> NodeX: OK, thanks so much for verifying this for me.

[13:50:47] <Derick> mw44118: He's just giving you the correct answer

[13:50:58] <mw44118> NodeX: I wasn't sure if I just wasn't aware of how to do it vs it could not be done

[13:51:04] <Derick> If you want to check for character count, you need to store that in the document

[13:51:07] <mw44118> Derick: sorry, was being silly

[13:51:12] <mw44118> Derick: yeah, that makes sense

[13:51:45] <NodeX> why do I treat you like what?

[13:52:20] <mw44118> NodeX: sorry, I was just venting

[14:01:15] <Bartzy> Hey

[14:01:21] <Bartzy> Why capped collections are good for logging?

[14:01:30] <Bartzy> If I need to process those logs, it's not good

[14:02:54] <algernon> they're great to store the most recent logs, and automatically age them out

[14:03:14] <algernon> if you want post-processing, or long availability of logs, then indeed, capped collections are not good

[14:03:32] <algernon> but for storing the most recent logs, they're godly.

[14:10:16] <jiffe98> can I setup sharding over part of an index ?

[14:10:32] <jiffe98> I have an index field1, field2, field3 and I want to shard over field1, field2

[14:10:49] <jiffe98> trying to do so I get the error "errmsg" : "please create an index over the sharding key before sharding."

[14:11:19] <Derick> so make an index spanning just field1, field2

[14:13:46] <jiffe98> seems like a waste of space, no?

[14:15:06] <fredix> mongodb crash again with getlasterror at 1

[14:15:38] <fredix> the runCommand return : { n: 0, connectionId: 257, err: null, ok: 1.0 }

[14:29:22] <Bartzy> algernon: Why not just use regular ol' files ? If you're not adding indexes on the capped collection, it's pretty much the same

[14:29:52] <Bartzy> algernon: And that's only for application logs? errors and stuff? I didn't understand why many examples are for using capped collections for user interactions. Something needs to process those user interactions.

[14:30:34] <algernon> Bartzy: because I want structure. I can do that with files, but with mongodb, it's a lot easier.

[14:30:58] <algernon> Bartzy: nothing prevents you to send logs to a capped collection AND a non-capped one.

[14:31:19] <Bartzy> algernon: And then what's the capped one is giving me exactly ?

[14:31:35] <Bartzy> I just don't understand what's its purpose other than a fixed-size bucket for logs

[14:31:40] <algernon> Bartzy: very fast access to the last couple of thousand logs.

[14:31:43] <Bartzy> which is ... nice, but not so useful

[14:32:31] <algernon> plus, you can put a erm... what's it called.. well, a cursor on the capped collection that will follow the data coming in.

[14:32:59] <Bartzy> That's useful... But I guess some message queue is better at that

[14:33:02] <algernon> so you push logs into a capped collection, and you have your processing application listen via the cursor, do whatever processing you need, and send the result off wherever you want.

[14:33:28] <Bartzy> What happens if the processing application dies? The cursor dies with it, and some logs are lost (to processing, at least) ?

[14:33:35] <algernon> a message queue might be better, yes. but it also might be an overkill.

[14:33:50] <algernon> Bartzy: yup. That may or may not be a problem for your use case

[14:34:04] <algernon> for me, it isn't.

[14:34:13] <Bartzy> For what usecase it's not important ?

[14:34:27] <Bartzy> use case*

[14:34:39] <algernon> when I only collect broad statistics

[14:34:41] <Bartzy> algernon: Also, what structure gives you in logs?

[14:34:53] <algernon> loosing 0.00001% percent of stuff won't make a dent.

[14:34:54] <Bartzy> statistics for what, for example? :)

[14:35:09] <algernon> like how much time users spent logged in

[14:35:18] <algernon> missing two seconds? I don't care.

[14:35:38] <algernon> as for structure...

[14:36:06] <NodeX> [15:33:01] <algernon> plus, you can put a erm... what's it called <-- tailable cursor ?

[14:36:06] <Bartzy> and this data - is in its own capped collection, "spent_logged_in", or some general logging capped collection ?

[14:39:51] <algernon> NodeX: yes

[14:40:00] <NodeX> ;)

[14:40:36] <algernon> Bartzy: at the moment, I store that stuff in a non-capped collection

[14:41:03] <Bartzy> algernon: And it's a collection per log type ?

[14:41:16] <Bartzy> meaning clicks are in a different location that time on site ?

[14:41:34] <Bartzy> and: <algernon> as for structure... .. ?

[14:41:44] <algernon> yes, sorry, $work is interfering :P

[14:42:06] <algernon> http://mojology.madhouse-project.org/log/4d28cc53f310ef4f00000020 <- I like to do this kind of stuff.

[14:42:24] <algernon> makes it very easy later on to find stuff

[14:42:54] <algernon> combined it with an external indexer (solr, lucene, ES, whatever) and it's wonderful.

[14:43:21] <Bartzy> Why save in mongo SSH authentications ? :)

[14:43:25] <Bartzy> But I get what you mean

[14:43:38] <algernon> because I save *every* log in mongodb

[14:44:26] <Bartzy> is it on different collections, or just one 'logs' collection ?

[14:45:00] <algernon> one big logs collection, a few stats collections and a capped recent collection

[14:45:26] <algernon> I'm not entirely happy with the way I deal with stats, but had no time to come up with anything better yet.

[15:00:05] <timing> Hi! I have a collection with a few basic fields and one field contains a large multi dimension array. I don't mean to query on the data in the array. So can I just put the array in the collection as well. Or is it (performance wise) better to just serialize the array myself as string before insertion? So it just becomes a large string instead of a 'real' array

[15:01:41] <timing> So the data can be something like this: {field: 'value', bigArray: '[[1,2,3,4],[603,123,2039]]'}

[15:01:44] <timing> or this:

[15:01:53] <timing> {field: 'value', bigArray: [[1,2,3,4],[603,123,2039]]}

[15:01:57] <timing> note the quotes

[15:47:01] <Bartzy> why Mongo saves numbers as double by default? Isn't that very wasteful in space ?

[15:52:11] <mediocretes> nope.

[15:52:28] <algernon> it does that? o.O

[15:53:42] <mediocretes> also nope.

[15:54:25] <mediocretes> but, a double is the same size as an int

[16:28:57] <Lujeni> Hi - i need move many documents between two servers. Should i use mongodb command as mongodump / mongorestore ? or store all document in variable and use "forEach" to insert?

[16:29:34] <SLNP> Hi All

[16:30:39] <SLNP> We are seeing a strange issue at the moment, we are hitting a connection limit of 1000 on our mongo rep set and both mongo instances are crashing

[16:30:46] <SLNP> the arbiter stays up

[16:32:02] <SLNP> when that happens we get a segfault

[17:29:41] <timing> Hi!

[17:30:51] <timing> How do I insert a big array in a collection?

[17:31:15] <timing> is it recommended to serialize it first, or shall I just 'put it in' the collection?

[17:31:39] <timing> both work, but I wonder what the best way is

[17:32:10] <timoxley> timing if you don't need to query on that data, then perhaps serialising it is a good option

[17:33:41] <timing> yeas, there is no need to do so

[17:34:09] <timing> timoxley: is it a performance reason? or size?

[17:35:09] <timoxley> timing don't really know, just seems to make sense for you to put things into the db in the most primitive form you need them as

[17:36:28] <timing> yes, me too

[17:51:38] <redalpha> hello

[18:02:33] <redalpha> If i have 2 servers setup with replication - and both servers have an application connected to their own mongodb - and both applications doing writes and reads, will that go well ?

[18:10:17] <armandop> hi everyone

[18:10:20] <armandop> had a quick question.

[18:10:46] <armandop> kinda a n00b question too. what is the best practice to create a collection.

[18:11:20] <armandop> lets say im creating a mobile app and the data we collect is just for a specific user, comments, and other users do not see these comments uts just the user

[18:11:59] <armandop> i want to create a single collection called users which contains a comments property which holds an array of Comment objects.

[18:15:28] <mollerstrand> armandop: a collection is created as soon as you insert a document into it.

[18:15:37] <armandop> yes i know

[18:16:03] <armandop> let me rephrase it. should I, given the conditions stated, create two collections?

[18:16:26] <armandop> a users collection and a comments collection? and use a DBRef to link both. I think this is totally the wrong approach to take

[18:16:44] <armandop> since the comments are not shared and can be easily be embeded into the users collection.

[18:16:53] <armandop> i just want to know if my logic is valid.

[18:18:03] <redalpha> i would make 2 collections

[18:18:13] <mollerstrand> to me it sounds valid to have two collections, one holding user documents and one holding comments documents

[18:18:55] <redalpha> i dont like to nest a lot of diffrent entities into 1 collection - i tend to create collections per entity types

[18:19:31] <armandop> http://www.mongodb.org/display/DOCS/Schema+Design

[18:22:14] <armandop> thats what i used to formulate my assumption on architecture behind my reasoning.

[18:29:20] <redalpha> its just my preference to do manual referencing rather then embedding and DBrefs

[18:30:30] <redalpha> maybe if you do a lot of reads - where the referencing would mean doubling the amount of queries u execute

[18:31:41] <redalpha> http://docs.mongodb.org/manual/applications/database-references/

[20:36:01] <timing> Hi!

[20:37:02] <timing> when I have a comment attached to a post i can give the comment a post_id. what can I ise best the the id? the MongoId object or just the numeric $id value?

[20:44:20] <bluejade> where is the best place to ask beginner / intermediate questions related to structuring and querying mongodb documents? here? mongodb-user google group?

[21:22:30] <mrfloyd> hi everyone

[21:23:25] <mrfloyd> Any idea how can i collect a nested object from a document, and only this object not the whole document

[21:34:07] <armandop> @mrfloyd - have yu tried db.collection.find({your where clause}, {property_you_want:1})

[21:34:45] <mrfloyd> the property_I_want is nested

[21:35:03] <mrfloyd> x.foo.bar

[21:35:12] <mrfloyd> foo is an array and bar is the item i want to grab

[21:35:26] <domo> is it good or bad to use Mongo's IDs to reference docs cross collection?

[21:35:37] <domo> (in my app)

[21:35:43] <domo> (internally)

[21:35:45] <domo> :)

[21:36:19] <armandop> domo - that depends

[21:36:43] <wereHamster> it's good to use unique IDs

[21:36:45] <armandop> have you read this? http://www.mongodb.org/display/DOCS/Schema+Design

[21:42:52] <domo> so I should generate my own IDs?

[21:43:07] <domo> that link is showing an example of 'alex' being an id referencing two collections

[21:43:48] <armandop> alex is the primary key (if your looking at this from a Relational DB mindset) using that key you associate a post to alex

[21:44:18] <armandop> you could use the internal object id that is created for each document but it gets dicey when fetching info using code.

[21:44:23] <armandop> just keep that in mind.

[21:44:33] <domo> why does it get dicey?

[21:45:48] <armandop> {_id: 697698779886hhy76 } <-- starting object example. when you call it from code if you convert the object id to string it will no translate to the object id. object id != (string)object_id

[21:46:02] <armandop> let me find the article

[21:46:05] <armandop> one sec

[21:46:39] <domo> ah i see, but my driver in the app allows me to create MongoIds as objects from strings

[21:49:05] <domo> and what about sharding - do ObjectIds get screwy across multiple boxes?

[21:49:40] <armandop> that im not too sure never look that closely into it.

[22:00:07] <hadees> is there anyway to see what queries are being run? I have a mongodb query that works but when I converted it to php it doesn't. So now I want to see what is actually being run.

[22:02:29] <linsys> hadees: http://www.mongodb.org/display/DOCS/Database+Profiler

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 11th of July, 2012