PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 1st of November, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[07:40:48] <joannac> meep
[10:53:48] <_mkrull> hi. i would like to dump a larger mongodb. does the size of the dump have a similar size than the database on the filesystem or do those differ?
[10:54:33] <Derick> it should be similar
[10:54:39] <Derick> probably a tiny bit less
[10:55:07] <_mkrull> ah ok. as long as it is not more I am ok with it
[10:55:24] <Derick> make sure you do have some extra space though!
[10:56:33] <kali> it should be less, as the index is not dump, and the storage is not fragmented
[10:58:55] <Derick> kali: and not padded to 2GB
[10:59:07] <kali> that oo
[11:01:08] <_mkrull> i have about 33% of the filesystem used.. well i will monitor. another question: i will have to do the dump on the master of a replset as the slaves do not have enough space left. that will impact performace of course.. but does that have other implications as well?
[11:06:19] <_mkrull> and one more thing: does mongodb free space if i delete large parts of the database automatically? if i have a 3TB database and remove 1TB of data will i get that back on the filesystem without running --repair?
[11:55:46] <bartzy> joannac: Still here? :D
[11:56:00] <bartzy> About mongodump and mongoexport, thanks. Can both export index metadata?
[12:44:51] <NodeX> bartzy : read the docs, it's all in there
[13:30:18] <spacepluk> hi, I'm trying to find out if mongodb suits my need for a new application. I need to be able to perform queries in the style of: "documents not in some-list" where some-list can grow significately. What do you think?
[13:32:35] <MANCHUCK> spacepluk, yes you can check out http://docs.mongodb.org/manual/reference/sql-comparison/
[13:34:33] <spacepluk> I've been reading the documentation for a while
[13:35:00] <MANCHUCK> it also depends on how you design your schema
[13:35:39] <NodeX> spacepluk : how large is the list?
[13:36:51] <spacepluk> I need to track which documents a user has seen and be able to filter them
[13:37:11] <spacepluk> so, it has some physical limits but is kind of indefinite
[13:38:43] <spacepluk> the list will grow over time
[13:39:04] <spacepluk> even if I use references I'd need to put the list in the query, right?
[13:41:45] <NodeX> references are a bad idea
[13:44:23] <spacepluk> I'm thinking about putting documents in the style of: { type: 'viewed', user: 'u_id', document: 'd_id' }
[13:44:35] <spacepluk> in the same collection as the documents I'm tracking
[13:44:40] <spacepluk> but it feels kind of messy
[13:45:36] <NodeX> for me I would store the documents I've seen in a redis set, nice and easy
[13:47:02] <spacepluk> and how do I query an "unseen" document?
[13:48:51] <spacepluk> I think the redis set doesn't solve the problem of passing a hughe list on each query
[13:48:59] <spacepluk> huge
[13:49:18] <spacepluk> or am I missing something?
[13:49:19] <NodeX> no but you can do the diff on the client which is easier
[13:49:39] <NodeX> I don't know what your end game is so it's hard to advise
[13:49:57] <spacepluk> it's actually a trivia game
[13:50:11] <spacepluk> I don't want the player to get the same question twice
[13:50:53] <NodeX> right ok, in that case you might store a list of uid's with the question and query uid's $nin [uid]
[13:51:21] <spacepluk> but then I'll reach the Document limit easily
[13:51:22] <NodeX> that will only scale up to 16mb though
[13:51:24] <spacepluk> I hope hehehe
[13:51:37] <NodeX> 24bytes for an object id.. that's a lot of users
[13:52:21] <NodeX> whether you store it on the user or on the question you will hit a limit eventuallu
[13:52:23] <NodeX> -u +y
[13:52:24] <MANCHUCK> I have a simliar set up with my project
[13:52:27] <MANCHUCK> i have users and accounts
[13:52:33] <MANCHUCK> users are allowed to access accounts
[13:52:45] <MANCHUCK> inside the accouts document i have a list of id's to the users
[13:52:58] <MANCHUCK> I have 5000 users
[13:53:07] <MANCHUCK> and i have not reached the document limit yet
[13:53:27] <NodeX> 5000 user is nothing, that's probably less than 50kb
[13:54:07] <MANCHUCK> i can find out the largest document
[13:54:23] <MANCHUCK> im sure its not more than 50kb
[13:54:25] <spacepluk> I guess that could work for a while
[13:54:26] <NodeX> it's nowhere near 16mb is my point
[13:54:38] <MANCHUCK> yea thats the point im pointng out
[13:55:05] <spacepluk> that was very helpful, thank you
[13:55:17] <NodeX> spacepluk : at some point the document size won't scale with your app so I would say a graphdb might be more appropriate for the questions
[13:55:37] <spacepluk> any recommendation for graphdbs?
[13:55:53] <NodeX> neo4j is about the most popular I would say
[13:56:03] <ron> though its license may be a bit evil.
[13:56:22] <NodeX> orientdb can do some things liek this too
[13:56:24] <NodeX> like*
[13:56:56] <ron> and its license is much more permissive
[13:57:02] <NodeX> another option would be a pool per user of questions they can have and when it nears a certain level refresh the pool
[13:57:08] <bartzy> NodeX: docs of mongodump or mongoexport? I didn't find it there
[13:57:19] <NodeX> (of course do the computation in a queue behind the scenes)
[13:57:25] <NodeX> bartzy : both
[13:58:44] <bartzy> NodeX: Only in mongorestore, which says it will recreate indexes recorded by mongodump
[13:58:59] <bartzy> But there's no where to tell mongodump NOT to record indexes ?
[13:59:10] <NodeX> remove the index dump ;)
[13:59:49] <bartzy> :D
[14:00:32] <bartzy> Thanks, and I just saw that mongoimport/export doesn't reliably determine data types (I guess dates and such), so I get why one should use mongodump/restore for backups.
[14:01:02] <NodeX> yeh, you can convert them but it's a pain
[14:01:38] <bartzy> how ?
[14:01:41] <bartzy> just interesting
[14:03:08] <NodeX> just write a thrid party script to parse the exported json
[14:20:00] <BlakeRG> I am adding a document to my collection and tagging it with an added date. The data shows up like this: "added": { "$date": 1383315474000.000000 }
[14:20:38] <BlakeRG> is this suitable for be to be able to query/group by day if i wanted to pull all documents added for a particular day?
[14:23:05] <NodeX> you can do a $gt / $lt to captur the date
[14:23:43] <BlakeRG> are human readable timestamps query-able as well?
[14:24:16] <NodeX> not in $gt afaik
[14:24:44] <NodeX> I could be wrong though, not something I've looked much into
[14:44:16] <eldub> is it normal to have a 'local' database in my replicaset that's 22gb?
[14:45:50] <eldub> Is there a command that I can run outside of a mongo terminal to discover if a node is a primary or not?
[14:49:21] <kali> eldub: mongo --quiet --eval 'printjson(rs.isMaster().ismaster)' ?
[14:50:52] <eldub> kali That's pretty close to what I want. Is there a way I can query the replicaset as a whole? or maybe add --host x.x.x.x --host y.y.y.y etc etc?
[14:51:30] <kali> eldub: mongo --quiet --eval 'printjson(rs.isMaster())' ?
[14:51:39] <kali> mmmmm non.
[14:52:10] <kali> try rs.status()
[14:52:19] <eldub> kali I can't NOT specify a host because in my mongodb.conf I have it listening on a certain IP. So whe I put that command in, it says it can't connect
[14:52:47] <kali> ha. just add the ip to the command line
[14:53:22] <eldub> yea then I'm only returning a single value
[14:53:28] <eldub> well
[14:53:35] <eldub> I can script a loop to do each host IP
[14:53:43] <eldub> then I will get results on all 3 hosts
[14:53:45] <eldub> that will prbly work
[15:55:05] <eldub> kali so the commands you gave me earlier worked great -- 1 more question. Is there a way to have it output the hostnames along with the "true / false" print?
[15:55:27] <eldub> kali I took what you gave me and put --host `hostname` in there so now in my script it comes back saying "true or false" but no hostname.
[16:00:40] <eldub> looking for assitance on configuring a host as a hidden member
[16:01:14] <NodeX> close your eyes - problem solved haha
[16:01:29] <eldub> lol
[16:01:35] <eldub> if it were only that eas, eh
[16:01:52] <NodeX> nothing is ever easy on a friday
[16:03:05] <eldub> NodeX I can set 2 nodes in my replica set at the same priority, right?
[16:03:13] <NodeX> yup
[16:04:49] <eldub> ok so if I set 2 of them to say... priority 1 -- then the 3rd to priority 0 that will prevent node 3 from becoming primary
[16:05:01] <eldub> but still have a copy of the data. Am I seeing this correctly?
[16:06:24] <NodeX> I think so, replicas ar enot my strong suit - I don't use them enough
[16:07:11] <cheeser> yes, you'd still have copies on all three replicas.
[16:13:15] <eldub> gimmic for what reasoning
[16:41:00] <zlatko> Hi
[16:41:17] <zlatko> Can somebody tell me what's the correct pattern for reindexing a collection?
[16:41:30] <Derick> zlatko: why would you need to do that?
[16:41:34] <zlatko> I have a collection, and I have text-search (with mongoose-text-search).
[16:41:51] <zlatko> But I need to expose the text-search index "weights" to the users.
[16:41:59] <zlatko> So that they can play with search settings.
[16:42:10] <zlatko> So it's getting weird.
[16:42:17] <Derick> zlatko: drop the index, and recreate it
[16:42:48] <zlatko> Yeah, I am trying. How long should that take? Just a few thousand records, with 4-5 columns being indexed.
[16:43:14] <zlatko> Because I had to turn my model to a callback version. I load the itemSchema from a settings collection.
[16:43:31] <zlatko> Then in the callback, I turn construct the new itemSchema and ItemModel.
[16:43:43] <zlatko> Then return that back.
[16:44:02] <zlatko> But when I try to reindex, my mongo (or node) gets stuck for a while.
[16:44:18] <zlatko> And I didn't yet figure out where or why.
[16:44:35] <Derick> a few thousand? 5 seconds or so
[16:44:37] <Derick> try it :)
[16:44:40] <zlatko> And I have to reindex, because if I just add the new index, then the damn thing complains.
[16:44:57] <zlatko> Well it takes more then that, even on my test coll with only two rows.
[16:45:01] <zlatko> So, something is probably off.
[16:45:05] <zlatko> Don't know what.
[16:45:57] <zlatko> Let me think out loud, write down my sequence of callbacks...
[16:46:16] <zlatko> 1. Load from the settings the property weights.
[16:46:24] <ron> think out loud, but use your inner voice ;)
[16:46:25] <ron> j/k
[16:46:54] <zlatko> 2. in a callback, get to the collection, and drop its index. (Can I drop just a single index by name? How?)
[16:47:01] <zlatko> 3. in a callback, create a new index.
[16:47:27] <zlatko> 4. when it's done, return (call the original callback param.)
[16:47:35] <zlatko> That should be correct, right?
[16:47:55] <zlatko> But I don't get to that original callback, somehow.
[16:48:20] <Derick> you can drop a single index
[16:48:31] <Derick> not sure which language you use, but you can do that
[16:48:45] <zlatko> JS (node.js)
[16:48:53] <cheeser> i figured with all this callback talk
[16:48:57] <Derick> sounds ok then
[16:51:47] <zlatko> Hmm. Thanks.
[16:51:54] <zlatko> Will poke into this some more.
[17:07:23] <zlatko> Derick, thanks for the conversation. I seemed to have fixed it. Now just to fix the angular frontend part and I'm shipping :)
[17:07:29] <Derick> hehe
[17:07:31] <Derick> good luck!
[17:46:25] <eyda|mon> Derick: I've got many clients and I'd like for each one to have their own database. I'll also have a central database to manage users that are common between them. Is there any worry about having a db per client with the knowledge I may end up with thousands?
[17:46:54] <Derick> eyda|mon: no need to ask me directly :-)
[17:47:03] <Derick> eyda|mon: you can have hundreds of dbs
[17:47:14] <Derick> but, you need to realize what the storage requirements are
[17:50:08] <flyankur_> Derick: Do you know about any good resource to understand/learn what different kind of schema design I can for for complex uses cases.
[17:51:24] <eyda|mon> Derick: i know I don't need to, but you seemed friendly and helpful so I chose to :)
[17:52:49] <eyda|mon> Derick: would thousands cause an issue? The other options I have are making collections with prefixes to keep the data separate, or have everything in one database and use client_ids to get the data out. My concern with the last one is I won't know how much a client consumes as far as space.
[17:53:15] <flyankur_> Derick: sorry for asking directly :P
[17:53:27] <Derick> eyda|mon: files require filepoints, of which you have a limited amount
[17:53:35] <Derick> filepointers*
[17:53:53] <Derick> ulimit -a will tell you how many
[17:54:04] <Derick> on my dev laptop it's only 1024 (although you can easily change it)
[17:54:54] <eyda|mon> oh ok, so each database requires a new filepointer.
[17:55:54] <eyda|mon> Out of the three options, which would you have chosen? 1. database per client. 2. prefix-collection per client. 3. just one database, shared collection with client_id to get the data.
[17:56:38] <Derick> eacy *file* does
[17:56:43] <Derick> and each DB is atleast two files
[17:56:52] <Derick> option 1
[17:57:04] <Derick> also think about the overhead of file sizes
[17:57:12] <Derick> even a one document DB takes up quite a bit of space
[17:58:19] <eyda|mon> ok, so option 1 despite the concern about filepointers.
[17:58:22] <eyda|mon> Thank you very much
[17:58:36] <Derick> test it though!
[17:58:37] <eyda|mon> I had asked it on the google groups and here before, but gotten no response
[17:58:47] <eyda|mon> ok
[17:59:11] <Derick> and remember that connections are file pointers too
[18:05:43] <zlatko> Derick not around?
[18:05:56] <zlatko> Ah, you're an op.
[18:06:14] <zlatko> I wanted to bother you or anybody else for some extra help.
[18:06:30] <zlatko> I'm trying to reindex mongoose-text-search text index.
[18:06:36] <zlatko> How can I do that?
[18:06:40] <Derick> zlatko: just ask your questions - people will answer if they can
[18:06:50] <zlatko> Yup, thanks.
[18:09:10] <zlatko> So basically, I am trying to re-index a mongoose schema.
[18:09:16] <zlatko> And I'm failing to do that.
[18:10:00] <zlatko> err, mongoose schema.index thing.
[18:16:09] <Aartsie> why you can't get a document by the ObjectID in javascript ?
[18:41:44] <zlatko> Aartsie what do you mean?
[18:44:21] <Aartsie> i already find out :) there is an function for in the driver
[19:07:00] <slava-work> where can I read about the possible values that rs.status() can return and the int->str mapping?
[19:13:38] <slava-work> also, I think 'syncingTo' is a terrible name for something that should be 'syncingFrom'
[19:18:05] <slava-work> docs-2182 submitted\
[19:48:20] <a13x212> if i want to create a member of a replica set, i can just copy the datafiles from an existing member?
[19:49:26] <cheeser> or just let the system sync itself
[19:50:57] <Tiller> Hi there!
[19:51:28] <Tiller> Stupid question: When I store a byte[] in the DB using java driver, I've got a BinData() into the db
[19:51:50] <Tiller> Is it a "standard" representation of a byte arrays in mongo, or is it the serialisation of the byte array in java?
[19:54:45] <cheeser> it's how the java driver serializes it.
[19:55:18] <cheeser> it's actually more space efficient than storing as a native array and is more often compatible with actual use cases.
[19:55:37] <cheeser> so it's faster to (de)serialize for most use cases.
[19:55:46] <Tiller> ok, but the fact is I've to store the same kind of values from a shell script
[20:13:55] <spacepluk> is there any projection to get the value of a field in the result, instead of field:value?
[20:14:31] <cheeser> "in the result?"
[20:15:50] <spacepluk> for example, i have documents that have the question field. I'd like to get a list of the questions that meet some criteria but I don't want the result to be a list of { question:'value' }
[20:16:04] <spacepluk> but a list of 'value'
[20:20:37] <spacepluk> is that possible?
[20:20:47] <spacepluk> I'm new to this :P
[20:22:23] <cheeser> i don't think it is... maybe with the aggregation framework.
[20:24:40] <spacepluk> ok, I'll take a look
[20:25:05] <spacepluk> is there any limitation to the size of the object I can use to query the database?
[20:25:22] <ron> 16mb.
[20:25:35] <spacepluk> ok, cool
[20:46:31] <cirwin> Is there a ruby client that provides access to sh.getBalancerState() etc.?
[20:46:45] <cirwin> I can obviously re-implement them myself, but it feels a bit like someone has already done this :p
[20:48:09] <cheeser> cirwin: checking
[20:50:04] <cheeser> here's what I got back cirwin :
[20:50:17] <cheeser> if you just print the source for a shell shortcut, e.g.,
[20:50:23] <cheeser> > sh.getBalancerState
[20:50:23] <cheeser> function () {
[20:50:24] <cheeser> var x = db.getSisterDB( "config" ).settings.findOne({ _id: "balancer" } )
[20:50:24] <cheeser> if ( x == null )
[20:50:24] <cheeser> return true;
[20:50:26] <cheeser> return ! x.stopped;
[20:50:26] <cirwin> yeah
[20:50:28] <cheeser> }
[20:50:33] <cheeser> so in this case it's a findOne
[20:50:35] <cirwin> I'm translating that to ruby now
[20:50:41] <cirwin> just wondered if it had already been done
[20:50:41] <cheeser> use the ruby API for that
[20:50:43] <cirwin> cool
[20:50:46] <cirwin> thanks for confirming
[20:50:46] <cheeser> doesn't seem that way
[20:50:49] <cheeser> no problem
[20:54:16] <umi> Can I force mongo to generate a replication log without putting it into a replication set?
[20:54:38] <umi> I'd like to use the replication log to create a real time backup
[21:00:19] <umi> Is there any way to get replication log data without putting monoDB into a replica set?
[21:01:01] <umi> Maybe I should instead call it "oplog" not replication log
[21:35:31] <umi> Can I get the oplog without being in a replication set?
[21:36:22] <umi> Or make a replicaset of one?
[22:26:43] <spellett> Is there any way to run mongoexport from the mongo shell? I'm guessing not, but I thought I might check anyway.
[22:27:09] <Derick> spellett: nope
[22:27:19] <spellett> thanks
[23:54:42] <rafaelhbarros> any idea on how to do hot backups on a large db?
[23:57:08] <cheeser> use the brs service that mongodb offers