pmxbot IRC Log Viewer

[07:40:48] <joannac> meep

[10:53:48] <_mkrull> hi. i would like to dump a larger mongodb. does the size of the dump have a similar size than the database on the filesystem or do those differ?

[10:54:33] <Derick> it should be similar

[10:54:39] <Derick> probably a tiny bit less

[10:55:07] <_mkrull> ah ok. as long as it is not more I am ok with it

[10:55:24] <Derick> make sure you do have some extra space though!

[10:56:33] <kali> it should be less, as the index is not dump, and the storage is not fragmented

[10:58:55] <Derick> kali: and not padded to 2GB

[10:59:07] <kali> that oo

[11:01:08] <_mkrull> i have about 33% of the filesystem used.. well i will monitor. another question: i will have to do the dump on the master of a replset as the slaves do not have enough space left. that will impact performace of course.. but does that have other implications as well?

[11:06:19] <_mkrull> and one more thing: does mongodb free space if i delete large parts of the database automatically? if i have a 3TB database and remove 1TB of data will i get that back on the filesystem without running --repair?

[11:55:46] <bartzy> joannac: Still here? :D

[11:56:00] <bartzy> About mongodump and mongoexport, thanks. Can both export index metadata?

[12:44:51] <NodeX> bartzy : read the docs, it's all in there

[13:30:18] <spacepluk> hi, I'm trying to find out if mongodb suits my need for a new application. I need to be able to perform queries in the style of: "documents not in some-list" where some-list can grow significately. What do you think?

[13:32:35] <MANCHUCK> spacepluk, yes you can check out http://docs.mongodb.org/manual/reference/sql-comparison/

[13:34:33] <spacepluk> I've been reading the documentation for a while

[13:35:00] <MANCHUCK> it also depends on how you design your schema

[13:35:39] <NodeX> spacepluk : how large is the list?

[13:36:51] <spacepluk> I need to track which documents a user has seen and be able to filter them

[13:37:11] <spacepluk> so, it has some physical limits but is kind of indefinite

[13:38:43] <spacepluk> the list will grow over time

[13:39:04] <spacepluk> even if I use references I'd need to put the list in the query, right?

[13:41:45] <NodeX> references are a bad idea

[13:44:23] <spacepluk> I'm thinking about putting documents in the style of: { type: 'viewed', user: 'u_id', document: 'd_id' }

[13:44:35] <spacepluk> in the same collection as the documents I'm tracking

[13:44:40] <spacepluk> but it feels kind of messy

[13:45:36] <NodeX> for me I would store the documents I've seen in a redis set, nice and easy

[13:47:02] <spacepluk> and how do I query an "unseen" document?

[13:48:51] <spacepluk> I think the redis set doesn't solve the problem of passing a hughe list on each query

[13:48:59] <spacepluk> huge

[13:49:18] <spacepluk> or am I missing something?

[13:49:19] <NodeX> no but you can do the diff on the client which is easier

[13:49:39] <NodeX> I don't know what your end game is so it's hard to advise

[13:49:57] <spacepluk> it's actually a trivia game

[13:50:11] <spacepluk> I don't want the player to get the same question twice

[13:50:53] <NodeX> right ok, in that case you might store a list of uid's with the question and query uid's $nin [uid]

[13:51:21] <spacepluk> but then I'll reach the Document limit easily

[13:51:22] <NodeX> that will only scale up to 16mb though

[13:51:24] <spacepluk> I hope hehehe

[13:51:37] <NodeX> 24bytes for an object id.. that's a lot of users

[13:52:21] <NodeX> whether you store it on the user or on the question you will hit a limit eventuallu

[13:52:23] <NodeX> -u +y

[13:52:24] <MANCHUCK> I have a simliar set up with my project

[13:52:27] <MANCHUCK> i have users and accounts

[13:52:33] <MANCHUCK> users are allowed to access accounts

[13:52:45] <MANCHUCK> inside the accouts document i have a list of id's to the users

[13:52:58] <MANCHUCK> I have 5000 users

[13:53:07] <MANCHUCK> and i have not reached the document limit yet

[13:53:27] <NodeX> 5000 user is nothing, that's probably less than 50kb

[13:54:07] <MANCHUCK> i can find out the largest document

[13:54:23] <MANCHUCK> im sure its not more than 50kb

[13:54:25] <spacepluk> I guess that could work for a while

[13:54:26] <NodeX> it's nowhere near 16mb is my point

[13:54:38] <MANCHUCK> yea thats the point im pointng out

[13:55:05] <spacepluk> that was very helpful, thank you

[13:55:17] <NodeX> spacepluk : at some point the document size won't scale with your app so I would say a graphdb might be more appropriate for the questions

[13:55:37] <spacepluk> any recommendation for graphdbs?

[13:55:53] <NodeX> neo4j is about the most popular I would say

[13:56:03] <ron> though its license may be a bit evil.

[13:56:22] <NodeX> orientdb can do some things liek this too

[13:56:24] <NodeX> like*

[13:56:56] <ron> and its license is much more permissive

[13:57:02] <NodeX> another option would be a pool per user of questions they can have and when it nears a certain level refresh the pool

[13:57:08] <bartzy> NodeX: docs of mongodump or mongoexport? I didn't find it there

[13:57:19] <NodeX> (of course do the computation in a queue behind the scenes)

[13:57:25] <NodeX> bartzy : both

[13:58:44] <bartzy> NodeX: Only in mongorestore, which says it will recreate indexes recorded by mongodump

[13:58:59] <bartzy> But there's no where to tell mongodump NOT to record indexes ?

[13:59:10] <NodeX> remove the index dump ;)

[13:59:49] <bartzy> :D

[14:00:32] <bartzy> Thanks, and I just saw that mongoimport/export doesn't reliably determine data types (I guess dates and such), so I get why one should use mongodump/restore for backups.

[14:01:02] <NodeX> yeh, you can convert them but it's a pain

[14:01:38] <bartzy> how ?

[14:01:41] <bartzy> just interesting

[14:03:08] <NodeX> just write a thrid party script to parse the exported json

[14:20:00] <BlakeRG> I am adding a document to my collection and tagging it with an added date. The data shows up like this: "added": { "$date": 1383315474000.000000 }

[14:20:38] <BlakeRG> is this suitable for be to be able to query/group by day if i wanted to pull all documents added for a particular day?

[14:23:05] <NodeX> you can do a $gt / $lt to captur the date

[14:23:43] <BlakeRG> are human readable timestamps query-able as well?

[14:24:16] <NodeX> not in $gt afaik

[14:24:44] <NodeX> I could be wrong though, not something I've looked much into

[14:44:16] <eldub> is it normal to have a 'local' database in my replicaset that's 22gb?

[14:45:50] <eldub> Is there a command that I can run outside of a mongo terminal to discover if a node is a primary or not?

[14:49:21] <kali> eldub: mongo --quiet --eval 'printjson(rs.isMaster().ismaster)' ?

[14:50:52] <eldub> kali That's pretty close to what I want. Is there a way I can query the replicaset as a whole? or maybe add --host x.x.x.x --host y.y.y.y etc etc?

[14:51:30] <kali> eldub: mongo --quiet --eval 'printjson(rs.isMaster())' ?

[14:51:39] <kali> mmmmm non.

[14:52:10] <kali> try rs.status()

[14:52:19] <eldub> kali I can't NOT specify a host because in my mongodb.conf I have it listening on a certain IP. So whe I put that command in, it says it can't connect

[14:52:47] <kali> ha. just add the ip to the command line

[14:53:22] <eldub> yea then I'm only returning a single value

[14:53:28] <eldub> well

[14:53:35] <eldub> I can script a loop to do each host IP

[14:53:43] <eldub> then I will get results on all 3 hosts

[14:53:45] <eldub> that will prbly work

[15:55:05] <eldub> kali so the commands you gave me earlier worked great -- 1 more question. Is there a way to have it output the hostnames along with the "true / false" print?

[15:55:27] <eldub> kali I took what you gave me and put --host `hostname` in there so now in my script it comes back saying "true or false" but no hostname.

[16:00:40] <eldub> looking for assitance on configuring a host as a hidden member

[16:01:14] <NodeX> close your eyes - problem solved haha

[16:01:29] <eldub> lol

[16:01:35] <eldub> if it were only that eas, eh

[16:01:52] <NodeX> nothing is ever easy on a friday

[16:03:05] <eldub> NodeX I can set 2 nodes in my replica set at the same priority, right?

[16:03:13] <NodeX> yup

[16:04:49] <eldub> ok so if I set 2 of them to say... priority 1 -- then the 3rd to priority 0 that will prevent node 3 from becoming primary

[16:05:01] <eldub> but still have a copy of the data. Am I seeing this correctly?

[16:06:24] <NodeX> I think so, replicas ar enot my strong suit - I don't use them enough

[16:07:11] <cheeser> yes, you'd still have copies on all three replicas.

[16:13:15] <eldub> gimmic for what reasoning

[16:41:00] <zlatko> Hi

[16:41:17] <zlatko> Can somebody tell me what's the correct pattern for reindexing a collection?

[16:41:30] <Derick> zlatko: why would you need to do that?

[16:41:34] <zlatko> I have a collection, and I have text-search (with mongoose-text-search).

[16:41:51] <zlatko> But I need to expose the text-search index "weights" to the users.

[16:41:59] <zlatko> So that they can play with search settings.

[16:42:10] <zlatko> So it's getting weird.

[16:42:17] <Derick> zlatko: drop the index, and recreate it

[16:42:48] <zlatko> Yeah, I am trying. How long should that take? Just a few thousand records, with 4-5 columns being indexed.

[16:43:14] <zlatko> Because I had to turn my model to a callback version. I load the itemSchema from a settings collection.

[16:43:31] <zlatko> Then in the callback, I turn construct the new itemSchema and ItemModel.

[16:43:43] <zlatko> Then return that back.

[16:44:02] <zlatko> But when I try to reindex, my mongo (or node) gets stuck for a while.

[16:44:18] <zlatko> And I didn't yet figure out where or why.

[16:44:35] <Derick> a few thousand? 5 seconds or so

[16:44:37] <Derick> try it :)

[16:44:40] <zlatko> And I have to reindex, because if I just add the new index, then the damn thing complains.

[16:44:57] <zlatko> Well it takes more then that, even on my test coll with only two rows.

[16:45:01] <zlatko> So, something is probably off.

[16:45:05] <zlatko> Don't know what.

[16:45:57] <zlatko> Let me think out loud, write down my sequence of callbacks...

[16:46:16] <zlatko> 1. Load from the settings the property weights.

[16:46:24] <ron> think out loud, but use your inner voice ;)

[16:46:25] <ron> j/k

[16:46:54] <zlatko> 2. in a callback, get to the collection, and drop its index. (Can I drop just a single index by name? How?)

[16:47:01] <zlatko> 3. in a callback, create a new index.

[16:47:27] <zlatko> 4. when it's done, return (call the original callback param.)

[16:47:35] <zlatko> That should be correct, right?

[16:47:55] <zlatko> But I don't get to that original callback, somehow.

[16:48:20] <Derick> you can drop a single index

[16:48:31] <Derick> not sure which language you use, but you can do that

[16:48:45] <zlatko> JS (node.js)

[16:48:53] <cheeser> i figured with all this callback talk

[16:48:57] <Derick> sounds ok then

[16:51:47] <zlatko> Hmm. Thanks.

[16:51:54] <zlatko> Will poke into this some more.

[17:07:23] <zlatko> Derick, thanks for the conversation. I seemed to have fixed it. Now just to fix the angular frontend part and I'm shipping :)

[17:07:29] <Derick> hehe

[17:07:31] <Derick> good luck!

[17:46:25] <eyda|mon> Derick: I've got many clients and I'd like for each one to have their own database. I'll also have a central database to manage users that are common between them. Is there any worry about having a db per client with the knowledge I may end up with thousands?

[17:46:54] <Derick> eyda|mon: no need to ask me directly :-)

[17:47:03] <Derick> eyda|mon: you can have hundreds of dbs

[17:47:14] <Derick> but, you need to realize what the storage requirements are

[17:50:08] <flyankur_> Derick: Do you know about any good resource to understand/learn what different kind of schema design I can for for complex uses cases.

[17:51:24] <eyda|mon> Derick: i know I don't need to, but you seemed friendly and helpful so I chose to :)

[17:52:49] <eyda|mon> Derick: would thousands cause an issue? The other options I have are making collections with prefixes to keep the data separate, or have everything in one database and use client_ids to get the data out. My concern with the last one is I won't know how much a client consumes as far as space.

[17:53:15] <flyankur_> Derick: sorry for asking directly :P

[17:53:27] <Derick> eyda|mon: files require filepoints, of which you have a limited amount

[17:53:35] <Derick> filepointers*

[17:53:53] <Derick> ulimit -a will tell you how many

[17:54:04] <Derick> on my dev laptop it's only 1024 (although you can easily change it)

[17:54:54] <eyda|mon> oh ok, so each database requires a new filepointer.

[17:55:54] <eyda|mon> Out of the three options, which would you have chosen? 1. database per client. 2. prefix-collection per client. 3. just one database, shared collection with client_id to get the data.

[17:56:38] <Derick> eacy *file* does

[17:56:43] <Derick> and each DB is atleast two files

[17:56:52] <Derick> option 1

[17:57:04] <Derick> also think about the overhead of file sizes

[17:57:12] <Derick> even a one document DB takes up quite a bit of space

[17:58:19] <eyda|mon> ok, so option 1 despite the concern about filepointers.

[17:58:22] <eyda|mon> Thank you very much

[17:58:36] <Derick> test it though!

[17:58:37] <eyda|mon> I had asked it on the google groups and here before, but gotten no response

[17:58:47] <eyda|mon> ok

[17:59:11] <Derick> and remember that connections are file pointers too

[18:05:43] <zlatko> Derick not around?

[18:05:56] <zlatko> Ah, you're an op.

[18:06:14] <zlatko> I wanted to bother you or anybody else for some extra help.

[18:06:30] <zlatko> I'm trying to reindex mongoose-text-search text index.

[18:06:36] <zlatko> How can I do that?

[18:06:40] <Derick> zlatko: just ask your questions - people will answer if they can

[18:06:50] <zlatko> Yup, thanks.

[18:09:10] <zlatko> So basically, I am trying to re-index a mongoose schema.

[18:09:16] <zlatko> And I'm failing to do that.

[18:10:00] <zlatko> err, mongoose schema.index thing.

[18:16:09] <Aartsie> why you can't get a document by the ObjectID in javascript ?

[18:41:44] <zlatko> Aartsie what do you mean?

[18:44:21] <Aartsie> i already find out :) there is an function for in the driver

[19:07:00] <slava-work> where can I read about the possible values that rs.status() can return and the int->str mapping?

[19:13:38] <slava-work> also, I think 'syncingTo' is a terrible name for something that should be 'syncingFrom'

[19:18:05] <slava-work> docs-2182 submitted\

[19:48:20] <a13x212> if i want to create a member of a replica set, i can just copy the datafiles from an existing member?

[19:49:26] <cheeser> or just let the system sync itself

[19:50:57] <Tiller> Hi there!

[19:51:28] <Tiller> Stupid question: When I store a byte[] in the DB using java driver, I've got a BinData() into the db

[19:51:50] <Tiller> Is it a "standard" representation of a byte arrays in mongo, or is it the serialisation of the byte array in java?

[19:54:45] <cheeser> it's how the java driver serializes it.

[19:55:18] <cheeser> it's actually more space efficient than storing as a native array and is more often compatible with actual use cases.

[19:55:37] <cheeser> so it's faster to (de)serialize for most use cases.

[19:55:46] <Tiller> ok, but the fact is I've to store the same kind of values from a shell script

[20:13:55] <spacepluk> is there any projection to get the value of a field in the result, instead of field:value?

[20:14:31] <cheeser> "in the result?"

[20:15:50] <spacepluk> for example, i have documents that have the question field. I'd like to get a list of the questions that meet some criteria but I don't want the result to be a list of { question:'value' }

[20:16:04] <spacepluk> but a list of 'value'

[20:20:37] <spacepluk> is that possible?

[20:20:47] <spacepluk> I'm new to this :P

[20:22:23] <cheeser> i don't think it is... maybe with the aggregation framework.

[20:24:40] <spacepluk> ok, I'll take a look

[20:25:05] <spacepluk> is there any limitation to the size of the object I can use to query the database?

[20:25:22] <ron> 16mb.

[20:25:35] <spacepluk> ok, cool

[20:46:31] <cirwin> Is there a ruby client that provides access to sh.getBalancerState() etc.?

[20:46:45] <cirwin> I can obviously re-implement them myself, but it feels a bit like someone has already done this :p

[20:48:09] <cheeser> cirwin: checking

[20:50:04] <cheeser> here's what I got back cirwin :

[20:50:17] <cheeser> if you just print the source for a shell shortcut, e.g.,

[20:50:23] <cheeser> > sh.getBalancerState

[20:50:23] <cheeser> function () {

[20:50:24] <cheeser> var x = db.getSisterDB( "config" ).settings.findOne({ _id: "balancer" } )

[20:50:24] <cheeser> if ( x == null )

[20:50:24] <cheeser> return true;

[20:50:26] <cheeser> return ! x.stopped;

[20:50:26] <cirwin> yeah

[20:50:28] <cheeser> }

[20:50:33] <cheeser> so in this case it's a findOne

[20:50:35] <cirwin> I'm translating that to ruby now

[20:50:41] <cirwin> just wondered if it had already been done

[20:50:41] <cheeser> use the ruby API for that

[20:50:43] <cirwin> cool

[20:50:46] <cirwin> thanks for confirming

[20:50:46] <cheeser> doesn't seem that way

[20:50:49] <cheeser> no problem

[20:54:16] <umi> Can I force mongo to generate a replication log without putting it into a replication set?

[20:54:38] <umi> I'd like to use the replication log to create a real time backup

[21:00:19] <umi> Is there any way to get replication log data without putting monoDB into a replica set?

[21:01:01] <umi> Maybe I should instead call it "oplog" not replication log

[21:35:31] <umi> Can I get the oplog without being in a replication set?

[21:36:22] <umi> Or make a replicaset of one?

[22:26:43] <spellett> Is there any way to run mongoexport from the mongo shell? I'm guessing not, but I thought I might check anyway.

[22:27:09] <Derick> spellett: nope

[22:27:19] <spellett> thanks

[23:54:42] <rafaelhbarros> any idea on how to do hot backups on a large db?

[23:57:08] <cheeser> use the brs service that mongodb offers

Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 1st of November, 2013