[01:46:44] <GoClick> Would MongoDB be a reasonable choice for an application that needs to store 2-7m entries, each with 10-20 fields which change all the darn time, but not exceeding 1kB each, and for which writes are done very frequently and reads maybe only once a week but reads would hit almost all entires?
[01:48:38] <GoClick> Maybe the best analogy I could give would be a web server logging headers sent by clients which you'd want to run reports on later…
[03:36:38] <circlicious> another use going aay from mongo - http://metabroadcast.com/blog/looking-with-cassandra-into-the-future-of-atlas :/
[03:57:37] <unomi> Hi - within an update() is there a reference to the current object?
[03:58:36] <unomi> I see there is someArray.$.arrayitemAttribute
[04:08:34] <unomi> I currently have: db.users.update({_id:ObjectId(user_id)}, {$set:{"stats.deckCount": db.decks.count({user:ObjectId(user_id)})}},false,true);
[04:09:56] <unomi> but I would like to be able to rewrite it for 'true' multi-update where the user._id of the currently matched user can be referenced
[04:32:55] <tiripamwe> hi guys, is the entire javascript language available within mongo or is only a subset available?
[04:33:14] <tiripamwe> if it's only a subset where can i find details about what that subset is?
[05:08:04] <tiripamwe> ranman: i'm using MapReduce... if i want to define a function is there somewhere other than the map or reduce function to define it?
[05:08:22] <tiripamwe> ranman: it seems wrong to put them there...
[05:08:47] <ranman> var someVariable = new function(arguments) { }
[07:46:58] <exi> i just have a small question. if i accidentaly started removing the wrong shard, is there any way to stop the draining revoke the removeshard command?
[07:49:03] <NodeX> I think you have to let it run, then rebuild the shards / index again
[07:50:01] <exi> hm, thats bad. i hoped that i maybe could alter the config database to mark it as non draining again
[07:51:46] <exi> i'll try to remove the draining flag in the config database, maybe that is enough
[08:01:56] <circlicious> is there some mongo client that i can use for ease? like there is phpmyadmin for mysql. would be nice if the client also allowed to execute commands like you can execute sql commands/queries in PMA.
[08:03:17] <NodeX> phpmongoadmin or something like that it's called
[08:44:57] <Derick> Nicolas_: what does phpinfo() show for "extensions_dir" ? that needs to match '/usr/lib/php/extensions/no-debug-non-zts-20090626/'
[08:46:19] <Nicolas_> it matches with '/usr/lib/php/extensions/no-debug-non-zts-20090626/'
[08:46:19] <falu> With pymongo I'm "losing" connections. I have a for-loop with about 10-15 db-calls and after some iterations there are no more connections available. Why does this happen?
[08:55:13] <circlicious> so if i specify $set modifier option in .update() its going to replace the entire document or just change values?
[08:55:37] <circlicious> i read somewhere mongodb replaces entire documents, so one should not do too many updates or something
[08:59:58] <falu> circlicious: { $set : { field : value } } only changes "field", not the whole document
[09:05:42] <circlicious> ok. also how do you update "all" documents? like i just dont want to pass in any criteria
[09:06:00] <circlicious> or maybe some critera that holds true for all documents ?:/
[09:58:58] <diegok> DigitalKiwi: as I understand, count will scan all returned by index.
[10:01:33] <NodeX> it's a very tricky thing to optimise for
[10:02:34] <NodeX> personaly I would cache the query and it's count in memcache/redis and check that first
[10:03:10] <DigitalKiwi> oh this made me think about something
[10:03:50] <diegok> NodeX: yes, thats the way... but it's very uncomfortable on some situations
[10:04:09] <DigitalKiwi> so say you have a website, and people are browsing on a paginated list, they go to a new page, data gets added, then they go to a new page, how do you make it so that the data they see is not repeated (or skipped)
[10:04:10] <diegok> you can also cache total count on mongo :)
[10:04:12] <NodeX> I wouldnt let wide queries on the system
[10:04:45] <NodeX> DigitalKiwi I said cache the query not it's results
[10:04:58] <diegok> but it gets tricky when multi-key indexes are used.
[10:05:08] <NodeX> use a hash of the query paramters and the count() of that as the key/value
[10:05:14] <DigitalKiwi> NodeX: i meant this discussion not the cache part
[10:05:32] <NodeX> then I don't understand the question
[10:05:54] <NodeX> why would they see the data repeated?
[10:07:01] <diegok> in my case I should filter by 3 fields, each field has lot of values. Cached counts are a lot of them... but yes, that's the way :-/
[10:08:18] <NodeX> some people internaly shard on large data so the counts are always small
[10:08:30] <NodeX> then if page > 100 go get the count from archive
[10:10:34] <DigitalKiwi> skip(10).limit(10) ; add 10 to the database; skip(20).limit(10); shows the same results as the first query, right?
[10:11:26] <diegok> NodeX: ok, that should make it better as scanning is spreaded.
[10:11:48] <NodeX> DigitalKiwi : no because you added 10 records
[10:12:07] <NodeX> well yes and no - depends on your Sort I suppose
[10:12:23] <NodeX> but it will shift the last ten records up the chain 10 places
[10:12:57] <DigitalKiwi> right, and now you're looking 10 further (but since they were shifted it's the same 10 you just saw)
[10:14:03] <NodeX> because it's not page 2 anymore it's page 3
[10:18:44] <DigitalKiwi> hrm, maybe, say you're sorting by date added, so _id works, so keep track of what _id the lowest _id they've seen, and then do $lt on the _id and limit to however many you want, update the _id they're on?
[10:20:44] <NodeX> surely that's alot of hassle when the user can just click back a page?
[10:21:26] <diegok> DigitalKiwi: how you solve that with other backends?
[10:21:45] <diegok> DigitalKiwi: thats not a db issu I think...
[10:23:10] <NodeX> If I was going to tackel the problem I would save the document id's in an array with the page number in session data and see if they had been there before and then show a message saying "these results may be stale"
[10:23:42] <DigitalKiwi> NodeX: my problem case also involves jquery.infinitescroll.js so there's no page to click back on really, and it would duplicate IDs in html, but maybe there are ways to avoid that...
[10:25:02] <DigitalKiwi> maybe I should just not use infinitescroll.js and use some api :(
[10:27:03] <NodeX> can you not add a callback to the scroller ?
[10:27:54] <NodeX> I do something similar on a site where articles (documents) are updated and I push the results into the browser with replaceWith();
[10:29:29] <DigitalKiwi> it's possible this even handles the case of duplicates and I'm all concerned over nothing :(
[10:29:54] <circlicious> oh man this is mad, i am not using type castig much and then fetching reusltsets in PHP is becoming mad
[10:30:03] <NodeX> I doubt it but you could adapt the code to handle dupes
[10:30:18] <NodeX> circlicious : I did warn you yesterday!
[10:31:42] <circlicious> yes, but now i understand what you said yesterday :D
[10:32:25] <circlicious> sweet dude :D app works great :D
[10:53:49] <txm> Hey - my first look at mongo DB - it comes free with meteor.com. I've a new collection. each document(?)/row has a name and I can see an _id. is there an auto-inc id available to me? (I understand this may mean I don't understand NOSQL ... but a UUID isn't convenient for users to speak)
[11:03:58] <ron> txm: consider the need for an actual sequence though. think on how you want to query the data and not what 'feels' natural to you.
[11:08:01] <txm> sincerely appreciated ron; in practice it won't be used much or at all. the anchor can still be linking to the object_id - I just know users will feel more comfortable with a sequence_id column when speaking to each other even if it's of no technical merit.
[11:11:59] <ron> txm: well, I'd still use the ObjectId as the _id, you can use an additional id for external usage. we do something similar (though the external id is also objectid).
[11:13:11] <diegok> txm: but after a while users stop liking id's like 9887747774201 :-p
[11:13:31] <txm> ron: I won't touch _id, <a href's will still reference it so internally the app will reply on it - I just need a short tag for users
[11:17:21] <ron> yeah, but keep in mind that it's not like a relational database where you have sequences you can use in an insert operation. it will be at least two separate operations.
[11:20:12] <diegok> well, inserts are rare so doing two ops it's ok probably... :-/
[11:24:06] <txm> if you have 200 clients and some with similar names ... oh never mind - I'm over it now - I'm bound to get tripped up on something else,
[11:25:41] <txm> the real point is I should be using a RDBMS for this data - but mongo is what meteor comes with. And I'm sometimes happy to learn something new :)
[11:40:05] <DigitalKiwi> tell them they can't have both >.>
[11:41:50] <txm> Wouldn't they just point to the moon and say something along the lines of "yada yada we can get to the moon .. .. lorem ipsum .. .. but I can't have meteor and an auto inc ID next to each row?"
[11:41:58] <txm> I think they can have both, and the moon on a stick.
[11:42:50] <DigitalKiwi> tell them they aren't rows they're documents
[11:43:24] <txm> I think we're all being silly now, no?
[11:46:13] <DigitalKiwi> just tell them the truth, that meteor doesn't use a relational database, there is no concept of incremental IDs, and that implementing them is just not something that can be done safely
[11:46:32] <NodeX> incremental id's are easy in mongo
[11:50:16] <txm> so ... new question. My 8 clients request jobs. Usually I'd relate jobs to clients. I could then count jobs per client. Is the NoSQL method to count jobs per client or is it to keep a count of jobs in the client docuement?
[11:50:31] <txm> bear in mind I can still order post-its ...
[11:51:23] <DigitalKiwi> i'm feeling stupid now :(
[11:52:01] <NodeX> txm : it's however is the most efficient to query
[11:52:15] <NodeX> i/e do you query job counts more than clients?
[11:53:55] <txm> NodeX - in reality this is a tiny app so i guess I can do either. I'm going to display a table (DigitalKiwi that's an HTML table not some css3 fart with fading transitions and rounded corners) and display a client on each row, along with a count of jobs.
[11:54:18] <txm> I'll do it on the fly, and refactor when client ID 9 joins us :)
[11:57:05] <NodeX> then I really wouldnt worry about how you factor your schema
[11:57:24] <NodeX> whatever works for you - it's a small app so it wont make a bit of difference on performance
[12:58:06] <UnSleep> mmm it is a must to duplicate the database into another collection to manage info with map reduce?
[13:00:38] <UnSleep> i want to create a "infinite" number of id asociations-relationships and i dont see logic to create a new collection for each user with the full database...
[13:02:56] <UnSleep> its for a search engine that show different results for each user
[13:12:33] <UnSleep> the idea is to put an array with all the ids in and out, but... the ids havent got all the data and duplicate data did not look very good idea (until now)
[13:13:27] <UnSleep> the problem is that the info can go into an infinite loop
[13:14:18] <UnSleep> i think i will need to find another schema
[13:15:13] <UnSleep> but i dont see another way to go with this...
[13:15:48] <UnSleep> its an object with lot of info what can being associted to other objects in the same collection (with all their info)
[13:17:05] <UnSleep> person -> like -> thing -> extradata
[13:17:25] <UnSleep> and those things are "objects" in the same collection
[13:18:22] <UnSleep> "person" can "like" "like" itself too
[13:28:12] <Bartzy> I have a few more questions, more relevant to adminitrating mongo
[13:28:20] <Bartzy> I can find out the total size of indexes.
[13:28:23] <UnSleep> its crazy to do a new query for each result?? for example "object a have 2000000000... objects associated in an array" can i chosee a part of that array (0-10, 10-20) and then do findone for each
[13:28:39] <Bartzy> However - if some indexes are rarely used (but needed when they are used) - they don't need to be in memory, right?
[13:29:21] <NodeX> "MongoDB uses memory mapped files for managing and interacting with all data. MongoDB memory maps data files to memory as it accesses documents. Data that isn’t accessed is not mapped to memory."
[13:29:39] <Bartzy> Also, if I have 6GB of RAM, and 5GB of indexes, that doesn't mean that my indexes will always be in RAM - because mongo mmap's data (actual documents) too, so the indexes will page out and in all the time ?
[13:29:57] <Bartzy> NodeX: OK, thanks. That answers the rarely used index question
[13:31:12] <NodeX> Look at the working set statement at the bottom
[13:31:35] <NodeX> UnSleep : you probably want a graphing database for your query
[13:32:13] <UnSleep> yep but there is not any cheap solution for that :(
[13:32:19] <Bartzy> NodeX: Yeah, saw that. But for reasonable performance, if you can't keep most of your working set in RAM - you need to at least keep your indexes in RAM - right ?
[13:32:58] <NodeX> Bartzy : that's where sharding comes in
[13:33:10] <Bartzy> NodeX: So you think - my RAM is 6GB, my indexes are 3GB - great! I have room to spare.. but that is not correct since if you have a big working set, your indexes will get swapped out from memory because of that data, even though they are more important in RAM than that data.
[13:33:11] <UnSleep> i will take a look at this http://www.mongodb.org/display/DOCS/Trees+in+MongoDB
[13:33:55] <NodeX> I would imagine the indexes have a higher place in memory plus they work on an PRU
[13:56:38] <houms> good day all we are trying to setup 3 servers with mongodb and use them in test env. we would like to use repl. sets and sharding. our plan was to run mongod mongos and config on all three. is this feasible?
[13:59:11] <tncardoso> houms: you should check this link http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture
[14:03:20] <houms> thanks tncardoso. reading it now. and I assume the config params. can all be specified within the mongod.conf correct?
[14:03:53] <tncardoso> houms: yes, You can set every parameter of the command line in the config file. The -f flag specifies wich config file to use.
[14:04:33] <mediocretes> so, this is a thing about which people might have opinions: http://nosql.mypopescu.com/post/27131117723/from-mongodb-to-cassandra-why-atlas-platform-is
[14:04:55] <houms> tncardoso is it true that if you plan on sharding then you will need 3x the amount of servers per shard since each will have to have its own repl. set?
[14:05:31] <houms> in the link you posted it shows 3 shards and 9 hosts
[14:06:00] <tncardoso> houms: its a good thing to have 3 machines in the replica set. You will thank this when in maintenance
[14:06:26] <houms> but if we plan on using only 3 servers in production then is it best not to use sharding?
[14:06:45] <tncardoso> houms: usually you should first scale vertically
[14:07:06] <tncardoso> houms: only consider sharding if one very good machine is not capable of handling your load
[14:07:32] <houms> so really using 3 good machines with repl set may be enough depending on our load?
[14:20:59] <houms> is there a way to intiate a repl set from conf file as opposed to logging into the db to initiate?
[14:40:10] <_simmons_> Hi everybody.. I'm trying to use pymongo to do a nagios monitoring script.. but I can't/or dont know how to send commands to mongo after the connection is created..
[14:40:49] <_simmons_> There is no method to do a "dbisMaster()" for example
[14:40:56] <algernon> have you consulted the tutorial? http://api.mongodb.org/python/current/tutorial.html
[15:37:48] <mw44118> Is there a way to do a bulk delete in mongo? I have a list of ObjectIDs and I want to delete them all. Right now, I'm looping through each and deleting them one-by-one.
[16:23:23] <Aram> hi all. is there a way to start a single mongod with --replSet and force it to be a primary? this is only for testing/debuggin. right now I'm using the usual rs.initiate(), but it takes 30 second to completion. Our tests run several mongod instances and take 200 seconds to complete, while without --replSet they take only 2 seconds.
[16:34:07] <Aram> or is there some other way to speed up the process?
[16:34:45] <Bartzy> Why you can't easily kill an index creation job ?
[16:44:15] <BurtyB> I'm looking at sharding an existing collection - I see it's "limited" to 256GB the easy way - I assume that doesn't include indexes?
[16:55:06] <Bartzy> Why you can't easily kill an index creation job ?
[16:55:32] <Bartzy> also, when using mongodump to dump the system.indexes collection - that does not backup the indexes ? mongodump can't backup indexes ?
[16:56:01] <NodeX> the indexes are backed up when you dump
[17:01:08] <Bartzy> NodeX : But not when you backup a single collection?
[17:01:20] <Bartzy> How are they backed up , if mongorestore just inserts the data when restoring ?
[17:01:28] <Bartzy> So indexes are created on the fly , or by restoring them from backup ?
[17:04:32] <BurtyB> Bartzy, they're created after all of the data has been restored iirc
[17:10:55] <hdm> Any thoughts on how to recover from this? Tue Jul 17 12:10:21 uncaught exception: count failed: { "errmsg" : "10320 BSONElement: bad type 71", "ok" : 0 }
[17:14:04] <hdm> argh, points to corruption, no fun
[17:17:01] <txm> if each document has a flag, deleted. Is there any efficiency in using 1/0 over Yes/No or Y/N?
[17:27:31] <NodeX> backs up = yes it does backup indexes ... AFTER it has dumped the file
[17:28:44] <Bartzy> NodeX: So indexes are not created on the fly when restoring?
[17:29:03] <chubz> How come I can't use any other port than 27017 with mongo? I started a mongod process with the port 27018 but I can't "mongo localhost:27018". i even tried turning off iptables, any ideas?
[17:50:34] <NodeX> [18:05:16] <NodeX> data goes in, then index is run
[17:51:48] <Bartzy> NodeX: I'm sorry, but you got 2 answers that are not the same
[17:52:11] <Bartzy> If while in mongorestore data goes in mongodb, and the index is run (meaning ensureIndex is running?), then mongodump IS NOT backing up the files.
[17:53:25] <NodeX> mongo backs up the data and index with mongo dump
[17:53:35] <NodeX> then it restores the DATA then INDEX with mongo restore
[17:54:58] <progolferyo> does anyone have any ideas about this one. I have a replica set primary that is getting super overloaded, 'ar' and 'aw' in mongostat are through the roof. this replica set is one of the shards in my cluster. there are only writes going on and disk is just getting hammered, but the weird thing is on the read side. if i do iostat, reads are super high
[17:55:22] <progolferyo> so it feels like there is something that is causing reads to go through the roof o this primary, even though no queries are hitting the box
[17:56:54] <Bartzy> NodeX: Got it. So restoring won't take a long time because of index creation, because indexes are backed up too, if I understand correctly ?
[17:57:08] <Bartzy> NodeX: Also, when backing up a single collection - indexes for this collection only will be backed up by mongodump ?
[18:02:19] <TheEmpath> I have discovered that it is easier to justify the usages of dbrefs than it is to learn how to use doctrine to autopopulate manual references
[19:47:23] <unomi> what kind of magic must I invoke to get this functionality?
[19:54:47] <dstorrs> hey all. Got some weird behavior here, any ideas?: I am connected to the right machine, on the right port, talking to mongos as expected. I do a 'save' with 'safe' set. I get an ID back. No object hits the DB.
[19:55:10] <dstorrs> I have verified that I wrote to & looked in the correct collection each time.
[19:55:21] <dstorrs> there's nothing in the log at all from the last 7 hours.
[19:56:27] <dstorrs> any thoughts? this is a mission-critical issue right now, and I'm a bit stressy over it.
[19:57:05] <dstorrs> btw, when I said "I am connected to...", I meant "my app code is connected to..."
[19:57:54] <hdm> dstorrs: check the other shards? maybe its getting stored in the wrong db
[19:58:01] <hdm> like the config db for some reason
[19:58:59] <dstorrs> hdm: I'll check. But I dumped my database handle and it says it is connected to the "cm_prod" DB (correct) on the 27017 port on the correct IP.
[19:59:38] <kchodorow> dstorrs: you're using perl, right?
[20:18:59] <dstorrs> Ok, I don't know if this is remotely relevant, but we are currently running a dangerous configuration -- our replica sets are one primary box, two arbiters
[20:37:53] <dstorrs> ok, where would I RTFM for how to fix this?
[20:38:05] <dstorrs> "$err" : "dbclient error communicating with server: dbs1d1:27018",
[20:38:42] <dstorrs> ah. actually, that might be the issue -- the hostname changed. I don't know why it's using dbs1d1, since it was never sharded with that name.
[20:41:48] <mrfloyd> i have a collection where i store stories, each story have photos from different users, and each photo has comments from different users
[20:48:05] <dstorrs> mrfloyd: what's your question?
[20:49:04] <mrfloyd> i am storing the username inside each subobject photos, comments etc.. so i use it whenever i need to displaying without refering to the users collection
[20:49:40] <mrfloyd> i need to know how should i design my stories collection to be able to update this nested field easily
[20:51:29] <dstorrs> first of all, why would you want to? if user X made comment #778383, why would you want to change it to say user Y ? are users really allowed to change their nicks in your app? that's going to be more trouble than it's worth.
[21:13:11] <lahwran> if I delete everything with a "modification_date" of less than now, then will documents without that field be deleted too?
[22:03:34] <dstorrs> I have a a 2-shard system, 3 configs. Each shard is a 1-member repl set (I know; bad). The hostname of both DB machines changed. If I use the "change hostname within a repl server" procedure from the docs, will that propagate to the config servers?