pmxbot IRC Log Viewer

[00:37:26] <Arelius> Is it possible to use a mapreduce to update all records in a database?

[01:39:15] <patforg> Hi I'm stuck and I need help

[01:39:49] <patforg> I get this error when ever I try to do something "Can't take a write lock while out of disk space" even when deleting to make free space

[01:40:51] <patforg> I thought my sharding was working... turns out gridfs doesn't shard by default

[01:41:08] <patforg> eventhough I enable sharding on my DB

[01:46:26] <patforg> any ideas on what should one do?

[03:30:08] <vrim> is there some metric collection in the mongo java driver - something like application success/error count?

[04:08:12] <therealkasey> anyone know if it's possible to stop an index building operation?

[04:08:37] <therealkasey> I set up a unique index on a field in a collection that already had duplicates before I started the operation

[04:31:07] <dstorrs> anyone about?

[04:31:26] <ron> why not just ask the question? :)

[04:31:50] <dstorrs> because usually when I ask questions after 5pm I get Warnocked for hours, and it gets frustrating. :>

[04:32:00] <dstorrs> so, question --

[04:32:14] <dstorrs> I've built a job manager using Mongo as the datastore.

[04:32:26] <dstorrs> it's for a harvesting system that writes data about YouTube videos.

[04:32:41] <ron> well, just because people are around, doesn't mean they're going to answer ;)

[04:32:46] <dstorrs> I'm thinking about breaking the "job control" collections out into a separate mongod from the data

[04:33:15] <dstorrs> but I'm not sure that's a good idea. This post (http://stackoverflow.com/questions/9203418/mongodb-sharding-on-one-machine) has Derick saying that mongod's will fight for all RAM.

[04:33:35] <dstorrs> however, the jobs collections are a very small set (a few dozen megs at most), so

[04:33:52] <dstorrs> I'm wondering if that's really "all ram on box" or "all ram needed for dataset"

[04:33:58] <ron> why use a different mongod for it?

[04:34:19] <dstorrs> because we are currently bottlenecking on the write lock.

[04:34:41] <dstorrs> this is based on extensive profiling / experimentation.

[04:34:49] <dstorrs> we know it's the write lock.

[04:35:34] <ron> okay, fair enough. once you move to mongo 2.2, it won't be much of an issue though, but for now it's understandable.

[04:35:49] <dstorrs> yeah, I just spent most of the day reading up on 2.2.

[04:36:05] <dstorrs> I'm seriously considering trying 2.1rc1 as our production system.

[04:36:13] <dstorrs> but thought I'd look at this first.

[04:36:43] <dstorrs> the idea of separating "transient control structures" from "persistent data store" seems like a good one anyway

[04:36:46] <ron> I imagine you can't pull off a small server for the additional mongod?

[04:37:10] <dstorrs> I could, but would prefer not to pay the extra costs if not needed

[04:37:23] <dstorrs> startup == pennypinching by requirement

[04:37:28] <ron> well, for transient data you have other solutions (that is, other than mongo)

[04:37:39] <ron> oh, I work at a startup too, no need to explain ;)

[04:37:42] <dstorrs> :>

[04:38:05] <dstorrs> true. I could throw it all into memcache or such. but that requires changing the code

[04:38:20] <ron> yes, only not memcache.

[04:38:20] <dstorrs> which I don't want to do, because what we have works really really well.

[04:38:55] <ron> well, it doesn't work really really well, you have a global write lock issue ;)

[04:39:23] <dstorrs> it works brilliantly for the case it was actually designed for -- harvesting feeds from YouTube in semi-real time

[04:39:34] <dstorrs> we're pulling about 10M videos / hour

[04:39:51] <dstorrs> the problem we're having is when we try to retro-harvest our old feeds.

[04:40:11] <ron> that's so.. retro.

[04:40:15] <dstorrs> :>

[04:40:40] <dstorrs> these are feeds that we collected but have never had the disk space to use until now.

[04:41:46] <dstorrs> I reused the harvester architecture to pull them out of CloudFiles, untar / split / queue up harvest jobs / process them. It works so well that it ends up with a bazillion harvest jobs and it chokes out.

[04:42:13] <dstorrs> I've added a usleep(125,000) and that helps a lot, but that's insane.

[04:42:58] <ron> yup, I can understand the issue.

[04:46:43] <acidjazz> hey all

[04:47:03] <acidjazz> so i setup a 3 node replica set and the beefier server wasnt elected master.. any way to specify i want that one to be master?

[04:47:09] <acidjazz> ofcourse unless something happens to it..

[04:56:57] <bemu> acidjazz: yes

[04:57:56] <bemu> acidjazz: http://docs.mongodb.org/manual/core/replication/#replica-set-node-priority

[04:59:31] <acidjazz> oo it just decided to be come primary

[06:27:44] <tknz> Hey guys. How would I group by year of a date field?

[06:29:50] <tknz> or select distinct years ...

[06:30:47] <tknz> finding it a struggle to do any complex sql like operations

[06:32:41] <_johnny> tknz: use keyf: http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-UsingGroupfromVariousLanguages

[06:33:30] <_johnny> example which should fit your question: http://stackoverflow.com/a/9388189 just change the dateKey to date.getFullYear()

[06:34:01] <tknz> Ah I see

[06:47:40] <edmund> hi. i have a question about upsert with modifiers.. what i would really like to do is 'set a path through the matched object' .. e.g. if i do soemthing like upsert {id:1} {$set: {a: {$set: {b: 5}}}, then {id:5, a:{c:7}} becomes {id:5, a:{b:5,c:7}}

[06:47:54] <edmund> from poking around a bit it seems like this isnt possible. or am i doing something horribly wrong?

[06:48:30] <edmund> it's okay i just went to freenode $mongodb

[06:48:33] <edmund> whups

[06:49:56] <edmund> anyone here? :)

[06:51:16] <crudson> {$set:{'a.b':5}}

[06:52:20] <edmund> oh!

[06:52:22] <edmund> let me try that

[06:54:59] <edmund> hey, that worked. thanks, crudson

[06:55:08] <crudson> edmund: pleasure

[06:59:06] <niram> hi, db.stats() shows 4 collections, but show collections only shows 3..

[06:59:30] <niram> any idea what causes this?

[07:05:09] <crudson> niram: db.system.namespaces doesn't show in 'show collections'

[07:06:31] <crudson> http://www.mongodb.org/display/DOCS/Mongo+Metadata

[07:10:40] <circlicious> how good is mongodb at updating, if i wanted to update a document many times. like 100 times a minute

[07:11:10] <Mortah> 100 times a minute is nothing

[07:11:21] <circlicious> really?

[07:11:23] <Mortah> unless you're running on a really bad system

[07:11:23] <Mortah> :D

[07:11:30] <circlicious> basically i want to make a realtime app.

[07:11:34] <circlicious> 100 times a minute for 1 collection

[07:11:40] <circlicious> total considering all users could be thousands

[07:11:48] <Mortah> ah

[07:11:48] <circlicious> but for different collections ofcurse

[07:11:53] <circlicious> sorry i mean document not collection

[07:12:10] <circlicious> the same collection but different documents

[07:12:10] <Mortah> so, 100 updates a minute multiplied by thousands of users?

[07:12:44] <circlicious> that much load is quite hard to reach but you 100 updates a minute multiplied by 100s of users is feasible

[07:13:08] <Mortah> if its all in memory, then that is doable

[07:13:09] <circlicious> i mean that load is easy to get, so how well wouldd mongodb perform there

[07:13:17] <circlicious> see i noticed inserts is very fast

[07:13:29] <circlicious> but update is as ok as mysql, and i read its not vwery good/performant at updates

[07:13:45] <circlicious> in memory hmm

[07:13:50] <Mortah> try it

[07:13:53] <circlicious> you mean without safe writes?

[07:14:00] <Mortah> even with

[07:14:10] <Mortah> make a little script that simulates your usage and see how Mongo copes

[07:14:37] <circlicious> hm

[07:14:46] <Mortah> I'd imagine it depends on the capacity of your box

[07:14:56] <circlicious> i can buy bigger box from linode :D

[07:15:11] <circlicious> the thing is, i can make a script and test for 100 up[dates a minuite, but how do i simulate for 100 users?

[07:15:19] <circlicious> something like apache benchmarl ?"

[07:15:24] <circlicious> *benchmark

[07:15:41] <Mortah> I mean, simulate the queries your users will cause

[07:16:01] <circlicious> ok

[07:16:04] <Mortah> somethng like... infinite loop: each time, pick a random user and do the update/queries they need... repeat

[07:16:30] <Mortah> maybe add some bias to the probability for each user to get 'heavy usage' / 'low usage'

[07:16:35] <circlicious> ok i will try it out, i hope it works well, else my heart will break :P already made most of the transition fo this part of app from mysql to mongodb

[07:17:26] <Mortah> check out zeromq as well

[07:17:32] <Mortah> sounds like your're building a messaging system

[07:33:10] <circlicious> Mortah: do you know about etherpad?

[07:36:50] <[AD]Turbo> hola

[07:49:09] <tknz> can someone help me with this. I'm honestly not sure what I'm doing but I'm trying to group by a keyfunction (the year part of the date field). http://pastebin.com/TegGyTDS

[07:49:12] <tknz> C# driver.

[08:02:41] <Derick> dstorrs: it's all ram for the dataset

[08:03:22] <dstorrs> Derick: meaning each mongod will try to grab 100% of the RAM on the box?

[08:03:34] <dstorrs> even if it has a tiny dataset?

[08:03:47] <Derick> no, I meant for the dataset

[08:04:55] <dstorrs> ...so, say I've got a 32G machine. one mongod has 20M of data, the other has 300+G

[08:04:58] <dstorrs> what happens?

[08:06:06] <Derick> that should work

[08:06:21] <Derick> but, it's... something I wouldn't advice doing anyway. Why can't you have one mongod?

[08:06:55] <dstorrs> write lock contention

[08:07:20] <dstorrs> I'm using Mongo as the coordination point and datastore for a job management system.

[08:07:53] <dstorrs> the workers spend enough time twiddling jobs around that it cuts into their ability to do work.

[08:08:05] <Derick> dstorrs: shouldn't be a prob with 2.2 though

[08:08:12] <dstorrs> yep.

[08:08:24] <dstorrs> I spent most of today reading up on 2.2

[08:08:26] <Derick> got to go now, otherwise I'll be late at work ;-)

[08:08:32] <dstorrs> sadly, that's not an option yet.

[08:08:37] <dstorrs> g'night

[08:08:45] <Derick> hm? it's 9am here!

[08:09:50] <dstorrs> oh.

[08:09:57] <dstorrs> well, g'morning then. :>

[08:09:59] <dstorrs> it's 1am here

[08:10:16] <dstorrs> when is 2.2 likely to be released?

[08:51:59] <tknz> can you do a keyfunction on a distinct?

[08:54:28] <tknz> MongoDB only looks pretty in demos I swear. You try to do anything that's not a basic select or update and it's a bloody nightmare.

[08:56:49] <NodeX> what's a "nightmare" about it?

[09:37:15] <woozly> guys, what to do? MongoDB works very slowly, when stored data grows :/

[09:37:27] <woozly> really really slow :(

[09:37:36] <woozly> but I ensure indexes and other..

[09:54:58] <IAD> woozly: be sure, what indexes are correct for your find operations. and use profiler http://www.mongodb.org/display/DOCS/Database+Profiler#DatabaseProfiler-Viewslowoperationsonly

[09:58:30] <tomoyuki28jp> Does MongoDB have a function like mysql's enum type?

[10:01:02] <woozly> I see while quering, mongod CPU% usage 99-100%

[10:01:12] <woozly> what it can be? data stored = 5gb

[10:01:20] <woozly> CentOS on virtual machine

[10:02:01] <kali> woozly: look for long running queries in db.currentOp()

[10:03:31] <circlicious> how do you manage things like indexing ? do you just store all indexing queries in a txt file and then simply execute each of them in mongo shell of server(production)?

[10:07:59] <NodeX> that's one way

[10:09:03] <circlicious> whats the other cool way?

[10:10:30] <NodeX> I store mine in an installer file and execute it - similar to the above way

[10:13:31] <woozly> dammit, can't understand what's wrong... I have 5 000 000 objects, = 5.35gb

[10:13:45] <circlicious> NodeX: how is it done can you tell me ? how can i execute a file with mongodb commands ?

[10:13:45] <woozly> I have field: "sent" which has own index

[10:14:08] <woozly> when I try: db.collection.find("sent": 1).count() ... it took toooooo long...

[10:14:42] <NodeX> count() with criteria is expensive operation on large datasets

[10:14:59] <NodeX> circlicious : my file is a php file - I point my browser at it

[10:15:16] <circlicious> oh ok

[10:15:22] <circlicious> some way to do via command line ?

[10:16:07] <NodeX> http://www.mongodb.org/display/DOCS/Scripting+the+shell

[10:38:11] <woozly> NodeX: but which alternative for count() ? :/

[10:39:05] <NodeX> there isn't one

[10:39:22] <NodeX> how many documents are in your index, what are your server specs?

[10:43:03] <woozly> NodeX: how to check it ? :/

[10:43:16] <woozly> I have 5 000 000 documents in collection

[10:43:24] <woozly> ~ 5gb

[11:11:01] <newcomer> hi

[11:11:20] <newcomer> I have a mongodb that does not get dropped, any ideas

[11:11:25] <NodeX> what are your server specs woozly

[11:11:39] <NodeX> newcomer : how are you executing the comment

[11:11:41] <NodeX> command *

[11:12:16] <newcomer> after logging into that db ....... db.dropDatabase();

[11:12:26] <newcomer> works for others except this 1

[11:13:04] <NodeX> is it building an index?

[11:13:16] <newcomer> nope ... no operations

[11:13:27] <NodeX> try to restart your mongo then issu again

[11:14:39] <newcomer> what's the trick? it worked ...

[11:15:10] <NodeX> it would sever the any connections to the db

[11:15:19] <NodeX> you might have had one floating in the pool

[11:15:34] <newcomer> conn leaks ? ur scaring me ...

[11:15:42] <newcomer> anyways thanks.

[11:15:42] <NodeX> UNCLOSED

[11:59:45] <fredix> hi

[12:00:10] <fredix> when I try to use mongo::fromjson compil fail on this error : /home/fred/Dropbox/code/nodecast/ncw/externals/mongodb/db/mongomutex.h:235: error: call of overloaded 'msgasserted(int, std::basic_string<char>)' is ambiguous

[12:53:40] <fredix> https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/LvA9Fx1RLlk

[13:17:35] <jpadilla> hey guys, quick question. I have a map loading a couple of markers, I also have that data on Mongo with the geospatial index. I'm querying using within box, but the results I'm getting are not "complete". In the map I'm seeing at least 5, in the mongo query I'm being returned one.

[13:28:04] <NodeX> check the distance

[13:32:39] <jpadilla> NodeX: I'm using the boundaries of the whole map. This is the markers I'm seeing on the map, and only one result of the query on the side https://img.skitch.com/20120724-twq64b3k9i34s5qqc51f3qtmwd.jpg

[13:33:27] <NodeX> can you pastebin the query?

[13:39:04] <jpadilla> NodeX: https://gist.github.com/3169944

[13:49:16] <NodeX> did the pastbin put the space in after the "-" ?

[13:50:03] <jpadilla> I just removed it think it was that, but nothing different

[13:50:08] <jpadilla> must have been jsbeautifier

[13:50:56] <NodeX> my only suggestion is thta the coords are slightly out

[13:51:01] <NodeX> that*

[13:53:36] <jpadilla> NodeX: its supposed to be [southwest_coords, northeast_coords] right?

[13:55:02] <NodeX> To query for all points within a rectangle, you must specify the lower-left and upper-right corners:

[13:55:13] <NodeX> so yes

[13:56:37] <jpadilla> I'll have to try to draw a rectangle and get the boundaries of that to test more

[13:56:38] <jpadilla> thanks

[13:56:53] <NodeX> try with less decimal points

[13:56:55] <NodeX> or not

[14:49:52] <kylefaris> I've created a collection in my node.js application but it's not showing up in the mongodb console. Is this normal?

[14:50:46] <kylefaris> do they have different scopes or something?

[14:51:39] <NodeX> did the node thing execute it?

[14:52:05] <kylefaris> I'm not sure I'm following

[14:53:15] <kylefaris> I did something like this: var mongo = require('mongojs').connect('localhost', ['clients']); mongo.clients.save({foo:'bar'});

[14:54:19] <kylefaris> and it works great. I can even restart nodejs and mongodb and the data is persistent (as expected). But, if I open the mongodb console in the command line, the 'clients' collection isn't even listed as a option to use.

[14:59:24] <kchodorow_> kylefaris: are you switching to the correct db in the shell

[14:59:25] <kchodorow_> ?

[15:01:46] <kylefaris> yeah, definitely

[15:01:52] <kylefaris> well, I mean, I think I am

[15:02:11] <kylefaris> when I do `show collections` it doesn't show up

[15:02:27] <kylefaris> just system.indexes

[15:03:30] <kylefaris> or are collections not the same as databases?

[15:04:57] <kylefaris> hmmm... it would seem so... I just did `show dbs` and different stuff shows up

[15:05:08] <kylefaris> well, this is enlightenin

[15:05:09] <kylefaris> g

[15:07:50] <kchodorow_> kylefaris: have you used relational dbs?

[15:12:08] <NodeX> use "database" .. minus the quotes

[15:12:25] <NodeX> then you can do your db.collection.find({....

[15:15:56] <kylefaris> yeah, I'm very familiar with MySQL/MSSQL

[15:16:27] <kchodorow_> kylefaris: db=db, table=collection

[15:16:35] <kylefaris> yeah, I think I'm figuring that out now

[15:16:40] <kchodorow_> :)

[15:17:13] <kylefaris> I've been playing around in the console and it's becoming more clear now

[15:17:19] <kylefaris> thanks for the help!

[15:17:55] <kylefaris> now to clean up all the random dbs I have... ugh...

[15:18:20] <kylefaris> are there any required dbs? like... say, `system` looks important

[15:18:46] <kylefaris> says it's empty, though

[15:19:46] <kchodorow_> system collections are used by mongo

[15:19:59] <kchodorow_> admin, local, and config dbs are important

[15:20:35] <kchodorow_> so don't drop any of those

[15:34:33] <kylefaris> kchodorow: thanks! But, let's just say I did delete the `local` database... what would be the consequences? would I need to re-install mongodb or just reacreate the database?

[15:35:04] <kylefaris> I don't have anything I'm worried about losing at this moment

[15:35:51] <kali> kylefaris: as long as you don't use Replica Set, you can afford to ditch local

[15:37:50] <kylefaris> kali: Thanks so much

[17:35:23] <Almindor> hey, what can cause connection timeout resets from mongodb? We had a big import script (reads data from M$ SQL and puts them in mongoDB) and sometimes after a random amount of time we got connection resets

[17:35:46] <Almindor> usually when they started it was like a "buffer" was filled and they kept creeping up after a few 100 documents

[17:41:13] <Almindor> I get this with pymongo: AutoReconnect: could not connect to our.server:port timed out

[18:18:02] <addisonj> anyone have any guides on doing a migration of a production mongodb instance to a new server? I am not using replica sets on the old instance but plan on doing it on the new cluster, would adding the old host to the cluster make sense? and then just promote one of the new boxes to master?

[20:47:05] <chubz> how can i restrict the output of db.isMaster() to just return a certain field?

[20:54:04] <kali> chubz: db.isMaster().ismaster ?

[20:54:12] <kali> chubz: what are you trying to do ?

[21:31:11] <dstorrs> http://pastie.org/4326428 describes a data structure and a query. What index should I set to make the query efficient? I'm not clear on how to set an index into an array

[21:32:28] <dstorrs> is it just db.ensureIndex({ 'pages.locked_until' : -1, 'pages.action' : 1, 'pages.owner' : 1 })

[22:01:17] <kchodorow_> dstorrs: that's how you index fields in an array, but i'm not sure how efficient you can really make that query

[22:01:39] <kchodorow_> $ne and $exists can't use an index

[22:01:58] <kchodorow_> and i think you mean pages.owner_host, not pages.owner, right?

[22:03:35] <dstorrs> kchodorow_: no, I meant owner.

[22:03:46] <dstorrs> owner_host means "any worker on this machine"

[22:04:01] <dstorrs> owner means "this process on this machine"

[22:04:12] <kchodorow_> you generally don't need to index fields you're not querying on

[22:04:32] <dstorrs> ok...but isn't an update effectively a query?

[22:04:51] <dstorrs> "find an element that matches X, and change it"

[22:05:19] <dstorrs> or is update always a table scan?

[22:06:03] <kchodorow_> the first doc is a query

[22:06:08] <kchodorow_> the second doc is a modifier

[22:06:14] <kchodorow_> that operates on whatever was found

[22:06:35] <kchodorow_> build indexes to optimize the query

[22:07:01] <dstorrs> right. that's what I'm asking about.

[22:08:51] <kchodorow_> the query part can use an index to find the document(s) to change

[22:08:57] <dstorrs> as to the $ne issue...ok, if the index can get me to a subset of documents that match up until we consider their 'owner_host', that would be a win

[22:09:00] <kchodorow_> but the update always has to touch the document itself+indexes

[22:09:07] <dstorrs> sure, makes sense.

[22:09:22] <dstorrs> I'm trying to find an index that will let me locate potential matches more quickly

[22:09:49] <kchodorow_> how big is the collection at the moment?

[22:10:08] <dstorrs> so, if I did this: db.ensureIndex({ 'pages.locked_until' : -1, 'pages.action' : 1 }) I would expect a speedup

[22:10:32] <dstorrs> something like 40,000 documents. each of which has anywhere from 1 to 1,000 pages

[22:10:52] <dstorrs> the average seems to be low -- around 20 pages per document

[22:10:52] <kchodorow_> how many have np>= 0?

[22:12:10] <kchodorow_> (expected ratio)

[22:12:14] <dstorrs> that's...complicated. there are 50 workers hitting the DB simultatenously. each one locks one page on one job (decrementing np in the process), processes it to completion, and then removes it

[22:12:38] <kchodorow_> so, ~50?

[22:12:47] <dstorrs> well, I guess an all-up run has ~250,000 pages and only 50 of them can be locked at a time.

[22:13:03] <dstorrs> in theory, those could all be in the same job.

[22:13:14] <dstorrs> or they could be spread over 50 jobs.

[22:13:23] <dstorrs> ("job" == "document")

[22:13:32] <kchodorow_> what is np? locked/unlocked?

[22:13:42] <dstorrs> np is "number of unlocked pages"

[22:13:53] <kchodorow_> ah, i see

[22:15:47] <kchodorow_> is paged.locked_until<now or np>=0 going to return more documents? (i know you're not running them separately, but if you did)

[22:16:47] <dstorrs> ...they should actually return exactly the same set.

[22:17:04] <dstorrs> if a page is locked, its 'pages.locked_until' has been set to an epoch time in the future.

[22:17:18] <kchodorow_> okay

[22:17:20] <dstorrs> np was a hack so that we didn't have to reach down into the array.

[22:18:13] <kchodorow_> then i'd try {"pages.locked_until":1,"pages.owner_host":1}, but i'm not positive it'll be able to use the owner_host field

[22:18:23] <kchodorow_> run an explain on it to check

[22:18:31] <kchodorow_> if it can't, just use {"pages.locked_until":1} as the index

[22:19:18] <dstorrs> ok, thanks

[22:20:57] <dstorrs> errr, with the explain, you mean this? > db.jobs_harvest_Profile.find({"pages.locked_until":1,"pages.owner_host":1}).explain()

[22:21:07] <dstorrs> or should that be the query params?

[22:21:22] <dstorrs> I think the latter, but I'm not familiar with explain yet.

[23:11:40] <ccasey> hello all, i'm new to mongo and i'm having some issues with mongoexport. i'm trying to export a collection and i want the date to be formatted in the iso string format. from what i've read on stack overflow it's not possible…thought i'd check here too to see if anyone knows of any other options/workarounds

[23:31:53] <krz> anyway to install mongodb 2.2 in homebrew?

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 24th of July, 2012