[00:24:56] <dstorrs> in the Perl driver, is 'ensure_index' a no-op if the index already exists? how would I find out?
[01:58:35] <dstorrs> from the Perl driver, what's the correct syntax for the 'scope' param on a mapreduce?
[01:58:50] <dstorrs> should the value be a string, a hashref, or what?
[02:19:34] <Max-P> Hi, just a quick advice request: I have to store daily data for a user, which one is better: having a collection for each actions independant or to store all of that in a big "day" collection?
[02:44:23] <sirpengi> Max-P: probably better for one collection
[02:44:49] <sirpengi> what do you mean by "each actions independant"?
[05:50:21] <Max-P> sirpengi: I went with one collection, sorry I went back to work ;) "each actions independant" meant that a user can add multiple activities to their calendar, but they can edit them individually, move them from a day to another, reorder them. I still considered it as a whole day, as managing them seperately would have been too much trouble. Thanks anyway, it confirms that I went the right way ^^
[05:58:03] <jgornick> hey guys, would it be possible to find records where the date lands on a tuesday?
[06:01:17] <henrykim> I would like to ask the situation I got now. I am under testing write operation of blog articles. we have now about 1 billion blog articles now.
[06:01:59] <henrykim> yes, I started to insert(write) articles one by one. shard is url which has about 70 bytes length.
[06:02:21] <henrykim> I am operating 3 shards and 5 routers.
[06:03:00] <henrykim> at the beginning, the performance is over 3000 TPS. but, 1 or 2 hours later, the performance is decreased the half of the first of it.
[06:03:31] <henrykim> and then finally, the average TPS of writes is about 400 ~ 500 TPS.
[06:04:03] <henrykim> yes, I thought the reason of the performance is caused by random writes.
[06:04:38] <henrykim> I cannot assume orders of writes. It's happened randomly.
[06:05:52] <henrykim> Internally, It's because of paging. to write an article 'a' url, mongo need to load 1 chunk. to write an article 'b' url, mongo need to load 10 chunks .. and then. and then...
[06:07:03] <henrykim> so, my asking is how I can solve this situation? you guys, any ideas?
[06:23:14] <Max-P> henrykim: I'm new here and don't know much about Mongo, but you might need to change the way your stuff is linked so I don't need to do that much read/write...
[06:23:53] <henrykim> Max-P: sorry, what do you mean?
[06:24:48] <Max-P> henrykim: sorry it's 2:30 AM here, I will try again: why do you need to load that much documents when inserting your article? (if I understood your problem)
[06:27:39] <henrykim> Max-P: different time zone ;) here I am living is quite shiny day. anyways, I am testing mongdb to use for our business.
[06:27:59] <henrykim> blog collection we have is quite busy and large.
[06:28:33] <henrykim> everyday I got 3,000,000 upserts and 7,000,000 selects from users.
[06:29:00] <henrykim> heap, I need to keep all datas which has url as unique key.
[06:30:16] <henrykim> problem I already described is write operations are happened randomly. random.random.
[06:32:13] <Max-P> If you properly indexed all of this, then I'm sorry I can't help. I thought you had some linking between your blog thing that slowed down the process, so I thought you might have had to change the way you index and traverse the collection, but I'm no use there, sorry
[08:13:55] <stefancrs> I basically want to store "heartbeats" from different sources in a collection. My first naive idea was to just store one document per source name and update it's timestamp, but that wouldn't give me any logging
[08:14:30] <stefancrs> so my next idea was to just insert new document for each heartbeat received (name + timestamp per document, that's all I need to be able to retrieve)
[08:15:23] <stefancrs> at the same time, I want to keep it "name" agnostic. so how would I with a simple query retrieve the latest heartbeats from every source?
[08:16:17] <stefancrs> (if there are sources "s1", "s2" and "s3" for example. the send heartbeats whenever they "feel" like it, but I want to get all the latest ones...)
[08:20:29] <dstorrs> well, first off, this might be useful : http://api.mongodb.org/wiki/current/Optimizing%20Object%20IDs.html#OptimizingObjectIDs-Extractinsertiontimesfromidratherthanhavingaseparatetimestampfield.
[08:21:18] <dstorrs> second, you could use a capped collection. that guarantees that insertion order is preserved, so you can trivially retrieve most recent using findOne()
[08:22:16] <dstorrs> so, but the two together and you've got a capped collection of docs like this: { _id : ObjectID(...), name : 's1' }
[08:27:21] <dstorrs> kollapse: 'typeof' operator in shell
[08:27:41] <dstorrs> or in a function that you eval / map-reduce, etc
[08:27:59] <kollapse> Hmm, any way to do this with the PHP driver ?
[08:29:22] <dstorrs> use the appropriate driver command to say "run map-reduce / run function / etc"
[08:30:16] <stefancrs> kollapse: uhm, you want to check type of data after you've retreived it?
[08:30:31] <kollapse> Basically I have a collection of objects that have a field called 'field1'. 'field1' can be of different types. I want to retrieve the most predominant type.
[08:30:43] <stefancrs> is it REALLY called field1?
[08:31:08] <kollapse> For example if 10 objects have `field1` as strings and only 5 as integer, I want to know that string is the most predominant type.
[08:31:25] <stefancrs> that'd would have to be done in a map/reduce I guess
[08:31:46] <stefancrs> or by fetching the entire dataset and doing the evaluation in php
[08:32:02] <stefancrs> if it's never a lot of data, and you want to be able to retrieve it whenever, I'd go for the latter
[08:32:26] <kollapse> It's possible there will be a LOT of data, so a native method is prefered.
[08:32:31] <kollapse> Mapreduce my only choice it seems ?
[08:34:09] <stefancrs> I'd think so yes, especially if you want to be able to deal with a lot of data
[08:34:55] <stefancrs> dstorrs: I think going with two collections make the most sense for my problem :)
[08:44:13] <stefancrs> the client will be calmer when I tell them "and we log the up time of all the nodes in the system, as long as the main node is alive. if it isn't, we'll know that too."
[12:32:55] <natim> I started my first mongodb today
[12:33:04] <natim> And I have a question for you : http://stackoverflow.com/questions/10960926/distinct-group-by-using-mongodb-with-pymongo
[12:33:41] <natim> I would like to do a similar request on mongodb to this mysql query SELECT DISTINCT pin, value FROM mesh_captors WHERE arduino = 203 GROUP_BY pin ORDER BY date DESC
[12:34:42] <natim> For now I have this : db.mesh_captors.group(key=['pin', 'value'], condition={'arduino': int(arduino_id)})
[12:36:26] <natim> But I now want to keep only the last value order by {date: -1}
[12:36:42] <natim> I guess, I need to use the reduce fonction to do so
[12:40:18] <natim> What about something like this : https://www.friendpaste.com/2r2D79DYoaV3CuSOUmGdLr
[12:51:36] <omid8bimo> hey i have a question. i have a replicaset, im trying to dump my db in secondary server but i get this error: assertion: 13106 nextSafe(): { $err: "not master and slaveok=false", code: 13435 }
[13:07:02] <trax> I inserted "corrupted" bson in my mongodb base and now I am getting this http://pastebin.com/k6HJiAkK
[13:07:20] <trax> Is it normal that mongodb server accept such bson?
[13:16:30] <omid8bimo> hey i have a question. i have a replicaset, im trying to dump my db in secondary server but i get this error: assertion: 13106 nextSafe(): { $err: "not master and slaveok=false", code: 13435 }
[13:42:07] <natim> trax, Actually it looks like a bug
[13:56:47] <omid8bimo> just upgraded from 2.0.0 to 2.0.5 and it worked!!
[13:57:51] <trax> natim: ok, so if someone is on the same network as my db, it can just crash it \o/ . Yes there is authentication (on a none secure stream):
[14:23:33] <Goopyo> Q: If you have hundreds of users making trades and each trade has a P/L and you want to query by three things: user, the item traded, and the period of the trade how would you structure the data best?
[14:59:46] <ub|k> shouldn't the order of the elements in a bson document be irrelevant?
[15:00:14] <ub|k> i found a query expression which behaves differently depending on the order of the elements inside a JS object
[15:18:04] <pranavk> installing mongodb and its is shooting so many selinux warnings man, what is this ?
[15:18:21] <pranavk> i never encountered such a software shooting up so many selinux
[15:19:51] <dstorrs> pranavk: do you have a question we can help with, or do you just need to vent? if the former, I can try to help.
[15:20:55] <pranavk> dstorrs: yes i want to know if it is the usual problem with mongodb or its just my system that it weird ?
[15:23:58] <dstorrs> pranavk: I haven't used selinux myself, so I don't know. a quick Google turns up only a few number of hits for this search: +mongo +selinux
[17:28:31] <dstorrs> and then in another window, you type 'mongo'
[17:28:45] <Electron> mongod --help for help and startup options
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] MongoDB starting : pid=1015 port=27017 dbpath=/data/db/ 64-bit host=Zakerias-MacBook-Pro-2.local
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] db version v2.0.6, pdfile version 4.5
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] git version: nogitversion
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] build info: Darwin gamma.local 11.3.0 Darwin Kernel Version 11.3.0: Thu Jan 12 18:48:32 PST 2012; root:xnu-1699.24.23~1/RELEASE_I386 i386 BOOST_LIB_VERSION=1_49
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] options: {}
[17:28:45] <Electron> Sat Jun 9 13:26:07 [initandlisten] exception in initAndListen: 10296 dbpath (/data/db/) does not exist, terminating
[17:36:41] <dstorrs> Worry about changing config options for hygenie and platform-appropriateness is relevant for a production system, not testing
[17:36:43] <Chillance> hopefully there will not be a permission issue :)
[17:37:29] <Chillance> btw, does there really ALWAYS have to be a _id?
[17:37:45] <dstorrs> multi_io: I don't understand the question. you basically want to have only one process that's able to execute writes?
[17:37:48] <Chillance> what if that is not enough for uniqueness?
[17:38:19] <dstorrs> Chillance: there is always an _id, but it can be things other than an ObjectId. e.g., an object
[17:38:50] <dstorrs> plus, you can set unique indices that aren't related to _id
[17:39:38] <Chillance> dstorrs, hmm, well, the thing is, that the key Im will be using is larger than that (I think), so if _id is always there, it may not be unique eventually
[17:40:07] <dstorrs> Chillance: can you spell out the actual situation?
[17:40:55] <Chillance> dstorrs, say I want to use a 512 bit hash as the key, isnt that larger than the _id?
[17:41:25] <dstorrs> ok...so just have that be the value of your _id column
[17:50:26] <Chillance> dstorrs, I though I could use some other name as a unique key, and dont use _id, but now I get it as the one to use as the unique key, where I set the key. cool!
[17:50:28] <Electron> ok now im getting an error about "no journal files present"
[17:51:18] <dstorrs> Chillance: you /can/ have other keys be unique. they will enforce uniqueness on those constraints...but you can't get away from the _id
[17:52:14] <at133> Hi, I have a document with two counters. I would like to increment them independently. when I use {$inc {x:1, y:0}} and {$inc {x:0,y:1}} it creates new documents. Is there a way to increment them independently in the same document?
[17:54:37] <dstorrs> Electron: when you write to Mongo, it stages the writes in RAM for speed. periodically it flushes them to the actual data files
[17:55:37] <dstorrs> this is bad, but you can't just say "write everything immediatelY" -- that kills performance, as inserting into a BTree can be slow
[17:56:17] <dstorrs> so, changes get appended to the journal files almost immediately about being issued. then, when the next write trans happens, they are written to the real data files.
[17:56:47] <dstorrs> since this is your first startup, you have no journal files yet. for some reason, it thinks that things are not valid so it's trying to do a recovery
[17:57:04] <dstorrs> that means "hit the journals and replay them into data files to catch up"...but there are no journals
[17:57:39] <dstorrs> I don't know what your specific issue cause is; I'm doing some googling. but that's the general problem
[18:07:57] <dstorrs> at133: ok...that looks like you just updated one field. Is that what you mean by "updating fields indepedently"? if so, look to the '$inc' operator on update()
[18:09:21] <at133> dstorrs: Yes, but when I do that with {$inc {view_count:0, click_count:1}} it creates a new document for every increment. So incrementing one works fine, but if I try to increment the other I get two documents.
[20:43:30] <spaam> yoyo. is there any package with 2.1 for debian testing ?
[20:44:30] <spaam> the repo on http://downloads-distro.mongodb.org/repo/debian-sysvinit is kinda broken.
[20:44:50] <spaam> W: Failed to fetch gzip:/var/lib/apt/lists/partial/downloads-distro.mongodb.org_repo_debian-sysvinit_dists_dist_10gen_binary-amd64_Packages: Hash Sum mismatch
[21:38:20] <gigo1980> hi is it posible to change the chunk size of an sharded cluster at runtime ?