PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 24th of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:18:01] <Init--WithStyle-> How much overhead do documents in a collection each have?
[01:53:51] <doxavore> there's not a way to have a new server pull from a secondary for its initial sync, is there?
[03:20:56] <fwheel> hello world, can anyone tell me to sort order for an indexed array field?
[03:22:03] <fwheel> Finding it hard to find clear answer. Stackoverflow post implies the min/max is used depending on sort direction.
[03:55:03] <circlicious> cannot connect to mongo on a different server :S
[04:07:22] <circlicious> (Could not connect to a primary node for replica set
[04:14:03] <circlicious> now if someone could help
[04:56:38] <circlicious> 310 people, but no help :(
[04:56:59] <MikeFair> hehe yeah I noticed that same thing
[04:57:18] <circlicious> can you help please?
[04:57:34] <MikeFair> I don't I'd be any help at all
[04:57:43] <MikeFair> I haven't even gotten MogoDB installed
[04:58:01] <MikeFair> I'm suspecting the other 310 ppl are mostly ppl like us
[04:58:01] <circlicious> lol, i can help you with that if you want
[04:58:06] <circlicious> possible
[04:58:18] <circlicious> the only helpful guy here is NodeX but he is available for less time
[04:58:55] <circlicious> i havent been able to get some help on SO either
[04:59:28] <MikeFair> I'd love to talk about mongoDB and getting it installed and set up however It's actually time to put my kids to bed atm
[04:59:49] <MikeFair> circlicious: I saw your SO comment
[04:59:59] <crudson> I missed the description of the problem. What is the SO link?
[05:00:04] <MikeFair> I suspect their just aren't enough ppl who understand how that part works
[05:00:39] <MikeFair> Have you used tethereal or wireshark as it's called now to troubleshoot and watch the network part happening
[05:01:00] <MikeFair> circlicious: I suspect firewall or other network issue based solely on your to connect issue
[05:01:14] <MikeFair> err "failure to connect" description
[05:01:32] <circlicious> crudson: http://stackoverflow.com/questions/12103241/mongoid-cannot-connect-to-mongodb-on-different-server
[05:02:21] <circlicious> MikeFair: hm
[05:02:29] <circlicious> not sure with that :S
[05:02:46] <MikeFair> netstat -an would at least show whether or not the network connections are happening
[05:02:56] <crudson> circlicious: can you connect using mongo first from that machine?
[05:03:18] <circlicious> yes on Server2 i have a mongodb setup thats being used
[05:03:34] <MikeFair> bbl
[05:03:42] <circlicious> and theres a web app there that quite a few people are using
[05:03:54] <circlicious> that uses mongo
[05:04:09] <circlicious> MikeFair: ok
[05:04:50] <crudson> circlicious: you get refused if trying to connect using mongo js shell from server1?
[05:05:33] <circlicious> ok, i did not try that, let me try please, 1 second
[05:06:28] <crudson> first eliminate the rails part of it. the simple tests first
[05:07:24] <circlicious> connect failed, crudson
[05:07:30] <crudson> (I also have to leave in ~3 minutes)
[05:08:04] <crudson> ok so if you can connect to it when locally on server2, then look at the firewall rules on server2
[05:08:35] <circlicious> well, just a guess
[05:08:52] <circlicious> in netstat i see 2 mongod instances running on 27017 and 28017
[05:09:13] <circlicious> could it be related to not being able to resolve which one to connect to or something, just owndering. althought i do specify the port in connection string
[05:09:38] <circlicious> although ps ax | grep mongo shows only 1 mongod process, not sure :/
[05:09:58] <crudson> and check --bind_ip not set to something different in mongod command line
[05:10:43] <crudson> for isntance mongod --bind_ip 127.0.0.1 will not allow you to connect from server1
[05:11:31] <circlicious> crudson: in the cxonfig file there is, bind_ip = 127.0.0.1
[05:12:08] <circlicious> should i comment that?
[05:12:38] <crudson> circlicious: ok, but note that someone put it there for a reason. Check with them to see what your best choice is, as it's probably in place to prevent exactly what you're trying to do for security purposes.
[05:12:52] <crudson> I have to go now, but good luck. I leave this logged in.
[05:14:32] <circlicious> hm bind_ip should just specify the ip on which the mongod will listen on, why would it affect connectons from other server, confued. not sure wutdo :?
[05:32:40] <circlicious> ok i understand that now. hm, i think i nee dto first learn to work with firewalls properly. fine.
[07:31:32] <gigo1980> hi, i have an mapreduce thate emit 200000 times, in the reduce function i make only an simple count, why does it make 200 reduce calls and count each time only 1001 ?
[07:35:18] <[AD]Turbo> hola
[07:55:12] <kali> gigo1980: that's the way the map reduce is implemented, the spec says that reducers can be stacked on top of each other. this is why their input and output have the same form
[07:55:46] <kali> gigo1980: calling a JS function with an array of 200000 is impractical
[07:56:36] <kali> gigo1980: and the stackable reduce allow to performe a first stage of reduction on the shard node, before doing the final reduction on one single node
[07:58:08] <gigo1980> kali: so there is no way to blow this limitation up ?
[07:58:33] <kali> gigo1980: it's not a limitation, it's a feature
[07:59:23] <gigo1980> do you have an example where something is done ?
[08:01:59] <gigo1980> or the link there is this way of using map reduced is described ?
[08:02:26] <kali> well... http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-ReduceFunction
[08:02:41] <kali> but i think you've already seen that
[08:02:58] <kali> gigo1980: why don't you paste your m/r job somewhere so we can help you get it right ?
[08:03:50] <gigo1980> yes but what happens if there are on this more than 1001 emits … than this does not work anymore
[08:04:27] <kali> gigo1980: show me what you're trying to do...
[08:04:40] <kali> gigo1980: one doc, the mapper, the reducer
[08:05:07] <kali> gigo1980: it does work, but you need to do it right
[08:09:28] <gigo1980> http://pastebin.com/UYUsXyGx
[08:09:54] <gigo1980> thats are my map reduce functions
[08:09:57] <quuxman> where could I get feedback on a tiny library on top of pymongo?
[08:12:15] <kali> gigo1980: your reducer returns {}... the reducer must return something in the same form as the items in the "values" array
[08:12:32] <NodeX> else da twix is no da worky
[08:12:55] <kali> gigo1980: you must not work with the "reduced" global variable, this is not right
[08:13:38] <kali> gigo1980: the job of the reducer is to transform the array of values it receives into an aggregated version and return it
[08:14:26] <kali> gigo1980: http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-ReduceFunction
[08:16:31] <gigo1980> but that is what we doing right now
[08:17:24] <kali> no. you're returning {}
[08:18:53] <Init--WithStyle-> Doing a "find" query with a lmit over ~ 13 000 crashes my node.js server :/
[08:18:57] <Init--WithStyle-> i'm using mongolian as a driver
[08:19:09] <Init--WithStyle-> Any ideas how to do non-crashing large "find" queries?
[08:20:55] <gigo1980> kali: sorry that was a mistake, i return the reduced object that i created
[08:21:56] <Lujeni> Init--WithStyle-, limit of your cursor ? try to limit batch size in your query
[08:22:20] <Init--WithStyle-> So I should try putting a limit on my cursor and limit the batch size of query?
[08:22:27] <Init--WithStyle-> what does limiting the batch size of query mean exactly?
[08:23:15] <Init--WithStyle-> Limiting works, but I need to get the whole thing..
[08:23:27] <Init--WithStyle-> I guess just divide it into 3 parts and then put the parts together afterwards?
[08:23:41] <Lujeni> Init--WithStyle-, find( {spec}, {fields} ). Try to reduce your fields return.
[08:23:54] <Init--WithStyle-> I need everything in the collection though..
[08:24:54] <NodeX> push it to a cache
[08:25:04] <kali> gigo1980: you can't return "reduced". you need to return something that strictly matches the item in the values array
[08:25:16] <kali> gigo1980: some of your code in the reducer needs to be migrated to the mapper
[08:25:23] <kali> gigo1980: and some in the finalizer
[08:25:34] <gigo1980> ok ...
[08:26:12] <Init--WithStyle-> er...
[08:26:37] <NodeX> Init--WithStyle- : push it into a cache and trickle it to your app
[08:27:05] <Init--WithStyle-> on a node.js server how might that be performed? can you give me a related example?
[08:27:26] <kali> gigo1980: a typical reducer takes (a, [ { sum: 5, max: 3, min: 1}, { sum: 7, min: 3, max: 4 } ]) and returns { sum:12, max: 4, min 1 }
[08:27:46] <kali> gigo1980: strictly reducing aggregates values
[08:28:32] <gigo1980> ok … so it only aggreates data on this. no problem i can do this ...
[08:28:42] <NodeX> put it into redis or memcached as chunks and trickle it out
[08:28:52] <kali> gigo1980: so the mapper emitted values must also strictly match the reducable values: in this case, emit("a", { sum: 5, max: 3, min: 1}) somewhere, emit("a", { sum:12, max: 4, min 1 }) somewhere else
[08:28:58] <gigo1980> i ll thry this and give you an reponse
[08:29:46] <Init--WithStyle-> hmmm..
[08:44:30] <zeus> with 9 machines mongo cluster, and collection size around 300gb is it normat that inserts are very very, very slow ?
[08:44:51] <zeus> ~300 ins/sec
[08:45:45] <zeus> collection is sharded, one index on 4 values
[08:46:00] <kali> zeus: look for faults in mongostats
[08:46:06] <kali> zeus: and the index size
[08:46:15] <kali> zeus: compare the index size to your ram
[08:46:24] <zeus> how to check index size?
[08:46:33] <kali> db.collections.stats()
[08:47:24] <zeus> got lots of faults - what can i do with this ?
[08:48:07] <kali> zeus: it's probably the consequence of the indexes having outgrow the RAM
[08:48:13] <zeus> 68 inserts = 63 fautls
[08:49:58] <zeus> should i run this on shard/admin ?
[08:49:59] <zeus> db.collections.stats()
[08:56:10] <NodeX> make sure any upserts that are index bound are indexed properly
[08:56:51] <NodeX> I had a similar problem a year ago with a history collection and couldnt work out why my inserts crawled to a halt until I remembered I needed an index on a user id as it was index bound
[08:59:13] <zeus> for 65 gb data my index is 14gb - is this possible ?
[09:01:12] <IAD> if RAM>14gb then ok
[09:09:22] <zeus> can I drop _id index ?
[09:32:21] <NodeX> zeus : is that your only index?
[09:34:07] <zeus> no, i have 2 indexes
[09:34:31] <zeus> NodeX, but when I try to drop it - "may not delete _id index""
[09:34:45] <NodeX> you can't that's why
[09:34:56] <NodeX> what's the other index?
[09:36:04] <zeus> NodeX, http://pastie.org/4579534
[09:38:03] <NodeX> countryId_1_userId_1_categoryId_1_timeRange_1 <--- prolly that one!
[09:38:27] <NodeX> gonzo.xxx lol
[09:38:54] <_johnny> now i understand the size
[09:38:59] <_johnny> ;)
[09:39:42] <zeus> NodeX, i can, im using only this index
[09:40:18] <zeus> *can't
[09:40:28] <NodeX> when you say "cluster" do you mean replica sets or shards?
[09:41:05] <NodeX> and 9 machines for 300gb of data is an awful lot
[09:41:46] <zeus> NodeX, i got 9 shards, shard contain replica set
[09:42:04] <zeus> replica set contains 2 identical machines plus arbiter
[09:42:39] <NodeX> how are you sharding, is it an equal key ?
[09:43:45] <zeus> gonzo.xxx chunks: r1 223 r2 223 r8 223 r3 223 r5 223 r9 1 r7 223 r4 223 r6 223
[09:44:02] <zeus> so i assuming that's yes
[09:44:11] <NodeX> and is it across all shards you get bad performance or just one?
[09:44:55] <zeus> writes were distributed equaly 40-50 writes per second on each shard
[09:45:24] <zeus> i got ssd's there - thats just too slow
[09:45:54] <NodeX> how long is each write taking?
[09:46:07] <zeus> how to check that ?
[09:46:19] <NodeX> does your app not log it ?
[09:46:36] <zeus> i got only ins/sec
[09:47:02] <NodeX> how much ram does each machine have?
[09:47:14] <zeus> 4
[09:47:38] <zeus> i;ve managed to reduce index to 4.1 gb
[09:47:58] <zeus> now im moving to 8gb machines - i hope it helps
[09:49:18] <NodeX> I would think it's the page faults and the swapping
[09:49:51] <zeus> but i was hoping for litte more than 50 ins/sec from 2cpu 4gbram machine, especialy with ssd drive
[09:50:46] <_johnny> zeus: how did you reduce from 14 to 4?
[09:51:57] <zeus> i got 3 indexes so i droped biggest one
[09:52:10] <zeus> removing _id index would be sufficient
[09:52:12] <NodeX> you should drop every one you dont need
[09:52:16] <_johnny> ah, okay, i thought you had one (two with _id)
[09:52:44] <zeus> i can't i dont use _id index and its prohibited
[09:53:06] <NodeX> you never lookup anything on _id?
[09:54:38] <_johnny> NodeX: after i rebuild a (large) collection recently, i dropped a lot of indexes that i thought i needed (based on a sql layout, where you usually put indexes on what you WHERE key ...), and my lookups seem equally fast as before. does mongo look for the indexed field first, then the other conditions? that would be awesome - and it seems to be what it does
[09:55:32] <NodeX> it (tries to) optimise(s) your query else you can hint
[09:55:56] <NodeX> you need an index for example if query for foo:123, bar:123 you need a compound index on both fields
[09:56:19] <NodeX> you also need to tell the index to be ascending or descending for sorting
[09:56:44] <_johnny> hmm, see that's interesting, because that's exactly what i do in one query, two fields, but only index on one, and it seems (just as) fast
[09:56:48] <NodeX> it will only look on indexed fields if an index exists if that makes sense
[09:57:21] <_johnny> right, but i was thinking in orders. so it finds the index fields first, and only need to run the remaining conditions on those
[09:57:25] <NodeX> if there is not alot of documents then it will be fast
[09:57:51] <NodeX> use explain() on the end of a query and it will tell you what its doing
[09:58:11] <_johnny> ah, thanks
[09:58:15] <NodeX> nscanned is how many documents it had to scan to find the docs returned
[09:58:30] <NodeX> you want that number as low as possibl
[09:58:33] <NodeX> +e
[09:59:06] <NodeX> http://www.mongodb.org/display/DOCS/Indexes#Indexes-CompoundKeys <----
[09:59:57] <NodeX> you can use compound keys efficiently and plan the query's to use part of the key
[10:00:28] <_johnny> ah, very cool. thanks a lot :)
[10:01:02] <_johnny> it seems to actually do what i hoped. across 2,5mio docs, it looks only through 7. and 7 is the amount that contains the indexed field value
[10:01:43] <NodeX> kewl
[10:01:48] <_johnny> so while the docs on indexed field are low, it'll work fine. which i realize now is what you said :)
[10:02:00] <NodeX> you can always hint() if you think it's not using an index and it should be
[10:03:07] <NodeX> you can profile too so you can check the profiler and see what queries are running in-eficiently
[10:03:24] <NodeX> http://www.mongodb.org/display/DOCS/Database+Profiler
[10:05:21] <gigo1980> kali: i modified it … http://pastebin.com/vNmYyj8m
[10:05:22] <_johnny> yup. i'll do that over the course of an internal "test" i'll hold soon
[10:05:50] <gigo1980> but have the same problem that i have maximum 1000 emits for each reduce..
[10:47:14] <kali> gigo1980: durationcount is not right yet, you need to add the .durationcount from the values array
[10:48:59] <kali> gigo1980: and yes, it will call the reduce for a limited number of emit() but this will not longer be a problem, because you've made it stackable
[10:52:27] <gigo1980> ok i ll try it
[10:54:41] <gigo1980> kali: can you give me please a snippet, how my reduce methode should look like
[10:56:55] <kali> gigo1980: just change reduced.durationcount++; by reduced.durationcount += values[i].durationcount
[10:58:04] <kali> gigo1980: the input value of a reducer is either the output of map (in that case durationcount is 1) or the result of a previous reduce (in that case durationcount will be > 1)
[12:03:58] <siert> hey guys; any idea why rpm's on downloads-distro.mongodb.org are not signed?
[12:07:43] <Cubud> Hi all :)
[12:30:08] <NodeX> howdee
[12:43:00] <svenstaro> omg guys you need a systemd file
[13:26:18] <woozly> guys, why mongodb uses CPU 100%
[13:26:25] <woozly> jumps from 50-98-100%
[13:26:58] <woozly> but no operations... I have deleted big collection (6000000 entries)
[13:56:51] <jmar777> woozly: anything in the logs?
[14:18:35] <augustl> how do I add an "unique" index for a specific key in an array? My documents have authTokens: [{key: "123abc", ...}, {key: "456def", ...}, ...]
[14:18:44] <augustl> I want to index the "key"
[14:19:06] <augustl> uh, in my app it's actually called "token", not "key", in case that looked confusing :)
[14:23:56] <doxavore> Can you not change to hostnames from IPs in a replica set config? https://gist.github.com/af32c5e10b95cbdf16e5
[14:31:07] <seba89> hello, i'm having a problem connection to mongo from php. it throws a MongoConnection exception with message "transport endpoint is not connected".
[14:31:17] <seba89> i can connect normally through the mongo console
[14:31:48] <NodeX> pastebin your connection string
[14:32:05] <remonvv> Hi guys, we keep having issues with movePrimary on a sharded database resulting in the data disappearing until mongos is restarted.
[14:32:11] <remonvv> Anyone any idea what's happening?
[14:34:11] <seba89> NodeX: I'm using the defualt: "mongodb://localhost:27017"
[14:44:37] <dcbiker1029> how do I modify my upstart file in init.d to use a different dbpath it is ignoring /etc/mongodb modifications
[14:51:51] <xico> Bonjour my international partners in development.
[14:53:08] <DiogoTozzi> Bonjour
[14:53:10] <DiogoTozzi> a va?
[14:55:39] <xico> Oui DiogoTozzi.
[14:55:44] <xico> I'm not french :P I'm portuguese
[14:55:48] <xico> how do you feel about that?
[14:56:27] <xico> Anyone care to help me with a question about many database connections VS one single connection with many many requests
[14:56:41] <DiogoTozzi> Xico, Eu sou brasileiro hehe
[14:57:01] <xico> ahahah Pelo nome, vi logo que não devias andar longe :)
[14:57:08] <DiogoTozzi> :)
[14:57:13] <xico> Bons programadores que vocês têm por aí ;)
[14:57:36] <DiogoTozzi> Bons? Brasileiro relaxado para programao. Europeus so bem melhores
[14:58:38] <DiogoTozzi> Xico, s de onde em Portugal?
[14:58:44] <DiogoTozzi> xico, Lisboa?
[14:59:56] <xico> DiogoTozzi: Coimbra
[15:00:10] <DiogoTozzi> xico, Nice :)
[15:00:34] <xico> Mas vocês são muitos e têm bons forums
[15:01:01] <xico> I'm not sure we can talk in "not english"
[15:01:01] <xico> :P
[15:01:20] <DiogoTozzi> xico, hehe lets speak in english
[15:01:54] <DiogoTozzi> xico, next week we're going to MongoDB 2.2 conference in Microsoft here in So Paulo
[15:02:16] <DiogoTozzi> xico, do you have these confereces nearby you?
[15:02:37] <DiogoTozzi> I gotta lunch, brb
[15:03:14] <xico> I don't, unfortunatly :\
[15:03:25] <xico> We don't.
[15:03:43] <svenstaro> any mongo devs here?
[15:03:44] <xico> I think SP and NY are the only ones for now, no'
[15:04:27] <xico> Well, there's one in spain.
[15:04:32] <xico> Not sure if worth it though :\
[15:06:26] <svenstaro> I got patches I need a dev :{
[15:12:15] <tsally> for a query to go directory from mongos to the mongod that owns the shard, the query must contain a complete shard key?
[15:12:28] <tsally> *go directly from mongos to mongod
[15:25:14] <dcbiker1029> can someone help me change my dbpath it seems to be ignoring my mongodb.conf file
[15:39:00] <NodeX> dcbiker1029 : can you paste the output of ps ax | grep mongo
[15:40:00] <dcbiker1029> nodex: http://pastie.org/4581117
[15:41:23] <Cubud_> Hi all
[15:45:08] <dcbiker1029> Nodex: does that help?
[15:52:53] <NodeX> it should be reading it
[15:59:10] <dcbiker1029> this guy http://doubleclix.wordpress.com/2012/05/04/notes-on-mongo-at-aws/ has a similar problem on the same setup I am on
[15:59:18] <dcbiker1029> he says he fixed it but was unclear in how
[15:59:24] <dcbiker1029> nodex: see above
[16:08:42] <Cubud_> Does anyone here use MongoDB with C#?
[16:10:41] <Cubud_> I need to know how in C# I would update an individual element in all documents which meet a criteria
[16:29:25] <NodeX> on the shell it's like this Cubud: db.foo.update({foo:'bar'},{$set : {field:'value'}},false,true);
[16:30:32] <marek_> hi; I want to reference two user IDs in a Message object. Should I have a field [id1,id2] or two id fields which I query with an $or?
[16:31:09] <marek_> the Message isn't directional so the the user IDs are interchangeable
[16:47:03] <int> In mongodb I have a field saved as u'timestamp': datetime.datetime(2012, 8, 23, 19, 55, 13, 830000)
[16:47:33] <int> using pymongo I am doing a querty: {'timestamp': {'$lt': u'2012-08-24 16:43:54.017102'}}
[16:47:43] <int> why am I getting no data?
[16:47:52] <int> tried $gt too
[16:49:20] <int> I am saving timestamp as datetime.datetime.utcnow()
[16:49:35] <int> that's what I am using to fill in $lt as well
[16:49:36] <NodeX> what';s the extra "'" for ?>
[16:49:49] <NodeX> : u'201
[16:50:45] <int> NodeX: I am just doing {'$lt':datetime.datetime.utcnow()}
[16:51:18] <int> which extra ' are you talking about?
[16:55:40] <_johnny> the u. it's just a way python prints a dictionary (object) as a string
[16:56:06] <_johnny> stands for unicode
[16:56:46] <int> _johnny: so any idea why my query isn't working?
[16:57:22] <_johnny> well, i'm not sure how the <, > operators will work on a string
[16:58:10] <_johnny> do an empty search, like find_one({}), and see what type comes back for your timestamp -- to see if it's a string
[16:58:14] <int> _johnny: http://api.mongodb.org/python/2.0/tutorial.html
[16:58:21] <int> check Range Queries there
[16:58:31] <int> I am doing the exact same thing.
[16:58:59] <_johnny> ok, but are you? in the tutorial the datetime object is passed, not a string of a datetime
[16:59:14] <int> _johnny: I have a different script which prints out my data and it gives me:
[16:59:16] <int> u'timestamp': datetime.datetime(2012, 8, 23, 19, 55, 13, 830000)
[16:59:46] <int> I am doing {'$lt':datetime.datetime.utcnow()}
[16:59:52] <int> why is it being passed as a string object?
[17:00:30] <_johnny> oh, ok. from your paste above it thought you were passing u'2012-08-24 16:43:54.017102' in your query
[17:00:49] <_johnny> try expanding, doing a lookup from epoc or something
[17:01:54] <_johnny> .find({"timestamp": { "$lt": datetime.datetime(1970,1,1,1) } })
[17:02:25] <_johnny> i mean $gt obviously :p
[17:02:29] <int> _johnny: I think it's because I am using all this in an api and the filter is being passed a query string
[17:03:18] <_johnny> you can make a datetime object from the string. but check with something similar to what i wrote first - to see if that's the problem
[17:05:11] <int> ok thanks, checking
[17:18:05] <Cubud_> NodeX: Yes, I can do that in the console but how would I structure it as a runCommand?
[17:18:16] <Cubud_> That's how the C# API seems to work
[17:36:24] <gigo1980> hi what was the command to move an database to an other shard in my cluster
[18:04:46] <edmundsalvacion> Hi, there I have been noticing that mongo has been skipping records and was wondering if anyone had a bit more insight as to why this may be happening
[18:08:03] <kali> skipping records ?
[18:08:06] <kali> what do you mean ?
[18:11:35] <tsally> can someone hepl me understand the relationship (or lack thereof) between indexes and routed queries? does an index just improve performance for certain queries once the query hits the mongod process?
[18:27:37] <edmundsalvacion> kali, what i have noticed is while iterating over, say one million, records in a collection which generally has frequent additions to it, when it reaches the end, it appears as if some records were not returned at all
[18:28:30] <edmundsalvacion> even on a smaller scale, i've noticed that while iterating over 8000 records, only 7000 or so were successfully returned
[18:29:43] <tsally> edmundsalvacion: are you reading as inserting is going on and are you reading from primary or secondary?
[18:30:16] <edmundsalvacion> tsally: yes i am reading as inserting is going on, and from the primary
[18:32:51] <edmundsalvacion> would it be better to be reading from the secondary in this case?
[18:33:55] <kali> edmundsalvacion: you're reading in natural order ? without sort ?
[18:34:13] <edmundsalvacion> without sort
[18:34:16] <kali> edmundsalvacion: and the whole collection ?
[18:34:21] <edmundsalvacion> and the whole collection
[18:34:37] <kali> ok, i've seen similar issues
[18:35:04] <edmundsalvacion> its essentially collection.find({foo:1},{bar:2})
[18:35:06] <edmundsalvacion> nothing fancy
[18:35:13] <edmundsalvacion> with an index on foo
[18:35:18] <kali> i think it's because mongo just parse the whole collection in the disk order. if some records are inserted before where you cursor is, you don't see them
[18:35:33] <kali> edmundsalvacion: it's what i think is happening, i'm not 100% sure
[18:37:02] <kali> edmundsalvacion: in your case, the cursor is on the index on foo
[18:37:07] <edmundsalvacion> yes
[18:37:24] <edmundsalvacion> its actually a uniq compound index on foo and bar
[18:38:12] <tsally> kali: you are getting at this? "MongoDB cursors do not provide a snapshot: if other write operations occur during the life of your cursor, it is unspecified if your application will see the results of those operations or not." http://www.mongodb.org/display/DOCS/Queries+and+Cursors
[18:38:46] <kali> tsally: ha ? i did not know it was explicit in the doc, but that was mostly my gut feeling
[18:39:07] <tsally> kali: seems like good intuition ^^
[18:39:20] <edmundsalvacion> ha
[18:39:59] <kali> edmundsalvacion: i have seen case when i parse an entire collection top to bottom, adding some information on the documents. I know in that case i can "see" some records twice (because they no longer fit where they are and are bumped to the end of the collection)
[18:40:58] <kali> edmundsalvacion: if that's an issue, maybe the snapshot can help: http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database
[18:41:09] <kali> edmundsalvacion: it must comes at a price, I guess
[18:42:07] <kali> it looks that it only fix "my" issue, not yours...
[18:52:26] <edmundsalvacion> kali: i have looked into snapshot mode, but it seems to not be able to use a secondary index
[18:55:33] <edmundsalvacion> so these records are never updated, so i don't see how they could be bumped to the end of the collection
[18:56:28] <edmundsalvacion> while annoying, I'm not overly concerned about dupes as much as i am about records that showing up
[19:16:31] <lorsch> Hi, i'm new to mongo db and i can't find an answer to the following question about mapreduce:
[19:16:31] <lorsch> The key emitted by the map()-function is the _id in the collection wich will be created, also it will act as shard key.
[19:16:31] <lorsch> I can't create an index to the created collection, because if i do so, the mapreduce won't run again...
[19:16:31] <lorsch> Is there no way to have an own index in the created collection?
[19:18:06] <lorsch> i whant to do incremental map reduce with "out: {reduce: "collection"}", but without some indexes on the created collection, it doesn't ake sense at all
[19:32:42] <quuxman> Anybody here use pymongo? (I asked this yesterday, and was surprised to get no responses)
[19:39:30] <dgottlieb> I've used pymongo, mostly just the basic operations
[19:41:43] <quuxman> dgottlieb: the reason I keep asking is I wrote a very small library that's a helper for writing queries with pymongo, and I want some feedback on it
[19:42:51] <dgottlieb> Ah. I can take a look. Link?
[19:43:14] <quuxman> dgottlieb: thanks! http://bpaste.net/show/41740/
[19:43:33] <quuxman> dgottlieb: a couple examples of its use: db.Pages.search( mq().all('tags_index', 'food', 'art') )
[19:43:52] <quuxman> db.Feed.search( m().bt('created', now() - 86400 * 7, now()).is1('class_name', 'Broadcast', 'Star') )
[19:44:04] <trupheenix> i am trying to query a list of items stored on mongo db and display them on a web front end. i would like to display the first 20 items. then have a button to display the next 20 items. how do i do this using pymongo? not looking at code, just want a conceptual idea. thanks
[19:45:42] <quuxman> dgottlieb: which produce {'tags_index': {'$all': ('food', 'art')}}, and {'class_name': {'$in': ('Broadcast', 'Star')}, 'created': {'$gt': 1345232669.437062, '$lt': 1345837469.437065}} respectively
[19:46:27] <quuxman> dgottlieb: my main concerns are 1) I have the strong feeling I'm recreating something that already exists, though I can't find it, and 2) is there a better approach?
[19:48:02] <dgottlieb> trupheenix: mongodb has skip and limit functions you can put on queries. So you can do a limit(20) on all queries and a skip on (page_number * 20). I think with pymongo skip and limit are parameters to the find() function
[19:49:04] <quuxman> trupheenix: keep in mind that MongoDB does not have an efficient implementation of skip. It actually iterrates through every previous result in your query
[19:49:32] <trupheenix> quuxman, dgottlieb thanks. any other alternatives you can suggest?
[19:49:41] <quuxman> trupheenix: the only reasonable way to implement this is simply creating an attribute that has your desired order, and use a comparison
[19:50:09] <trupheenix> quuxman, hmmmm then it becomes all the more complex!
[19:50:27] <quuxman> trupheenix: this results in making pagination for any non-trivial (dynamically determined) ordering a humungous pain in the ass
[19:50:35] <trupheenix> quuxman, i have a dynamically changing list
[19:51:12] <quuxman> in my opinion, this is mongo's largest weakness in comparison to mysql and postgres
[19:51:17] <dgottlieb> quuxman: I think it's a cute simple wrapper to simplify those "advanced" operators. I'm sure others have been made for each of the languages.
[19:51:56] <quuxman> dgottlieb: I just get tired of typing brackets, quotes and '$'s
[19:52:20] <quuxman> especially when writing on-the-fly queries for analytics
[19:52:24] <dgottlieb> quuxman: As for better approaches, are you talking about trying to find a single API that everyone would enjoy using and adopt?
[19:52:44] <quuxman> dgottlieb: ideally I guess... or just a more concise sensical way of doing it myself
[19:53:05] <dgottlieb> quuxman: heh, if you think using map/list literals is bad, try turning your above query into Java :)
[19:53:32] <quuxman> dgottlieb: there is a reason I've never coded Java
[19:54:21] <quuxman> at least not for work... I did make a little paint / reaction diffusion exploration program once, just for the experience
[19:54:59] <dgottlieb> quuxman: I think if this is mostly for yourself, if it's good for you, it's good for me. I think it can be difficult to have everyone be happy with a lightweight API such as this.
[19:55:40] <dgottlieb> quuxman: I find that there are a few clear ways to do something like that and none are really any more right or wrong, just a matter of taste
[19:55:51] <quuxman> dgottlieb: mainly what I'm puzzled over is why something equivalent isn't included in pymongo. I understand the desire to keep the API as similar as possible to the underlying mongo API, but that doesn't mean you have to make things a pain in the ass
[19:55:59] <trupheenix> i got an idea? how about i cache the cursor while it iterates?
[19:56:23] <trupheenix> so i see the first 20 results. serve it out. store the cursor.
[19:56:25] <quuxman> trupheenix: if the pages are always requested in sequence, that would be _ok_
[19:56:41] <trupheenix> next call i retrieve the cursor and start off from last position?
[19:56:47] <quuxman> the thing is, there's no guarantee for that, and it makes your app non "restful"
[19:57:15] <dgottlieb> quuxman: Sorry if my feedback isn't very specific, I'm a little ambivalent to lightweight wrappers. I tend to talk with drivers directly
[19:57:38] <trupheenix> quuxman, dgottlieb i think i will have to do with skip and limit then
[19:57:49] <quuxman> dgottlieb: so far I haven't used it in application code (though I may start if I settle on this approach), just interactive use
[19:58:06] <quuxman> trupheenix: it's really not a big deal unless you have huge data sets and a significant amount of traffic
[19:58:24] <quuxman> don't prematurely optimize, it's just good to be aware of that weakness
[19:58:29] <trupheenix> quuxman, could have huge data sets and significant amount of traffic. trying to design a message boards.
[19:58:41] <trupheenix> ^board
[19:58:42] <quuxman> don't consider "could"
[19:58:50] <quuxman> design based on your current traffic
[19:58:57] <quuxman> it'll save you a lot of time
[19:59:18] <quuxman> and money, if you have a team, and you're buying hardware
[19:59:55] <dgottlieb> quuxman: well, i think something like that isn't included merely because there's a few consistent ways to implement that and the official drivers would rather not pick for the community and just leave it up to individuals to do what they wish. Also if one driver were to officially support an alternative syntax for advanced queries, the other drivers would probably have to try to conform which isn't always feasible for the officially supported langua
[19:59:55] <dgottlieb> ges
[20:00:42] <quuxman> dgottlieb: What're the other ways of doing it?
[20:02:17] <dgottlieb> quuxman: I think this was one of the original query builders that was adopted into a driver: http://api.mongodb.org/java/2.2/com/mongodb/QueryBuilder.html
[20:03:03] <quuxman> looks pretty similar in concept
[20:03:19] <quuxman> basically exactly the same
[20:03:22] <dgottlieb> quuxman: I believe the conclusion from that experiment was to not make a habit of adopting these in general. But I'm really not an authoritative source on this at all
[20:09:36] <quuxman> dgottlieb: thanks for the feedback :)
[20:10:27] <dgottlieb> quuxman: heh, np. Sorry I couldn't be more helpful.
[20:10:48] <quuxman> although I'm not sure what, I feel like I learned something
[20:11:03] <dgottlieb> that's the most important part really!
[20:13:03] <dgottlieb> Taking a second look, I can say some people (ok, maybe just me) are not a big fan of using argument lists (specifically on the `in` and `all` method)
[20:13:36] <dgottlieb> I personally would prefer just doing the .in('key', ['value1', 'value2'])
[20:14:00] <quuxman> the main point of it is to get rid of brackets. I support both
[20:14:13] <dgottlieb> which I see is still possible, but I would make that mandatory and I understand your impetus to allow the other way :)
[20:14:27] <quuxman> yeah, I thought it was a little weird
[20:14:47] <quuxman> it does make it impossible to search for a nested list in a multikey
[20:15:24] <quuxman> unless you throw a dummy value in the beginning
[20:16:21] <quuxman> in general if a feature makes some theoretical behavior impossible to achieve in your library, it's a sign of bad design, no matter how remote the possibility of actually using that behavior
[20:16:46] <quuxman> in short, I agree
[21:12:23] <geoffeg> Is there some way, maybe with 2.2's aggregation stuff, to query based on a conditional of two fields in the same document? like db.foo.find({'create-date' : { $gt : 'edit-date' }})?
[22:04:58] <sneezewort> if I create a user in a db does that data get stored in that db?
[22:05:04] <sneezewort> for example if I do...
[22:05:11] <sneezewort> use mydb
[22:05:31] <sneezewort> db.addUser("username", "password")
[22:05:44] <sneezewort> and then I dump and restore mydb to a machine will that username and password go with it?
[22:35:04] <Bartzy> Hi
[22:35:10] <Bartzy> Does MongoDB have connections, like MySQL ?
[22:35:26] <Bartzy> And by extension - connection timeout
[22:35:27] <Bartzy> ?
[22:36:57] <Bartzy> For example, I have a worker that runs for very long time - it instantiates a MongoCollection object on start (PHP driver) and when it gets a job it needs to use update to update some document. If the worker doesn't get a job for a while - the connection will drop ? How do I catch that and "reconnect" ?
[22:51:08] <Bartzy> anyone ?
[23:38:31] <jrdn> i'm trying to create a messaging system on top of mongo
[23:38:49] <jrdn> normally i'd remove something that was processed but found out that i can't remove anything from a capped collection
[23:38:55] <jrdn> so my question is, how can i update it?
[23:39:01] <jrdn> basically i never want to process something more than twice
[23:39:09] <jrdn> any known way of solidly doing this?
[23:39:48] <Neptu> jrdn: Its in my checklist use mongo as a queue manager, I read you could use it with some MQ queue systems
[23:39:58] <Neptu> but did not tried yet
[23:40:05] <jrdn> i've got it working right now
[23:40:24] <Neptu> cool, what did u used as a queue manager?
[23:40:28] <jrdn> but i'm stuck where if the subscriber process died, then how to restart it and start back where it left off
[23:40:33] <jrdn> Neptu, 100% mongo
[23:40:39] <Neptu> aha
[23:40:49] <jrdn> tailable capped collection
[23:41:04] <Neptu> aha
[23:41:07] <jrdn> subscriber basically listens and processes messages
[23:41:08] <jrdn> aha?
[23:41:23] <Neptu> means ok
[23:41:43] <Neptu> still need to learn quite a lot about mongo myself
[23:42:03] <Neptu> but I have a question regarding geospatial indexing
[23:43:02] <Neptu> I have a like 200GB of records with 2D posiotions and I was wondering if I can use and abuse the collection system to have organized smaller subsets of data to speed up search
[23:43:17] <jrdn> http://blog.attachments.me/post/9712307785/a-fast-simple-queue-built-on-mongodb similar to that
[23:43:22] <jrdn> and similar problem to what we have too
[23:47:08] <Neptu> jrdn: reading
[23:52:58] <Neptu> but this is based in karait??