PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 22nd of January, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:05:18] <mgallardo> does somebody know which shell version should be the one for mongodb version 2.2?
[00:27:52] <redir> mgallardo the shell ships with the db so it should be the same version, no?
[03:00:30] <resting> is there any settings for database size of monogdb?
[03:02:45] <mr_smith> you can make a capped collection. i don't think you can make a capped db.
[03:04:56] <resting> mr_smith: i see…so if there's no cap, it'll grow as large as the hdd can hold?
[03:05:17] <mr_smith> i think that's true.
[03:06:25] <deimos> I currently have about ~720gb of data in a single mysql db, ~128gb of indexes and I see about 30% of the ops are writes, 60% reads, and the rest system stuff, I'm trying to find a long term solution as the growth rate has really started to pick up, and I wanted to look into using mongo, I assume I'd have to use sharding due to the index sizes and the memory constraints … anyone have similar data sizes that has had a lot of success usi
[03:06:25] <deimos> mongo?
[03:09:17] <resting> deimos: something i found while browsing http://blog.wordnik.com/12-months-with-mongodb
[03:09:28] <resting> mr_smith: i see...cool..
[03:11:18] <mr_smith> deimos: your indexes are 18% the size of your data? that seems rather high.
[03:11:19] <deimos> resting: thanks, reading now, trying to decide on a long term solution as the data starts approaching the tb range has been rocky
[03:12:53] <deimos> mr_smith: yeah, and in reality, the indexes in mongo would probably get a once over, we're using mysql like a doc store as it is, and thats the data we want to move out and in to something long term that can scale, we can keep a reference in mysql and in mongo cut down on the number of indexes to 1-2 to help reduce those sizes
[03:13:54] <mr_smith> deimos: what are the pain points? read, write or query performance?
[03:15:12] <deimos> in the initial tests we did writes were a bottleneck for us, we setup 3 servers and dumped about half the data on them, and then hammered the hell out of it trying to push the other half, and it was having a hard time, granted i think we made some mistakes with the shard key and there were some configuration issues we missed
[03:15:59] <mr_smith> deimos: this is on the mysql side, right?
[03:16:00] <deimos> there was a lot of disk contention, i think because we had to keep the indexes in memory due to the random nature of our access patterns
[03:16:30] <deimos> this was the mongo test we had setup, dumping the data from mysql into mongo
[03:17:13] <mr_smith> deimos: well, mongodb is going to keep indexes in memory. that's why it's fast.
[03:17:49] <mr_smith> however, 128gb of indexes may not be ideal. just keeping those indexes up to date in a bulk load probably buried it.
[03:18:26] <mr_smith> even with a TB of data, mysql should be pretty fast.
[03:18:32] <deimos> yeah, correct me if im wrong, the shards themselves will only need to load the index for the data they contain into memory, it seems pretty self evident that would eb the case but i want to be sure heh
[03:19:05] <deimos> well, i can't say the data is structured very well, we're fighting a political battle on application architecture too :/
[03:19:15] <mr_smith> deimos: yeah, but it depends on your shard key. choose a bad one and you're going to have a bad time.
[03:21:15] <deimos> yeah i think we need to take some time and figure out the best key to use for sure
[03:22:07] <mr_smith> what were you using for it?
[03:24:21] <deimos> we grab data from a gaming company, most of it is like games/matches fought between 2 teams of folks, and then we have a web based analytics/scoring tool to surface the data/trends/patterns in fun interesting ways, etc..
[03:24:28] <deimos> so we were using the player's ids
[03:25:46] <deimos> most of the data is accessed when someone goes tha players profile, they see those games, and who they played against, etc… but then we also link to those players and there's a number of aggregations we do, and for building trends we aggregate historical data for like who uses what items the most, or whatever
[03:26:20] <deimos> sorry, typing in the dark, forgive my typos
[03:27:39] <deimos> we figured that way when new games were added, the idea would be single node queries since hopefully the games would exist where that player was inserted initially realizing though that could create hot spots
[03:31:38] <mr_smith> deimos: using IDs is questionable. index locality is good, but query isolation sucks and you're vulnerable to reliability problems. something better would be like player and time, just to spitball.
[03:34:02] <deimos> question, when you have say, 3 shards, and you need to add another one, is the idea that when you add it, they start rebalancing right away? and is that something that is throttled well based on the number of reads/writes coming in?
[03:34:26] <deimos> i think thats one thing we ran into is when we added a shard it was going to take days to rebalance due to the writes
[03:55:13] <mr_smith> deimos: i'd have to dig into that one, but you can manually split and move chunks between shards.
[04:25:11] <jwilliams_> if i've already had a sharded mongo cluster, how can i add replication set to an existing shard server?
[04:25:31] <jwilliams_> come across to read the doc https://groups.google.com/forum/#!msg/mongodb-user/F3_XA9RPHVM/I_kyN7HlVLcJ
[04:25:38] <jwilliams_> but that seems to be different from my case.
[04:26:48] <jwilliams_> and other examples e.g. http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ seems to be done with fresh one.
[06:12:47] <ubersci> Hi #mongodb
[07:03:04] <ragsagar> I am using aggregation framework to group and count based on date. I achieved this http://dpaste.com/886426/ . But I want to group dayOfWeek and month and group based on that, not just year. How can I achieve that.?
[07:17:25] <ragsagar> got it :)
[08:42:07] <oskie> some time after PRIMARY change, the secondaries start to lag a bit (~10 min), exactly the same lag. Yet there is not high CPU or high I/O on the slaves. What's wrong?
[08:43:39] <oskie> what's usually wrong when the two secondaries lag with exactly the same optimeDate?
[08:45:50] <chrisq> how come mongodb still uses huge amounts of memory even when i drop the only collection i had there?
[08:46:53] <chrisq> also, /var/lib/mongodb/, retains <collection>.<int> files that are GBs in size
[08:48:18] <chrisq> sorry <database>.<int>
[08:49:32] <NodeX> mongo doesn't clean data files and the memory is mmapped
[08:49:46] <NodeX> ergo the Operating system will remove it from the LRU cache
[08:49:50] <chrisq> droping the database solved it, i'm still curious why the database would use so much memory after the data in the db is actually removed
[08:50:15] <chrisq> ah, right, so how long would i normally wait for that to happen?
[08:50:37] <[AD]Turbo> hola
[08:52:10] <NodeX> depends how many queries you get on your db
[08:52:18] <NodeX> LRU = least resourse used
[08:52:23] <NodeX> resource
[08:54:24] <chrisq> NodeX: thanks again
[08:54:46] <NodeX> if you want to force it then restart your mongo
[08:58:10] <oskie> if a single secondary is overloaded and lagging, could that cause other secondaries in the same replica set (not overloaded) to be lagging as well?
[08:58:23] <NodeX> it shouldn't cause it
[08:59:03] <oskie> ok, but if the primary is slow, that could cause secondaries to lag (in unison), right?
[09:02:40] <NodeX> yes
[09:15:16] <oskie> But I don't get it. The primary has very little load, little I/O. Still, the secondaries keep lagging more and more. There are not connectivity issues.
[09:21:49] <yarco> what's the opposite operation to $addToSet ?
[09:22:00] <yarco> no $removeFromSet ..?
[09:41:46] <oskie> heh. I stopped one of the secondaries - then the other secondary was able to recover much faster
[09:41:57] <NodeX> yarco : $pull
[09:43:11] <yarco> NodeX , thanks. get it
[10:00:14] <simenbrekken> I've got a bunch of documents with different statuses (pending, approved, rejected) which I need to count each time I search the collection. I've made an index on the 'status' field but it's still pretty slow. Are there other ways of optimizing count=
[10:12:46] <NodeX> no
[10:33:51] <simenbrekken> NodeX: Alright, Guess I have to implement a caching strategy somewhere.
[10:34:22] <NodeX> if you only ever count based on 1 key then you should just count / inc / decr ee
[10:34:27] <NodeX> everytime you insert/update*
[10:56:53] <chrisq> having an array of sub-documents, how to i create an index on one of those?
[10:57:02] <chrisq> one of the fields that is
[10:57:17] <NodeX> dot notation
[10:57:52] <NodeX> foo : [{bar:1},{bar:2}] ..... db.bleh.ensureIndex({"foo.bar":1});
[10:58:10] <chrisq> NodeX: i've seen that when used on sub-documents in the docs, but they were all dicts in dicts, not dicts in arrays. would i just ignore the fact that they are in an array?
[10:59:19] <chrisq> {'title':'blah', 'artist':'blu', tracks: [{id:1}{id:2}{id3}{id:4}] }
[10:59:29] <chrisq> where id is what i want to index
[11:07:51] <NodeX> "tracks.id" : 1
[11:08:23] <chrisq> NodeX: yeah tried that and it seems to be doing something, thanks
[11:11:48] <SidSharma> #mongodb
[11:39:30] <NodeX> random
[11:42:19] <mollerstrand> he's not wrong
[12:15:20] <jer`> how to set slaveOk() permanently ?
[12:23:29] <MatheusOl> jer`: No way
[12:23:40] <MatheusOl> jer`: Some drivers has this configuration though
[12:24:09] <MatheusOl> jer`: On shell, I think you can use ~/.mongorc.js
[12:24:52] <jer`> No way to allow reading on secondaries nodes !
[12:24:57] <jer`> strange
[12:43:35] <tom0815> hello, is it possible to configure the timeouts for replicasets heartbeat?
[13:12:54] <jeremy-> Is there anyway to instance a collection without any data and ensureIndex (awaiting data). As i understand collections are created on the first data input. I'm just working out the best way to ensureIndex as part of the initial data input process on new collections that are created dynamically (daily date based collections).
[13:13:21] <Derick> ensureIndex will also create the collection
[13:13:32] <jeremy-> Awesome
[13:13:36] <jeremy-> thats just what i need
[13:13:46] <jeremy-> why didnt i try that :D sorry for stupid question
[13:14:41] <chrisq> what is the recommended technology for text search? sphinx? lucene?
[13:14:54] <NodeX> http://www.v3.co.uk/v3-uk/news/2238059/amazon-web-services-unveils-memoryintensive-cloud-service
[13:15:38] <Derick> chrisq: elastic search/solr
[13:16:21] <chrisq> Derick: isnt solr part of lucene?
[13:16:55] <Derick> no
[13:17:01] <Derick> or actually, it might be again
[13:17:10] <Derick> elastic search > solr then
[13:17:28] <Derick> projects had split up (solr and lucene) but they might have gotten together again
[13:17:33] <chrisq> Derick: thanks, I'll be looking at it then :)
[13:19:37] <NodeX> lucene is the library / way of indexing things
[13:19:47] <NodeX> solr is the software that quries it
[13:19:55] <NodeX> queries*
[13:20:19] <kali> technically, solr and elastic search are both using lucene
[13:20:19] <chrisq> Elasticsearch is a distributed, scalable nosql search server built on lucene.
[13:20:34] <NodeX> correctamundo
[13:20:48] <kali> lucene is just a library, not a server (or an executable of any kind)
[13:21:07] <NodeX> ^^
[13:21:16] <NodeX> [13:18:21] <NodeX> lucene is the library / way of indexing things
[13:21:57] <kali> ok, ok :)
[13:22:04] <kali> use the aggregation framework and shut up :)
[13:22:08] <NodeX> for me Elasticsearch is a little green but a nice project and fairly close to Mongodb in terms of structure
[13:22:49] <kali> yeah, the fact the high availibility and scalability are built-in since the beginning
[13:23:09] <NodeX> solr 4.0 cloud kinda deals with all the scaling problems it had
[13:23:26] <chrisq> i also read someone suggesting to split the text field into an array of lower case words and then index that array
[13:23:31] <kali> but i have to restart instances of ES a bit too much to my taste
[13:23:40] <chrisq> in mongodb, any experiences with such a solution?
[13:23:54] <NodeX> chrisq : it really depends how deep you need your search
[13:23:58] <kali> chrisq: it will work.. to a point.
[13:24:17] <chrisq> basically it's just album names, artist names and track names
[13:24:25] <kali> chrisq: you can even stem the tokens and have a list of stop words if you want
[13:24:28] <chrisq> however there are quite a few of them
[13:24:54] <kali> no fuzzy search either...
[13:24:55] <NodeX> let me re-phrase
[13:25:05] <NodeX> it depends what you need from your searchign
[13:25:08] <NodeX> searching *
[13:25:35] <jeremy-> Eg, what if you wanted spelling mistake searching match suggestions as well
[13:25:41] <kali> if the search is a critical path in you app usage, i recommend you design with an external search engine since the beginnin
[13:26:03] <kali> if the search is just a "nice to have" feature, you can try DYI with mongodb
[13:26:05] <NodeX> you can build a spelling engine in anything
[13:26:16] <NodeX> dont forget in mongo 2.4 FTS is arriving
[13:26:31] <jeremy-> I havnt been watching roadmap, sounds interesting
[13:26:35] <NodeX> it will be somewhere in the middle of a full blown indexer and what's currently available in mongo
[13:26:59] <jeremy-> I was just thinking, if you were going to use a python or whatever natural language toolkit to get typos then search mongodb, i thought one of the out of the box search engines might do it more efficient
[13:27:07] <jeremy-> than sending like 20-50 possible searches per query
[13:27:26] <jeremy-> wow, that sounds amazing
[13:27:30] <jeremy-> i gotta read on that nodeX
[13:27:34] <NodeX> ;)
[13:27:46] <chrisq> i dont need fuzzy searches
[13:28:04] <chrisq> i demand that the users can at least spell a word correctly
[13:28:15] <jeremy-> well you could probably use autocomplete and guide them too
[13:28:24] <NodeX> I would put a list of search terms in an array in the document, normalize them and use that
[13:28:26] <chrisq> jeremy-: true, good idea
[13:28:38] <jeremy-> yep, exactly
[13:29:16] <jeremy-> I think mongo (or even a sql based database) would be a pretty nice solution for what you are saying personally, but im pretty inexperienced. I just cant see a strong case for full text search engine
[13:30:28] <jeremy-> I have a mongodb with like 220 million search terms and metrics like volume/cpc and mongodb returns results in a tiny fraction of a second, havnt measured it but its like .1 or less i assume
[13:31:00] <kali> 0.1 is not that tiny...
[13:31:01] <jeremy-> you should just bench test a bunch of stuff and see what you like, have a field day / practice day
[13:31:23] <jeremy-> well thats not used by end users, its used to populate a 100k or whatever list that is
[13:31:30] <jeremy-> I'm assuming its much less
[13:31:55] <NodeX> new relic ftw
[13:32:01] <NodeX> measure that performance :D
[13:32:38] <jeremy-> I just use it to attach metadata to dynamic lists that are much smaller, but i can run a batch job to update like 100k records in minutes which is querying from my 220million records
[13:34:09] <jeremy-> while i'm talking to people who are obviously much better/more experienced than me
[13:36:15] <jeremy-> Ive been trying to benchtest some python to determine potential improvements for my application. Is there any known issue where if you hammer a mongoserver with like 100k writes in a short period that the speed at which the writes occur actually slows down over time (just slightly)
[13:36:46] <jeremy-> I've checked all sorts of things using pythons debugging framework and i cant see a memory based reason for the slowdown, I was thinking it might just be a small bottleneck as mongo gets a thrashing
[13:38:40] <jeremy-> Or i guess, i wonder if i can just log the speed at which mongo gets requests and processes them on the database end so i can measure it there, (for some reason only thought of that now..doh)
[13:39:03] <NodeX> lol
[13:43:19] <jeremy-> false alarm, there are a plethora of options at the database level
[13:43:50] <jeremy-> Just need to decide if i can get the info at database profiler or i should go third party
[13:48:45] <firefux> is mongo better than couchbase?
[13:51:59] <kali> define "better"
[13:52:28] <chrisq> i would assume both are better for different tasks
[13:55:04] <NodeX> tea is better than coffee for drinking when you want tea
[13:55:17] <chrisq> from what i heard couch is good if you have huge amounts of servers that you want to sync or want it embedded on mobile devices
[13:55:53] <kali> chrisq: couchbase or couchdb ? :)
[13:57:09] <chrisq> kali: oops yeah, that would be couchdb i think :)
[13:59:02] <firefux> is mongo a good choice for something like a billing application?
[13:59:26] <kali> probably not
[13:59:37] <kali> but couch* is probaly not either
[13:59:44] <chrisq> firefux: i'm not very experienced with mongo, but i think i'd use a traditional sql server for that
[13:59:52] <kali> lots of transactions -> pick a transaction engine
[13:59:57] <kali> that would mean SLQ
[14:00:13] <firefux> yeah, and reporting
[14:11:37] <krawek> firefux: the new aggregation framework works really good for reporting
[14:12:54] <kali> krawek: the AF has limitations that could be problematic for generating big reports
[14:13:10] <kali> or even small reports on big dataset, if you're unlucky
[14:17:05] <balboah> anyone running mongodb in amazon ec2 production? Is the recommendation about raid10 ebs still valid when they also have a enhanced IOPS option?
[14:20:07] <oskie> balboah: sure do.. for us, provisioned IOPS gives less performance
[14:20:25] <oskie> balboah: unless you go >= 1200 or maybe 1500 provisioned IOPS
[14:21:15] <oskie> it's a different charge though
[14:21:39] <oskie> balboah: and we use RAID0 right now (2 drives - I wouldn't go beyond that)
[14:21:56] <oskie> maybe RAID10 would work fine, but I wouldn't know
[14:22:20] <balboah> oskie: allright, so you leave the iops option out alltogether?
[14:22:50] <oskie> balboah: yeah... but I guess it also depends on your type of load...our mongodb load is very much limited by I/O