pmxbot IRC Log Viewer

[02:28:13] <culthero_> is it common that a standalone instance on 40m records with a fulltext index takes 20-30 minutes to query on a very beefy machine?

[02:30:00] <Boomtime> what is the query?

[02:30:25] <Boomtime> fyi, that sounds like it isn't using the index

[02:30:33] <culthero> db.collection.find({$text: {$search: ""obama""}});

[02:30:54] <culthero> well some queries return relatively quickly

[02:31:04] <culthero> db.tweets.find({ $text: { $search: '""Jeter""' }}).explain();

[02:31:13] <culthero> "nscannedObjects" : 1625,

[02:31:20] <culthero> these are tweets it's searching over.

[02:31:29] <Boomtime> what is "n" for that explain?

[02:31:32] <Boomtime> same?

[02:31:40] <culthero> yes

[02:31:42] <Boomtime> good

[02:31:58] <Boomtime> what is it for the "obama" case?

[02:31:59] <culthero> looking for a common word (lets say "the F word"

[02:32:16] <culthero> just a minute waiting on the resutls

[02:32:46] <Boomtime> erg :(

[02:33:10] <culthero> yeah, I had waited 10 / 20 minutes for the f word to come back over tweets

[02:33:16] <culthero> it did not return anything before I cancelled the op

[02:34:05] <culthero> my databases vsize is 177gb

[02:34:22] <culthero> and locked% is currently over 100%

[02:34:29] <Boomtime> hmm.. is it so common that the index threw it away?

[02:34:31] <culthero> (looking at mongostat)

[02:34:39] <culthero> Trying with "Bieber"

[02:35:25] <culthero> I am somewhat at a loss, as this worked very well originally, and was hoping not to have to deal with sharding of data per 5-6 days of twitter data :S

[02:35:39] <culthero> several thousand faults too

[02:35:41] <culthero> hm

[02:35:56] <culthero> db.tweets.find({ $text: { $search: '""Bieber""' }}).explain();

[02:36:04] <culthero> "cursor" : "TextCursor",

[02:36:08] <culthero> "n" : 104943,

[02:36:16] <culthero> nscanned" : 105046,

[02:36:28] <culthero> "millis" : 120662,

[02:36:55] <Boomtime> hmm.. i would have expected better

[02:37:17] <culthero> yeah. I have just recently cut the TTL in half, from 15 days to .. 7 days

[02:37:18] <Boomtime> is this db idle?

[02:37:24] <Boomtime> oh

[02:37:28] <culthero> no it is getting data inserted constantly

[02:37:39] <Boomtime> and it's deleting constantly too?

[02:37:41] <culthero> 30-40 inserts a second

[02:37:49] <culthero> yes

[02:38:00] <Boomtime> what is the mongostat numbers?

[02:38:06] <Boomtime> one full line please

[02:40:12] <culthero> http://pastebin.com/BdJN06di

[02:41:04] <Boomtime> and that is standalone?

[02:41:29] <Boomtime> i.e this is not a replica-set

[02:41:49] <culthero> yessir

[02:41:54] <culthero> on a linode 160gb

[02:41:55] <Boomtime> there is no way that lock% is from those inserts

[02:42:07] <Boomtime> i think you're still deleting from the ttl change

[02:42:17] <Boomtime> how many documents would that have deleted potentially?

[02:42:19] <culthero> so should I stop the inserts, delete?

[02:42:23] <culthero> 18 million

[02:42:24] <culthero> roughly

[02:42:32] <Boomtime> nup, the inserts should be doing little

[02:42:38] <Boomtime> what as the disks?

[02:42:41] <culthero> SSD

[02:42:49] <Boomtime> wowsa..

[02:42:50] <culthero> Linode $160/mo plan

[02:43:02] <Boomtime> oh, sorry, didn't recognise the reference

[02:43:14] <culthero> 2.6.3 is the version of monogo

[02:43:19] <culthero> the database at one point at 80m tweets in it

[02:43:28] <culthero> which was 77 or so gb

[02:48:01] <culthero> lock% is still that high

[02:48:04] <culthero> even after stopping streaming

[02:48:06] <Boomtime> @culthero, can you run db.tweets.count() in a shell a few times, a couple seconds apart

[02:48:22] <Boomtime> here is a thing: TTL deletes do not show up in mongostat

[02:49:06] <Boomtime> they show up in replication but unless you have a secondary to check with then the effect of TTL maintenance is only seen in the resulting effects such as lock%

[02:49:24] <culthero> http://pastebin.com/w3kj0ssk

[02:49:41] <culthero> I feel like something else is going on but I don't know what

[02:50:45] <Boomtime> i think you have a fixed IO allocation, say, 1000 or so IOPS by the looks of it (perhaps 2000) and you are hitting it

[02:50:54] <culthero> on the user?

[02:50:58] <Boomtime> your TTL is still deleting docs

[02:51:03] <culthero> I am positive I don't

[02:51:04] <Boomtime> hard-disk

[02:51:16] <culthero> I have had 150k-200k iops

[02:51:21] <culthero> shown on the linode dashboard

[02:52:36] <Boomtime> http://serverbear.com/13-linode-2gb-linode

[02:52:48] <Boomtime> that's the 2GB linode so smaller than yours

[02:52:58] <Boomtime> IOPS can easliy be as high as you say for cached

[02:53:14] <Boomtime> but mongodb needs to flush to or it can't trust the journal

[02:53:24] <Boomtime> so, cached doesn't matter

[02:53:40] <Boomtime> sequential write on the 2GB was 1079

[02:53:51] <culthero> http://i.imgur.com/ChgSHkH.png

[02:54:58] <culthero> but, 30 inserts of 500b a second.. roughly

[02:55:03] <culthero> 500 bytes*

[02:55:16] <Boomtime> i'm not sure what you are showing me

[02:55:18] <culthero> the memory never gets too crazy too, stays under 2gb

[02:55:23] <culthero> thats htop output of mongod

[02:56:32] <Boomtime> yeah...

[02:57:51] <culthero> here is the iops of the last 24h

[02:57:51] <culthero> right now the database is 20gb, her

[02:57:52] <Boomtime> also, what do you mean the memory stays under 2GB, that output shows resident at 9...

[02:58:01] <culthero> http://i.imgur.com/dw6f076.png

[02:58:51] <culthero> http://i.imgur.com/YaPiMi8.png

[02:59:48] <culthero> I just don't know what is going on, I feel like mongo let me go up to a 40-50gb database without too many problems, then all of the sudden everything stopped and everything I'm doing doesn't make any sense as to what is slowing everything down

[03:00:16] <culthero> thanks for your insight, I am just curious if I have to wait until the TTL expiration is completed

[03:00:23] <culthero> in order to see if the size of this database is still appropriate

[03:01:27] <Boomtime> i really think that graph is a bit optimistic/omitting the truth

[03:01:29] <Boomtime> http://serverbear.com/2427-linode-16gb-ssd-linode#benchmarks

[03:01:55] <Boomtime> that a benchmark of your system, showing again those similar big peaks for cached IO, but not for actual

[03:02:33] <Boomtime> and your count() checks showed more than a few thousand documents being deleted per check

[03:02:52] <Boomtime> each one of those could be 2 writes, and 1 read

[03:04:24] <Boomtime> what does iotop say?

[03:05:34] <culthero> sec

[03:06:27] <culthero> http://i.imgur.com/ObXD8IY.png

[03:07:40] <culthero> So what I guess I am asking, is that if I need to somehow store lets say.. a rolling log of 30 days of tweets, with tweets coming in between 30 and 80 tweets per second, expiring the same amount, how do I factor what my hardware costs will be (based on one fulltext index on the collection)

[03:08:21] <culthero> or decide if it's even feasible on a standalone instance? like I didn't imagine I was going to be I/O bound

[03:08:24] <Boomtime> an empirical test like what you are doing is the best way, but right now you are in steady state

[03:08:32] <Boomtime> your system is still adjusting to the change you made

[03:08:48] <Boomtime> you are NOT in steady state

[03:08:52] <Boomtime> damn, bad typo

[03:09:20] <culthero> ah

[03:09:24] <Boomtime> i realize that mongostate is effectively lying to you about that, the delete column does not show deletes

[03:09:29] <culthero> nod

[03:09:34] <culthero> would I speed up the process by deleting close to what is expired?

[03:09:42] <Boomtime> i.e it shows USER deletes,not TTL deletes

[03:10:32] <Boomtime> the slowness is in the document delete itself, not the fact that the TTL is doing it (though that is slightly more expensive)

[03:10:43] <Boomtime> if you really must speed it up... drop the collection

[03:10:54] <Boomtime> but that will delete everything of course :p

[03:10:57] <culthero> yeah

[03:11:09] <culthero> and it takes .. what 7 days to rebuild it again

[03:11:18] <culthero> currently the TTL is set to 7 days, down from 30 days and 15 days

[03:11:39] <Boomtime> right, so it is merrily deleting half the documents in your database based on a cursor.find() result

[03:12:20] <Boomtime> it will take awhile, you can probably work it out based on how many documents you had, or by running a find().count() on what should be kept

[03:12:44] <culthero> Will a db.repairDatabase() also fix the solution?

[03:12:47] <Boomtime> at this point, it will almost certainly be better to just wait for steady-state to be reached

[03:13:11] <Boomtime> no, db.reapirDatabase() right now would make it much much worse

[03:14:34] <culthero> alright

[03:14:54] <culthero> I don't understand why, but I am curious how long I should have to wait for the stable state

[03:14:55] <culthero> 7 days?

[03:15:26] <Boomtime> you can probably work it out.. is your insert rate stable?

[03:16:39] <culthero> insert rate (according to the other table recording it) varies from 75,000 an hour to 300k an hour

[03:17:33] <culthero> at max 83/s

[03:18:02] <culthero> well, I guess I also do an update or another table at 83 times a second

[03:18:13] <culthero> and I will also do a second query on a words table against the words in the tweet.

[03:18:52] <Boomtime> ok, so... max 5 million docs per week?

[03:19:30] <Boomtime> anyway, whatever you insert in 7 days is your target .count()... running .count() in the shell shows you how quickly you are approaching your target

[03:21:08] <Boomtime> if those numbers are right, and you currently have ~37 million documents, your TTL has a long way to go to catch up after the change

[03:21:57] <Boomtime> back-of-the-envelope calcs suggest you are looking at around 5 hours before steady-state is achieved :(

[03:22:16] <culthero> i think I will be going down to 20m documents

[03:22:16] <culthero> not 5

[03:23:25] <Boomtime> ok, then it is much closer

[03:24:10] <culthero> like 3.5 - 6m documents a day

[03:24:24] <culthero> my disk swap is also full

[03:24:25] <Boomtime> it depends a lot on the deletion-rate too, which i can't really tell, but you could measure

[03:30:25] <culthero> in general, if I sharded this over 4 20 dollar servers vs one 160 dollar one, am I going to potentially see a longer time of data that I could probably save>

[03:37:40] <Boomtime> @culthero, unless the sum of the hardware in those 4 servers is greater than the single server you had, then no, it will not improve anything

[03:40:42] <culthero> Don't you mean the i/o?

[03:40:56] <culthero> aren't you basically saying I'm i/o bound based on the number of inserts / deletes I would be doing

[04:00:34] <culthero> @Boomtime, thank you for your input, I'll check back in later

[04:14:19] <huleo> we have several types of content that share most of their fields, but not all of them - several are unique

[04:14:23] <huleo> one collection or many?

[04:14:33] <huleo> ("type" field vs. separate collection)

[07:18:43] <Zelest> if I have a replicaset of 3 nodes.. and shut down a secondary member.. will that affect the other nodes in any way? or will operations continue as before?

[07:18:52] <Zelest> (shutting it down for upgrade reasons)

[07:19:30] <joannac> Zelest: shouls be fine, you'll see more logging at the other 2 nodes report they can no longer see the one you shut down

[07:19:42] <Zelest> ah yeah

[07:19:56] <Zelest> thanks

[07:20:15] <joannac> what's your write concern?

[07:20:35] <Zelest> whatever the default is in the PHP client I think

[07:21:39] <joannac> w:1

[07:21:44] <joannac> yeah, you should be fine

[07:21:57] <joannac> are you specifying a read concern?

[07:22:23] <Zelest> yeah, at some places in the app

[07:22:45] <joannac> of what?

[07:23:00] <joannac> also, you have a primary and secondary still up, right?

[07:23:00] <Zelest> mostly it's default.. where it's consistency dependent, it reads from the primary.. on others where performance is preferred, secondaries.

[07:23:04] <Zelest> yeah

[07:23:11] <joannac> okay

[07:23:17] <joannac> so you might see extra read load

[07:23:36] <joannac> since the secondary reads previously went evenly to 2 secondaries, but are now going to the only secondary left

[07:23:43] <Zelest> yeah

[07:25:16] <Zelest> some tiny bits of the company still uses MySQL...

[07:25:20] <Zelest> give me 6 months and they're gone ;)

[07:26:59] <Zelest> joannac, also, if I run a mongodb node on a server with 1 core vs a server with 2 cores.. will mongodb benefit from an extra core?

[07:43:44] <Zelest> aw, the MMS agent isn't available for FreeBSD? :(

[09:16:09] <sgo11> hi, if I use mongodb with nodebb, how fast will the full-text search be? thanks.

[09:30:24] <mccajm> Are there going to be any issues with running a replica set between two datacentres 80ms apart? Ideally the secondary datacentre would be a priority 1 slave.

[11:18:06] <EXetoC> can raw commands be sent using the C driver?

[11:18:39] <EXetoC> I might've stumbled upon something like that before, but it's not exactly documented, is it?

[11:26:18] <EXetoC> mongoc_client_command perhaps

[11:26:57] <kali> EXetoC: commands are implemented as a find on a "magic" collection named "$cmd"

[11:37:01] <EXetoC> there it is. I had to grep like this: \\\$cmd

[11:37:20] <EXetoC> kali: thanks. hopefully it'll simplify wrapper development, but I might be wrong

[11:44:17] <EXetoC> actually, the body of mongoc_collection_aggregate for example is fairly large, but others are simple

[12:00:42] <dragoonis> What are the differences between doing map/reduce in mongodb or doing it in hadoop ? The mongodb says you can send mongodb data to hadoop for map/reduce if you like.

[12:01:52] <kali> dragoonis: mongodb is desinged mostly for small, fast queries. hadoop map/reduce is oriented towards hour-long batch processing

[12:02:08] <dragoonis> Roger that. Mines (atm) is quick fast queries.

[12:02:18] <kali> dragoonis: map/reduce relevance in mongodb is aguable, imho

[12:02:27] <kali> arguable

[12:02:33] <dragoonis> I'm trying to pipe lots of data into mongodb and then use $.aggregate() to pull out data based on things like "date range" and "area_id"

[12:02:49] <dragoonis> using MySQL for this is no longer feasible.. to damn slow

[12:03:38] <kali> dragoonis: aggregation is faster than m/r so anything that can be done that way should be that way

[12:04:12] <dragoonis> kali, what do you think about this? https://gist.github.com/dragoonis/3a6a4b2b9bcc1543698b

[12:05:35] <kali> dragoonis: well, it all depends on what are your performance and scalability expectations on this use case

[12:06:19] <kali> dragoonis: heavy usage of aggregation (or worse, m/r) on important chunk of data is expensive

[12:06:32] <dragoonis> kali, expensive with what measurement ?

[12:06:39] <dragoonis> CPU? ram?

[12:06:48] <dragoonis> query time ?

[12:06:50] <kali> dragoonis: cpu time, latency

[12:07:14] <kali> dragoonis: i/o too, if your dataset does not fit in RAM

[12:07:21] <dragoonis> noted.

[12:08:41] <kali> dragoonis: so when you'll hit this wall, you may have no choice but redesign and refactor so everything that you need fast (because there is a user waiting for a page to load) must be pre-computed

[12:10:36] <dragoonis> kali, we pre-compile everything on the "date range". the problem we want to solve is we want them to choose an arbitrary date range on a date picker and miss the opportunity to pre-process.

[12:10:38] <dragoonis> any thoughts on that ?

[12:12:09] <kali> not much. not an easy use case

[12:12:35] <kali> dragoonis: throwing more hardware at it may be the answer :)

[12:14:36] <dragoonis> kali, true.

[13:43:20] <flok420> I got this back from mongo: { result: [ { _id: null, total: 90 } ], ok: 1.0 } how do I get the value for "total" using the c++ api from this mongo::BSONObj instance?

[14:35:55] <flok420> I got this back from mongo: { result: [ { _id: null, total: 90 } ], ok: 1.0 } how do I get the value for "total" using the c++ api from this mongo::BSONObj instance?

[15:25:47] <feathersanddown> Hi, if I add an object to an array, is this an update or save operation?

[15:27:51] <ejb> Can anyone recommend a way to store "open hours" for a business? I need to be able to query currently open businesses.

[16:11:32] <saml> ejb, Date?

[16:11:37] <saml> oh

[16:11:59] <saml> yah normalize Date so that it'll always be 0000-01-01T

[16:12:50] <ejb> I also found a method that store open and close times in minutes

[16:12:58] <ejb> then use gte lte

[16:13:27] <ejb> hours: { mon: { open: xxx, close: xxx }, tue: { ... }, ... }

[16:14:02] <saml> yah that's good

[16:14:15] <saml> no maybe not good

[16:14:32] <saml> let's say client's today is monday . but in UTC, it's sunday

[16:15:56] <saml> nevermind everything can be in UTC in mongodb

[16:16:01] <saml> and you convert

[16:16:59] <cheeser> i'm not sure UTC even matters here.

[16:17:11] <cheeser> you're talking hour of the day

[18:55:31] <culthero> Howdy, will sharding a standalone to replicasets allow me to scale an application that has high locking percentages when going over a specific size, given a uniform distribution of data across shards?

[18:57:40] <culthero> with non growing document sizes

[20:02:25] <testerbit> is there an issue when comparing strings ie: var states = doc.State ?

[20:59:09] <bob_11> hey guys, quick question: I am trying bulk updates of existing documents in a collection using UnorderedBulkOperation, it was my understanding this was the fast way to do bulk updates and while the wrapper function returned quickly, the database is still handing the bulk update (~40k updates) a half hour later

[20:59:11] <bob_11> my question is: is it normal for UnorderedBulkOperation to continue in the database for a "long" time?

[21:12:11] <ss-> clear

[21:13:09] <ss-> Can anyone help me> i am unable to launch mongod on ec2

[21:13:31] <ss-> it cant bind to my elastic i

[21:13:35] <ss-> ip*

[21:18:12] <mango_> anyone know why ensureIndex() can kill a standalone shard?

[21:24:58] <mango_> ?

[22:57:38] <melvinrram> I'm using mongoid. If a collection has a field called 'emails' of type Array, how would do a search for all documents that contain value 'blah@blah.com' in the email field? I'm trying to create a custom validator that ensures uniqueness of values that are going into the 'emails' field.

[23:23:58] <feathersanddown> How to write a 256kb fixed size binary file in mongodb ??? as a normal class attribute ???

[23:24:30] <cheeser> gridfs

[23:24:57] <feathersanddown> but GridFS reserve 16Mbs any way, right ?

[23:26:00] <feathersanddown> file: { name: "my_file.hs", data: <binary_data>, owner: "an user" } <--- something like this is possible ? or there is something else to take in consideration ?

[23:28:45] <feathersanddown> because bson save whole "file" object as a binary i think, is correct ??

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 25th of August, 2014