[02:28:13] <culthero_> is it common that a standalone instance on 40m records with a fulltext index takes 20-30 minutes to query on a very beefy machine?
[02:35:25] <culthero> I am somewhat at a loss, as this worked very well originally, and was hoping not to have to deal with sharding of data per 5-6 days of twitter data :S
[02:48:04] <culthero> even after stopping streaming
[02:48:06] <Boomtime> @culthero, can you run db.tweets.count() in a shell a few times, a couple seconds apart
[02:48:22] <Boomtime> here is a thing: TTL deletes do not show up in mongostat
[02:49:06] <Boomtime> they show up in replication but unless you have a secondary to check with then the effect of TTL maintenance is only seen in the resulting effects such as lock%
[02:59:48] <culthero> I just don't know what is going on, I feel like mongo let me go up to a 40-50gb database without too many problems, then all of the sudden everything stopped and everything I'm doing doesn't make any sense as to what is slowing everything down
[03:00:16] <culthero> thanks for your insight, I am just curious if I have to wait until the TTL expiration is completed
[03:00:23] <culthero> in order to see if the size of this database is still appropriate
[03:01:27] <Boomtime> i really think that graph is a bit optimistic/omitting the truth
[03:07:40] <culthero> So what I guess I am asking, is that if I need to somehow store lets say.. a rolling log of 30 days of tweets, with tweets coming in between 30 and 80 tweets per second, expiring the same amount, how do I factor what my hardware costs will be (based on one fulltext index on the collection)
[03:08:21] <culthero> or decide if it's even feasible on a standalone instance? like I didn't imagine I was going to be I/O bound
[03:08:24] <Boomtime> an empirical test like what you are doing is the best way, but right now you are in steady state
[03:08:32] <Boomtime> your system is still adjusting to the change you made
[03:11:09] <culthero> and it takes .. what 7 days to rebuild it again
[03:11:18] <culthero> currently the TTL is set to 7 days, down from 30 days and 15 days
[03:11:39] <Boomtime> right, so it is merrily deleting half the documents in your database based on a cursor.find() result
[03:12:20] <Boomtime> it will take awhile, you can probably work it out based on how many documents you had, or by running a find().count() on what should be kept
[03:12:44] <culthero> Will a db.repairDatabase() also fix the solution?
[03:12:47] <Boomtime> at this point, it will almost certainly be better to just wait for steady-state to be reached
[03:13:11] <Boomtime> no, db.reapirDatabase() right now would make it much much worse
[03:18:02] <culthero> well, I guess I also do an update or another table at 83 times a second
[03:18:13] <culthero> and I will also do a second query on a words table against the words in the tweet.
[03:18:52] <Boomtime> ok, so... max 5 million docs per week?
[03:19:30] <Boomtime> anyway, whatever you insert in 7 days is your target .count()... running .count() in the shell shows you how quickly you are approaching your target
[03:21:08] <Boomtime> if those numbers are right, and you currently have ~37 million documents, your TTL has a long way to go to catch up after the change
[03:21:57] <Boomtime> back-of-the-envelope calcs suggest you are looking at around 5 hours before steady-state is achieved :(
[03:22:16] <culthero> i think I will be going down to 20m documents
[03:24:25] <Boomtime> it depends a lot on the deletion-rate too, which i can't really tell, but you could measure
[03:30:25] <culthero> in general, if I sharded this over 4 20 dollar servers vs one 160 dollar one, am I going to potentially see a longer time of data that I could probably save>
[03:37:40] <Boomtime> @culthero, unless the sum of the hardware in those 4 servers is greater than the single server you had, then no, it will not improve anything
[04:14:33] <huleo> ("type" field vs. separate collection)
[07:18:43] <Zelest> if I have a replicaset of 3 nodes.. and shut down a secondary member.. will that affect the other nodes in any way? or will operations continue as before?
[07:18:52] <Zelest> (shutting it down for upgrade reasons)
[07:19:30] <joannac> Zelest: shouls be fine, you'll see more logging at the other 2 nodes report they can no longer see the one you shut down
[07:23:00] <joannac> also, you have a primary and secondary still up, right?
[07:23:00] <Zelest> mostly it's default.. where it's consistency dependent, it reads from the primary.. on others where performance is preferred, secondaries.
[07:25:16] <Zelest> some tiny bits of the company still uses MySQL...
[07:25:20] <Zelest> give me 6 months and they're gone ;)
[07:26:59] <Zelest> joannac, also, if I run a mongodb node on a server with 1 core vs a server with 2 cores.. will mongodb benefit from an extra core?
[07:43:44] <Zelest> aw, the MMS agent isn't available for FreeBSD? :(
[09:16:09] <sgo11> hi, if I use mongodb with nodebb, how fast will the full-text search be? thanks.
[09:30:24] <mccajm> Are there going to be any issues with running a replica set between two datacentres 80ms apart? Ideally the secondary datacentre would be a priority 1 slave.
[11:18:06] <EXetoC> can raw commands be sent using the C driver?
[11:18:39] <EXetoC> I might've stumbled upon something like that before, but it's not exactly documented, is it?
[11:26:57] <kali> EXetoC: commands are implemented as a find on a "magic" collection named "$cmd"
[11:37:01] <EXetoC> there it is. I had to grep like this: \\\$cmd
[11:37:20] <EXetoC> kali: thanks. hopefully it'll simplify wrapper development, but I might be wrong
[11:44:17] <EXetoC> actually, the body of mongoc_collection_aggregate for example is fairly large, but others are simple
[12:00:42] <dragoonis> What are the differences between doing map/reduce in mongodb or doing it in hadoop ? The mongodb says you can send mongodb data to hadoop for map/reduce if you like.
[12:01:52] <kali> dragoonis: mongodb is desinged mostly for small, fast queries. hadoop map/reduce is oriented towards hour-long batch processing
[12:02:08] <dragoonis> Roger that. Mines (atm) is quick fast queries.
[12:02:18] <kali> dragoonis: map/reduce relevance in mongodb is aguable, imho
[12:02:33] <dragoonis> I'm trying to pipe lots of data into mongodb and then use $.aggregate() to pull out data based on things like "date range" and "area_id"
[12:02:49] <dragoonis> using MySQL for this is no longer feasible.. to damn slow
[12:03:38] <kali> dragoonis: aggregation is faster than m/r so anything that can be done that way should be that way
[12:04:12] <dragoonis> kali, what do you think about this? https://gist.github.com/dragoonis/3a6a4b2b9bcc1543698b
[12:05:35] <kali> dragoonis: well, it all depends on what are your performance and scalability expectations on this use case
[12:06:19] <kali> dragoonis: heavy usage of aggregation (or worse, m/r) on important chunk of data is expensive
[12:06:32] <dragoonis> kali, expensive with what measurement ?
[12:08:41] <kali> dragoonis: so when you'll hit this wall, you may have no choice but redesign and refactor so everything that you need fast (because there is a user waiting for a page to load) must be pre-computed
[12:10:36] <dragoonis> kali, we pre-compile everything on the "date range". the problem we want to solve is we want them to choose an arbitrary date range on a date picker and miss the opportunity to pre-process.
[13:43:20] <flok420> I got this back from mongo: { result: [ { _id: null, total: 90 } ], ok: 1.0 } how do I get the value for "total" using the c++ api from this mongo::BSONObj instance?
[14:35:55] <flok420> I got this back from mongo: { result: [ { _id: null, total: 90 } ], ok: 1.0 } how do I get the value for "total" using the c++ api from this mongo::BSONObj instance?
[15:25:47] <feathersanddown> Hi, if I add an object to an array, is this an update or save operation?
[15:27:51] <ejb> Can anyone recommend a way to store "open hours" for a business? I need to be able to query currently open businesses.
[16:16:59] <cheeser> i'm not sure UTC even matters here.
[16:17:11] <cheeser> you're talking hour of the day
[18:55:31] <culthero> Howdy, will sharding a standalone to replicasets allow me to scale an application that has high locking percentages when going over a specific size, given a uniform distribution of data across shards?
[18:57:40] <culthero> with non growing document sizes
[20:02:25] <testerbit> is there an issue when comparing strings ie: var states = doc.State ?
[20:59:09] <bob_11> hey guys, quick question: I am trying bulk updates of existing documents in a collection using UnorderedBulkOperation, it was my understanding this was the fast way to do bulk updates and while the wrapper function returned quickly, the database is still handing the bulk update (~40k updates) a half hour later
[20:59:11] <bob_11> my question is: is it normal for UnorderedBulkOperation to continue in the database for a "long" time?
[22:57:38] <melvinrram> I'm using mongoid. If a collection has a field called 'emails' of type Array, how would do a search for all documents that contain value 'blah@blah.com' in the email field? I'm trying to create a custom validator that ensures uniqueness of values that are going into the 'emails' field.
[23:23:58] <feathersanddown> How to write a 256kb fixed size binary file in mongodb ??? as a normal class attribute ???
[23:24:57] <feathersanddown> but GridFS reserve 16Mbs any way, right ?
[23:26:00] <feathersanddown> file: { name: "my_file.hs", data: <binary_data>, owner: "an user" } <--- something like this is possible ? or there is something else to take in consideration ?
[23:28:45] <feathersanddown> because bson save whole "file" object as a binary i think, is correct ??