pmxbot IRC Log Viewer

[01:42:09] <mrkirby153> Okay, so I'm having an issue with the java MongoDB library

[01:43:31] <mrkirby153> Every time I do something on a collection (e.g Collection#findOneAndReplace()) it appears to be spawning a new connection to the DB and not closing it

[01:45:18] <cheeser> there's a connection pool inside MongoClient

[01:45:43] <cheeser> you don't want a connection opened and closed every time.

[01:46:59] <kurushiyama> mrkirby153: It is not unusual to have several hundred to even thousands of connections open to a MongoDB instance. Compared to other DBMS, connections are pretty cheap.

[01:47:24] <mrkirby153> Huh, I did not know that

[01:47:37] <mrkirby153> Too used to MySQL where multipule connections was bad practice

[01:47:46] <kurushiyama> mrkirby153: Aye, you are.

[01:47:48] <kurushiyama> ;)

[01:48:04] <mrkirby153> Wanted to give MongoDB a spin

[01:48:39] <cheeser> multiple connections to the db is perfectly valid even for MySQL. You should just a connection pool with the driver. in mongodb's case, we provide the CP as part of the client/driver

[01:48:53] <kurushiyama> mrkirby153: Well, one pretty simple peace of advice. MongoDB is __very__ well thought out. So if you encounter a problem, dont doom MongoDB, but seek help.

[01:49:10] <mrkirby153> Yeah, I may become a permanent resident of this channel XD

[01:49:28] <mrkirby153> I tried mongodb but then moved away from it because it was "Not MySQL"

[01:49:28] <Boomtime> @mrkirby153: in case it wasn't clear; keep the MongoClient object persistent - don't go creating/destroying these

[01:49:40] <mrkirby153> Yeah, I figured

[01:49:53] <kurushiyama> cheeser: I have a question on the JavaDriver, too. The 2.14.x branch actually is just updated for 3.0's wire proto, right?

[01:50:35] <mrkirby153> Also, is there a cleaner way to disable the logging than Logger.getLogger("org.mongodb.driver").setLevel(SEVERE)?

[01:50:49] <cheeser> kurushiyama: yes

[01:51:58] <cheeser> mrkirby153: you can configure it via whatever logging framework you're using. http://mongodb.github.io/mongo-java-driver/3.2/driver/reference/management/logging/

[03:00:07] <lotus> Hey guys, support for mongoose in here? I'm trying to get at Model.findById in a pre-hook and it's saying undefined...

[03:27:06] <lotus> Trick: mongoose.model('ModelName').findOne(...);

[03:27:11] <lotus> gg. ttyl

[04:01:07] <kalx> Hi all. Is there anyway to insert a record into mongo with a field containing the current db server date/time? I saw there is a $currentDate operator but seems this is for updates only.

[07:55:29] <Zelest> brain is not working.. how can I find a document, and unless it exists, create it?

[08:01:19] <ravenx> im' kinda confused as to what upsert does in mongoimport

[08:01:22] <ravenx> can someone ELI5?

[08:03:17] <tpayne> Is it possible to count on an aggregate?

[08:03:48] <tpayne> db.users.find({"deactivated":false}).count() works but

[08:04:47] <tpayne> db.users.aggregate([{“deactivated":false}]).count() or whatever the aggregate for multiple where clauses, sorts, etc is

[08:04:50] <tpayne> doesn't

[08:42:07] <Folkol> Hello. I just stumbled upon the fact that I can get optimistic locking with MongoDB, which is great. But I also read that Mongo will retry on WriteConflict... This is not good enough, is it possible to disable the automatic retry? (Or supply custom merge-code in case of conflict?) I have tried looking this up in the doc, but failed.

[09:03:28] <tpayne> Is it possible to count on an aggregate?

[09:08:40] <lipiec> Hi! Is there some mailing group to which I could subscribe and be notified about new version release?

[09:11:00] <Folkol> tpayne: I am not sure that I understand what you are asking for, but check out "accumulators".

[09:11:29] <tpayne> Folkol: just wondering if i can count instead of returning results of an aggregate

[09:11:45] <tpayne> right now i have the count the documents in code, would like to optimize it

[09:12:57] <Folkol> You can group the results of aggregate. There is an example here: https://docs.mongodb.org/manual/aggregation/

[09:15:54] <tpayne> thx

[09:39:15] <Ceunincksken> Hello all, I'm trying to setup a sharding environment, but in a proof of concept, adding a new shard simply slows down the insert performance. How is that possible?

[09:40:06] <ravenx> could be a poorly picked shard key

[09:40:27] <Ceunincksken> I've picked a hashed value of the '_id' key, so seems good to me.

[09:41:21] <Ceunincksken> I'm using 5 shards and the sharded collection takes rougly 2000 documents per collection. This is correct since in total I'm importing 10.000 documents. So according to me, the shard key is correctly chosen, or should I pick something else?

[09:41:41] <ravenx> that may be good

[09:41:54] <ravenx> as in your chosen key** is good

[09:42:46] <Ceunincksken> So, what could be the problem then of my performance drop on inserts?

[09:43:25] <ravenx> journalling, writeconcerns or index creation maybe

[09:43:29] <ravenx> normally i create the index after i insert.

[09:44:09] <Ceunincksken> Sorry but I don't quite understand. I have an empty collection in which I import critical data, so journalling must be set to true to ensure that the data is being imported.

[09:44:39] <Ceunincksken> And I can't create the index after I insert because sharding does need an index. Any suggestions?

[09:54:36] <Ceunincksken> It's really driving me crazy. I tought that insert speed should go up when using sharding on Mongo but instead it goes down :-(

[09:54:38] <ravenx> i'm out of suggestions

[09:57:30] <Ceunincksken> Anyone else who has some ideas?

[10:05:14] <Ceunincksken> I guess I'm on my own :-)

[11:18:59] <Derick> freenode is not having a good day today

[11:22:58] <StephenLynx> ghostbin is also fugged

[11:23:02] <StephenLynx> so as 8ch it seems

[11:41:03] <Ceunincksken> Is there a sharding specialist available here?

[11:43:13] <extrememist|pc> gahh, installing mongodb 3.0.9 keeps failing -.-

[12:53:05] <Tomasso> i do @gateway = Net::SSH::Gateway.new('MYIP', 'root', :password => 'pass') @gateway.open('MYIP', 27017, 27018) @db = Mongo::Client.new([ 'localhost:27018' ], :database => 'mydb')

[12:53:41] <Tomasso> and was working.. .until it seems that the server crashed,, or not properly shutdown

[13:01:19] <Ceunincksken> Is there anymore who can help me regarding a Sharding issue?

[13:04:05] <kurushiyama> Shoot

[13:04:53] <Ceunincksken> I have setup 10 virtual servers on a single host. Each of this server does have 512 Mb Ram, 1 CPU (1 Core) and 4 Gb disk space (Proof of Concept).

[13:05:05] <kurushiyama> Tomasso: Sorry, I dont know poo about Ruby, except that I refused to learn it. Just plain deduction from my side ;)

[13:05:10] <Ceunincksken> I've created 3 configuration servers, 1 router, and 6 shards.

[13:05:23] <kurushiyama> Ceunincksken: So far, so good.

[13:05:38] <Ceunincksken> I have a collection which I'm sharding on a hashed value of the '_id' field.

[13:05:50] <kurushiyama> Ceunincksken: So far, so suboptimal.

[13:06:04] <Ceunincksken> The sharding key seems to be good. When I insert 60.000 documents, each shard does roughly contain 10.000 documents.

[13:06:44] <Ceunincksken> And now the problem, if I remove alet's say 4 shards (so working with 2 shards), I don't see a performance drop in Mongo. On the other hand, when I'm adding 5 more shards bringing the total on 11, I don't gain any performance gain.

[13:07:04] <Ceunincksken> In fact, using a single server (non-sharded) is performanter than using a sharded environment (with 6 servers). How can that be the case?

[13:07:10] <kurushiyama> Ceunincksken: How should there be a agin

[13:07:23] <kurushiyama> All shards share the same underlying storage

[13:08:10] <kurushiyama> Clearly making it the limiting factor of your installation.

[13:08:25] <Ceunincksken> But it's not a single disk.

[13:08:35] <kurushiyama> Ceunincksken: But the same controller.

[13:08:40] <Ceunincksken> The host have hundreds of TB available.

[13:08:46] <Ceunincksken> Probably the same controller yeah.

[13:08:54] <kurushiyama> Ceunincksken: we are talking of IOPs, not space

[13:09:14] <Ceunincksken> But is my assumption true than then I cannot get a performance gain in a virtual environment when the storage is on the same controller?

[13:09:19] <kurushiyama> And from what I hear, your cluster is not the only setup running on this host

[13:09:42] <Ceunincksken> That's correct. We do have a single host server on which various applications are running.

[13:09:48] <kurushiyama> Ceunincksken: You can not get a performance gain in any virtual environment unless you really know what you are doing.

[13:09:56] <Ceunincksken> Ok, clear.

[13:11:06] <Ceunincksken> One additional question to test if my setup is rigt. Would I see a performance gain when I'm installing Mongo on my local machine and adding 1 extra disk? 1 shard would then be on the same space as my OS while the other shard would be on a seperate disk.

[13:11:39] <kurushiyama> Ceunincksken: As a rule of thumb: have one mongod on one machine.

[13:12:12] <kurushiyama> Ceunincksken: RAM is an important factor. WT per default takes 50% of the available RAM just for caching.

[13:13:00] <Ceunincksken> How about installing 2 virtual servers on my local machine where each server does get it's own physical hard disk. Would than give me a performance gain? (Again, this is just for testing purposes).

[13:13:01] <kurushiyama> You dont want your instances battling for resources.

[13:13:24] <kurushiyama> Ceunincksken: In general, I suggest testing against a replset

[13:13:58] <Ceunincksken> Why testing again a replica set to see if sharding does get an performance gain?

[13:14:37] <kurushiyama> Ceunincksken: That's a bit like testing wether MongoDB can store documents

[13:14:53] <Ceunincksken> Ok, I understand.

[13:15:01] <kurushiyama> Ceunincksken: Yes, you do get a performance gain – if you provide more performance, and not sharing the existing.

[13:15:46] <Ceunincksken> Ok, but in a Virtual Environment, like our customer, each mongod instance should have it's own storage controller. Is that assumption correct?

[13:16:21] <kurushiyama> Ceunincksken: If IOPs are your limiting factor – yes.

[13:16:41] <Ceunincksken> Ok, but how can I find out if IOPS are the limiting dactor?

[13:16:46] <kurushiyama> MMS

[13:17:01] <kurushiyama> Or cloud manager, as it is called nowadays

[13:17:35] <kurushiyama> Ceunincksken: Or any other monitoring tool.

[13:18:01] <kurushiyama> Ceunincksken: you need to create a baseline, finding out what the max IOPs are your "disks" can take

[13:18:21] <Ceunincksken> Sorry but I'm quite new to this and I'm a bit confused.

[13:18:47] <kurushiyama> Ceunincksken: Now if your monitoring tool reports that you are constantly hitting this limit, you know your IOPs are the problem.

[13:19:21] <Ceunincksken> Would you mind helping me a bit with this?

[13:19:23] <kurushiyama> Ceunincksken: Ok, it is clear what to use for HW monitoring?

[13:19:31] <kurushiyama> Ceunincksken: Yure.

[13:19:34] <kurushiyama> Sure

[13:20:02] <Ceunincksken> So I'm running on Mongo. Is there any free monitoring tool that I can install to measure the IOPS?

[13:20:36] <Ceunincksken> On Ubuntu*

[13:21:10] <kurushiyama> Ceunincksken: This gets a bit out of scope. That's – sorry – basic system administration.

[13:21:29] <kurushiyama> Ceunincksken: let me check

[13:21:30] <Ceunincksken> I understand.

[13:22:00] <Ceunincksken> I know that this might not be the correct place but it's quite important and I'm searching already for 3 weeks why I'm having this behaviour.

[13:23:10] <kurushiyama> Ceunincksken: This is getting OT, lets go to private

[14:40:48] <Zelest> someone got bored of irccloud :o

[15:11:48] <kashike> is there a more in-depth upgrading guide for java driver 2 to 3? inserting/updating/querying is what's blocking me, specifically

[15:17:48] <cheeser> more in depth than?

[15:25:40] <kashike> cheeser: http://mongodb.github.io/mongo-java-driver/3.0/whats-new/upgrading/

[15:31:21] <Jester01> Hey folks, mongodb takes minutes to start up, burning cpu. Is this normal and can something be done about it? (v3.0.2, linux, mmapv1)

[15:34:51] <cheeser> kashike: ok, good. just making sure you'd seen that. what's missing for you?

[15:45:56] <kashike> cheeser: one moment, I'll re-open the project and branch and find the troubles I was having again

[15:50:39] <kashike> cheeser: hm, interesting - perhaps I was using an older v3 edition, but methods seem to be back that were missing

[15:50:51] <kashike> these were missing, along with others, before: http://api.mongodb.org/java/2.13/com/mongodb/DBCollection.html#save-com.mongodb.DBObject- http://api.mongodb.org/java/2.13/com/mongodb/DBCollection.html#update-com.mongodb.DBObject-com.mongodb.DBObject-

[15:57:25] <cheeser> kashike: it's possible we might've axed too much accidentally and had to course correct :)

[15:58:27] <kashike> looks like that might be the case - super glad to see these methods back, I ran into so many headaches with the bson Document stuff

[15:58:55] <cheeser> :)

[16:05:59] <GothAlice> Anyone know of any MongoDB ODM/DAO in any language that handles multi-lingual documents well?

[16:06:22] <GothAlice> https://gist.github.com/amcgregor/683cbb3e219f7b36fb96?ts=4 < it's proving to be an interesting challenge for my own DAO.

[16:06:24] <cheeser> spoken language I presume?

[16:06:50] <cheeser> oh! to track versions of each value in different languages?

[16:07:18] <GothAlice> Correct, basically to pivot into a per-language pool of sub-documents efficiently.

[16:08:26] <GothAlice> Most ODMs I've encountered handle translated strings _very_ badly. I.e. turning the field into a mapping of {lang: "String"} or individually into arrays of per-language sub-documents, which… eliminates any possibility to efficiently pull out a document in a specific language.

[16:09:50] <StephenLynx> +1 for ODMs being cancer

[16:10:18] <StephenLynx> kek

[16:10:29] <GothAlice> StephenLynx: Don't worry, though, mine isn't meant to be a layer between your app and pymongo, it's meant to augment your use of pymongo. ;)

[16:11:04] <StephenLynx> meh, I don't bother too much with what happens outside my neighborhood.

[16:11:17] <StephenLynx> except PHP

[16:11:24] <StephenLynx> screw it hard

[16:11:37] <GothAlice> Heh. My query builder utility is a third the complexity of others I've dissected to investigate approaches.

[16:12:00] <GothAlice> StephenLynx: https://gist.github.com/amcgregor/6ddbda735e6ded267d31 :3

[16:13:24] <StephenLynx> I don't particularly like the idea of query building.

[16:13:33] <StephenLynx> that is, generic query building.

[16:13:58] <StephenLynx> if its a specific query for a specific operation then you might as well just take in the values that replace certain parameters of the query

[16:14:02] <GothAlice> Once I teach the Op and Ops classes how to evaluate themselves, I'll have a complete in-Python version of the MongoDB query format, allowing things like validation before sending to MongoDB, execution of aggregate pipelines in-Python, etc.

[16:14:04] <StephenLynx> like I started doing with sql based dbs.

[16:14:21] <StephenLynx> I also started distancing myself from OOP

[16:14:27] <StephenLynx> specially for web back-ends.

[16:14:38] <GothAlice> OOP is a structural choice, not much more.

[16:15:09] <GothAlice> In my case I need Python-side validation of Mongo queries (validation schema) to validate MongoDB update operations themselves. It's weird and meta.

[16:15:37] <GothAlice> (I'm attempting to export .find() and .update() to the web in a safe, schema-defined way.)

[16:15:55] <StephenLynx> meh

[16:16:09] <StephenLynx> I just write a txt and screw anyone that doesn't adhere to it.

[16:16:11] <StephenLynx> git gud

[16:16:14] <GothAlice> XP

[16:16:49] <StephenLynx> luckily im' the sole contributor :v

[16:17:27] <StephenLynx> a dude running my engine decided to fight spam on his own and wrote a script to prune spam.

[16:17:38] <StephenLynx> of course he broke it v:

[16:17:41] <StephenLynx> not too much, though.

[16:18:01] <StephenLynx> he could repair after we figured out how the broke the db.

[16:18:08] <GothAlice> Not just piped through Spam Assassin?

[16:19:51] <StephenLynx> whats that

[16:20:04] <StephenLynx> and he could have used one of the various tools my engine offers

[16:20:07] <StephenLynx> but nope

[16:20:07] <GothAlice> It's a multi-policy spam filter system.

[16:20:17] <GothAlice> (Bases filtering, pattern matching, black/whitelists, etc.)

[16:20:23] <GothAlice> s/Bases/Bayes/

[16:20:29] <StephenLynx> reading docs is too hard, lets just manipulate the db directly

[16:20:32] <StephenLynx> is not e-mail

[16:20:46] <StephenLynx> is a forum system, I don't think this system would be able to handle it.

[16:20:47] <GothAlice> Bayesian filtering works well for classification of most types of data.

[16:21:01] <GothAlice> And what is a forum post but a really lame e-mail?

[16:21:02] <GothAlice> ;P

[16:21:10] <StephenLynx> the thing is the format of fields

[16:21:26] <StephenLynx> my system takes even base64 files on the json RPC

[16:22:02] <GothAlice> Hmm; that reminds me, my MIME Message wrapper needs to handle pre-base64'd files. ¬_¬

[16:27:13] <exonintrendo> Anyone free to answer a 'schema' question?

[16:45:53] <kurushiyama> exonintrendo: Shoot

[16:47:33] <exonintrendo> I'm fairly new to mongo, I've used it once or twice but not necessarily to create the 'structure' for a new application. I'm working on building a program that does basic monitoring on scripts that have run. I have the idea of a Group, Item (items belong to a group), and an Alert (alert is specific to each item, alert or email user if the item's value passes a threshold). I'm not sure if these should all be their own documents or if they should all

[16:47:33] <exonintrendo> just be nested within eachother in one Group document.

[16:48:34] <exonintrendo> Hopefully that makes sense.

[16:48:42] <kurushiyama> Ok, first, read this: http://blog.mahlberg.io/blog/2015/11/05/data-modelling-for-mongodb/

[16:49:03] <kurushiyama> Which explains why you should not use embedding in your use case.

[16:49:13] <exonintrendo> Ok

[16:49:41] <exonintrendo> I'll read, but my initial though (which may also be wrong) was to have a Group contain an array of Item IDs, and Items contain an array of Alert IDs. Would that be correct?>

[16:50:10] <StephenLynx> that. embedding is a valid tool, but abusing it is worse than dealing with multiple collections.

[16:50:39] <exonintrendo> And is it fair to think of a 'collection' as a 'table' in a SQL context?

[16:51:17] <cheeser> more or less

[16:51:25] <exonintrendo> ok

[16:51:32] <exonintrendo> I'll read up on this post, thanks for the quick info!

[16:51:41] <StephenLynx> it is your main data aggregation layer, after the db, like in sql.

[16:52:30] <StephenLynx> but thats pretty much the only similarity, afaik.

[16:53:38] <StephenLynx> data is not divided into lines that each occupy the same space and follow the same model, you don't reference them with foreign key, they can store complex data structures

[16:54:27] <kurushiyama> exonintrendo: After that, we need a little more info. What is an Item? Which fields would define an Item? Are there just more than a couple of dozen items per Group? etc.

[16:55:51] <exonintrendo> kurushiyama: as i said its a basic monitoring service I'm writing for myself. Let's say you have a group of 'Sites' which you want to have a script ping each item in the group. Let's say I have example.com as an item, and it has an alert of 300 seconds (that is: if the item's time doesn't get updated in the database after 5 minutes, it will send an alert).

[16:56:51] <exonintrendo> So a group is mainly a title, description, and a list of items, an Item is a title, type (value which could be age, or a numerical value) and it has a set of alerts that trigger on that value. So maybe in the case of the ping, I email myself after 5 minutes, but a second alert triggers after 10 minutes that sends me a text message

[16:59:03] <kurushiyama> In that case, I'd probably use a Group collection, an item collection and a Ping collection. Then finding an alert is pretty easy using aggregations.

[16:59:21] <exonintrendo> Ok, cool

[16:59:32] <kurushiyama> exonintrendo: Or even simple queries.

[17:01:22] <kurushiyama> db.pings.find({group:"fooGroup",date:{$gte:nowMinusThreshold}). If the result set is empty, you have an alert. Note that this is just one way of doing it.

[17:02:44] <exonintrendo> Oo

[17:03:53] <kurushiyama> exonintrendo: s per the query: of course you can use item instead of group, or a combination, or run aggregations and save all found alerts to an "alerts" collection...

[17:16:00] <kalx> Hi all. Is there anyway to insert a record into mongo with a field containing the current db server date/time? I saw there is a $currentDate operator but seems this is for updates only.

[17:16:17] <cheeser> new Date()

[17:16:19] <StephenLynx> new Date() or the equivalent on our language

[17:18:06] <kalx> So the date must be generated by the on the caller's side? No mechanism for Mongo generating it? (similar to mysql's NOW() as an example)

[17:18:28] <cheeser> dates are stored in UTC so i doubt you'll have a problem.

[17:24:09] <kalx> thx. It's more just to guarantee accuracy in case a server's time is not in sync with other servers (even seconds can matter).

[19:37:52] <DrPeeper> hi

[19:49:24] <matthias__> i, how can i efficiently get about 25 random item set from a collection with ~ 500.000 items ? (with optional search parameters)

[19:53:51] <DrPeeper> you want to find 25 random documents from a collection?

[19:54:19] <matthias__> DrPeeper: yeah

[19:54:26] <DrPeeper> http://stackoverflow.com/questions/2824157/random-record-from-mongodb

[19:54:58] <matthias__> DrPeeper: i'm doing it already this way, but it is damn slow

[19:55:12] <matthias__> excpecially with a non empty find query

[19:55:25] <StephenLynx> getting 25 random documents is

[19:55:26] <DrPeeper> I have a collection of ~2M documents, and each of the documents has a common field that can contain one of ~30 values

[19:55:27] <StephenLynx> bad.

[19:55:46] <StephenLynx> to say the least.

[19:55:49] <StephenLynx> what is the use case for that?

[19:55:55] <DrPeeper> i'd like to index that, but instead should I have a second collection?

[19:56:31] <StephenLynx> I think your query is slow because skips are slow.

[19:56:34] <DrPeeper> StephenLynx: random sampling is genearlly used for research if the sample is respective of the population

[19:57:01] <DrPeeper> but in the case of a database, you can do quantitative research on the entire population, probably faster than taking the initial sample...

[19:57:11] <matthias__> i have a large database of image-urls and i want to pick 25 randomly

[19:57:26] <StephenLynx> and why random?

[19:57:34] <StephenLynx> why do you need that?

[19:57:36] <DrPeeper> tired of looking at the same porn

[19:57:39] <StephenLynx> kek

[19:57:58] <matthias__> user should not get the same ones every time

[19:58:01] <StephenLynx> why

[19:58:22] <DrPeeper> more importantly, why do you have 500,000 image uris?

[19:58:23] <DrPeeper> :)

[19:59:16] <matthias__> DrPeeper: i crawled tumblr blogs of landscape pictures and each user can rate them.

[19:59:16] <DrPeeper> honestly the query shouldn't take more than.. a few milliseconds

[20:00:12] <DrPeeper> find the total number of documents, find a random number between 0 and n-1

[20:00:42] <StephenLynx> the problem is the skip

[20:00:55] <StephenLynx> skips are slow

[20:01:22] <DrPeeper> what do you mean skips

[20:01:56] <StephenLynx> skip()

[20:02:11] <DrPeeper> yeah, could have just put a sequential key in 0..n

[20:02:20] <starfly> you could have one app thread compile a list of random IDs and stick them in a collection, then another thread (the one you want to perform) perform the finds with a covering index

[20:03:55] <DrPeeper> yeah, the compute would be (n log (n))(log (n))

[20:04:16] <DrPeeper> for the sort + search

[20:04:24] <DrPeeper> but really

[20:04:27] <DrPeeper> just n log n

[20:11:40] <DrPeeper> so should I create that second collection, or should I index the element?

[20:12:15] <kurushiyama> I am not convinced skips are slow. I tried it on a 10M docs collection with the empty query, a skip of 5M and a limit of 1m. took just a few msecs

[20:13:07] <kurushiyama> DrPeeper: So it can be _one_ of 30 values, like an ENUM?

[20:14:17] <DrPeeper> kurushiyama: one of 30 three-char alpha .. e.g. RWT, CAE, BBQ

[20:14:29] <kurushiyama> DrPeeper: If it is a more or less fixed number of values, like an ENUM which get's rarely changed (if at all), I'd index said field.

[20:14:44] <DrPeeper> perfect

[20:14:47] <kurushiyama> If the enum gets expanded, those are just additional values.

[20:15:00] <DrPeeper> yeah it is a very fixed number.. unless congress or FCC changes them :)

[20:15:16] <kurushiyama> However, if that values are subject to runtime changes... Thats a different story. FCC?

[20:15:36] <DrPeeper> they aren't subject to change

[20:15:50] <DrPeeper> yeah

[20:15:57] <DrPeeper> i'll find a danish equiv :)

[20:16:10] <kurushiyama> German, for that matter ;)

[20:16:24] <DrPeeper> oh duh, .de :( deutchland

[20:16:48] <kurushiyama> DrPeeper: Almost. DeutSchland

[20:16:54] <DrPeeper> same thing :)

[20:17:11] <DrPeeper> you say tcheuß, I say chews

[20:17:13] <DrPeeper> :)

[20:17:17] <kurushiyama> DrPeeper: Yeah. we are killing babys... o.O

[20:17:45] <DrPeeper> Bundesnetzagentur

[20:18:11] <DrPeeper> FCC is similar to the telecommunications piece of BNA

[20:18:22] <kurushiyama> Ah, ok.

[20:19:54] <kurushiyama> Well, the good thing about agencies and federal regulations is that even when they announce a change now, it is likely that we are going to be retired by the time they are implemented.

[20:20:37] <DrPeeper> yes :(

[20:22:33] <kurushiyama> DrPeeper: But to answer your question: Yes, use an enum (or the according structure of the lang you are using) for those values, simply write them to the collection, and have that field (compound)indexed, if necessary.

[20:23:00] <DrPeeper> thanks kurushiyama!

[20:23:30] <kurushiyama> DrPeeper: However, you should not use this field as a shard key, of course.

[20:24:55] <DrPeeper> i have to learn what a shard is :)

[20:25:10] <DrPeeper> brb

[21:31:51] <micom> hello guys, its really a beginner question, l use mongodb 3.2 and the default engine is wiredtiger, there is a memory engine but in beta stage, in my production code I cannot accept beta software. Is placing the database in ram disk a good idea with wiredtiger?

[21:32:07] <micom> i mean on the partition create in RAM

[21:33:46] <cheeser> the memory file store is not intended for production use at all

[21:34:17] <micom> yes, i mean to use wiredtiger but place the database files in the partition which is linux ram partition

[21:34:22] <micom> cheeser, ^

[21:35:05] <micom> the question is if this is a common patter or a good idea

[21:35:32] <micom> i do not care about the possible data lost in my case its not a problem

[21:35:32] <cheeser> oh, that's probably a bad idea.

[21:36:25] <micom> second one, i use bulb insert, and I have noticed it is limited only to use one CPU core, is it a normal behavior?

[21:36:38] <cheeser> sounds iffy

[21:37:36] <micom> mpstats shows that one core load is like 90% and the second one is idling

[21:51:17] <kurushiyama> micom: Sure you are using WT? I have seen this behavior on mmap...

[21:51:49] <micom> kurushiyama, sure, 100% checked today

[21:52:20] <micom> maybe if there is only two of them, mongo is using only one

[21:52:24] <micom> need to upgrade VM

[21:52:47] <kurushiyama> micom: Now that _is_ a horrible idea.

[21:53:06] <micom> VM is only for my testing

[21:53:14] <micom> in production env i do use docker

[21:53:21] <kurushiyama> micom: Falsifying your tests ;)

[21:53:41] <micom> nothing is perfect in this world :(

[21:54:34] <kurushiyama> As per your in memory db. Have to think about it, but with mmap this was possible, iirc.

[21:55:27] <micom> yes, it is possible with WT as well, tested it today, but I am a little afraid of some kind of undefined undefined behaviours

[21:55:45] <micom> that's why Im asking experts(you)

[21:56:37] <kurushiyama> micom: Well, let me put it that way: I would not advice using a hammer as a scredriver. If you need a hammer, get a hammer.

[21:56:56] <kurushiyama> if you need a screwdriver, get a screwdriver

[21:57:10] <micom> ok I will stick to HDD and use SSD

[21:57:34] <kurushiyama> Either of them might be used to do the other tools job, but it will s...

[21:57:47] <kurushiyama> So write speed is your concern?

[21:59:09] <micom> yes, I do load to database ~5GB - 10GB in one bulk, it must be as fast as possible

[22:00:58] <kurushiyama> But then you dont persist it? ;) First, you dont load in a 10GB bulk.

[22:01:08] <kurushiyama> you do bulks in operations of 1k

[22:01:32] <kurushiyama> batches of 1k operations that is.

[22:01:39] <micom> then I mostly read it (90% of time)

[22:01:57] <micom> yes bulk insert is splitting them into smaller groups

[22:03:01] <kurushiyama> Aye, and the writes to disks are done concurrently.

[22:03:01] <micom> now the main issue is CPU but I will overcome it tomorrow(by moving to othe hardware) and the IO will be the main problem

[22:03:17] <kurushiyama> CPU? You use compression?

[22:03:29] <micom> turned off

[22:03:53] <kurushiyama> journal + block? That should eliminate basically every CPU load.

[22:04:13] <micom> hmm will turn them off tomorrow, if not turned off by default

[22:04:17] <micom> also i do use ext4

[22:04:22] <kurushiyama> bad idea

[22:04:25] <micom> why?

[22:04:34] <kurushiyama> because it is slower than xfs

[22:04:48] <micom> hmm ok, thanks that's a great advice

[22:05:06] <kurushiyama> disable atime in fstab

[22:05:20] <kurushiyama> and mtime, maybe

[22:06:14] <kurushiyama> furthermore, you want to use striping in case you use a replset, RAID10 or 01 in case not.

[22:07:09] <micom> that will be the next step, its only <10GB i guess i will go down to ~5 minutes and it should be acceptable

[22:07:23] <micom> thanks, need to note all of that down

[22:07:54] <micom> I'm have become DB adminexpert by mistake ;) I'm a programmer by experience

[22:08:32] <kurushiyama> micom: or come back. but 10GB in 5 minutes... Stream data? Like files or sth?

[22:08:48] <micom> files

[22:09:08] <micom> actually like 10 of them, all on only one partition

[22:09:26] <kurushiyama> I'd do an streamcopy on those to temp files and load them into the database later.

[22:09:53] <kurushiyama> eliminating the DBs overhead.

[22:10:45] <micom> after the data is loaded i kick in the application, its not possible to do it in "lazy" way

[22:11:02] <micom> application from the very beginning is using db

[22:11:14] <micom> s/is using /uses

[22:11:38] <kurushiyama> micom: Well, you know your use cases...

[22:11:48] <micom> yea, thanks

[22:12:54] <neybar> is there a way to tune mongo to have the id use a timestamp with microseconds? I'm doing api logging, and I'd rather not add an additional field since the id already has most of the info I want.

[22:13:26] <cheeser> neybar: there isn't

[22:18:38] <kurushiyama> neybar: however, ISODates are cheap being just an int64

[22:18:49] <kurushiyama> (more or less)

[22:19:03] <cheeser> but don't track microseconds afaik

[22:20:04] <kurushiyama> cheeser: Ah, right, milliseconds.

[22:21:10] <cheeser> 3.4 will like introduce what we're calling a "high precision date" but that's months away at best

[22:21:48] <kurushiyama> cheeser: Leak! ;) Good to know, however.

[22:22:19] <cheeser> "make no business decisions blah blah blah" </safeharbor>

[22:32:00] <neybar> lol

[22:34:43] <kurushiyama> I'll plan my world domination as a benevolent dictator based on that feature! Now I only have to find how to leverage it for taking over the world... ;)

[22:35:15] <cheeser> all you need is ... wait for it ... time!

[23:16:07] <silasx> So … I have one collection with an indexed (obviously) id field of 12 bytes. I have another one with an indexed binary field of 12 bytes. .skip() on the latter is much much slower despite it having the same .explain() strategy. What could I be doing wrong? There’s no reason such iteration should be slower, right? Bytes are bytes. I’ve thought of just making the latter itself be an ObjectID field instead of binary but was told that ObjectIds are t

[23:16:08] <silasx> to have semantic information and thus are dangerous to use. So ...

[23:17:40] <StephenLynx> from what I know about skips

[23:17:44] <StephenLynx> is not about indexes

[23:18:02] <StephenLynx> is about mongo not being able to tell how much data it has to skip unless it checks for each document.

[23:22:51] <kurushiyama> StephenLynx: Nope.

[23:23:10] <kurushiyama> That was stated earlier this day, so I ran a test

[23:23:36] <kurushiyama> On a collection with 10M documents skipping the first 5M

[23:23:47] <StephenLynx> https://docs.mongodb.org/v3.0/reference/method/cursor.skip/

[23:23:50] <StephenLynx> is not what the docs say

[23:24:04] <StephenLynx> and the page remains the same for 3.2

[23:26:44] <StephenLynx> sure, if you order by an indexed field and then skip

[23:26:52] <StephenLynx> the db will have a clue

[23:27:56] <silasx> but that’s the thing, we had really slow skips for indexed attributes.

[23:28:23] <StephenLynx> were you sorting by the index?

[23:28:26] <silasx> Fields I mean. We thought it was because they were long strings (~80 char), but even when we turned them into 12-byte binary fields, they were just as slow

[23:28:33] <silasx> yes, sorting by index

[23:29:03] <silasx> ObjectIds are 12 bytes too, but (as primary keys at least) miraculously fast

[23:33:55] <kurushiyama> StephenLynx: But the index is traversed, if present, which was basically my point. So it is not necessarily the documents which are traversed.

[23:34:13] <StephenLynx> ah

[23:34:39] <StephenLynx> true

[23:34:56] <StephenLynx> maybe the engine does some optimization of its sorted by _id

[23:35:15] <StephenLynx> and you haven`t disabled automatic ids

[23:35:47] <kurushiyama> wed need to see the indices, the query and may be some example data. Order still matters, after all.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 17th of February, 2016