PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 15th of April, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:11:47] <Gevox_> can i talk?
[02:11:55] <Gevox_> OH i can, lol i was testing
[06:26:40] <al-damiri> Hi #mongodb.
[06:26:54] <al-damiri> I'm new to MongoDB. I type `mongo` in my terminal and it says connecting to test.
[06:26:58] <al-damiri> Now is test a database?
[06:27:05] <al-damiri> How do I see what other databases are available?
[06:27:20] <al-damiri> When I do `show databases`, it shows local and one another database but not test.
[06:30:13] <al-damiri> edxapp@precise64:~$ mongo
[06:30:13] <al-damiri> MongoDB shell version: 2.4.7
[06:30:13] <al-damiri> connecting to: test
[06:30:13] <al-damiri> > show databases;
[06:30:13] <al-damiri> edxapp 0.203125GB
[06:30:15] <al-damiri> local 0.078125GB
[06:32:21] <Boomtime> you have connected to the 'test' database, but it is ephemeral until you do something with it
[06:32:44] <Boomtime> once you insert a document, or create an index, or apply some other persistent action, it will show up
[06:39:39] <al-damiri> Boomtime: Okay. Fine.
[06:40:13] <al-damiri> Another question if I only type `mongodump` will it backup the all the databases (with users/roles/passwords etc)?
[06:43:38] <joannac> al-damiri: yes, as documented here http://docs.mongodb.org/manual/reference/program/mongodump/#cmdoption--dumpDbUsersAndRoles
[06:44:20] <al-damiri> joannac: Great, thanks.
[07:19:11] <al-damiri> Hi #mongodb.
[07:19:19] <al-damiri> When I do `show users` it doesn't show any users, why?
[07:19:35] <al-damiri> Even though I'm able to connect to mongo using `mongo`.
[07:19:59] <joannac> do you have users defined?
[07:20:29] <al-damiri> joannac: I haven't done any so. I thought there would be some default users?
[07:20:32] <al-damiri> No?
[07:20:33] <joannac> nope
[07:21:01] <al-damiri> joannac: Oh, I see. Thanks. Let me see then how can we create a user. Thanks :)
[08:39:41] <grepwood> hi everyone
[08:40:11] <grepwood> I wanted to write an automated setup for a mongodb shard
[08:40:36] <grepwood> so I need to know how to execute rs.conf() from bash shell on mongo
[08:41:02] <grepwood> I'm doing `mongo --host localhost --port 27017 --eval "rs.conf()"` and I don't get anything useful
[08:41:25] <grepwood> and I can't find anything helpful in the documentation apart from writing javascript files and then feeding them into mongo
[08:41:37] <grepwood> is there seriously no other way to do something like this?
[08:52:56] <grepwood> figured it out
[08:53:12] <grepwood> mongo --host $host --port $port <<< "co.mmand()"
[10:25:52] <fontanon> Hi everybody, is it possible to configure a cluster of 2? Imagine I have two hosts (server1 and server2), this is what I get when I try to launch mongos: "BadValue need either 1 or 3 configdbs" $ mongos --configdb server1:27019,server2:27019
[10:26:15] <Derick> no, you need 1 or 3 config servers
[10:28:30] <fontanon> Derick, that's what I understood ... but why?
[10:28:56] <Derick> that's just how it is.
[10:29:12] <Derick> with two nodes, it's difficult to say which one is "right"
[10:29:18] <Derick> with three, there can be consensus
[10:29:30] <Derick> with one, there is only "one truth" (but never do this in production)
[10:30:08] <fontanon> Derick, so three machines are mandatory for a mongodb cluster
[10:31:00] <Derick> fontanon: three config servers, yes
[10:31:50] <fontanon> Derick, but can I have 3 config servers and only 2 shards ?
[10:32:29] <joannac> sure
[10:33:50] <fontanon> so, in order to have two shards in separated servers, I would need an additional server to use as the third config server
[10:34:18] <Derick> well, in production, each shard should have two datanodes *and* either another data node or an arbiter
[10:34:26] <Derick> each shard should be a replicaset (in production)
[10:35:32] <fontanon> maybe I'm misunderstanding the difference between a shard and a datanode
[10:36:22] <Derick> http://www.kchodorow.com/blog/wp-content/uploads/2010/03/sharding.png
[10:37:13] <fontanon> Nice graph
[10:37:29] <Derick> mongod = data node
[10:37:37] <fontanon> Derick, why three mongod per shard?
[10:37:46] <Derick> because each shard is a replicaset
[10:37:53] <Derick> so that you have failover capabilities
[10:39:54] <fontanon> Derick, but I'm not trying to replicate data in different machines but setting a 2-tier schema with tag aware sharding: http://blog.mongodb.org/post/85721044164/tiered-storage-models-in-mongodb-optimizing
[10:40:18] <Derick> fontanon: where are you going to run this?
[10:40:30] <Derick> http://docs.mongodb.org/manual/core/replication-introduction/ has a good introduction about replica sets though
[10:41:00] <fontanon> Derick, I expected to run that on 2 machines, one with ssd for recent records, another with massive hdd for older records.
[10:41:19] <fontanon> I'll have a look at that link ..
[10:41:34] <Derick> and how would you use the tags for that?
[10:41:51] <Derick> and *where* are you running this, your own data centre, or?
[10:41:52] <fontanon> As the blog post says: date as tag
[10:42:00] <fontanon> My own data centre, yes
[10:42:19] <fontanon> on digital ocean
[10:42:23] <Derick> you really should have replication though for failover, because if a node goes down - so does your app
[10:42:32] <Derick> digital ocean, isn't that a cloud provider?
[10:43:31] <fontanon> Yes, on DO, is a cloud provider, forget what I said about my own data centre (I didn't understand the question).
[10:44:04] <Derick> in production, you really should use a replicaset (one for each shard), unless your data isn't important
[10:44:15] <Derick> or you don't care about availability at all
[10:45:06] <fontanon> Understood. So 4 machines would be a good schema? 2 for the shards and another 2 for the replicaset ?
[10:45:19] <Derick> no no, each shard *is* a replicaset
[10:45:55] <Derick> so two shards, means you have two replicasets. Each replicaset should be 3 nodes (2 data + 1 arbiter, or 3 data)
[10:46:56] <fontanon> each node is a different machine ?
[10:47:00] <Derick> yes
[10:49:28] <fontanon> Derick, so 6 nodes (6 machines) are recommended for my sharding schema.
[10:49:46] <fontanon> For having the 2-tier sharding plus replication.
[10:51:29] <Derick> fontanon: and then 3 config servers
[10:52:10] <fontanon> should I run the 3 config servers in another 3 different machines ?
[10:52:36] <Derick> in a real world production environment, yes
[10:52:53] <Derick> they certainly should all be on a different physical machine
[10:53:19] <Derick> but you can probably get away by putting them on the same physical machine as an arbiter
[10:53:40] <fontanon> the three config servers into the same physical machine ?
[10:53:59] <Derick> no
[10:54:10] <Derick> that defeats the point of spreading them out
[10:54:24] <Derick> think about stuff breaking
[10:54:34] <Derick> if one config server goes down, you have two left
[10:54:36] <fontanon> then I didn't understand your last comment: "<Derick> but you can probably get away by putting them on the same physical machine as an arbiter"
[10:54:50] <Derick> fontanon: ah, on one node: 1 arbiter + 1 config server
[10:55:07] <fontanon> oh, ok!
[10:55:25] <fontanon> Is the arbitrer the same as the mongo query router?
[10:55:36] <Derick> no
[10:55:46] <fontanon> the mongos
[10:55:48] <fontanon> I meant
[10:55:50] <Derick> that's a "mongos" node, which you typically put on the machine that runs your application code
[10:56:52] <fontanon> ok, the mongos on the same machine as my app, got it
[10:56:59] <fontanon> but what's the arbitrer then?
[10:57:53] <Derick> a replicaset needs 3 nodes
[10:58:41] <Derick> you can either pick: 3 data nodes (mongod), or, 2 data nodes and one arbiter. An arbiter is a lightweight process, that does not store data - but it does help out in case a data carrying node goes down
[10:58:57] <Derick> they're basically a judge to say which of the two data nodes is the "primary" node
[10:59:45] <fontanon> which mongo command is related with the arbitrer ?
[11:00:31] <fontanon> doesn't mind I reached here: http://docs.mongodb.org/manual/tutorial/add-replica-set-arbiter/
[11:00:56] <Derick> you start it up as a normal mongod node, but add it to a replicaset set with addArb isntead of add
[11:01:00] <Derick> right - that URL
[11:01:03] <fontanon> :)
[11:11:58] <KekSi> also start it with --smallfiles and --nojournal and the one you should connect to to use rs.addArb() is the PRIMARY of the rs
[11:14:33] <fontanon> Derick, really thank you for you help. You've opened my eyes :)
[11:15:29] <Derick> fontanon: np
[12:06:50] <hernil> Hi! I'm quite new to nosql databases and I'm trying to let go of my relational reasoning when modeling my application. I'm building a system with some similarities to Stack Overflow and I'm thinking about tag handling. Should I just keep them in the "question" document or should I normalize it? My main problem with not normalizing is performance concerns regarding searching through all tags (suggestion of tags). Will it be considerable or am I overthinking it
[12:17:32] <StephenLynx> hernil I would keep a list of tags in the question.
[12:17:55] <StephenLynx> is not hard to find all questions containing a certain tag in the list
[12:18:28] <StephenLynx> but then theres indexes and I am not sure how that could be optimized, searching for documents that contains an element on an array.
[12:18:52] <StephenLynx> now for suggestions for tags.
[12:18:57] <StephenLynx> I would pre-aggregate that.
[12:19:47] <StephenLynx> I don't think even SO looks for all posted questions in search of suggestions.
[12:24:45] <hernil> StephenLynx: How would that look? Is this something I would do myself or is this abstracted away by using mongoose as I am now? :)
[12:25:04] <deathanchor> you can also keep a tag collection (indexed by tag) and just store the object id of the question from a questions collection
[12:25:22] <StephenLynx> mongoose is cancerous, you don't want to use it.
[12:25:31] <deathanchor> but that's two lookups and you have to maintain both collections to keep them in sync
[12:40:28] <hernil> StephenLynx: What's wrong woth mongoose?
[12:40:43] <StephenLynx> first: it is useless. it adds zero functionality.
[12:40:53] <StephenLynx> second: it is extremely bad designed.
[12:40:59] <StephenLynx> third: it eats up performance
[12:41:15] <StephenLynx> fourth: it goes way, waaaaay out of the regular mongo workflow
[12:42:39] <deathanchor> what op should I use if want to do a find() where 'string' is matched at the beginning of value?
[12:43:03] <deathanchor> regex? or is there something better? (this is an indexed field)
[12:43:42] <StephenLynx> only at the beginning?
[12:43:48] <deathanchor> yeah
[12:43:58] <StephenLynx> yes, I think $regex is the only think that would work for that.
[12:44:01] <deathanchor> values would be 'stringwithsomemore'
[12:44:06] <StephenLynx> $text would look anywhere in the string.
[12:44:20] <deathanchor> is text or regex faster?
[12:44:25] <StephenLynx> don't know.
[12:44:34] <deathanchor> hmm.. I guess I'll try both
[12:44:37] <deathanchor> thx
[12:44:48] <StephenLynx> don't know how they handle indexes neither, but the documentation should give you that information.
[12:53:55] <kylegferg> Anyone know how to quiet output using the ruby client?
[12:56:01] <deathanchor> using { key : { $in : [ /^string/, /^another/ ] } } works pretty fast
[13:19:15] <kmtsun> Hello everyone
[13:41:43] <deathanchor> anyone know how to write the /^pattern/ equivelent for use with Mongo driver for PHP?
[13:41:58] <Croves> Hello folks, why this isn't working? http://pastebin.com/9C32eLvm
[13:42:03] <Derick> deathanchor: new Regex( "/^pattern/" )
[13:42:07] <Croves> There's a error on my birthday field
[13:42:08] <Derick> sorry
[13:42:13] <Derick> deathanchor: new MongoRegex( "/^pattern/" )
[13:42:17] <deathanchor> Derick: thx!
[13:42:30] <Derick> Croves: what is the error?
[13:42:42] <Derick> Croves: and how / where are you running that?
[13:43:51] <Croves> Hi Derick, I'm running it in MongoLab, when I try to create a new collection
[13:44:09] <Croves> I don't know exactly what's the error, but when I remove 'birthday', it runs fine
[13:50:43] <Derick> Croves: sorry, what's MongoLab?
[13:50:53] <Derick> do you run this on the mongo shell? or in a script?
[14:12:19] <deathanchor> my php mongo query not returning results: help? https://gist.github.com/deathanchor/4b0e1bc5deb4d77e5ce1
[14:14:34] <deathanchor> but in mongo the equivelent query does return results
[14:23:00] <GothAlice> deathanchor: I don't PHP, but what exactly is $dayDate; when pasting code it's a good idea to include enough to reproduce. ;)
[14:26:14] <deathanchor> that part works, when I tested it with just the date
[14:26:25] <deathanchor> it's the key part that doesn't
[14:31:16] <GothAlice> deathanchor: Have you tried the longer-form $regex? (I'm not actually sure if the longer form is compatible with $in usage, though.)
[14:31:36] <GothAlice> deathanchor: A la the first example on http://docs.mongodb.org/manual/reference/operator/query/regex/ (vs. the second example)
[14:31:47] <GothAlice> ({$regex: 'pattern-as-string'})
[14:32:12] <deathanchor> yeah, but I have a long list I need to match... :(
[14:34:13] <deathanchor> I might hit up the php chan
[15:02:09] <fontanon> Hi everybody, I would like to change the ports in my shards configuration, i.e. shards: { "_id" : "shard0001", "host" : "server1:27017" } TO shards: { "_id" : "shard0001", "host" : "server1:27018" }. May I have to remove the shards and add them again?
[15:04:45] <GothAlice> fontanon: That would be the "safest" approach, yes.
[15:05:29] <GothAlice> Since the set name isn't changing, and in theory that replica is already up-to-date, the catch-up process when it re-joins should be fast.
[15:05:53] <fontanon> GothAlice, thanks I'm removing the shards with db.runCommand({removeShard:"shard0001"}), is it ok?
[15:06:18] <GothAlice> Yup, that's the recommended approach as documented here: http://docs.mongodb.org/manual/tutorial/remove-shards-from-cluster/
[15:06:19] <fontanon> the first shard I removed with that command displays a "drainning: true" status
[15:06:29] <GothAlice> Just remember to do one node at a time, if you're wanting to change multiple.
[15:07:02] <fontanon> yep, mongo complains if I try to remove another shard if the previous has not finished
[15:07:05] <GothAlice> (Remove A, wait, add A back with the right port, wait, remove B, wait, …) ;)
[15:07:16] <fontanon> got it!
[15:07:55] <GothAlice> What you're waiting for is for all of that data to get shuffled to other nodes to ensure your data is always accessible.
[15:08:09] <fontanon> removing the first shard is taking a lot of time, is that ok too?
[15:08:25] <GothAlice> The amount of time it takes will depend on the amount of data that needs to be moved.
[15:08:32] <fontanon> the point is the shard I'm removing is on a node with no data
[15:08:59] <fontanon> I mean, I haven't partitioned any collections into that shard yet!
[15:09:03] <GothAlice> :/
[15:09:06] <GothAlice> Hmm.
[15:09:13] <fontanon> and stills drainning ...
[15:09:49] <GothAlice> Anything else showing up in the logs? And what's the output of db.currentOp()?
[15:10:04] <GothAlice> OneAngryDBA: Love the nick.
[15:10:23] <fontanon> mongos> db.currentOp()
[15:10:23] <fontanon> { "inprog" : [ ] }
[15:10:51] <fontanon> but stills drainning --> shards: { "_id" : "shard0000", "host" : "server1:27017", "draining" : true } ...
[15:11:58] <fontanon> GothAlice, I understand this un-clustering procedure is prod-env friendly, but just because I'm playing in a sandbox ... isn't there an easy way to destroy the cluster?
[15:12:20] <cheeser> shut down services. rm -r.
[15:12:30] <fontanon> cheeser, rm -r what
[15:13:08] <GothAlice> The contents of your mongod's data directory, typically /var/lib/mongodb/*, /var/lib/mongod/*, or /data/* (if not otherwise configured, I believe this is the default)
[15:13:22] <fontanon> alright
[15:13:28] <fontanon> lets try, thanks both !
[15:26:39] <deathanchor> I have 4 member set, m4 is zero votes and priority. would a force re.reconfig() on m4 to make it a solo member of the set break it out of the other 3 cleanly (and not affect the 3)?
[15:26:40] <fontanon> GothAlice, cheeser it was a matter of removing the config servers metadata directory for removing the cluster, no need to remove the data in the nodes, thanks for your help
[15:29:07] <GothAlice> deathanchor: The process would be to remove that member from the set using the management commands. It will still think it's part of the set, though, and will need to be restarted after altering the configuration to remove the replSet configuration option. Only _then_ will it think it's standalone.
[15:29:49] <GothAlice> If it's a non-voting, non-electable member of the set, the set should be otherwise uninterrupted during removal. (You _might_ trigger an election, which will cascade down to the client drivers needing to reconnect, but the same primary should win.)
[15:30:20] <GothAlice> See: http://docs.mongodb.org/manual/tutorial/remove-replica-set-member/
[15:45:09] <deathanchor> yeah, I'm trying to figure out a way to take prod data to a testing env, and bring it back again to resync from scratch when needed.
[15:45:56] <deathanchor> safest is to remove it
[16:17:57] <zunkree> Hi.
[16:19:35] <zunkree> I have a question. When I dump database with mongodump they used less space than in mongodb stats. Is this ok?
[16:20:08] <deathanchor> yes
[16:20:26] <deathanchor> mongo pads the files when used in the active db
[16:20:30] <zunkree> But difference more than two times.
[16:21:04] <deathanchor> http://docs.mongodb.org/v2.4/core/record-padding/
[16:21:05] <zunkree> for example
[16:21:05] <zunkree> ~ # du -sh dump
[16:21:05] <zunkree> 4.6G dump
[16:21:13] <zunkree> "dataSize" : 6.481055587530136,
[16:21:14] <zunkree> "storageSize" : 8.350494384765625,
[16:21:26] <zunkree> data from mongo in GBytes
[16:21:38] <Croves> Is it possible to create two documents with the same ID?
[16:21:52] <Croves> oh, no
[16:21:57] <Croves> forget what I said
[16:22:45] <deathanchor> zunkree: I believe that oplog doesn't get dumped, which is in the db.
[16:23:00] <GothAlice> It only gets dumped if you explicitly request "point-in-time" backup from mongodump.
[16:23:53] <zunkree> deathanchor: this instance not a part of replica set
[16:25:33] <GothAlice> zunkree: MongoDB on-disk data files are "sparse", i.e. they develop holes, and are over-allocated vs. the amount of actual data. (They also can become fragmented enough to warrant compaction.) Additionally, each record has some "head room" added to it to allow it to grow without needing to be moved—this helps reduce fragmentation. When you use mongodump it pulls out each record and packs them as tightly as possible in the resulting .bson
[16:25:33] <GothAlice> binary file. This is _substantially_ more efficient.
[16:25:42] <GothAlice> Thus the difference in apparent sizes.
[16:26:31] <deathanchor> zunkree: I usual do a test import and check that count is the same as when I did the dump
[16:27:17] <zunkree> GothAlice: Thank you. I was been little confused when dump my data.
[16:27:28] <deathanchor> I have one DB where I rotate collections which is 400x larger than the actual data size
[16:28:25] <GothAlice> Aaaand that description of the difference going in the FAQ. ^_^
[16:29:09] <GothAlice> deathanchor: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework < this article describes several different ways of storing the same data and the impact on query performance and data storage size. Simple changes can have substantial effects on disk utilization.
[16:30:00] <GothAlice> (The naive approach, document per event in this example, stores 133mb of data and 94mb of indexes. The hybrid approach stores 39mb of data and 23mb of indexes… and it's the same data!)
[16:30:51] <deathanchor> GothAlice: my issue was we got lazy and the rotation stopped working. So 3 months of data got stored instead of 2 days worth
[16:31:01] <deathanchor> just need to do a repair :D
[16:31:03] <GothAlice> … ouch.
[16:31:05] <GothAlice> XP
[16:31:30] <deathanchor> we only noticed when the 300GB disks started getting full
[16:33:36] <deathanchor> in reality only about 3GB should be used
[16:38:30] <GothAlice> FAQ updated to include mongodump: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab#manual-dump
[16:38:35] <Tinuviel> hi
[16:38:47] <Tinuviel> I have a question
[16:39:54] <GothAlice> We may or may not have answers, but before any can be provided the question must be posed, Tinuviel.
[16:40:02] <Axy> hi all -
[16:40:44] <Axy> I'm new to mongo, I save everything into my database like json objects
[16:40:58] <Axy> is it possible to perform a text search on the whole database? and get the item?
[16:41:11] <Tinuviel> I have mongoDB with two shards (three replica members each). Database is getting constantly some reads from web app. I need from time to time some heavy writes to DB. It will take maybe even few days, script will be reading some documents from DB,calucateing something and writeing back to DB>
[16:41:39] <Tinuviel> What is best way of doing that to not interrupt production traffic (reads).
[16:41:46] <GothAlice> Axy: http://docs.mongodb.org/manual/core/index-text/ < If you want to perform full text search, you're going to need to create a text index. However, these searches are restricted to a single collection at a time.
[16:42:00] <Tinuviel> I was thinking about redirecting reads to to secondary member, and do writes on primary
[16:42:01] <Tinuviel> b
[16:42:15] <Tinuviel> *but maybe there are some smarter options
[16:43:01] <Axy> GothAlice, can I send you an example entry over dm? maybe there is a better way of doing this
[16:43:28] <Tinuviel> Priority here is to not interrupt production queries.
[16:43:43] <GothAlice> Tinuviel: Alas, all writes _must_ pass through a primary. Handing your reads to secondaries will help reduce the overall read load on the primaries, but your application should also be pulling data from the secondaries where possible. Your main trick will be to rate limit your updates. If you are performing those updates all in one thread, you might not need to due to the natural serial nature of an arrangement like that.
[16:44:33] <GothAlice> The only other concern for a long-running query like that is the need to handle the cursor timing out. I.e. retry from where you left off in the event of the cursor being closed.
[16:45:18] <Tinuviel> Thanks GothAlice
[16:45:33] <Tinuviel> and how can I limit my updates?
[16:45:49] <Tinuviel> is there any build-in mongo way of doing that?
[16:45:55] <GothAlice> Nope.
[16:46:16] <Tinuviel> Or does it require some sleeps inside the script?
[16:46:17] <Tinuviel> I see
[16:46:25] <GothAlice> Adding a small time delay (i.e. time.sleep(0.1) in Python) to the bottom of your loop would work, as one approach.
[16:46:33] <Tinuviel> got it
[16:46:37] <Axy> GothAlice -- when I do db.collection.findOne() this is a single entry that comes up http://pastebin.com/QxASwpVL
[16:47:00] <Tinuviel> One more thing.
[16:47:03] <GothAlice> Axy: And what are you wanting to search on?
[16:47:15] <Tinuviel> I know I can construct query to hit secondary instead of primary
[16:47:21] <Axy> part of the post_url
[16:47:35] <Axy> for .post
[16:47:59] <GothAlice> Tinuviel: Which sizes do you care about? (Original only, alt only, both?)
[16:48:04] <GothAlice> Er, sorry, Axy.
[16:48:46] <Axy> GothAlice, hm in what sense
[16:49:00] <Tinuviel> But can I somehow force that without adjusting query? I'm asking because query will be in PHP code, and if we will need to changed that to ask secondary, it will require to do the release, which take long time.
[16:49:02] <GothAlice> Axy: In the sense of "what you want to search on". ;)
[16:49:23] <Axy> GothAlice, I can search for the whole post_url as well, if that is possible
[16:49:36] <GothAlice> Tinuviel: That's called the "read preference". You'll have to examine the driver docs for the implications to your code.
[16:49:41] <Axy> I mean I have a post url in my hands, and I need tofind which entry it belongs to
[16:49:53] <Tinuviel> yes, I know.
[16:50:09] <Tinuviel> but I can I do that without integration in driver?
[16:50:15] <GothAlice> Nope.
[16:50:26] <Tinuviel> That's sad, but fully logical.
[16:50:34] <Tinuviel> I've just wanted to make sure.
[16:50:37] <GothAlice> In *theory* you can run a middleware server which transforms the data coming out of your app before it hits mongod, but, uh, that's a bit of overkill.
[16:51:35] <Tinuviel> Hmm, can I just leave php calling secondary all the time?
[16:51:42] <GothAlice> There are reasons to not do that.
[16:51:57] <Tinuviel> I see...
[16:52:21] <GothAlice> You could, but secondaries are potentially delayed. You may have situations where you .update() something, then .findOne() the updated copy… if the findOne() asks a secondary, you may get the document prior to the update.
[16:52:43] <Tinuviel> I will be updating once a day in regular bases
[16:52:54] <Tinuviel> update will take around 10-15minutes
[16:52:57] <GothAlice> So anywhere you think you want to use a secondary read preference, consider: is it A-OK if this query gets old data?
[16:53:08] <Tinuviel> Thanks
[16:53:49] <GothAlice> http://docs.mongodb.org/manual/reference/read-preference/ goes into depth on the various caveats of each choice.
[16:56:38] <Axy> do find() returns limited number of results?
[16:56:48] <Axy> How can I view my database?
[16:56:59] <Axy> (all entries)
[16:57:10] <GothAlice> Axy: find() will return a cursor which has no particular limit; internally it requests records in batches. (Each new batch is referred to as a "getMore" operation.)
[16:57:25] <cheeser> in batches of 20
[16:58:02] <Axy> Hm I see
[16:58:07] <GothAlice> However iterating the cursor will keep that little fact from you, excluding a slightly increased delay every 20 records.
[16:58:12] <Axy> is it possible to get everything in return
[16:58:35] <GothAlice> Aye. In Python if you cast a cursor to a list (list(cursor)) it slurps all the data in.
[16:58:53] <cheeser> but that still results in multiple server calls.
[16:59:04] <cheeser> to get everything in one go, you could bump the batch size.
[16:59:04] <GothAlice> (This is generally not a good idea, though. It requires enough memory to store all data, and blocks until all the data is read.)
[16:59:37] <Axy> Ah this is an action I'll do manually -- I jsut need to find a piece of text that's in the database,since I can't do a serach for it I was just thinking, console logging thee whole database and just manually performing a ctrl-f
[17:00:21] <Axy> my whole db is around 100mb for now
[17:00:25] <Axy> its not big
[17:00:37] <GothAlice> "Can't do a search for it" is… dubious.
[17:01:46] <GothAlice> db.foo.find({$or: [{"post.photos.alt_sizes.url": /bob/}, {"post.photos.original_size.url": /bob/}]}) < for example
[17:01:59] <GothAlice> Search for "bob" anywhere in the URLs (for any scale photo) on a post.
[17:02:40] <GothAlice> Though I'm still not sure if that's actually what you're trying to do. ;)
[17:02:59] <Axy> GothAlice, that might be! let me try!
[17:06:28] <Axy> thank you very much GothAlice that worked like a charm
[17:06:54] <Axy> what does $or: do there?
[17:07:30] <GothAlice> Because you're storing the original separate from the scales, you effectively need the combined set of two queries, thus the $or.
[17:07:35] <GothAlice> If either condition is true, the record matches.
[17:26:53] <fullstack> In Mongoose Populate (show nested), how do I show all without passing the specific column name? When I call mongoose.model...find().populate().exec(function(data){console.log(data}) I get an error
[17:27:21] <fullstack> And the document saids doesn't say
[17:27:50] <fullstack> I just want to populate everything
[18:06:45] <StephenLynx> don't use mongoose.
[18:06:49] <StephenLynx> fullstack
[18:07:07] <GothAlice> Practicality beats purity, StephenLynx. ;)
[18:07:35] <StephenLynx> and not using a steaming pile of radioactive trash beats using it.
[18:46:27] <MacWinner> StephenLynx, Why not use mongoose? sorry, just noticed the post... i actually just was playing aroudn with it today.
[18:47:09] <StephenLynx> because it is bad designed, has slow performance, adds a failure layer, it completely changes the workflow, all while offering zero functionalities.
[18:47:45] <cheeser> not sure "zero" is fair...
[18:47:53] <cheeser> i know lots of folks use it...
[18:47:55] <MacWinner> StephenLynx, my main use for it is adding model methods... I'm open to using something else.. any suggestions?
[18:47:58] <GothAlice> MacWinner: It is the single most frequent source of support incidents I assist with, typically caused by Mongoose encouraging patterns of use MongoDB doesn't cope well with. A typical example is the naive approach of storing ObjectId instances as strings (more than doubling the storage requirement, breaking query/sort capability, etc.
[18:48:12] <StephenLynx> the regular driver is more than enough.
[18:49:09] <StephenLynx> cheeser and I know lots of folks that think that windows is A-ok for a server.
[18:49:10] <MacWinner> StephenLynx, so just creating a plain old javascript object and having it reference mongodb directly?
[18:49:20] <StephenLynx> no, you use the driver.
[18:49:36] <StephenLynx> directly isn't worth it.
[18:50:08] <StephenLynx> specially if the driver for your platform has official support from 10gen.
[18:50:22] <GothAlice> "Badly designed": In what ways? "Slow performance": do you have benchmarks to back that claim up? "adds a failure layer": that's minor; often the benefits of a DAL outweigh the risk of an additional moving part. "Completely changes the workflow": Not completely; it's still CRUD. "Offering zero functionalities": patently false if you look at the documentation.
[18:51:17] <StephenLynx> you just mentioned how bad it is design regarding object ids.
[18:51:20] <MacWinner> could you point me to an example of using the native driver, but extending the document with a method like if it's an Animal object, give it a method to bark()
[18:51:21] <GothAlice> Those gross generalizations would, in theory, apply to any DAL. I use MongoEngine. I'd prefer to not work without it. (I make extensive use of its features, including automatic caching of values associated with references, signal handlers, cascading deletes, …)
[18:51:32] <StephenLynx> and of course it will degrade performance because it adds and abstraction layer.
[18:51:45] <GothAlice> Again, that's not a problem until you can measure an actual problem.
[18:51:46] <fullstack> StephenLynx, I needed de-normalization modeling and mongoose was the go-to
[18:51:55] <StephenLynx> and AFAIK it doesn't enable you to do anything that you couldn't already do with the driver.
[18:52:00] <fullstack> I mean I needed normalization modeling, whoopts
[18:52:09] <StephenLynx> what do you mean by that, fullstack?
[18:52:24] <StephenLynx> you needed it to work like a relational db?
[18:54:10] <fullstack> Yes
[18:54:50] <fullstack> they wanted to be able to change the name of a data item and have it reflect to all the other areas of the documents without updating the entire collection
[18:54:58] <fullstack> I mean, change the name column value
[18:55:12] <StephenLynx> ok, so, let me get this straight
[18:55:31] <GothAlice> fullstack: That's a pretty typical use of a ODM (object document mapper).
[18:55:36] <StephenLynx> you added a dependency to make your database work exactly the oposite way it should work instead of using a more apropriate database?
[18:55:46] <GothAlice> I do exactly that to store single-character fields in MongoDB, but have descriptive names at the application layer.
[18:55:48] <fullstack> GothAlice, isn't that what mongoose is?
[18:55:59] <GothAlice> fullstack: That's… a minor part of what Mongoose does.
[18:56:43] <fullstack> StephenLynx, they didn't want to write one off REST queries for each possible combination of changes
[18:56:51] <GothAlice> There's also "active record" (i.e. the document can save modifications to itself amongst other things), query transformation (build up queries using a more "native" approach than $ keys), etc.
[18:57:07] <GothAlice> (Schema validation…)
[18:57:39] <StephenLynx> active record is just a simple abstraction, isn't?
[18:57:55] <GothAlice> Depends on the DAL.
[18:58:05] <fullstack> So like, active record , I'd say "Hey this field change. go everywhere in the collection and find where we used the same field where the data is duplicatd and update it. Also lets pay employee Bob to do this fulltime"
[18:58:10] <StephenLynx> you have an object that holds a reference to the document and make it perform the query it is referenced to.
[18:58:15] <GothAlice> MongoEngine is capable of "dirty flag" tracking and can transform local modifications into efficient $set/$push/etc. atomic operations.
[18:59:15] <GothAlice> fullstack: A Document Object Mapper and Active Record system would let you change the name of the property in your application independent of the name of the field in the database.
[18:59:41] <GothAlice> There's no need to "go everywhere" (i.e. update the actual data) at all.
[18:59:50] <fullstack> Oh I just wanted to change the value of "name" and have it reflect everywhere else
[19:00:14] <fullstack> as in, name being a proeprty in the document. not change the name of the property
[19:00:44] <GothAlice> fullstack: So, like, db.collection.update({_id: ObjectId(…)}, {$set: {name: "Bob Dole"}})
[19:00:56] <StephenLynx> in the end, it went to all collections and updated them all. you just didn't see it.
[19:01:09] <StephenLynx> transparent implementations are still implemented.
[19:01:11] <fullstack> GothAlice, yeah but now we need to change it everywhere else that I use that property.
[19:01:41] <fullstack> in other documents. Like reports, or etc. So DBRef/Mongoose was the logical step
[19:01:56] <StephenLynx> logical? to add it just because of something small like that?
[19:02:16] <GothAlice> fullstack: So, db.other_collection.udpate({'some_reference.id': ObjectId(…the thing updated before…)}, {$set: {'some_reference.name': "Bob Dole"}}, multi=True)
[19:02:18] <StephenLynx> it is performing the same operations with or without it.
[19:02:29] <fullstack> "name" was just one of them, First name last name, age, questions, answers, sections, teams,
[19:02:30] <GothAlice> s/udpate/update/
[19:02:53] <fullstack> coach name. If I change the coach name for a team, I'd have to go to each player and change their coach name property
[19:03:01] <GothAlice> Aye. You're basically looking for: http://docs.mongoengine.org/apireference.html#mongoengine.fields.CachedReferenceField
[19:03:03] <fullstack> for that team.
[19:03:11] <GothAlice> ;)
[19:04:23] <StephenLynx> the problem with these featureless abstraction tools is that they are still performing the tasks that you don't want to write yourself. but with much more increased complexity because they are way much more generic than you need..
[19:04:25] <fullstack> its not immediately obvious to me how that feature would help me, but thanks.
[19:04:32] <GothAlice> AFIK Mongoose doesn't yet have support for this type of automatic caching, though in theory it is implementable using its "middleware" feature (trap "save" on referenced schemas, update references manually as described above).
[19:04:50] <GothAlice> StephenLynx: Again, you really need numbers to back up your claims.
[19:04:59] <fullstack> StephenLynx, No they don't See DBRef
[19:05:32] <cheeser> hard to square "featureless" and "complexity"
[19:06:30] <GothAlice> fullstack: CachedReferenceField('Coach', fields=['name']) — if this is defined in your schema that references a coach, as an example, MongoEngine will _automatically_ handle storing the name field of the referenced document, and automatically update the cached value in the reference if the target document is updated (via MongoEngine).
[19:07:22] <GothAlice> Additionally, the MongoEngine query planner allows for treating these cached values as if they were a join: Team.objects(coach__name="Bob Dole") < actually works
[19:07:54] <fullstack> GothAlice, thanks.
[19:10:30] <saml> with mongoimport (3.0) can I force int ?
[19:10:50] <saml> after upgrading mongod, mongoimport is upgraded.. and it sometimes insert as long.. sometime as int
[19:11:01] <saml> application expects int type (java and scala)
[19:11:14] <GothAlice> saml: http://docs.mongodb.org/manual/reference/mongodb-extended-json/#numberlong
[19:11:23] <saml> legacy system spits out json and we use mongoimport
[19:11:37] <saml> yah that forces as long.. and need to change java and scala apps
[19:11:43] <GothAlice> saml: If it's emitting JSON, then you're limited by the JSON spec.
[19:11:59] <GothAlice> That means all numbers are 64-bit IEEE double precision floating point, with 54 bits of integer accuracy.
[19:12:09] <saml> maybe i'll have to write mongoimport script myself
[19:12:32] <saml> yah with 2.x mongoimport json numbers were imported as int
[19:12:46] <saml> now they are imported as long, double, int
[19:12:47] <GothAlice> I suspect it'd attempt to find the most compact viable representation for any given number.
[19:13:08] <saml> should've written all apps in node.js for web scale
[19:13:12] <saml> java isn't web scale
[19:13:16] <GothAlice> Haaah.
[19:13:24] <GothAlice> Everything is "web scale" as the term is somewhat meaningless.
[19:13:42] <GothAlice> (And plenty of large organizations enjoy success with Java…)
[19:14:03] <saml> yah i wish i don't have to use node.js
[19:14:36] <cheeser> "web scale:" the joke for people stuck 4 years in the past.
[19:14:38] <cheeser> :)
[19:15:08] <GothAlice> Heh, like "big data" for one of my clients who has… 700MB of data.
[19:15:25] <saml> play.api.Application$$anon$1: Execution exception[[ClassCastException: java.lang.Double cannot be cast to java.lang.Integer]]
[19:15:32] <GothAlice> http://bsonspec.org/spec.html — int32 (type \x10), int64 (type \x12), double (type \x01) — these are the available number formats in BSON. Of these, JSON only directly supports doubles.
[19:15:49] <GothAlice> saml: Round first?
[19:15:53] <GothAlice> Even though it's silly?
[19:18:18] <deathanchor> GothAlice, you would have liked Perl for the same reason
[19:18:45] <GothAlice> I coded Perl for many years.
[19:18:59] <GothAlice> Then I realized I don't enjoy write-only code. ;)
[19:19:06] <StephenLynx> your problem is not that java isn't optimized for web servers like other technologies
[19:19:10] <StephenLynx> saml
[19:19:33] <StephenLynx> you could have that problem, yes. but that is not your immediate problem.
[19:20:21] <StephenLynx> and if you plan to migrate to node, migrate to io.js instead.
[19:22:22] <MacWinner> i've starting migrating all our PHP code to iojs
[19:22:45] <MacWinner> it's been a real pleasure because of generators and koajs
[19:22:59] <GothAlice> Yeah, the latest ECMAScript standard introduces many nice things.
[19:23:02] <MacWinner> i can write syncronous style code, but it's operating async
[19:23:18] <GothAlice> ECMAScript really wants me to like it. ;)
[19:23:33] <MacWinner> and has very nice try/catch symantics that work with promises
[19:23:59] <StephenLynx> I've just finished my forumhub using mongo and io. I just need to stop being lazy and get a token in my bank to be able to buy stuff with my credit card on the internet again so I can host it.
[19:24:01] <GothAlice> MacWinner: Could you run one line for me and tell me the result? typeof "foo" === typeof new String("bar")
[19:24:08] <MacWinner> GothAlice, I've become a huge fan of javascript through a reverse route.. It started when I began with angularjs develepement
[19:24:35] <StephenLynx> goth alice, it gives me false
[19:24:41] <GothAlice> StephenLynx: On io.js?
[19:24:44] <StephenLynx> yes
[19:25:02] <GothAlice> Welp, it _wants_ me to like it, but continues to do things that make sure I never will. ;)
[19:25:16] <StephenLynx> well, js is a dirty language.
[19:25:19] <StephenLynx> I don't deny that.
[19:25:20] <GothAlice> (When is a string not a string? When it's JS.)
[19:25:20] <MacWinner> GothAlice, returns false
[19:25:35] <StephenLynx> but is very easy to avoid its insanity using strict mode.
[19:25:53] <StephenLynx> don't you say that praticity beats purism?
[19:25:54] <GothAlice> StephenLynx: Does one still need to memorize a page and a half of hierarchical typecasting rules?
[19:26:01] <StephenLynx> no
[19:26:06] <MacWinner> yeah.. use strict and use jshint, and you'll be fine
[19:26:15] <GothAlice> StephenLynx: Link to strict mode casting spec?
[19:26:25] <StephenLynx> dunno, never bothered with casting.
[19:26:34] <StephenLynx> I just cast to number when needed.
[19:26:41] <GothAlice> StephenLynx: Any time you use == instead of ===, you're hitting the casting rules. How do you cast to a number?
[19:26:41] <StephenLynx> which is very few cases.
[19:26:52] <StephenLynx> +variable.
[19:26:55] <GothAlice> …
[19:27:07] <MacWinner> GothAlice, take a look at koajs + co... it's really life changing :)
[19:27:13] <StephenLynx> and you should always use === in js.
[19:27:21] <StephenLynx> my lint warns me if I use ==
[19:27:26] <MacWinner> the co library is like 200 lines of code, but extremely elegant
[19:27:37] <StephenLynx> meh
[19:27:43] <StephenLynx> I bet its bloat.
[19:27:49] <StephenLynx> what does it does?
[19:28:06] <saml> yah it looks like i'm wrong. mongoimport always imports json number as Double
[19:28:18] <saml> it used to be Int in older version
[19:28:34] <MacWinner> StephenLynx, wraps generator functions and makes them extremely simple to do async code in a sync fashion.. no callbacks or .thens
[19:28:35] <saml> so.. i could downgrade mongoimport, rewrite mongoimport (read json and do mongo update)
[19:28:38] <GothAlice> saml: The older version was not conforming to the actual JSON specification, in that case.
[19:28:44] <StephenLynx> that sounds awful, MacWinner
[19:28:50] <saml> yah
[19:28:52] <StephenLynx> callbacks are great.
[19:29:04] <GothAlice> StephenLynx: Only in languages with real tracebacks.
[19:29:08] <saml> probably someone opened jira ticket about mongoimport and json number.. and it got fixed for him.. but broke for me
[19:29:17] <saml> i was relying on buggy behavior of mongoimport
[19:29:51] <GothAlice> saml: Yup. If you want something else, in this case, I'd recommend dumping to CSV. Then you can have explicit control. JSON just _does not do integers_.
[19:29:57] <StephenLynx> never had any issues with callback based code and debugging when I had to find where something is going on.
[19:29:59] <MacWinner> no promise chains needed.. try/catch scope nicenes.. tj hollowaychuck basically started this as a replacement for express..
[19:30:07] <saml> oh i see. that's another solution
[19:30:45] <GothAlice> saml: (The tears of thousands of crypto developers have left a bit of a stain on JavaScript, because of how it only does floats.)
[19:30:53] <MacWinner> StephenLynx, give it a look though.. I'm a newb with node, but after playhing around with old ways of doing stuff and this ES6 way, it's really game changing for me
[19:31:10] <StephenLynx> whats its name again?
[19:31:17] <StephenLynx> koa
[19:31:26] <saml> today is international json day
[19:31:27] <MacWinner> koajs.. it's built on top of co library
[19:31:36] <StephenLynx> lololol
[19:31:39] <GothAlice> saml: Pity, msgpack is so much better. ;)
[19:31:42] <StephenLynx> yeah, nah.
[19:32:23] <StephenLynx> and it has to be built too?
[19:32:28] <StephenLynx> the thing has a makefile?
[19:33:05] <StephenLynx> hole sheeeeeit the thing has over twenty dependencies
[19:33:09] <MacWinner> what has makefile?
[19:33:12] <StephenLynx> oh my god
[19:33:13] <StephenLynx> koa
[19:33:18] <StephenLynx> https://github.com/koajs/koa
[19:33:25] <StephenLynx> this is an abomination.
[19:33:58] <StephenLynx> 20 dependencies for 220 LOC
[19:34:04] <StephenLynx> my sides
[19:34:21] <MacWinner> it's philosophy is to keep core extremely small and use plugins and packages
[19:34:39] <StephenLynx> yeah, greaty way to add vulnerabilities and trust on inexperienced coders.
[19:34:43] <StephenLynx> A++
[19:34:52] <StephenLynx> or break on updates
[19:35:25] <MacWinner> FUD
[19:35:27] <StephenLynx> at least it support io.js.
[19:35:41] <StephenLynx> so I can give it a tap in the back for that.
[19:35:55] <MacWinner> it requires node with harmony flag.. or iojs
[19:36:06] <MacWinner> since it depends on ES6 generators
[19:36:59] <StephenLynx> no way I am touching this monstruosity.
[19:37:17] <MacWinner> TJ Holowaychuk and Jonathan Ong are the 2 main authors.. they are moving to this from express.. i think there is something here. i have personally found it useful, but hard for me to articulate exactly why
[19:37:28] <MacWinner> but to each there own
[19:37:31] <GothAlice> StephenLynx: My web framework's core is 649 SLoC… small is good. Small is testable. Small is easy to document.
[19:37:43] <StephenLynx> yeah, but how many dependencies you have?
[19:37:54] <GothAlice> Depends on which use flags you set.
[19:38:00] <StephenLynx> my point is:
[19:38:07] <StephenLynx> dependency code is still there.
[19:38:07] <GothAlice> No use flags: four: marrow.pacakge, WebOb, marrow.util, pyyaml.
[19:39:01] <StephenLynx> and express is awful and I have no idea who these two guys are, saying they are moving to koa from express means jack to me.
[19:39:07] <MacWinner> test suites and locking the versions in package.json should make it moot
[19:39:26] <GothAlice> Yup.
[19:40:08] <GothAlice> (My own packages record dependencies with "soft" version pinning, i.e. "less than 3.0" type stuff; I can do this on the marrow dependencies because I guarantee minor and rev bumps don't break things.)
[19:58:12] <saml> okay i chanegd scala app
[20:10:11] <MacWinner> noticed this when i startup mongo 3.0.2.. WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. if I change it to recommended 'never', what kind of impact could that have on ther apps?
[20:10:27] <MacWinner> like any specific type of app that it would be particularly bad for
[20:10:32] <saml> mongoimport is Go
[20:12:50] <MacWinner> seems like a lots of DBs recommend the kernel setting
[20:17:47] <MacWinner> seems like transparent huge pages may be benefecial for desktop apps
[20:19:49] <Ramone> hey all... anyone know if there's a way to cycle pooled connections in the node.js driver?
[20:20:34] <Ramone> I'm having a production issue that redployment seems to fix, and I'm wondering if it could be related to stale connections or something
[20:21:37] <[S^K]> Is it possible to use the $slice projection in a DESC order?
[20:22:56] <StephenLynx> I would use aggregate and slice at the end.
[20:23:17] <StephenLynx> wait, is that possible?
[20:23:32] <StephenLynx> I remember a limitation regarding slice and aggregate.
[20:35:27] <pagios> hi all, trying the ruby binding and getting an argument error: result = client[:'userTable'].find({"token" => token}, :fields => ["arn"]) thats like in the doc https://github.com/mongodb/mongo-ruby-driver/wiki/Tutorial
[20:35:35] <pagios> any idea?
[20:47:36] <mrmccrac> anyone know if using mongo 3.0 + wiredtiger + snappy compression if there is a minimum storage size allocated for a single document?
[20:58:21] <GothAlice> mrmccrac: Around 15 bytes or so for an "empty" document, plus padding factor. (See the docs/configuration for this value; I don't seem to have it on hand.)
[20:59:36] <mrmccrac> trying to figure out why storage size estimates aren't matching up with a larger data set based upon what i saw in a smaller one
[21:00:34] <GothAlice> Always handy to have on hand when doing napkin calculations for size: http://bsonspec.org/spec.html
[21:04:21] <GothAlice> Compression, of course, makes the sizes highly variable. (Compressing the same bytes twice may not even result in the same compressed byte stream!)
[21:10:54] <saml> mrmccrac, did you underestimate or overestimate?
[21:11:02] <saml> and by how much? give us numbers
[21:11:15] <mrmccrac> i thought i'd be using a lot disk space than i am :)
[21:12:38] <mrmccrac> i had a dataset that was 1480G, imported into mongo w/ compression on one host and it ended up being 173G
[21:12:40] <GothAlice> mrmccrac: You seem to be missing a qualifying word between "lot" and "disk". ;)
[21:12:51] <mrmccrac> a lot more, sorry
[21:12:55] <GothAlice> ;)
[21:13:03] <mrmccrac> sigh i cant talk my brain is tired
[21:13:16] <GothAlice> So 173G (vs. 1480G) is "a lot more than expected"?
[21:13:27] <mrmccrac> well that was the 'small' data set i did first
[21:13:30] <GothAlice> That's a compressed size 11% the original…
[21:13:55] <GothAlice> That's a lot better than snappy was able to do on my at-work dataset, and the majority of that dataset is HTML generated by Word…
[21:14:08] <mrmccrac> the larger one im trying now is 2783G, but its looking like at the rate its going now it will end up being around 2400G
[21:14:22] <GothAlice> Compression rates depend entirely on what you are compressing.
[21:14:32] <mrmccrac> i thought it'd be around 334G :)
[21:14:47] <mrmccrac> the type of data between the two should be extremely similar
[21:14:53] <mrmccrac> i would agree with you
[21:15:00] <Torkable> how do you catch timeouts?
[21:15:12] <GothAlice> Torkable: Which timeout, specifically?
[21:15:39] <mrmccrac> its not even 10% imported yet and already using 177G
[21:15:45] <Torkable> if I make a request but the connection has dropped or hasn't been set up yet, it just sits
[21:15:55] <Torkable> seems like I would get an err
[21:16:11] <GothAlice> mrmccrac: Compression works by building up a "dictionary" of patterns; the more data, the better the compression will be, generally. Thus the first half will take more space than the second half.
[21:16:42] <GothAlice> Torkable: Client driver in use?
[21:16:54] <Torkable> node
[21:17:05] <mrmccrac> yeah ill give it more time i guess, this is my 2nd attempt, i changed things so there will be larger documents to avoid too much document overhead
[21:18:22] <GothAlice> Torkable: connectTimeoutMS, socketTimeoutMS are both network-related timeouts you have control over at the MongoClient level, defaulting to 30 seconds.
[21:18:32] <GothAlice> Torkable: As seen: https://github.com/mongodb/node-mongodb-native/blob/2.0/lib/mongo_client.js#L178-L184
[21:18:46] <GothAlice> mrmccrac: WiredTiger compression isn't document-aware.
[21:18:55] <GothAlice> It's done at the stripe block level, AFIK.
[21:20:21] <GothAlice> Torkable: As a fun note, https://github.com/mongodb/node-mongodb-native/blob/2.0/lib/mongo_client.js#L334 seems to indicate "no default" (zero/wait forever) for socketTimeoutMS. >:(
[21:20:40] <Torkable> oh god lol
[21:20:43] <GothAlice> wat js
[21:20:45] <Torkable> maybe that's my problem
[21:20:57] <Torkable> 0.12.2
[21:21:01] <GothAlice> …
[21:21:06] <Torkable> ?
[21:21:29] <GothAlice> There isn't even a 0.12.2 tag in the repo…
[21:21:41] <Torkable> latest stable of node is v0.12.2
[21:21:49] <GothAlice> Ah, node, not the node Mongo driver.
[21:21:54] <Torkable> oh
[21:22:27] <Torkable> mongodb 2.0.25 in package
[21:22:52] <GothAlice> That's reasonably current, so shouldn't be the source of your issue.
[21:23:06] <Torkable> but I need to set timeout looks like
[21:23:09] <GothAlice> I'd definitely provide an explicit set of values for those timeouts, yeah.
[21:23:13] <Torkable> k
[21:23:38] <mrmccrac> gah this makes no sense, ill keep trying to figure out whats going on i guess
[21:24:30] <GothAlice> mrmccrac: I wish you luck with WT, BTW. I can see how for your dataset compression would be a very, very enticing feature, however in my own testing, and in the testing done by many others, there are some significant outstanding issues, some of which put data stored in WT at increased risk.
[21:25:10] <mrmccrac> i setup this larger data set import as a replica set member
[21:25:10] <mrmccrac> but journal is turned off
[21:26:49] <mrmccrac> so far so good i guess
[21:26:49] <mrmccrac> yeah compression is pretty much a must
[21:31:12] <Torkable> hmmm
[21:31:23] <Torkable> yea can't seem to capture the timeout
[21:31:32] <Torkable> timeout set to 30 seconds
[21:31:47] <Torkable> and the connection.on('error' is there and working
[21:32:07] <Torkable> problem is that it catches the failed connection and logs it, but server keeps running
[21:32:18] <Torkable> and then if a route tries to hit the db it just sits
[21:32:40] <Torkable> seems like the query would return an err for that
[21:45:15] <Torkable> ok added hacky connection check to one of the queries
[21:45:34] <Torkable> but how is there not a good way to deal with queries that don't have a connection
[21:45:36] <Torkable> ??
[21:45:48] <Torkable> (⊙.☉)7