[00:28:02] <zamnuts> need some help: uploading a ~900mb file via GridFS (node driver), the node process is maxed out on the core, i see 3636 of 3703 chunks in the fs.chunks, but am not seeing the file yet in fs.file. what gives?
[01:12:54] <zamnuts> well for the record, this was happening when running node in debug mode, running normally it still takes some time to process (a few extra seconds, vs 15 minutes w/ still no end in sight)
[02:31:59] <hfp> GothAlice: I figured it out, thanks.
[02:52:00] <nobody18188181> since mongodb isn't acid compliant; how reliable is mongodb for working with financial transactions, and direct customer data in the event of a catastrophic crash?
[02:54:26] <nobody18188181> i see; with journaling is that similar to the query log for mysql?
[02:56:45] <nobody18188181> lets say I have the perfect mongodb cluster built across the world; someone does a $1M transaction and everything goes wrong at the same time (meteors perfectly hit datacenters, etc) what is the worst that could go wrong with the data assuming a complete cluster rebuild of mongodb?
[03:05:47] <Boomtime> nobody18188181: you only need to plan for disaster scenarios in which civilization survives, can you restate the question with this outcome?
[03:07:32] <nobody18188181> i'm thinking about using mongodb instead of mysql/galera for an ecommerce platform; what i've found is that this is "risky". I found a few articles of bitcoin issues with mongodb, so; I figured I'd ask here in the IRC to see what the thoughts are.
[03:08:28] <Boomtime> every database is risky if you don't deploy with any redundancy
[03:08:39] <Boomtime> mongodb uses replica-sets for redundancy
[03:09:04] <Boomtime> write-concern is the method at the application level of gaining assurance of the resilency of your writes
[03:09:21] <nobody18188181> yea, but what i'm worried about is mid transaction a crash that results in money being lost b/c it wasn't saved properly
[03:09:46] <Boomtime> that applies to every database too
[03:09:49] <nobody18188181> i would ofc deploy with replica sets, etc
[03:09:58] <Boomtime> you need to perform a two-stage commit
[03:12:28] <Boomtime> the specific (simple) example in that tutorial is a banking transaction
[03:17:36] <nobody18188181> it seems client side though?
[03:19:27] <Boomtime> yes, mongodb does not have multi-operation transactions - only single operations are atomic
[03:34:08] <scjoe1> If you use a write concern of majority and j:1 which means that those servers have also performed a commit to disk, you would need a big meteor to loose data. This assumes your majority is across data centers, disc arrays and so forth. I would trust MongoDB just as much as Sql Server if set up correctly.
[03:35:10] <scjoe1> This is on top of using the pattern of the two phase commit shown by Boomtime
[04:10:44] <JakeTheAfroPedob> hi there guys anyone here?
[04:11:07] <JakeTheAfroPedob> need a little help regarding mongodb
[04:11:53] <JakeTheAfroPedob> is there any website other than mongodb itself where i can get a quick tutorial on it?i have been trying to transfer the contents of a .txt file into mongo db but failed to do so
[04:11:55] <Boomtime> ask away, somebody might be able to help you
[04:12:59] <Boomtime> we'll need to know a lot more than that - are you familiar with JSON?
[04:13:03] <JakeTheAfroPedob> i ran a python code .py but i cant fidn the command to access the file
[04:13:16] <JakeTheAfroPedob> yes am using JSON file in the text file
[04:13:56] <JakeTheAfroPedob> i have tried one or two codes playing around with it so that the contents of the .txt file in JSON format can be copied into mongo db
[04:14:01] <Boomtime> accessing a file on your hard-disk is not a mongodb problem - you need to figure out how to read a file yourself, you can then import the contents into mongodb using the pymongo driver
[04:14:06] <JakeTheAfroPedob> but i cant seem to access it on my mongo db when i log in
[04:14:47] <Boomtime> you can also use mongoimport if you like
[04:14:57] <JakeTheAfroPedob> i have tried using db.collection.find
[04:22:08] <Boomtime> ok, so that should be fine, i don't know python but insert often returns the result of the op in other languages
[04:22:28] <Boomtime> you should check the return value from insert
[04:23:18] <JakeTheAfroPedob> well what i inted to do now is to check whether the contents in the .txt filewas actually copied into mongodb. Prob is i cant find the right code for it
[04:23:20] <Boomtime> do you know how to use the mongo shell?
[04:23:50] <JakeTheAfroPedob> am using that. you are reffering to command lines yeah?
[05:08:28] <joannac> you separate your single document into multiple documents
[05:12:34] <JakeTheAfroPedob> @joannac i have an example. For example the text that i have are json files(.txt files) retrieved from the net. To put an example a search result from Google is the json file
[05:13:21] <JakeTheAfroPedob> in that single json file there are multiple google search result of a single query. Is it possible if i seprate them into their respective url's titles and so forth?
[07:08:03] <zw> Hi. I'm new to mongodb. I'm looking for a way to get all collections, but "show collections" on mongo shell is nog giving me anything ?
[07:19:59] <joannac> zw that's the way to get all collections for the active database. is that what you want?
[07:21:08] <zw> joannac: yes, nevermind, I was looking at the wrong db
[09:34:19] <rskvazh> i have collection sharded on {owner: "hashed"}, and many updates with q: {_id: a, owner: b}. There is owner field because of "shard hinting". But sometimes targeted mongod use incorrect index (very high nscanned) - owner, instead of _id for a minute or two. I try plancachefilter for a day - and seems all ok. But plancachefilter not persistent... How I can fix this problem? I know about https://jira.mongodb.org/browse/SERVER-1599. There is a bug in mongod?
[15:31:41] <nawadanp> GothAlice, Hi ! FYI, the gentoo bug (segfault on mongo shell and mongod) have been resolved : https://bugs.gentoo.org/show_bug.cgi?id=526114
[15:37:59] <rskvazh> Hi! Could you help me? https://groups.google.com/forum/#!topic/mongodb-user/NwKHZhGqjqc
[15:39:56] <wsmoak> (ftr that code was absolutely fine. upgraded editor stopped doing “save when focus is lost” :/ )
[15:41:47] <GothAlice> nawadanp: Thanks for the heads up!
[15:44:32] <GothAlice> wsmoak: I tend to not rely on save-on-defocus due to the fact that my development environments auto-reload when files are modified, and not all modifications I wish to save immediately. (Or will allow the reload mechanism to even survive…)
[15:45:06] <GothAlice> I.e. accidentally saving an unintentional syntax error nukes dev from orbit.
[17:45:25] <wsmoak> now that I’ve gotten that inner array added… I’m having trouble appending to it. :/
[17:45:33] <wsmoak> trying to apply http://docs.mongodb.org/v2.2/applications/update/#add-an-element-to-an-array
[17:46:01] <wsmoak> but it’s not quite the same … https://gist.github.com/wsmoak/27e92ea2e5b5bd313155
[17:46:36] <wsmoak> I need to match on the weights.date to get into the right element of weights, and then append to data
[19:47:38] <J-Gonzalez> I'm aggregating orders per month and can add the totals and averages easily
[19:47:49] <J-Gonzalez> however, I'm now trying to add the number of items purchased
[19:48:01] <J-Gonzalez> which is an array within the order object
[19:48:36] <J-Gonzalez> If I unwind based on the array, wouldn't I get double (or more) the amount for order totals etc?
[19:48:53] <J-Gonzalez> and is there a way I can do a sum on the arrays length?
[19:49:36] <J-Gonzalez> Note if you see gist: I can use either the number of items in the user_ticket_ref array (so user_ticket_ref.length) or the qty inside the ticket_items array - both values should be the same
[19:49:45] <J-Gonzalez> just I don't really know how I can aggregate this correctly
[19:57:06] <J-Gonzalez> hmm - unwinding DOESN'T give me double for totals, but it does give me slightly different numbers
[20:42:41] <ndb> I have 2 kinds of doc schema for the same collection, the registers are 1 million, I search for one by key like action: CONFERENCE, even for small amount 1k registers find the time is pretty low (4s)
[20:43:07] <ndb> is there some way to makes things fast? Like indexes by 'action' key?
[20:43:20] <ndb> or a collection for each kind of schema
[20:46:28] <wsmoak> what are the two different documents? GothAlice advised me to put different “things” in different collections. (no idea how much faster that would be though)
[20:47:20] <ndb> i have action: EVENTS and a group of events are summarized on this last document
[20:47:48] <ndb> like a resume denormalization, normally we just get this last "summary" and use it
[20:48:05] <ndb> even than we have tons of events, and it hm makes things a little slow on a find
[20:54:08] <rskvazh> Derick, hello! in mongo-php driver how I can do MongoUpdateBatch with upsert option? add(['q' => ..., 'u' => ..., 'upsert' => true])?
[20:55:29] <rskvazh> sorry, find in manual. RTFM :( http://php.net/manual/en/mongowritebatch.add.php
[20:57:01] <Derick> rskvazh: glad you found it :-)
[20:57:22] <rskvazh> Derick thank you for your work!
[21:01:04] <bazineta> kali Background flush on the GP SSD is indeed a thing of wonder and beauty
[21:19:30] <bazineta> kali same workload on 1K PIOPS, 1.5 to 2 seconds flush. On 500GB SSD, 1.5k to 3k burst, 400ms flush. With flush at the 60 second default, the volume is always in burst credit surplus. Half the price per month.
[21:23:53] <kali> bazineta: that's still weird. maybe we should expect a refactoring of the pricing grid in the weeks to come
[21:26:30] <kali> bazineta: thanks for the followup
[21:39:54] <speaker1234> I'm putting a history of activity as part of a record. The sequence is changing state of the record and adding a history entry. Is there any way I can copy original records state into the history line in a find and modify command?
[21:40:52] <GothAlice> speaker1234: No; pivot your actions a bit. Instead of wanting to move the old into the history when updating state, add the updated state to the history each time you change it. (That way the "current" value need not be moved.)
[21:43:17] <speaker1234> in other words, think of the history asa record of what changed you to that state (i.e. trigger), not what state you were in before the trigger
[21:43:33] <quuxman> The way I would accomplish this on the first pass, is to have a history attribute of the record. The record is updated, you take that updated state (minus the history attribute) and prepend or append it to the history attr
[21:43:36] <GothAlice> I.e. {state: 'active', history: [{state: 'active', when: utcdate(…)}]} when changing state to "inactive" becomes: {state: 'inactive', history: [{state: 'active', when: utcdate(…)}, {state: 'inactive', when: utcdate(…)}]}
[21:43:39] <quuxman> so everything is copied with each state change
[21:48:03] <GothAlice> (There can't be race conditions if state transitions are fixed. I.e. to go to "inactive" it must first be "active" — you can include that in your query to make it into a update-if-not-modified, and the operations are atomic…)
[21:48:12] <quuxman> even if multiple workers were updating the history, simply adding a timestamp to when the change was made would eliminate conflict. Things may get inserted out of orde, but your app layer could resort it
[21:48:48] <speaker1234> also passing over the _ID, state, and the history field from the worker be to the work manager process is not going to be that big a burden since the history is only going to be about 5 to 7 entries long
[21:49:26] <GothAlice> Uhm… don't pass the full history.
[21:49:33] <GothAlice> For the love of all that is sacred, don't do that. ;)
[21:49:50] <speaker1234> believe me, nothing is sacred :-)
[21:50:01] <GothAlice> Does your worker actually need access to the history?
[21:50:06] <quuxman> yeah why not pass the whole history? if it's only 7 items long?
[21:50:23] <GothAlice> (My worker bees literally only get the _id of the job.)
[21:50:26] <speaker1234> what I started to say was that the worker be only generates the state change and what triggered the state change (i.e. the history entry)
[21:51:39] <speaker1234> so when I pass information back to the hive, I can do a find an update to accomplish the same thing about adding to the history
[21:51:49] <speaker1234> as long as I pass the current state in addition to the new state
[21:52:18] <GothAlice> _id, old state (for race condition prevention), and new state, yes. Nothing else needs to be passed around in order to update state and add to the history.
[21:52:35] <speaker1234> I also need to pass the reason
[21:53:25] <speaker1234> so the document argument would look something like:
[21:53:30] <GothAlice> That can be a requirement. All of my states have very well-defined purposes and state transitions are explicit (you can't just hop around the states), so "reason" is inherent to the state in my case. (i.e. "invoiced" means someone actually invoiced)
[21:55:48] <speaker1234> unfortunately for me, multiple reasons can cause the same state change. This is not a problem because the recovery process is the same for all of the failures
[21:57:47] <wblu> Will an Ubuntu package be uploaded to Mongo's apt repo for 2.8.0-rc0?
[22:02:35] <GothAlice> speaker1234: I hope the object you push to History is an embedded document, not a string, but yeah, that's the approach.
[22:03:28] <GothAlice> speaker1234: As a note, because MongoDB doesn't enforce consistent schemas between documents in a collection, it can't optimize away the names of the fields. This means that the fields take up room in the document the same way as the data… long field names will add up substantially over time. How many records did you say you have?
[22:05:38] <GothAlice> 120000000 * ( len("Delivery_state") + 6 ) = ~240 MB (base 1000, not MiB base 1024) of storage just for the key "Delivery_state".
[22:05:58] <GothAlice> 2000000 * ( len("Delivery_state") + 6 ) = ~40MB per day growth just to store the name of that field.
[22:07:00] <GothAlice> Assuming two dozen fields total the storage of just keys would be 5.76 GB, with a growth of ~1 GB/day… just to store keys!
[22:07:00] <speaker1234> okay so if I went to single character field identifiers, it would save a lot right?
[22:07:20] <speaker1234> I'm only half serious :-)
[22:07:30] <GothAlice> It would. I find this to be the biggest reason to use an ODM.
[22:08:56] <speaker1234> that's going to have to wait for rev two. I lost way too much in the past couple of weeks to Windows 8 hosing Me over moments
[22:09:02] <GothAlice> speaker1234: https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 here is an example of an aggregate query (and matching map/reduce) against some of my data. Notice the single-character keys everywhere.
[22:10:33] <GothAlice> Indeed; single-character-izing everything wasn't my first step. ;) Re-naming fields is a bit of a PITA, though, esp. with large datasets, so it's something to consider up-front. (Most ODMs allow you to change the in-database name used for the field, i.e. name = StringField(db_field='n')
[22:10:56] <J-Gonzalez> This aggregate is killing me: https://gist.github.com/J-Gonzalez/675bddacbf6f546e2f5c - can't figure out how to sum values within an array while still keeping 'parent' grouped values accurate
[22:12:12] <speaker1234> I will cut the keys down to two or three letter acronyms look into and ODM later
[22:12:45] <speaker1234> fortunately, the data does expire and I always have the option of restarting with the new instance in the old one can expire eventually
[22:13:24] <speaker1234> this data is like beer, you don't buy it, you only rent it
[22:13:30] <GothAlice> Nice! Not something most datasets can benefit from. :) (If you weren't already, I'd recommend using an expiring index to let MongoDB clean up your data for you.)
[22:14:00] <speaker1234> that's a rev 1.5. I've got to prove to customers that this works and then they will fund changes
[22:14:41] <GothAlice> speaker1234: http://docs.mongodb.org/manual/tutorial/expire-data/ It's a one-liner in most environments. ;)
[22:14:59] <GothAlice> I use this approach: http://docs.mongodb.org/manual/tutorial/expire-data/#expire-documents-at-a-certain-clock-time
[22:15:15] <speaker1234> does it come with its own rimshot? <ba-dump>
[22:15:47] <GothAlice> Alas, rimshot sold separately.
[22:16:50] <speaker1234> ok, back in the code base of death.
[22:17:19] <GothAlice> J-Gonzalez: You want $size.
[22:17:59] <GothAlice> I.e. totalTicketsSold: {$sum: {$size: '$payment'}}
[22:18:19] <GothAlice> Er, actually, scrap the $sum on that.
[22:18:52] <GothAlice> Or don't. Ugh. Reading is hard after a day filled with meetings. Yeah, you're grouping on time. You'll want the $sum there.
[22:19:53] <GothAlice> Also $ticket_items, not $payment.
[22:25:24] <J-Gonzalez> Thanks - GothAlice - I'll take a look at trying to get $size to work
[22:25:29] <J-Gonzalez> hadn't even thought of using $size
[22:41:18] <GothAlice> Now that I'm home… I'm going to see how much overhead would be added by using full field names on my Stupidly Large™ dataset…
[22:58:05] <GothAlice> As it is, field name storage currently takes 168 bytes per asset on average. ~200 million assets w/ an average compressed asset size of ~125MB… I've got some big things in there. Using full field names would add about 200 bytes per asset on average—each field is still only one word. That'd be overhead to the tune of 73.6 GB or ~0.3% total storage vs. current 33.6 GB or ~0.1%.
[22:58:15] <GothAlice> So for my dataset this optimization is minor. ^_^;
[23:06:18] <speaker1234> looks like the overhead really isn't that big an issue as long as we keep the names "reasonable"
[23:06:48] <speaker1234> by the way, I highly recommend frozen strawberries blended with a quarter to a half cup of red wine and a little bit of cinnamon or lemon
[23:06:52] <GothAlice> I'm storing large BLOBs in GridFS, which is rather atypical pattern of use. (I really should only be measuring against the metadata collection data storage…)
[23:07:30] <speaker1234> any alcoholic beverage or just red wine?
[23:08:03] <GothAlice> Red wine and _certain_ other alcohols. I have a really nasty (and almost immediate) adverse reaction to something in it. Sulphates, maybe?
[23:08:59] <speaker1234> it's hard to tell. Sometimes it is the tannins
[23:09:15] <speaker1234> I'm not supposed to have alcohol at all because my triglycerides are "easily stimulated"
[23:09:32] <speaker1234> I'm floating between 750 and 850
[23:09:59] <GothAlice> Well, in theory nobody should consume alcohol. It *is* a poison, kills more than murder, and racks up long-term medical costs second only to tobacco. ;)
[23:11:04] <speaker1234> if I can find something that tasted like red wine (i.e. good red wine) without the alcohol, then I would gladly leave the alcohol out of my life. But consider that I consume at most 2 cups of wine a month, I got low risk to the other issues
[23:11:40] <speaker1234> also, if I drink my usual tiny amounts of alcohol, I don't drive for at least three hours.
[23:12:21] <speaker1234> The level I drink wouldn't impair most people but my body is "special"
[23:12:32] <GothAlice> Yeah, that's not bad at all. I've inherited terrible internal organs, so I have a fair number of digestion-related issues. (Lactose intolerance, potato starch intolerance, and I'm the only member of my family who still has a gall bladder… so anything greasy/oily/fatty I have to be very careful about. Any of these and I spend nearly 24h feeling like I'm dying.)
[23:13:20] <speaker1234> good Lord. We are too genetic peas in a pod.