PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 10th of March, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:54:54] <GothAlice> zivix: Never before have I seen anyone with as many network transitions as you. Please accept these bytes as a token of my sympathy. (Either that of you've got a host fight on your hands. Both of these problems can be alleviated by running an "IRC bouncer" such as ZNC.)
[04:03:18] <Jonno_FTW> hi, when I use $near with min and max distance, I still get the document exactly on the query location
[04:04:48] <Jonno_FTW> my query and result looks like this: http://i.imgur.com/cX50K8D.png
[04:05:12] <Jonno_FTW> why is the first result included even though it's not within the range?
[07:55:18] <amitprakash> Hi, with mongo I am getting a key '$set' must not start with '$' at random for some update queries
[07:56:55] <amitprakash> db.items.update({'_id': ObjectId(...)}, {'$set': {'cnc': 'Heathrow'}}) for example throws this error
[07:59:02] <joannac> amitprakash: in the shell?
[07:59:07] <joannac> amitprakash: what version?
[07:59:15] <amitprakash> db.items.update({'_id': ObjectId(...)}, {'$set': {'cnc': 'Heathrow'}}, {multi:true, writeConcern: 2}) to be specific
[07:59:22] <amitprakash> joannac, actually no, this is via pymongo
[08:01:17] <amitprakash> joannac, mongo 2.6
[08:04:16] <amitprakash> db.items.update({'_id': ObjectId(...)},{'$set': {'cnc': 'Heathrow'}}, multi=True, w=2)
[08:04:22] <amitprakash> Any ideas?
[08:04:38] <amitprakash> The weird thing it, it happens on random queries
[08:06:42] <joannac> if you can find the exact update that makes it error, that would be useful
[08:06:50] <amitprakash> And I can execute the same query a second time for it to work :/
[08:06:54] <amitprakash> Sure, 1 sec
[08:11:51] <amitprakash> joannac, http://pastebin.com/qNMww8By
[08:12:38] <amitprakash> The exact error thrown is, key '$set' must not start with '$'
[08:16:01] <amitprakash> I can dig the pymongo code where it is breaking and throw the variables at the erranous function if needed?
[08:16:28] <joannac> sure, I really can't see how that update could trigger that warning
[08:25:06] <amitprakash> joannac, http://pastebin.com/C07yJjfG
[08:28:14] <joannac> wait, that's wrong
[08:28:21] <joannac> where's the query part of the document?
[08:28:26] <amitprakash> spec?
[08:29:25] <amitprakash> this is pymongo internal, so I am not sure how message.update works, but I am guessing spec is the query part of the doc
[08:29:51] <joannac> the query part you have is ign
[08:29:58] <amitprakash> or wbn
[08:30:19] <joannac> erm
[08:30:37] <amitprakash> oh, the first and second pastes are not actually off the same query
[08:30:42] <joannac> okay
[08:30:48] <amitprakash> similar but not same, first is with ign: value, and second is wbn: value
[08:30:56] <joannac> and the second paste also triggers the error?
[08:31:00] <amitprakash> yes
[08:39:35] <amitprakash> joannac, any ideas?
[08:40:11] <joannac> nope
[08:40:21] <joannac> especially weird that if you try it again, it works
[08:40:43] <amitprakash> :'(
[08:40:56] <amitprakash> filing a bug with mongo support then
[10:13:41] <yauh> is there a recommended way to get mongod stats into graphite/collectd/statsd?
[11:51:17] <BBin> Not entirely mongo-related but since it's similar: I have kind of the reverse situation of the regular use case. Instead of there being one query that should return all matching documents, I want to input a document and get all matching "queries" (match patterns, currently using the mongo syntax). Where should I look for info about this type of problem?
[11:52:52] <joannac> BBin: ...what
[11:53:09] <joannac> BBin: say you input the document {a:1}
[11:53:51] <BBin> You want me to explain further, or are you typing up a long answer?
[11:53:53] <joannac> queries that match this: {a: {$gt: -9999....9}}, {a: {$gt: -9999....8}}, etc
[11:54:15] <joannac> more queries that match this: {a: {$lt: 9999....9}}, etc
[11:54:36] <BBin> joannac: Oh! Now I get it. The queries are predetermined, it is a finite list
[11:55:03] <joannac> ?
[11:55:18] <joannac> I don't understand what that means
[11:55:20] <BBin> joannac: Let me explain why I want it as well, I understand my explanation wasn't good
[11:59:28] <BBin> joannac: So, I have an arbitrary list of queries. Say {a: {$gte: 5}, b: "hello"}, {a: {$gte: 5}}, {a: {$lte: 5}} (This list will be LONG). Now when inputing an object I want to find the rules from that list that match the object in the most optimal fashion. Obviously any object that matches the first one also matches the second one and it certainly does not match the third one, I want to exploit this type of relationship between the
[11:59:49] <BBin> * lt, not lte
[12:00:02] <joannac> you cut off at "between the"
[12:00:41] <BBin> joannac: last sentence, "I want to exploit this type of relationship between the queries"
[12:01:28] <joannac> okay
[12:01:51] <joannac> other than the fact you're using mongodb syntax, this is not a problem that can be solved with mongodb
[12:02:36] <BBin> yeah, I know. I just figured that database-inclined people would have a higher chance of knowing what I should look for
[12:02:46] <kexmex> what if two queries can accomodate?
[12:03:20] <BBin> kexmex: what do you mean by accomodate in this setting? I am not a native speaker
[12:03:35] <kexmex> if two differnt queries can return same resut
[12:03:36] <kexmex> result
[12:03:49] <kexmex> i mean, the result of two queries will contain what you are looking for
[12:05:04] <BBin> hmm, I'm still not sure what you are driving at
[12:05:26] <BBin> If two queries match the document, two queries are returned as the result
[12:05:52] <joannac> BBin: I still don't understand the use case
[12:06:17] <BBin> joannac: I'll explain the actual use case. It's pretty simple
[12:07:51] <BBin> I am building a real time publish/subscribe app. Clients can connect to the server as either publishers or subscribers. When a client subscribes it does that in the form of submitting a "query" (matching pattern). When someone publishes an object it is matched against all the subscribers rules and if a subscriber rule matches, the object is delivered to that subscriber
[12:07:56] <BBin> The issue is scaling
[12:08:28] <kexmex> so you are doing wildcard subscriptions eh?
[12:09:01] <kexmex> you probably want a scheme for the topics
[12:09:10] <BBin> Obviously I could just check every rule against the object, but considering that clients will sometimes have identical or very similar subscriptions this should be possible to optimize pretty well I think
[12:09:15] <kexmex> sometihng like /a/b/c/d/e so a wildcard would be /a/b/*/d/e or wahtever
[12:09:37] <kexmex> you'll be able to do a graph search at least ;)
[12:14:55] <joannac> BBin: sounds a bit like this https://github.com/10gen-labs/socialite/blob/master/docs/feed.md
[13:06:52] <yauh> any recommendations on how to integrate MongoDB stats into graphite with collectd or statsd?
[13:27:58] <the_drow> I accidently dropped the system databases. How can I restore them?
[13:29:28] <cheeser> restore from backup
[13:29:47] <cheeser> so you essentially lost your indexes?
[13:29:53] <cheeser> and maybe users...
[13:36:28] <the_drow> yup
[13:36:37] <the_drow> cheeser: I got no backup it's just a local instance
[13:36:59] <cheeser> you're probably out of luck then.
[13:37:13] <cheeser> if it's jsut indexes, you can redefine those easily enough.
[13:37:38] <ChALkeR> Hi all. Is there any way to limit wt memory usage below 1gib?
[13:38:28] <the_drow> I can't connect to the instance using robomongo
[13:39:07] <the_drow> cheeser: It says it can't load the database names
[13:40:33] <the_drow> cheeser: Should I just reinstall mongo?
[13:41:53] <the_drow> Nope, that did not help
[13:42:44] <ChALkeR> The total db dump takes 690M. Compressed storage — 375M.
[13:43:15] <ChALkeR> And the mongodb process could consume about 800M of ram.
[13:43:42] <ChALkeR> Can I limit the ram consumption with the wt backend?
[13:44:26] <ChALkeR> indexes (comressed) — 4.7M.
[13:44:35] <ChALkeR> What does it need 800M for?
[13:47:59] <cheeser> the_drow: reinstalling wouldn't fix anything if your data files are mangled.
[13:50:40] <ChALkeR> mmapv1 takes 1.6G on disk, but has a stable memory consumption (~220M) for me.
[13:54:57] <ChALkeR> Why is the memory limit calculated in whole gbs?
[13:57:20] <the_drow> cheeser: Well I removed /var/lib/mongodb/ and reinstalled again and it did help
[13:57:39] <the_drow> I have excluded the admin and local database from my script that clears everything in mongo
[13:59:01] <cheeser> right. you'd have to toss your data files and just restart mongod
[14:04:46] <fkt> i have an entry structured like such: { "_id" : ObjectId("54fc4db01b8562722b9e8c67"), "03082015" : { "test" : { "data1" : "490.56", "data2" : [ 6, 12, 21 ], "data3" : [ 5, 32, 74 ] }}} . each entry has a datestamp object which contains other items. how would i query mongo for that specific date - in this example - 03082015
[14:07:50] <cheeser> well, you could use $exists on that field name
[14:23:34] <ChALkeR> fkt: That would be slow.
[14:25:45] <cheeser> yeah. better to use an actual timestamp field so you can index the value
[14:25:56] <yauh> anyone using statsd or collectd to capture mongodb stats?
[14:34:29] <fkt> timestamp field ok ill research
[14:47:08] <ChALkeR> cheeser, fkt: The _id may be enough.
[14:48:00] <ChALkeR> If there is only one date per object and if that date corresponds to the ObjectId in _id.
[14:48:59] <cheeser> that's a lot of ifs. and magic.
[14:49:26] <cheeser> i tend to avoid clever queries like that because they're hard to unwind mentally and pretty limited.
[14:49:36] <ChALkeR> http://mongodb.github.io/node-mongodb-native/api-bson-generated/objectid.html#objectid-createfromtime
[14:49:49] <cheeser> i'm guessing she's wanting more than one data point per _id
[14:50:34] <ChALkeR> That's not magic, it's standard documented ObjectId functionality and has a corresponding API.
[14:51:00] <fkt> i simply to make entries with a date stamp which i can query later
[14:51:09] <cheeser> yes. i understand that. but it's unclear what's going on to the less experienced.
[14:51:18] <fkt> yup
[14:51:25] <ChALkeR> And no ifs.
[14:51:28] <cheeser> and it's of limited value in most cases
[14:51:37] <ChALkeR> _id: {$gte: a, $lt: b}
[14:51:45] <fkt> so, i should make a date: datestamp field?
[14:51:53] <fkt> key/value*
[14:51:57] <cheeser> fkt: it depends on what you're modeling.
[14:53:09] <fkt> yeah i simply don't know enough about mongodb honestly, need to read more
[14:53:33] <ChALkeR> cheeser: Quering by _id saves an extra index. It has limited applications, though. If there are more than one timestamp per item, if the item _id does not correspond to the timestamp it should represent — then you need a separate field (and index).
[14:53:34] <fkt> this isnt work related or anything, just trying to teach myself about something new vs using another sql type db for this which ive done countless times
[14:53:50] <fkt> each id will have unique timestamp
[14:53:54] <fkt> one per day, 365 a year
[14:54:02] <cheeser> ChALkeR: yes. i understand all that.
[14:54:27] <ChALkeR> cheeser: But if the model allows using _id instead of a separate index — why not?
[14:54:57] <ChALkeR> It's not some kind of «magic» and the syntax is very clear.
[14:55:14] <cheeser> i prefer being explicit about my data. and i prefer my id values to have no semantic value to my application.
[14:55:41] <ChALkeR> Ah.
[14:55:50] <cheeser> if i had billions of such records i might consider it for space reasons, though.
[14:55:59] <cheeser> to date (ha!) that hasn't been a problem.
[14:57:01] <ChALkeR> cheeser: I replaced ObjectId with manually-crafted ids in some collections, to save space and indexes. =) Because it was a problem there.
[14:57:44] <fkt> oh so i could make object id my datestamp?
[14:58:19] <cheeser> you can reuse _id (if it's an ObjectID) and it matches your timestamp.
[14:59:08] <GothAlice> However, on collections where I need to perform date-based aggregate queries, I use a distinct creation field, since date projection operators don't work on ObjectId.
[14:59:39] <ChALkeR> Has anyone used WT on low-ram machines?
[14:59:54] <cheeser> haven't switched to WT yet.
[15:00:10] <GothAlice> ChALkeR: I'm running a WT instance that's consuming maybe 230MB of RAM.
[15:00:53] <GothAlice> (It's got a 3.3 GiB virtual size, of course, but currently 230 MiB wired into it.)
[15:02:05] <ChALkeR> I guess that noone cares about virtual on 64bit =)
[15:02:32] <ChALkeR> GothAlice: What's the data size?
[15:03:17] <GothAlice> It's got a dataSize of 196 MiB. (This is MongoDB running in development on my laptop with a reduced dataset vs. production.)
[15:04:38] <ChALkeR> GothAlice: Have you tried WT in production?
[15:04:48] <ChALkeR> In development, everything was ok for me.
[15:05:11] <GothAlice> ChALkeR: Not yet. Now that I've got the app working in development, next step is to roll out a new 3.0.0 WT cluster on MMS for staging with a full copy of production data.
[15:05:48] <ChALkeR> Ah. My production is memory-limited a bit.
[15:06:02] <GothAlice> ChALkeR: Our production is the reverse of that. ;)
[15:07:10] <GothAlice> (We're running 60% data growth vs. the prior 90 days… so yeah. We've had to over-provision up front quite a bit.)
[15:07:41] <ChALkeR> On ~700M dump size, mmap1 reports ~1.3G dataSize and ~11.5M index size. And consumes ~260M ram.
[15:08:54] <ChALkeR> On ~700M dump size, wt takes up on disk ~400M for data and ~4.7M for indexes. And consumes >800M ram.
[15:10:27] <ChALkeR> And I don't know how to limit that, because wiredTigerCacheSizeGB is calculated in whole GBs =).
[15:10:41] <ChALkeR> It doesn't understand 0.5 or 0,5.
[15:10:51] <GothAlice> ChALkeR: My test data here has 68 indexes totalling 9 MiB of storage in WT. The key difference, I suspect, is that mmapv1 does what it says: it uses memory mapped files, so much of the virtual size is dedicated to the on-disk files, without adding to the "allocated stack size" of the process. (I.e. mmapv1 is giving you a very large under-estimate of the memory usage.) WT, OTOH, manages memory itself more, using actual malloc()'d memory
[15:10:51] <GothAlice> blocks and thus more apparent memory usage.
[15:11:09] <GothAlice> ChALkeR: I hope your database host is dedicated to the purpose…
[15:12:28] <ChALkeR> It actually looks like caches.
[15:13:10] <ChALkeR> Because it doesn't consume that on startup, but keeping it running increases memory usage over time.
[15:13:13] <ChALkeR> Until a restart.
[15:13:40] <ChALkeR> And no, no compex queries are involved.
[15:14:30] <GothAlice> That's pretty normal. Again, as long as you have a host dedicated to your database, its growth is clearly bounded (to the limits of the machine). Have you benchmarked and identified a drop in performance when the memory usage hits the limit?
[15:14:39] <GothAlice> Or is this a case of pre-emptive worry? ;)
[15:15:22] <ChALkeR> The host is not dedicated yet. I could limit the memory usage by other means of course.
[15:16:28] <GothAlice> Step 1: isolate your DB. Mixing and matching services is a quick path to unknowable pain. (Unknowable as you'll have difficulty identifying the culprit if something terrible happens. OOM-killer is not your friend. ;)
[15:17:14] <ChALkeR> GothAlice: That would double the nodes count. I will probably do that in some time, but not now. Maybe in a few months =).
[15:17:37] <GothAlice> ChALkeR: I'll hazard a guess that you also aren't running a replica set.
[15:36:26] <GothAlice> Well that's a thing.
[15:40:47] <GothAlice> Half-way through a multi-megabyte-per-second streaming find/update pass mongod disappeared, the testing app caught the socket error, then the whole host went away. Anyone else running into panics with WT under high load?
[15:55:04] <NoOutlet> Is the kernel panic reproducible, GothAlice?
[15:56:39] <GothAlice> NoOutlet: Not in the least, unfortunately. I've repeated that ten minute test suite four times now without another crash. (And looking over the dom0 logs, it looks like I somehow reset the virtual SATA controller and the root filesystem went *pif*. I've never, ever seen that before.)
[15:58:01] <nosleep77> hi there, is a simple command I can run from the shell to get a short status on health of db?
[15:58:18] <cheeser> db.stats()
[15:58:19] <cheeser> ?
[15:58:28] <cheeser> depends on what "health" means to you.
[15:58:41] <nosleep77> cheeser: for very quick monitoring. let me try that one
[15:58:43] <GothAlice> nosleep77: echo "db.stats()" | mongo <dbname> # A fast way to do it.
[15:59:11] <GothAlice> nosleep77: If you add '--quiet' to the mongo command, you'll get back just the JSON document.
[15:59:45] <nosleep77> thank you. let me check
[16:02:03] <GothAlice> nosleep77: For monitoring, you'd also want rs.status() if you have a replica set. See https://blog.serverdensity.com/mongodb-monitoring-rs-status/ for some in-depth monitoring advice. :)
[16:02:22] <nosleep77> thank you. that's awesome :)
[16:04:38] <nosleep77> which one can i use just to see if its up or down. this gives me great information but I want to have that basic check
[16:05:24] <cheeser> if you can't connect, it's down
[16:05:51] <nosleep77> makes sense, so any of them should result in non-zero exit code
[16:10:00] <GothAlice> nosleep77: With the rs.status() command run on any member of the replica set, it'll let you know if any of the other members aren't available, or what state they are in. (See the "state" field.) In theory you only actually need to monitor one of the members.
[16:11:29] <nosleep77> thank you.
[16:49:09] <oriceon> hello there
[16:49:30] <oriceon> i installed mongodb 3 on a vagrant env with ubuntu 14
[16:49:34] <oriceon> and cannot connect to it
[16:50:16] <oriceon> i created a test user db.createUser({ user: "test123", pwd: "password123", roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]});
[16:50:24] <oriceon> but cannot log in with it
[17:46:27] <inspiron> i have a mongo dbs, and i have a json dump of it. i need to write a mongoose schema for the collection in it. is there a way to auto create that schema in code?
[17:46:49] <inspiron> like actually auto create the code that i would have written for the schema
[17:55:29] <GothAlice> inspiron: MongoDB has no concept of a schema, so no.
[17:56:03] <latestbot> Stupid question, is there a way I can specify fields in a collection?
[17:57:03] <GothAlice> latestbot: If you mean "enforce a schema", then no.
[17:58:19] <latestbot> Oh
[17:58:42] <latestbot> So how the relationships will be defined? @GothAlice
[17:58:59] <GothAlice> latestbot: That's another item that MongoDB simply says "no" to. MongoDB has no relationships.
[17:59:23] <GothAlice> latestbot: See: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[17:59:47] <latestbot> After being used to SQL, it’s kinda weird to me
[18:00:02] <GothAlice> If you are used to SQL, the article I just linked should be helpful.
[18:01:12] <GothAlice> latestbot: See also: http://docs.mongodb.org/manual/reference/sql-comparison/ and http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
[18:01:42] <latestbot> Thanks @GothAlice
[18:32:16] <girb1> hi
[18:32:32] <girb1> http://pastebin.com/nRL0gcL2  — split is not happening please may I know the reason
[18:33:08] <attrib> hi all, I'm experience a weird issue. If I do a db.agg.findAndModify({query: {_id: {date: '2015-02-08', nid: 3898}}, update: {$inc: {session: 1}}, upsert: true}) - I get E11000 duplicate key error.
[18:34:01] <GothAlice> attrib: findAndModify — why are you using this particular operation instead of a standard update?
[18:34:01] <cheeser> run that query manually and see what you get.
[18:34:07] <attrib> checking db.agg.stats() - I get indexDetails of _id with a count of 328, but if I do db.agg.count() I only get 298
[18:35:05] <attrib> GothAlice: In my code (original using node.js), I return the modifed document
[18:35:13] <attrib> just broke everything now down to this
[18:35:30] <attrib> to continure working with this document
[18:37:08] <GothAlice> attrib: Unfortunately, that's not what that does. http://docs.mongodb.org/manual/reference/command/findAndModify/#output — if an existing record gets updated, you get back the value prior to the modification, not the current value. Better, in this instance, is to remember the _id you are using and simply continue to issue atomic updates as needed. (The "session" value won't be reliable as it stands. If you do need an accurate value,
[18:37:08] <GothAlice> .update() then .find() separately.)
[18:37:18] <attrib> so having this issue now in plain mongo (wanted to be sure nothing interferes here)
[18:37:45] <GothAlice> attrib: However, cheeser's question would be helpful to diagnose. What happens if you .find() just the query portion of that findAndMofidy?
[18:37:58] <GothAlice> s/findAndMofidy/findAndModify/
[18:38:33] <attrib> GothAlice: it returns nothing
[18:39:33] <GothAlice> attrib: Try: db.agg.insert({_id: {date: '2015-02-08', nid: 3898}, session: 0})
[18:39:48] <GothAlice> This is the rough equivalent of the upsert, just without incrementing the value.
[18:40:17] <attrib> GothAlice: E11000 duplicate key error.
[18:40:18] <attrib> :P
[18:40:30] <GothAlice> Right-o. db.agg.getIndexes() — any indexes involving "session"?
[18:40:57] <attrib> only the standard _id index
[18:41:05] <GothAlice> session isn't in _id.
[18:41:22] <attrib> no, its outside
[18:41:41] <attrib> _id is just {date: 'string', nid: int}
[18:42:16] <attrib> like said before, the strange thing is looking at the stats
[18:42:26] <attrib> the id index has a count of 328
[18:42:38] <attrib> while the count over the collection returns only 298
[18:43:36] <GothAlice> attrib: http://showterm.io/cb59ca42bf74f73900589
[18:43:42] <attrib> I did a remove today [db.agg.remove({'_id.date': '2015-02-08'})]
[18:44:00] <attrib> I have the feeling the index still has the values
[18:44:31] <GothAlice> Yeah. There's something pretty funky with that collection. It will probably be worthwhile to transfer the contents to a new collection.
[18:45:21] <attrib> If I'm doing now a db.agg.reIndex()
[18:45:27] <attrib> it works, till I do the remove again
[18:45:31] <GothAlice> (That terminal recording demonstrates inserting explicitly, upserting the existing record, then upserting a new record.
[18:47:01] <GothAlice> attrib: Why collection.remove instead of colleciton.find().remove/removeOne?
[18:47:23] <attrib> Is there a difference?
[18:47:37] <GothAlice> Yeah, collection.remove isn't really a thing, it's a console macro.
[18:47:51] <GothAlice> (Type "db.agg.remove" and press enter in the shell to see the source for it.)
[18:48:04] <GothAlice> Basically boils down to db.agg.find(query).remove()
[18:48:30] <attrib> okay, but like said - in the normal case this is handled by the mongodb - nodejs driver
[18:48:43] <attrib> was currently just testing if I can reproduce it on the console
[18:48:52] <attrib> to not spam here with node.js issues
[18:50:19] <GothAlice> Are you able to reproduce the issue on a completely clean collection? (I.e. re-name the existing one out of the way, do nothing to the indexes on the new collection.)
[18:50:37] <GothAlice> My terminal recording demonstrates that this (clean) should work.
[18:54:58] <GothAlice> attrib: http://showterm.io/39afe4940aff9b7deb9eb < demonstrates that a potentially conflicting upsert works after record removal, too.
[18:56:55] <attrib> seems like I broke something with the index there
[18:57:12] <attrib> not sure what, can't replicate right now on a fresh collection
[18:58:09] <inspiron> has anyone used mongo-connector with elasticsearch?
[18:58:18] <attrib> but inserting a value, doing a count, returns me 299; checking stats, tells me the size of the id index is 299
[18:58:45] <GothAlice> attrib: .count() results are "best effort" guesses.
[18:58:46] <attrib> deleting the entry again, count is now 298, stats index size ist still 299
[18:58:59] <GothAlice> Yeah, looks like you've got index updates disabled somehow.
[18:59:25] <GothAlice> AFIK, that's impossible, as index updates are synchronous with the applied atomic operations.
[19:00:06] <attrib> I'm getting the impossible possible to day... the day gets better and better :P
[19:00:59] <GothAlice> However, a disparity between .count() results and the rest of the world can be expected. Doubly expected if you have are using sharding.
[19:01:09] <GothAlice> s/have are/are/
[19:01:29] <attrib> if I do a db.agg.reIndex() all the values are 298 again
[19:01:43] <attrib> so currently trusting count more than the indexes ;)
[19:01:45] <GothAlice> Indeed. MongoDB doesn't do much in the way of "online compaction".
[19:02:23] <attrib> I'm connected to a mongos, but only have one shard, but using this for easy access to the replication behind it
[19:03:19] <attrib> trying the same now on the primary
[19:03:26] <attrib> to test if mongos inbetween is the problem
[19:03:41] <GothAlice> Indexes are (using the mmapv1 back-end driver) stored as binary trees, with leaves representing ordered "buckets". Inserting records may allocate more buckets that removal won't clean up. When attempting to count() MongoDB estimates the number of records based on the number of populated buckets, and sharding makes this a hella non-trivial operation.
[19:04:01] <GothAlice> .reIndex() would throw away the existing buckets and create new, highly packed ones.
[19:04:52] <attrib> uhh, yeah. working directly on the primary it works...
[19:06:34] <attrib> so somehow I know now that the problem is with mongos
[19:11:15] <attrib> oO, now it works with mongos too
[19:11:18] <attrib> I'm confused
[19:11:25] <attrib> but big thumbs up GothAlice
[19:11:28] <attrib> GothAlice++
[19:11:34] <attrib> :)
[19:13:02] <GothAlice> :/
[19:13:11] <GothAlice> I'm not a fan of it spontaneously working...
[19:14:26] <cheeser> "things that start working by themselves tend to stop working by themselves." -- my dad
[19:17:14] <GothAlice> cheeser: In a related vein: “Do you have a backup?” means “I can’t fix this.” (Alice's Law #105: https://gist.github.com/amcgregor/9f5c0e7b30b7fc042d81) I'll totally need to add your father's quote in there, though.
[19:17:59] <GothAlice> cheeser: Your quote is now Law #145. ;)
[19:18:02] <cheeser> he also had a theory that electronics ran on smoke and once you let the smoke out you can't put it back in... so, you know, he was interesting. :D
[19:18:12] <GothAlice> Magic smoke.
[19:18:24] <GothAlice> Magic smoke is a pretty common engineering term. ;)
[19:19:56] <attrib> yeah, I also don't like this at all
[19:20:07] <attrib> but somehow I want to go home already :P
[19:20:27] <GothAlice> attrib: Step 1: until you actually are sharding, don't use a sharding router in front of your plain replica set.
[19:20:50] <cheeser> we used mongos for years at my last gig before finally sharding.
[19:20:54] <GothAlice> (See Alice's Laws #30 through 32.)
[19:21:15] <GothAlice> It's an extra point of failure. :/
[19:21:23] <GothAlice> With no benefit for use.
[19:21:36] <attrib> colleague told me he had problems in his last company with mongodb and replicasets without using mongos, so....
[19:21:50] <cheeser> yeah. it was a premature optimization by the ops guys since when we finally decided to shard, the cluster topology didn't have to change
[19:22:05] <cheeser> how that worked out in practice, i dunno. i left before they sharded.
[19:22:09] <GothAlice> lol
[19:29:49] <GothAlice> attrib: I only run one cluster (26 TiB dataSize) as a sharded replica set, for obvious reasons, but the other clusters I manage are simply bare replica sets. When their dataSize + indexSize grows > RAM, I'll consider sharding them. ;)
[19:42:43] <attrib> fun day. after now using the replica set directly without mongos, I get the error now with the primary too... I really just should go home and never come back :P
[19:44:43] <GothAlice> attrib: If you can reproduce reliably, open a JIRA ticket that includes the steps needed.
[19:45:01] <GothAlice> Indexes going out-of-sync are a Problem™ that deserves a look at by MongoDB staff.
[22:29:50] <aliasc> i find it hard to track and analyze my data in mongodb
[22:30:05] <aliasc> are there good tools for analysis
[22:33:58] <GothAlice> aliasc: At work we rolled our own "dynamic aggregate query reporting framework". I used http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework and http://docs.mongodb.org/ecosystem/use-cases/hierarchical-aggregation/ to assist in structuring my data in the most efficient way for how it needed to be queried.
[22:35:20] <GothAlice> aliasc: Things like our dashboard (http://cl.ly/image/142o1W3U2y0x) are simply a series of standard and aggregate queries.
[22:43:13] <GothAlice> aliasc: For data analysis links, see the following presentations: http://www.slideshare.net/crcsmnky/analyzing-data-in-mongodb and http://www.mongodb.com/presentations/high-throughput-data-analysis and this may be of use/interest: http://community.pentaho.com :)
[23:58:45] <_blizzy_> so I can think of NoSQL like this? https://gist.github.com/NotBlizzard/53ab334627d7fc42eb8d