PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 6th of April, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:11:19] <greyTEO> GothAlice: any chance development will continue on https://github.com/marrow/task ?
[00:12:35] <GothAlice> greyTEO: Indeed; I'm currently polishing up the cinje template/DSL engine first (ugh, marketing sites are tedious to work on if you're a lib developer ;) then can turn my attention to divorcing task from MongoEngine prior to bundling up a release of that.
[00:12:55] <GothAlice> MongoEngine has… broken my heart a bit. :/
[00:14:03] <greyTEO> awesome! im currently re-implementing a queue from a while back and do not marrow.task avalible on pip anymore..
[00:14:46] <greyTEO> not sure what version I was way using way back in the day
[00:15:04] <GothAlice> m.task has never been available on the package index. :/
[00:15:29] <GothAlice> greyTEO: https://github.com/marrow/mongo/blob/develop/marrow/mongo/util/capped.py may be of interest.
[00:15:58] <GothAlice> marrow.mongo is going to be my collection of pymongo add-ons, vs. the MongoEngine approach of wrapping everything.
[00:16:48] <GothAlice> And, ah, see the documentation for Python's "warnings" module to disable that "A batgirl has died." warning if you try to patch in production environments. ;P
[00:16:58] <GothAlice> s/batgirl/catgirl/
[00:17:00] <GothAlice> Silly auto-correct.
[00:19:02] <greyTEO> ok, I thought it was. maybe it’s a documentation thing
[00:19:12] <greyTEO> Ill give that repo a look and see what’s in there.
[00:19:28] <greyTEO> it is an interesting warning though…lol
[00:21:00] <GothAlice> "Monkeypatching" is generally frowned upon.
[00:21:34] <GothAlice> But the function was designed, in this iteration, to match the call semantics of the rest of the new pymongo API, i.e. designed to integrate as a method on Collection.
[01:46:25] <Forbidd3n> Quick question in regards to storing dataset in MongoDB? Would I have three pieces; Cruise Lines, Cruise Ships (linked to lines), Cruise Schedules (linked to ships) - Should I store this all in one collection or should I create three collections and link them by id?
[01:47:26] <Forbidd3n> I was thinking of one collection called Schedules (Line, Ship, Date, Port, Arrive, Depart)
[01:48:03] <Forbidd3n> The only issue with this is I will need to use Cruise Lines and Cruise Ships again and will have other tables using the same references
[01:51:53] <Forbidd3n> Or better yet I was thinking one collection called Schedules it will hold parent (lines -> object/array of ships per line -> object/array of schedules per ship
[01:51:58] <Forbidd3n> Anyone?
[01:54:39] <Forbidd3n> Anyone here that can help with schema design question?
[08:16:41] <bo_> Hi all
[08:19:40] <bo_> Hi All, in repica set primary and arbiter nodes can't see secondary node. Return "lastHeartbeatMessage" : "Couldn't get a connection within the time limit". Please help
[08:47:19] <bo_> in repica set primary and arbiter nodes can't see secondary node. Return "lastHeartbeatMessage" : "Couldn't get a connection within the time limit". Please help
[09:22:43] <bo_> #mongodb
[10:15:59] <r3d0x> hey there. I'm having an issue here with an update, concerning the positional operator. Does anyone here have an idea if this is a bug? Thanks http://pastebin.com/h2NGhN8Z
[10:18:57] <Derick> r3d0x: not sure whether it is a *bug* - perhaps not something we support. Does it work if you switch the query clauses for scopes and tags.id around?
[10:19:48] <r3d0x> Derick: i've already tried that and it didn't work
[10:20:47] <Derick> it's a tricky one
[10:21:17] <Derick> I'd file a Jira ticket: https://jira.mongodb.org/browse/SERVER
[10:21:26] <r3d0x> Derick: will do :)
[10:21:30] <r3d0x> Derick: thanks
[10:22:14] <Derick> suggested new syntax? "$set" : { "tags.{$tags.id}.name" : "ghi"} ? :-)
[10:22:56] <Derick> or perhaps "$set" : { "tags.$1.name" : "ghi"} (with the scopes being $0)
[10:28:02] <r3d0x> :)
[10:45:00] <kurushiyama> @Derick Deja vu: http://stackoverflow.com/questions/36448005/how-to-have-best-performance-with-nosql-database-and-million-documents
[10:45:10] <Derick> :-)
[10:45:26] <Derick> kurushiyama: yeah, close as duplicate
[10:46:14] <kurushiyama> @Derick Cant reference IRC, but it is OT on multiple levels, and I opted for too broad ;P
[10:46:26] <Derick> me too then
[10:49:24] <kurushiyama> But he is to congratulate. Triple the docs/day from yesterday. The application seems to get massive momentum.
[10:49:48] <Derick> hehe
[11:20:54] <bo_> https://groups.google.com/forum/#!topic/mongodb-user/WDgJ8dI1BvY DUdes please help
[11:23:34] <kurushiyama> bo_ **Check** the system time on the machines.
[11:24:15] <dddh__> kurushiyama: ISODate("1970-01-01T00:00:00Z") ?
[11:24:20] <kurushiyama> bo_ Next, ping each member from each other member by NAME
[11:24:36] <kurushiyama> dddh__ Can be one indication. I just want to make sure the basics are in place.
[11:25:14] <kurushiyama> dddh__ I chased flying sheep to often just to find out that it was the freaking grass.
[11:25:31] <dddh__> recently had something similar, had to remove ipv6 from /etc/hosts and reconfig force ;(
[11:26:44] <kurushiyama> dddh__ Well, order matters in /etc/hosts. First past the post.
[11:28:27] <kurushiyama> bo_ Hm, you have a primary, so where is the problem. You have (more or less) all the time in the world to debug.
[11:29:44] <kurushiyama> bo_ Are you there?
[11:32:34] <bo_> Yes I 'm there
[11:33:04] <bo_> ping success go to any node
[11:36:11] <bo_> I'm compare date in primary, secondary and arbiter
[11:36:48] <bo_> arbiter has got anjother timezone maybe problem in it
[11:37:04] <kurushiyama> The host called "primary" is not reachable, nothing much to compare there.
[11:38:49] <bo_> I run mongo -host primary --port 27017 from secondary
[11:38:57] <bo_> all is ok
[11:40:14] <kurushiyama> bo_ Have a look at the output you pasted.
[11:40:25] <kurushiyama> bo_ Try from arbiter, too.
[11:42:03] <kurushiyama> bo_ It is really weird, since you ran rs.status() on the secondary. Can you rereun it and use pastebin to show the result?
[11:42:40] <aiRness> Hello is there a way to take a mongodbdump outside of a docker container?
[11:43:14] <kurushiyama> aiRness Have you any volumes mounted?
[11:43:21] <aiRness> yes, the data
[11:44:31] <aiRness> can I point the directory of the data?
[11:44:48] <aiRness> with the mongodump that is
[11:45:32] <kurushiyama> aiRness Depends on the version you run. If <3.0, that is possible. However you should impose an fsynclock
[11:45:59] <aiRness> hmm I'm not familiar with that
[11:46:21] <kurushiyama> aiRness Is it an option to stop the container?
[11:46:29] <aiRness> no :)
[11:46:36] <aiRness> I guess you mean to stop the writes to the db
[11:46:40] <kurushiyama> aiRness Underlying FS?
[11:46:53] <aiRness> ext4
[11:47:18] <kurushiyama> Hm, you should be able to connect to the mongod inside the docker container, anyway...
[11:47:33] <aiRness> yeah I can
[11:47:41] <kurushiyama> so do a dump against the server.
[11:47:45] <bo_> <kurushiyama> restart secondary node and get rs.status() http://pastebin.com/W7KeUExx
[11:47:56] <aiRness> kurushiyama: I just need to automate this to run every N hours or so
[11:48:14] <kurushiyama> aiRness Bad backup strategy, imho
[11:48:20] <aiRness> yeah
[11:48:25] <kurushiyama> bo_ Ah, that looks better.
[11:48:26] <aiRness> I'm still thinking about it tbh
[11:48:41] <aiRness> what would you suggest?
[11:48:59] <kurushiyama> aiRness Use LVM, create snapshots. Or use cloud managers backup
[11:49:33] <aiRness> hmm that's not an option atm unfortunately
[11:49:50] <aiRness> but I thought I can reach with mongodump the data outside the container
[11:49:55] <aiRness> from inside the container the dump works ok
[11:50:51] <bo_> <kurushiyama> but what is it ? Why mongod have weird behaviour when send HeartBeat?
[11:51:29] <kurushiyama> aiRness a) mongodump is the very least backup solution I use, aside from config servers in a sharded cluster b) The bigger your data gets, the more inconsistent your backup will become using mongodump. c) at least LVM should be possible. It is free and easy to use.
[11:51:59] <kurushiyama> bo_ It sends a heartbeat every few secs, iirc. That is not the cause of any problem.
[11:52:12] <aiRness> I understand, for now it should be fine though
[11:54:13] <bo_> I agree with you that heartbeat is not cause of problem, it's result of problem
[11:54:22] <kurushiyama> aiRness Then simply connect to your mongodb via the exposed port and do your dump. Again, doing a mongodump against a running instance is a horrible idea for backup purposes, if we are not talking of cheap data. It _is_ possible, but you really have to put some effort into data modelling to make sure you will get a consistent backup.
[11:54:44] <kurushiyama> bo_ Maybe you should start from the beginning, describe your env and the problems you experience.
[11:55:04] <aiRness> kurushiyama: alright, thanks but regarding connecting to the exposed port I get an empty dump, I can see the data only from inside the container
[11:55:10] <aiRness> when I do a mongodump that is
[11:55:45] <kurushiyama> aiRness https://docs.mongodb.org/manual/reference/program/mongodump/#working-set
[11:56:02] <kurushiyama> aiRness You are probably doing something wrong... ;)
[11:56:29] <aiRness> kurushiyama: mongodumo --host foo --port 27017 (no user or password)
[11:56:35] <aiRness> the commands works fine inside the container
[11:56:38] <aiRness> but not outside
[11:56:44] <kurushiyama> aiRness Uhm.
[11:56:46] <aiRness> I've read the manual
[11:57:17] <kurushiyama> aiRness In your _container_ your mongod surely listens on 27017. On the host? most likely not so much.
[11:57:42] <aiRness> yeah and on the host as well, 27017 is reachable
[11:57:50] <aiRness> otherwise the machine wouldn't be on production
[11:58:15] <aiRness> I've tested with a differnt port just to make sure and it won't connect
[11:58:17] <aiRness> it connects fine
[11:58:19] <kurushiyama> Well, how should I know the config of your prod env?
[11:58:29] <kurushiyama> aiRness Gimme a sec.
[11:58:33] <aiRness> yeah but I mean it's only port and host
[11:58:34] <aiRness> alright
[11:59:44] <kurushiyama> aiRness Can you run docker ps -a and paste the result on pastebin?
[11:59:55] <aiRness> sure
[12:00:37] <aiRness> ah right, the ports entry is empty
[12:00:56] <aiRness> http://pastebin.com/JRD0Pgtw
[12:02:04] <kurushiyama> Which leads to the question how your prod env accesses the data... You'd check that. Now.
[12:02:19] <bo_> <kurushiyama> thank you for help
[12:02:19] <kurushiyama> bo_ You still there?
[12:02:35] <kurushiyama> bo_ Uhm... did not do anything, really.
[12:03:09] <aiRness> well I didn't configure it but the host machine has multiple 27017 entries open
[12:03:34] <aiRness> and also I can see the checks of the cluster fine from icinga
[12:04:17] <kurushiyama> aiRness No. f... the tools for a moment. Do a manual test.
[12:04:58] <aiRness> kurushiyama: did already, everything works fine apart from the dump from the host
[12:05:01] <aiRness> I can connect
[12:05:03] <aiRness> call the mongo client
[12:05:15] <aiRness> ports are looking fine via netstat
[12:05:19] <kurushiyama> That is... strange
[12:05:21] <aiRness> prod app looks fine
[12:05:26] <aiRness> I mean we have million of hits
[12:05:30] <aiRness> everything is good
[12:05:46] <kurushiyama> aiRness You have a replset or a sharded cluster then, I assume?
[12:06:02] <aiRness> kurushiyama: replset
[12:06:09] <kurushiyama> aiRness Hence.
[12:06:34] <kurushiyama> run docker ps -a on the other nodes of the replset. I assume they expose the port.
[12:07:00] <aiRness> I run it on the primary, ok give me a min
[12:07:29] <kurushiyama> aiRness If it is not reachable, it is unlikely to be still primary, no matter your designation ;)
[12:08:30] <aiRness> the only issue at the momment is that mongodump is empty, it can see fine the host/port and I can connect to the mongo client
[12:08:40] <aiRness> all the instances have the same setup
[12:10:04] <kurushiyama> aiRness Well, since you seem to have it all right, I can not help you. Your magical docker env, which does not expose a port probably exchanges the data using quantum fluctuation, which is beyond my understanding.
[12:10:46] <aiRness> well I didn't claim that I have it all right, it's weird to me
[12:10:49] <aiRness> but it's alright
[12:10:54] <aiRness> I'm not gettin into that conversation
[12:10:58] <aiRness> it seemed interesting to me
[12:11:00] <aiRness> np
[12:19:36] <kurushiyama> aiRness Well, we need to find out what's happening... So we need to collect facts. ;)
[12:21:19] <aiRness> sure, I think I know what is wrong, I will inform the channel if you like
[12:24:17] <aiRness> kurushiyama: it was just a missmatch of the client version
[12:25:20] <kurushiyama> aiRness Dang, I should have known that. But still I am curious about the quantum fluctuation data exchange.
[12:27:04] <aiRness> well me too but I guess I need to talk with my colleagues because I didn't set it up
[12:27:51] <kurushiyama> aiRness And still, using mongodump as means of backups is something I strongly discourage. You have been warned ;)
[12:33:36] <Torsten`> Hi, is it possible to search for a field by a part of a string?
[12:35:08] <aiRness> kurushiyama: I will warn our main dev :)
[12:35:54] <kurushiyama> aiRness Uhm, DevOps? Good luck.
[12:36:14] <kurushiyama> Torsten` Sure, multiple options. Either use text indices or regexes
[12:36:54] <aiRness> :)
[12:39:36] <kurushiyama> aiRness imvho, DevOps is just an expression of CXOs saying "I just have to c&p a few lines, and I have software X installed. How hard can that be?" Overlooking some tiny details.
[12:40:49] <dddh__> hidden replica set member with no indices > mongodump
[12:42:36] <kurushiyama> dddh__ The problem is when data gets big and the mongodump takes long. Hidden or not, you'd run into problems.
[12:43:37] <aiRness> kurushiyama: it will fade away like other stuff did, virtualization, agile, cloud way of doing things, etc.
[12:44:21] <aiRness> and it created evne more confusion
[12:44:34] <aiRness> nobody understands what it actually means, everyone is missinterpreting each other
[12:44:37] <aiRness> and it's a mess
[12:44:47] <aiRness> we were workign with developers 10 years ago and it was fine
[12:44:49] <aiRness> no need for manifests
[12:44:52] <kurushiyama> aiRness I doubt that. It "saves" money. Those things are going to persist. And I am not too sure agile fades. It is the reason for this devops nonsense the first place, no?
[12:45:06] <aiRness> yeah
[12:45:19] <aiRness> it's the agile way saying it with different words
[12:45:23] <aiRness> and some cherries at the top
[12:45:40] <scruz> howdy
[12:45:58] <scruz> how do i implement a binning counter with the aggregation framework?
[12:46:48] <cheeser> group, count
[12:48:29] <Spritzgebaeck> hello
[12:49:17] <kurushiyama> Spritzgebaeck Guten Tag auch! ;)
[12:50:41] <Spritzgebaeck> i have a question. we had an application for a mongodb 2.6.11, today we upgrade to 3.2.1 and our aggregates with an $out command are slower than before. if we removed the $out command there perform nearly same
[12:51:26] <kurushiyama> Spritzgebaeck Compression
[12:52:02] <kurushiyama> Spritzgebaeck Have a look at your CPU utilization.
[12:54:43] <scruz> hey kurushiyama
[12:55:30] <scruz> cheeser: thanks.
[12:55:44] <kurushiyama> scruz Yes?
[12:55:55] <scruz> kurushiyama: just saying hello
[12:56:12] <scruz> problem is i want to do it for several fields at the same time.
[12:56:22] <kurushiyama> scruz Sorry, jumping from here to there and around atm.
[12:56:28] <scruz> cheeser: ^
[12:56:36] <scruz> kurushiyama: no problem
[12:56:47] <Spritzgebaeck> okay, i had an disconnect, so i have to ask the same question again
[12:56:49] <Spritzgebaeck> i have a question. we had an application for a mongodb 2.6.11, today we upgrade to 3.2.1 and our aggregates with an $out command are slower than before. if we removed the $out command there perform nearly same
[12:57:06] <cheeser> 08:44 < kurushiyama> Spritzgebaeck Compression
[12:57:07] <cheeser> 08:44 < kurushiyama> Spritzgebaeck Have a look at your CPU utilization.
[12:57:49] <Spritzgebaeck> kurushiyama: it is possible to disable the compression
[12:57:57] <Spritzgebaeck> thank you cheeser
[12:58:26] <cheeser> np
[12:58:41] <kurushiyama> Spritzgebaeck Well, sort of. Do you have a replset?
[12:59:10] <scruz> can’t do a $group within a $group, more’s the pity
[13:00:23] <scruz> i think i’ll $push 1 or 0 to an array then sum outside mongodb
[13:00:51] <kurushiyama> scruz How about pastebinning an input doc and the expected output?
[13:00:56] <Spritzgebaeck> kurushiyama: no, it is only one mongodb on a windows 8 system
[13:01:46] <kurushiyama> Spritzgebaeck Uhm. You are running a mongod on a windows 8 for production and performance is of concern?
[13:02:52] <kurushiyama> Spritzgebaeck Forgive me, but that is a bit like complaining that your VW Golf is getting slower when you load stones into it, all while trying to get it ready to compete in Formula 1.
[13:03:03] <Spritzgebaeck> it's a developer machine and after the upgrade our aggregates took longer.
[13:03:52] <kurushiyama> Spritzgebaeck Which is understandable. a) WT needs more RAM b) WT needs more CPU c) WT is more reliant on FS performance.
[13:05:38] <kurushiyama> Spritzgebaeck regarding a) Windows takes a lot of RAM by itself. b) Your CPU most likely did no change c) Both ReFS and NTFS aren't exactly known as performance monsters.
[13:08:18] <StephenLynx> i'd suggest using a VM to develop.
[13:08:31] <StephenLynx> running anything worthwhile on windows is masochism.
[13:08:46] <Spritzgebaeck> kurushiyama: okay. than we have an explanation for our problem. we was uncertain about that problem
[13:08:52] <Spritzgebaeck> thank you
[13:09:05] <kurushiyama> Spritzgebaeck Follow StephenLynx advice.
[13:09:43] <Spritzgebaeck> kurushiyama: our customer had mostly windows server. soooo... :D
[13:09:48] <StephenLynx> HAHAHAHAHAHAHAHAHAHAHAA
[13:10:00] <StephenLynx> ebin
[13:10:02] <StephenLynx> :^)
[13:10:24] <kurushiyama> Spritzgebaeck I _strongly_ discourage running MongoDB on windows in prodcution. The FS alone is a bottleneck.
[13:10:37] <Spritzgebaeck> we know :'(
[13:10:46] <StephenLynx> his client is probably too retarded to listen to reason.
[13:10:46] <kurushiyama> Spritzgebaeck We are talking of performance loss of up to 40%
[13:10:56] <kurushiyama> That easily makes whole servers in a cluster.
[13:11:05] <StephenLynx> he probably thinks windows is better because its expensive.
[13:11:29] <kurushiyama> Well, I OUL still licensable?
[13:11:38] <Spritzgebaeck> clients... there a many of them, all with windows servers and no real admins
[13:11:38] <StephenLynx> wot
[13:11:49] <StephenLynx> OUL?
[13:12:05] <kurushiyama> Oracle "Unbreakable" Linux.
[13:12:20] <StephenLynx> they might as well pay for RHEL support
[13:12:33] <kurushiyama> StephenLynx IIRC, OUL was more expensive.
[13:12:45] <StephenLynx> yeah, but its full of oracle bullshit.
[13:12:53] <StephenLynx> but I get
[13:12:57] <StephenLynx> the joke v:
[13:14:09] <kurushiyama> Spritzgebaeck Be warned. Running MongoDB on Windows forces you to prematurely scale, most likely because of IO or RAM bottlenecks artificially made worse because of the uderlying OS. We can easily talk of hundreds to thousand/month.
[13:16:06] <kurushiyama> Wow, running an IT infrastructure with no admins. That is the epitomy of devops.
[13:17:46] <StephenLynx> told you, his clients are beyond hope, probably.
[13:17:58] <kurushiyama> What's next? Getting rid of dev? Or sth like SalesMarkBackOfficeDev?
[13:18:00] <StephenLynx> the worst is that he is aware of it.
[13:18:11] <kurushiyama> That is sad, really.
[13:18:47] <StephenLynx> in times like these I am really glad I work in a company where the CEO/owner is a linux user v:
[13:19:04] <StephenLynx> and would die before running a windows server.
[13:20:44] <dreamreal> I was at a shop like that except they were all mac... and relied on azure/sharepoint for everything
[13:21:38] <StephenLynx> huehue
[13:22:34] <Spritzgebaeck> the clients think, they a bought something like ms office and could install it by itself, so no needs for admins.
[13:23:23] <StephenLynx> hue
[13:24:41] <scruz> kurushiyama: i’m building the pipeline in python, so i get some extra tools to work with to create a monster of an aggregation pipeline
[13:31:06] <scruz> kurushiyama: https://bpaste.net/show/355792eb2d5b
[13:31:41] <scruz> that’s from an interactive session for thinking through
[13:31:50] <scruz> now have to translate that into solid code.
[15:21:34] <deshymers> I'm looking at using mongodb to store analytic data, and I was wondering if this https://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/ is still relevant an current?
[15:22:01] <deshymers> I found a few blog posts but they were almost 6 years old
[15:22:29] <echelon> hi, what should i do if mongo runs out of memory?
[15:22:47] <kurushiyama> echelon scale up or out.
[15:23:03] <echelon> in the interim
[15:23:05] <kurushiyama> echelon Adding more swap might help (very) short trm
[15:23:14] <echelon> on aws? :/
[15:23:19] <kurushiyama> echelon Sure.
[15:23:23] <kurushiyama> echelon Loopback
[15:24:13] <echelon> i'm just wondering if it would add too much latency considering ebs is all networked storage
[15:25:22] <kurushiyama> echelon Better than being fragged by OOM killer.
[15:25:30] <echelon> alright, fair enough
[15:25:32] <echelon> thanks
[15:25:44] <kurushiyama> echelon You are welcome.
[15:25:47] <echelon> :)
[15:45:21] <echelon> kurushiyama: doesn't seem like mongo is using it
[15:45:39] <echelon> SwapTotal: 6291452 kB
[15:45:42] <echelon> SwapFree: 6291444 kB
[15:47:12] <echelon> "Nevertheless, systems running MongoDB do not need swap for routine operation. Database files are memory-mapped and should constitute most of your MongoDB memory use. Therefore, it is unlikely that mongod will ever use any swap space in normal operation. The operating system will release memory from the memory mapped files without needing swap and MongoDB can write data to the data files without needing the
[15:47:14] <echelon> swap system."
[15:47:56] <Derick> who here asked about v8 vs spidermonkey yesterday?
[15:48:11] <Derick> https://engineering.mongodb.com/post/code-generating-away-the-boilerplate-in-our-migration-back-to-spidermonkey/ has the story
[15:49:17] <kurushiyama> echelon Then I am pretty confused why OOM killer should be applicable.
[15:49:47] <kurushiyama> @Derick I did not ask the original question, but I was and am interested anyway.
[15:54:52] <kurushiyama> echelon Ok, one step back. You were saying that you are out of memory. What is your RAM utilization?
[15:55:07] <echelon> 92-93% by mongod
[15:56:15] <kurushiyama> echelon That _is_ a bit high, admittedly, it should be somewhere between 85 - 90%
[15:57:05] <kurushiyama> echelon But as long as physical mem is still available, I would not be concerned.
[15:57:16] <echelon> oh ok, thanks
[15:57:24] <echelon> just trying to figure out why the queries are slow
[15:57:44] <kurushiyama> echelon That may be a whole different story, especially with EBS
[15:57:48] <echelon> oh ok
[15:58:03] <kurushiyama> echelon You said yourself that EBS might add latency.
[15:58:25] <kurushiyama> echelon With a mem usage that high, it might well be that your queries touch documents not in the working set.
[15:58:49] <echelon> hmm
[15:58:50] <kurushiyama> echelon In which case they'd have to be loaded from "disk"
[15:59:44] <deshymers> I am working on storing analytic data in mongodb, we dont need to be very granular only weekly and monthly counts, so I was thinking that this structure would be good for that, https://gist.github.com/deshymers/d86d1d579c32a133e47d2ec227b39315
[15:59:52] <echelon> i see
[16:00:21] <deshymers> is there anything blatantly wrong with, or that could cause problems with that document schema?
[16:00:24] <kurushiyama> echelon And then there might be readahead settings which are not optimal, compression might interfere and whatnot.
[16:00:57] <kurushiyama> deshymers Overembedded, imho
[16:01:49] <Derick> i'd put each year in a separate document at least
[16:01:54] <deshymers> storing the years that way, or the weeks/months in the year?
[16:02:00] <kurushiyama> echelon Or, to put it the other way around: The best experience I have with MongoDB is on SSDs, enabling raw performance.
[16:02:24] <deshymers> ok
[16:02:34] <deshymers> that makes sense, then the documents to get to bloated
[16:03:05] <kurushiyama> deshymers As for time series data, I'd always put it into a flat "schema" like {"date": someDate, "source":whatever, "value": foo}
[16:04:23] <deshymers> kurushiyama: so one document per entry then? and another for total counts?
[16:05:27] <kurushiyama> deshymers You do not need another one. You can calc the counts on demand or from time to time using aggregations, potentially writing those aggregations to a collection using an $out stage
[16:05:38] <deshymers> this is what I was following, https://docs.mongodb.org/ecosystem/use-cases/pre-aggregated-reports/#schema
[16:05:42] <deshymers> ahh right
[16:07:25] <deshymers> kurushiyama: you would probably get better write performance with a flatter document, correct?
[16:07:46] <deshymers> rather then updating an existing large document
[16:08:01] <kurushiyama> deshymers Exactly. Hence the suggestion.
[16:08:18] <kurushiyama> deshymers Plus, it scales much better.
[16:09:18] <deshymers> ok
[16:09:52] <kurushiyama> deshymers If you need real time data (in the strictest meaning of the word) however, it might not be optimal.
[16:10:21] <deshymers> oh no this isnt for realtime, we only want to graph weekly and monthly views
[16:10:44] <kurushiyama> deshymers Then I'd go with flat events, plus aggregations with an $out stage
[16:10:52] <deshymers> I know this talks about down to the second, http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb and that is way over kill for my needs :P
[16:11:40] <deshymers> kurushiyama: and filtering on a date index by weekly wont be a big performance hit?
[16:12:38] <deshymers> I guess I could still store the data as an int if I need to
[16:12:46] <kurushiyama> deshymers depends on what we are talking about. But in general we are talking of a DBMS. It would not really stand up to the name if it was unable to answer questions.
[16:12:47] <deshymers> err store the week as an int
[16:14:14] <kurushiyama> deshymers Let me put it that way: I do something similar regularly for testing purposes on 10M records. On my 6 year old MBA...
[16:14:43] <cheeser> putting that business degree to work!
[16:14:47] <deshymers> ha that works for me
[16:14:59] <kurushiyama> @cheeser ;P
[16:15:33] <kurushiyama> One has to admit that holding a MBA degree at the age of 6 is impressive.
[16:16:31] <deshymers> awesome thanks guys
[16:16:47] <kurushiyama> deshymers You are welcome.
[16:22:14] <deshymers> so another question, the ObjectID contains a timestamp, is that indexable? or would I be better of having a separate timestamp to query against?
[16:25:36] <kurushiyama> deshymers short answer: Use a separate timestamp. Long answer: There are some rare use cases in which ObjectId is sufficient since it is monotonically increasing. But it may get really complicated with client side generated ObjectIds and alike. All that for a saved index over a 64bit int? I dont think so.
[16:25:59] <deshymers> ok
[16:27:36] <Derick> kurushiyama: most timestamps (all up to 2038) will fit in a 32bit int even :)
[16:27:45] <Derick> and drivers should pick that type if possible
[16:28:18] <kurushiyama> Derick Well, I rather calc with the worst and be positively surprised ;)
[16:28:25] <Derick> fair enough :)
[16:28:44] <kurushiyama> Although the ability to query oids by ts would be great, I have to admit.
[16:29:20] <Derick> does A/F not have a function to extract the timestamp from them?
[16:30:29] <Derick> doesn't seem so
[16:31:21] <Derick> kurushiyama: could be a nice excercise to try to add that :)
[16:31:40] <kurushiyama> Given my c++ skills, it would be a lifetime achievement ;)
[16:31:55] <deshymers> Derick: as in Derock Rethans?
[16:32:01] <Derick> meh, it's not that hard. All you need is stackoverflow and copy and paste skills
[16:32:05] <Derick> deshymers: that's me
[16:32:16] <Derick> (well, with s/o/i)
[16:32:20] <deshymers> ha, if anyone one new about date/time functions I guess you'd be the guy :P
[16:32:26] <Derick> hehe
[16:34:27] <kurushiyama> Derick Well, since my c++ skills are around 0.1 on a scale of 1 - 100, I'd probably accidentally transcend MongoDB into the Flying Spaghetti Monster. ;)
[18:21:24] <diegoaguilar> whats better to use in order to get counts
[18:21:28] <diegoaguilar> .count or aggregations
[18:23:19] <StephenLynx> .count
[18:23:29] <kurushiyama> StephenLynx I beg to differ
[18:23:51] <kurushiyama> count in sharded environments counts orphaned docs.
[18:24:03] <StephenLynx> hm
[18:24:09] <StephenLynx> interesting.
[18:24:54] <kurushiyama> StephenLynx there is a bug somewhere. Ran into it, and while researching on it, I found out that aggregations are not susceptible to this problem.
[18:25:08] <StephenLynx> ah
[18:25:18] <StephenLynx> and is that bug still present on the latest version?
[18:26:09] <kurushiyama> Afaik, It wasnt fixed.
[18:26:09] <kurushiyama> I just check
[18:27:46] <kurushiyama> StephenLynx https://jira.mongodb.org/browse/SERVER-3645
[18:29:03] <kurushiyama> diegoaguilar So personally, I would use an aggregation
[18:30:56] <uehtesham90_> hello, i had a question regarding using flask with mongoengine. i am adding a flask web app on an already existing project which consists of multiple databases. but in flask-mongoengine, we have to define a default database, so should we just provide one of the database names as the default database and internally access other databases through the mongo
[18:30:56] <uehtesham90_> client? i am not sure what the best approach is...would appreciate some help here :)
[18:32:03] <kurushiyama> uehtesham90_ Maybe you should ask on #flask and/or #mongoengine?
[18:33:22] <uehtesham90_> i am asking in #mongoengine but there was no response...so i thought i would give it a shot here
[18:34:10] <StephenLynx> I would just
[18:34:16] <kurushiyama> uehtesham90_ Well, personally, I do not use either, so my help will be limited.
[18:34:18] <StephenLynx> have a system setting to indicate the database name
[18:36:22] <uehtesham90_> StephenLynx: there is a system settings but we can only define one database...but i have currently many databases....so i want sure if i should have a separate database instance referring to each database?
[18:36:40] <StephenLynx> add more settings.
[18:37:02] <StephenLynx> and you really need multiple databases?
[18:38:31] <kurushiyama> uehtesham90_ From what I can see, the problems might actually lie elsewhere.
[18:39:54] <uehtesham90_> StephenLynx: this is how my project is setup. it is for analysing student data for different online courses. so we stored each course data in separate databases (each with their own collections)
[18:40:22] <StephenLynx> ...................
[18:40:26] <StephenLynx> no
[18:40:28] <StephenLynx> just no
[18:40:38] <StephenLynx> that's dynamic database creation
[18:40:45] <StephenLynx> and it's a capital sin in database design.
[18:40:58] <uehtesham90_> really? i did not know that :(
[18:41:02] <StephenLynx> you will use a single database for all courses
[18:41:15] <StephenLynx> and have a field on each document indicating the course they belong to.
[18:41:41] <uehtesham90_> but what abt individual collections....each course have their own sets of data like user info for a given course
[18:41:49] <StephenLynx> again
[18:41:57] <uehtesham90_> would i add them in a single collection and define what course they belong to?
[18:41:58] <StephenLynx> all documents for user info have the same structure
[18:42:00] <StephenLynx> yes
[18:42:04] <uehtesham90_> ah ok
[18:42:18] <StephenLynx> dynamic models are as bad as it gets.
[18:42:27] <uehtesham90_> but then each collection would become very large....what is the best way to handle that?
[18:42:33] <StephenLynx> you don't.
[18:42:35] <StephenLynx> that's intended.
[18:42:42] <uehtesham90_> makes sense
[18:43:05] <StephenLynx> its much easier to manage a single large collection than the same collection on multiple databases.
[18:43:15] <StephenLynx> not to mention lower level stuff
[18:43:31] <uehtesham90_> thanks alot StephenLynx .....that really helped :)
[18:44:03] <uehtesham90_> can you recommend any good sources to learn more abt good database design...i really want to improve on this
[18:44:32] <StephenLynx> nope
[18:44:42] <StephenLynx> I learned by experience on that subject.
[18:44:46] <StephenLynx> however
[18:45:00] <StephenLynx> code complete, second edition might give you some general software development insight.
[18:45:11] <uehtesham90_> nice! thanks
[18:45:53] <StephenLynx> addendum: dynamic collection creation and dynamic field creation are awful too.
[18:46:04] <uehtesham90_> just one more question....based on your experience, when should a separate database be used? like what one should look for when considering using a new database
[18:46:05] <StephenLynx> if you can't document your model, you should change your model.
[18:46:14] <StephenLynx> hm
[18:46:26] <StephenLynx> usually, you shouldn't.
[18:46:30] <StephenLynx> HOWEVER
[18:46:44] <StephenLynx> if you have some contrived use case that requires a second database
[18:46:52] <StephenLynx> then you use a second database.
[18:46:56] <StephenLynx> for example
[18:47:06] <StephenLynx> some legacy software
[18:47:30] <StephenLynx> you have to have this legacy database being used by your software and at the same time have a new database
[18:47:51] <StephenLynx> or you want to isolate the databases physically and it serves a different purpose
[18:48:22] <StephenLynx> usually a different database will serve a different system and you won't connect to the db directly and will instead communicate with the system that handles the second database.
[18:48:37] <StephenLynx> so tl,dr; only add a second database as a last resort.
[18:48:54] <StephenLynx> if there is absolutely no other option.
[18:49:06] <kurushiyama> Again, I beg to differ.
[18:49:51] <kurushiyama> Integration via a database is a Very Bad Idea™ , as we learned the hard way all through the 90s
[18:49:58] <kurushiyama> and 2000s
[18:50:09] <StephenLynx> I mention that, though.
[18:50:20] <StephenLynx> that usually a different db is behind a different system.
[18:50:49] <StephenLynx> and that usually you shouldn't.
[18:50:51] <kurushiyama> StephenLynx I was not weakening your arguments. I was just differing on the conslusion ;)
[18:52:15] <uehtesham90_> thank you so much StephenLynx and kurushiyama
[18:52:27] <kurushiyama> Imho, there is absolutely no reason to have two databases even when dealing with legacy systems. either migrate or integrate. ;)
[18:52:49] <StephenLynx> yeah
[18:53:06] <StephenLynx> you only do that under extremely contrived circumstances.
[18:53:38] <StephenLynx> and any other option means throwing everything that depends on the software to be trashed, including companies.
[18:54:43] <dman777_alter> with mongo node driver insertOne(doc, options, callback){Promise}
[18:54:46] <kurushiyama> StephenLynx Whereas it remains to be proven that keeping multiple databases make more sense than doing a migration. No offense?
[18:54:47] <dman777_alter> sorry
[18:55:21] <dman777_alter> ah...nm
[18:55:42] <uehtesham90_> kurushiyama: by integration, i am assuming that 'old database' would then become tables belonging to the new database
[18:56:10] <StephenLynx> yeah, I agree thata migration makes more sense
[18:56:30] <StephenLynx> but haven knows that anything can happen in the real world.
[18:56:34] <StephenLynx> heaven*
[18:56:39] <starfly> never say never
[18:57:11] <StephenLynx> when I meant last resort, I really meant last resot.
[18:57:35] <kurushiyama> uehtesham90_ Well... no, not really. Assume you have a legacy system in need of the old database, and you have a new system with some other. Integration would mean that you make those applications communicate with each other to exchange data, rather than to have them communicate by sharing data
[18:58:00] <uehtesham90_> ah ok...makes sense
[18:58:50] <kurushiyama> uehtesham90_ what you described would be a data migration. ETL, basically.
[19:01:19] <kurushiyama> uehtesham90_ The problem with communication via data sharing is that two applications become dependent on the same database, and changes in your data structure become really, really hard to be executed without affecting the other system
[19:28:27] <Kaetemi> getting this exception on the target system when doing rs.add, any idea http://pastebin.com/BYFN9fpm ?
[19:33:36] <FuzzySockets> When you create an index with mongoose, what do the 1 or -1 represent... the sort order?
[19:34:25] <FuzzySockets> nvm
[20:11:03] <uuanton> Kaetemi have you tried to google exception ? 9001 socket exception
[20:18:04] <FuzzySockets> Anyone know of an accurate json document containing city/state to postal code mappings, preferably for all countries.
[20:57:26] <saml> can you use aggregate to get last action? {timestamp: ISODate(..), action: either 0 or 1, name: thing to group by}
[20:58:49] <kurushiyama> saml why would you use aggregate on that?
[20:59:32] <saml> i log which user subscribed, unsubscribed over time. i want to get current state of a given user's subscription
[20:59:53] <saml> like, this use is subscribed to these lists (optionally, unsubscribed to these lists)
[21:00:19] <saml> so, if i subscribed to list1 , then unsubscribed more recently, i don't want list1 to be part of the result
[21:00:53] <kurushiyama> saml db.yourcoll.find({user:username}).sort({timestamp:-1}).limit(1)
[21:01:42] <saml> i have multiple lists. {timestamp, action, username, listname}
[21:02:20] <saml> so i need group by listname sort by timestamp and only use the first listname,action per group
[21:02:27] <kurushiyama> saml "i want to get current state of a given user's subscription" not too precise, then ;)
[21:02:45] <kurushiyama> Can you pastebin a sample doc?
[21:02:52] <saml> let me try first
[21:06:52] <saml> kurushiyama, https://gist.github.com/saml/1d2f050f0440e36cfbe152fa47083399 this is what i'm trying to do
[21:07:46] <kurushiyama> saml Gimme a few
[21:08:32] <saml> db.docs.aggregate([{$match:{user:'saml'}}, {$group:{_id:'$to', action: '$action'}}])
[21:09:38] <kurushiyama> saml I meant minutes ;)
[21:18:35] <kurushiyama> saml db.mailing.aggregate([{$match:{user:"saml"}},{$sort:{"date":1}},{$group:{_id:{user:"$user",list:"$to"},status:{$last:"$action"}}}])
[21:20:20] <kurushiyama> saml you can add a project stage to make it look like you want.
[21:21:35] <saml> oh $last
[21:21:48] <kurushiyama> saml plus sort, that is important
[21:21:49] <saml> i was $push to array and trying to $max during $project
[21:21:53] <saml> yup thanks
[21:22:03] <kurushiyama> saml Np.
[22:36:16] <Stereo> Hi everyone
[22:37:07] <Stereo> I'm trying to replace 'http://www' with 'https://www' in any values in all of my documents, and I'm not sure how to do this.
[22:37:31] <Stereo> I've tried google, stack exchange, even the documentation ;). Stuck.
[22:59:02] <kurushiyama> Stereo Good luck ;)
[23:20:40] <dimaj> hello all! have a weird question today... is there a way to access a mongodb through another mongodb? My setup is as follows: Target MongoDB (Network A) <--- Proxy MongoDB (Network B) <---- Application (Network B). I need to access Target MongoDB from my application