PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 7th of June, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:06:29] <wwwd> I'm trying to use mongoexport to generate a csv file. I can get a csv file with all of my records but want to use a --query parm to limit the records returned. I am using http://pastebin.com/knCY4gKq as my query. It is giving this error " error validating settings: query ... is not valid JSON: invalid character 'd' looking for beginning of value" Where ... is the original query. Can anyone tell me why this is not valid?
[02:14:46] <joannac> wwwd: have you read the docs?
[02:15:15] <joannac> the --query option takes just the query, which is what's inside the find()
[02:15:39] <joannac> https://docs.mongodb.com/master/reference/program/mongoexport/#cmdoption--query
[02:29:24] <wwwd> joannac: Can't even tell you how many times I read them...just kept doing what I thought instead of what I was told! Story of my life!
[02:31:48] <joannac> happens to all of us :)
[02:32:35] <wwwd> Thanks a bunch!
[02:34:07] <joannac> no probs
[03:11:23] <Jonno_FTW> hi, what's the best way to put an image (~100kb) into a document using pymongo?
[03:12:35] <Jonno_FTW> should I just use base64encode and store it as a string?
[03:12:56] <cheeser> gridfs
[03:18:33] <Jonno_FTW> cheeser: "Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data."
[03:18:49] <cheeser> there you go then.
[03:48:57] <macwinner> what are some common MongoDB misconceptions or FUD that is spread.. I'm doing a small presentation and I wanted to talk about them and why they are invalid points.
[03:53:30] <Boomtime> if you find some FUD on mongodb this channel might be an appropriate place to dispel it, but it's not really appropriate to relay it
[03:55:25] <wwwd> Though I would think discussion of pros & cons would be in order. It is impresive but not omnipotent!
[03:57:25] <wwwd> mcwinner: I would say it takes a bit more effort to keep your data clean. But in exchange you get flexability and maybe speed, depending on what your doing.
[04:00:28] <macwinner> Boomtime: oki! I will find some FUD and be back :)
[04:40:06] <syrius> hey guys. i'm trying a fairly automated approach to searching a mongo collection. i'm using mongoengine (python) to define classes and using reflection to generate possible filters to populate jQuery QueryBuilder.. this just generates the appropriate mongo query for me (json) which I can pass as a raw query and it *does* work right now.. I'm curious if there's a way to only return items for an embedded
[04:40:11] <syrius> list that met the search criteria?
[04:42:06] <syrius> http://pastebin.com/LFhgSb1j here's a quick example of my schema (as python objects) but conceptually it shouldn't matter that it's python
[04:49:37] <Boomtime> @syrius: https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/#project-specific-array-elements-in-the-returned-array
[04:50:29] <syrius> ahh okay yeah i've been looking at $elemMatch and $
[04:50:49] <syrius> thanks Boomtime - I might have to check my design. this is still poc so i haven't uploaded all the data yet (and the schema is fake.. just trying to evaluate how i want to do it)
[04:51:02] <syrius> i might change those embedded lists to reference fields
[04:51:22] <syrius> i just need to be able to search and get back only what matched from those nested lists
[05:16:23] <Lope> mongoDB acting funny. I'm running it manually. I've set the shell of mongodb to /bin/bash. And with su I can browse to the dbpath, make files etc (have permissions). But when I enter the console "show dbs;" shows everything as 0.000GB. If I try insert new test data use foo; db.bar.insert({a:1}); it can't recall it when I try db.bar.find() afterwards.
[05:17:08] <Lope> I'm running it with this command. su -c '/usr/bin/mongod --replSet rs0 --oplogSize 50 --wiredTigerCacheSizeGB 1 --smallfiles --port 27017 --dbpath /mnt/mongodb --logpath /var/log/mongodb/test.log --logRotate rename --fork' mongodb
[05:45:44] <Jonno_FTW> macwinner: you could talk about how people use it for the wrong reasons (they need a relational db or ACID), then complain it doesn't meet their needs. People just don't know when mongo is the right tool for the job
[05:49:42] <Lope> ok i figured it out. I had a wrong path somewhere.
[05:50:09] <Lope> does docker always use the same offset for container UID's?
[05:50:38] <macwinner> Jonno_FTW: yeah, that's going to be one of my points
[05:51:23] <Lope> for example if I run a CT from an image named imgfoo. let's say it's UID 33 is 10033 on the host. Now if I destroy that CT, and create another one, will the UID mapping be the same, or different?
[07:39:15] <chris|> is there a way to call sh.* functions from the java client?
[07:47:35] <Boomtime> @chris|: by sh.* i think you mean the shell helpers - these are javascript functions defined in the javascript of the shell
[07:47:59] <Boomtime> if you want to emulate these in any driver, take a look at the implementation in the shell
[07:50:19] <chris|> Boomtime: yes, I was already looking at those and was hoping to save me some time
[08:00:18] <Boomtime> many of the helpers don't have a generic means of implementation in a driver; consider sh.status() which ouputs a formatted table-ish sort of arrangement to the console - what would you expect the java driver to do?
[08:01:24] <Boomtime> even if there were an analogue to this that took say a textstream or some such, would it really be so useful that any single driver would want to maintain it?
[08:02:13] <Boomtime> the range of possibilities is wide, but the shell has a narrow scope so it can implement there extra little helpers
[08:05:45] <Lope> does anyone know how the /etc/subuid file works?
[08:26:28] <kurushiyama> Jonno_FTW: What language?
[09:11:47] <chris|> what is the default mode of the balancer? is it enabled by default or do I have to enable it explicitly?
[09:15:13] <kurushiyama> chris|: Enabled by default
[09:25:44] <chris|> so I assume isBalancerRunning is only true if the balancer is currently doing work?
[09:28:46] <kurushiyama> Yes. but there should be an enabled flag in sh.status()
[09:29:06] <kurushiyama> iirc, there should be a command for it, even.
[09:30:28] <chris|> yes, there is rs.getBalancerState()
[09:31:50] <kurushiyama> rs?
[09:33:07] <kurushiyama> chris|: Was more referring to sh.isBalancerRunning()
[09:40:13] <chris|> yes, it's sh.getBalancerState
[09:40:19] <chris|> typo
[09:41:02] <kurushiyama> chris|: Was this a general question or do you experience problems of some sort?
[09:41:52] <chris|> general question, I wanted understand under which conditions running != state applies
[09:45:15] <kurushiyama> Well, the balancer only kicks in when balancing threshold are met and balancing is enabled.
[11:30:08] <Zelest> yay! tonight i'll upgrade the last machine! \o/
[11:30:13] <Zelest> then it's 3.2 in the whole cluster!
[12:04:44] <kurushiyama> ping
[12:16:31] <Zelest> pong
[12:17:21] <kurushiyama> @Zelest Thank you. Changed back to textual and had some auth problems.
[12:18:07] <Zelest> ah :)
[12:55:52] <kees_> is it possible for the mongodb driver in debug mode to print mili/nanoseconds instead of just full seconds?
[12:57:10] <cheeser> "the mongodb driver?"
[12:57:13] <Derick> which language driver?
[12:57:19] <kees_> sorry, php mongodb driver
[12:57:32] <Derick> the old or the new driver? and what sort of log?
[12:57:39] <kees_> which uses the mondbdb c library
[12:58:02] <Derick> so the new one
[12:58:10] <Derick> how are you obtaining/creating the log?
[12:58:11] <kees_> new driver, version 1.1.7, running with php -d mongodb.debug=stderr
[12:58:34] <Derick> I think the C library generates these logs, but let me have a look for you.
[12:58:41] <kees_> and getting TRACE output, it is timestamped, but only with like: 2016-06-07T11:45:23+00:00
[12:58:51] <Derick> TRACE is done by the C driver
[12:59:29] <kees_> yep, those TRACE lines
[12:59:36] <Derick> gimme a few mins
[13:02:19] <Derick> kees_: apparently, not currently. But adding it wouldn't be hard.
[13:03:11] <kees_> i guess it's something in mongoc/mongc-log.c, i'll give it a try :)
[13:03:22] <kees_> *mongoc-log.c
[13:03:30] <rpap> hello , I am looking for recommendations. I am planning to store user web activity in mongodb. There can be hundreds of requests per second and data payload will vary from page to page. Is mongodb the best choice
[13:03:54] <Derick> kees_: line 156
[13:04:00] <rpap> The use case will be ad-hoc quering according to payload
[13:04:13] <Derick> kees_: it uses gettimeofday already, but uses the second resolution localtime/strftime for printing
[13:04:45] <kees_> aye, time to dust of some c skills :)
[13:05:30] <Derick> kees_: if you make it work, I'm sure they'd love a PR and Jira ticket: https://jira.mongodb.org/browse/CDRIVER
[13:05:48] <kees_> ok, will do :)
[13:21:12] <darkfrog> I am trying to diagnose a bug in my project using ReactiveMongo, and when I dump out the query it references "$oid" but when I attempt to use the query directly it says, "unknown operator: $oid". I presumed ReactiveMongo was converting "$oid" to "_id" and tried that and the query then worked, but it then doesn't return any results.
[13:38:55] <jordonbiondo> I'm looking for something like $ but updates all matching array elements, does that exist? {foo: [{a: 3}, {a: 1}, {a: 3}]} => {foo: [{a: 9}, {a: 1}, {a: 9}]}
[13:40:05] <jordonbiondo> I can use $ to update the first foo.a to 9 where foo.a == 3, but can't find a way to update all foo.a to 9 where foo.a == 3
[13:42:13] <kurushiyama> jordonbiondo Nope. And if you need to, it smells like overembedding.
[13:59:29] <jordonbiondo> kurushiyama: normally yes, but this was for a migration, I guess I'll just script it with mongoose, thanks.
[14:06:21] <saml> kurushiyama, what was the timestamp db you're using?
[14:06:35] <kurushiyama> saml Influx
[14:06:57] <saml> i need to implement GET /toppages?user={userid}&duration={lastday|lastweek|lastmonth}&limit=5
[14:07:47] <saml> each http request log contains {userid:string, url: string, timestamp: datetime}
[14:08:15] <saml> so, in mongo, I was thinking about creating three collections: db.countsLastDay countsLastWeek countsLastMonth
[14:08:35] <saml> and each http log will update all three collections, incrementing count by one
[14:12:52] <saml> db.countsLastDay.createIndex({ts:-1}, {expireAfterSeconds:24*60*60}); db.countsLastDay.createIndex({url:1, user:1}, {unique:true}); db.countsLastDay.update({url:url, user:user}, {$set:{ts:ts}, $inc:{count:1}});
[15:22:02] <kurushiyama> saml Sorry, was a bit busy.
[15:22:43] <saml> take a break
[15:23:02] <kurushiyama> saml Will do. In 30?
[15:23:33] <saml> okay
[15:27:18] <edrocks> kurushiyama: you see influx 1.0 beta just came out
[15:29:09] <kurushiyama> edrocks Nope. Too busy doing UML stuff ;) But thanks for informing! (Jay!)
[15:29:10] <kenalex> never realized the power of mongodb until I used it in my first app and realized how straight forward development is with it
[15:36:26] <Frenchiie> Hey guys
[15:36:47] <Frenchiie> i have mongod running on the default port and then i try to start a new mongod on port 27018 but i get an error
[15:36:57] <Frenchiie> Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'. 2016-06-07T11:25:33.690-0400 W - [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty. 2016-06-07T11:25:33.690-0400 I STORAGE [initandlisten] exception in initAndListen: 98 Unable to lock file: /data/db/mongod.lock errno:35 Resource temporarily unavailable. Is a mongod
[15:37:12] <Frenchiie> i tried mongod --repair but it tries to repair on the default port so that doesn't help
[15:37:28] <Frenchiie> i also tried mongod --port 27018 --repair but i get the errror above
[15:37:40] <Frenchiie> anyone knows how to fix this?
[15:41:08] <Frenchiie> mkay...
[15:41:31] <Derick> it's the lock file
[15:41:36] <Derick> did the machine die or something?
[15:42:19] <Derick> you need to tell it to put the lock file somewhere else
[15:42:31] <Derick> or rather, you can't start two mongod's on the same machine with the same database path
[15:42:51] <Derick> so you need to specify --dbpath /data/db2 for the second server you run
[15:45:22] <kenalex> hi guys
[15:45:47] <kenalex> whats a good windows gui tool for managing collections and other aspects of mongodb ?
[15:47:02] <Derick> Frenchiie: did you get that?
[15:49:20] <ily> hello there. i have a question. for the people that use the Mongoose API with nodejs. if i have a index on say first name like FirstName: 1
[15:49:46] <ily> and i use mongoose to do a search by first name. do i have to do a normal kind of search? or do i have to pass arguments to make it aware to use the index?
[15:49:52] <StephenLynx> kenalex, none.
[15:49:57] <StephenLynx> ily, don't use mongoose.
[15:50:04] <ily> what is wrong with it?
[15:50:20] <StephenLynx> first, its performance is horrible. about 6x slower than the native driver.
[15:50:27] <StephenLynx> second, its date handling is borderline broken
[15:50:43] <StephenLynx> third, it doesn't handle _id's properly and do some asinine cast instead.
[15:50:53] <ily> StephenLynx, so which api would you recommend?
[15:51:02] <StephenLynx> fourth it forces you to use mongo in a way it wasn't designed for
[15:51:04] <StephenLynx> mongodb
[15:51:16] <StephenLynx> http://mongodb.github.io/node-mongodb-native/2.1/api/
[15:52:09] <ily> StephenLynx, ok alright. it will be a bit of work but i think im tired of mongoose enough to switch over. so for mongodb how does that one handle indexes?
[15:52:21] <StephenLynx> in which sense?
[15:52:44] <StephenLynx> usually its smart enough to use indexes where it can.
[15:52:45] <ily> StephenLynx, as in, is the use of an index transparent? as in if i do a search without index and one with, it will be the same command?
[15:53:00] <StephenLynx> explain will tell you what the db is doing to do what you asked for.
[15:53:06] <StephenLynx> however
[15:53:15] <StephenLynx> you can use hint to indicate an index to be used.
[15:53:40] <ily> ah alright thanks StephenLynx
[15:53:45] <StephenLynx> but before you start using hint, check with explain
[15:53:53] <StephenLynx> if the index isn't already being used.
[15:53:56] <ily> ah i see
[15:54:01] <ily> so in my test cases
[15:54:02] <ily> yeah
[15:54:13] <Derick> I've never needed $hint yet - you probably don't need it either
[15:54:14] <ily> i know what im going to do today :/
[15:54:41] <ily> https://docs.mongodb.com/ecosystem/drivers/node-js/
[15:54:51] <ily> only reason i used mongoose was because they used it here as an example
[15:54:55] <ily> so i thought it was legit, ya know
[15:55:01] <ily> maybe the developers should change it
[15:55:54] <StephenLynx> the thing with mongoose is
[15:56:00] <StephenLynx> the guy that develops it works for mongo
[15:56:14] <StephenLynx> so he got that little help from inside
[15:56:39] <Derick> really?
[15:56:41] <Derick> who's that
[15:57:01] <StephenLynx> i dunno, I heard from someone
[15:57:45] <Derick> Valeri no longer works for us
[15:57:49] <StephenLynx> aaaah
[15:57:52] <StephenLynx> but he used to?
[15:57:55] <Derick> yes
[15:58:03] <StephenLynx> told you.
[15:58:05] <Derick> I didn't know he was the one working on Mongoose though
[15:58:14] <cheeser> yep
[15:58:32] <StephenLynx> tbh, mongoose is not the only bad thing on that page, ily
[15:58:44] <StephenLynx> MEAN stack is a god-awful concept.
[15:58:54] <StephenLynx> "DURR LETS COUPLE FRONT-END CODE WITH THE DATABASE :::DDDD"
[15:59:09] <cheeser> well, they lost me at node. ;)
[15:59:12] <Derick> the page does say "https://docs.mongodb.com/ecosystem/drivers/node-js/#object-mappers"
[15:59:25] <ily> heh lol
[15:59:38] <ily> StephenLynx, what happens when front end code is same as the db code
[15:59:42] <ily> and the server code
[16:00:41] <StephenLynx> first of all
[16:00:44] <StephenLynx> the server code is C++
[16:00:46] <kurushiyama> ily The world as we now implodes.
[16:00:50] <StephenLynx> second
[16:00:58] <StephenLynx> you don't know the db exists on the front-end
[16:01:05] <StephenLynx> you just know the back-end has an interface.
[16:01:26] <StephenLynx> it could be crippled monkeys carving runes on rocks as far as the FE is concerned.
[16:01:41] <StephenLynx> third, angular have NOTHING to do with express
[16:01:46] <StephenLynx> fourth, express is garbage.
[16:02:19] <StephenLynx> so no, that makes absolutely no sense.
[16:02:38] <StephenLynx> my main project currently is a node back-end that also uses a web front-end
[16:03:09] <StephenLynx> it currently has about 35k loc on the back-end alone and I can tell you, that is not a realistic scenario
[16:03:19] <ily> loc?
[16:03:23] <StephenLynx> lines of code
[16:03:25] <ily> oh ok
[16:03:41] <ily> why did you write it in node then?
[16:03:43] <ily> if you did not like it?
[16:03:46] <ily> cause of boss?
[16:03:52] <StephenLynx> because its good.
[16:03:54] <StephenLynx> I like node.
[16:03:57] <ily> oh ok
[16:04:09] <StephenLynx> I am just telling you that it doesn't share anything more than the language being used.
[16:04:19] <Frenchiie> Derick: hey sorry was away from computer. reading now
[16:04:27] <StephenLynx> which is a very superficial characteristic.
[16:04:36] <StephenLynx> the problem with mean is not mongo or node.
[16:04:45] <ily> what is the problem
[16:04:48] <StephenLynx> its express and coupling a front-end library with everything else.
[16:05:00] <ily> well angular is not needed tbh
[16:05:08] <ily> to run a node server
[16:05:09] <StephenLynx> its in the name.
[16:05:11] <ily> etc
[16:05:12] <ily> yeah
[16:05:15] <Frenchiie> Derick: so are you saying to do something like this? mongod --port 27018 --dbpath /data/db2 ?
[16:05:16] <ily> it could be jquery
[16:05:29] <StephenLynx> it could be anything
[16:05:36] <ily> Frenchiie, yeah that would work. also you can make config files
[16:06:06] <ily> and do mongod -f /path/to/configfile
[16:06:06] <StephenLynx> the only instance where the front-end matters to a back-end is when you have dynamic html
[16:06:22] <kurushiyama> ily So then why call it mean? Should I call it ReMGo just because I use React, mgo , Mongo and Go?
[16:06:22] <StephenLynx> otherwise you just serve json to be consumed by whatever.
[16:06:29] <ily> StephenLynx, i suppose the "MEAN" stack is a buzzword
[16:06:29] <StephenLynx> kek
[16:06:34] <ily> kurushiyama, because its a buzzword
[16:06:39] <ily> like LAMP stack used to be
[16:06:41] <StephenLynx> it is just stupidity.
[16:06:58] <StephenLynx> from people who shouldn't be allowed near a computer, let alone to develop software.
[16:07:03] <kurushiyama> ily Yeah, and they are equally potent to get you into serious problems.
[16:07:20] <ily> im not combatting your ideas
[16:07:28] <ily> just listening
[16:07:30] <ily> :D
[16:07:38] <Frenchiie> and how do i actually go about creating db2? i forgot :/
[16:07:43] <Frenchiie> the file
[16:07:47] <Frenchiie> is it a dir?
[16:07:56] <ily> if you point your dbpath to an empty folder
[16:08:05] <ily> mongod will create mongo db files in there for you
[16:08:09] <Frenchiie> oh i see
[16:08:28] <ily> yeah theres no special create command. and if there is one existing. mongod will just read those. it wont delete anything
[16:35:03] <kurushiyama> edrocks Hey, you are back!
[16:38:05] <edrocks> kurushiyama: yea
[16:38:24] <edrocks> kurushiyama: do you need something? I have to go test a psu. It just came in
[16:38:52] <kurushiyama> edrocks Nah, wanted to discuss your time series thingy.
[16:39:27] <edrocks> kurushiyama: k I'll be back back soon have to go in other room
[16:39:39] <kurushiyama> Sure.
[16:39:45] <edrocks> kurushiyama: I'm working on my reporting stuff a little bit this week so we'll chat
[16:43:13] <darkfrog> I have a query I'm trying to optimize running against MongoDB 2.6.9 and I'm just trying to count results, but it has an $in array of like 500 ObjectIds. It takes roughly three minutes to count them all. Obviously this should be completely re-written, but I'm hoping for some low-hanging optimizations in the short-run if anyone has any suggestions?
[16:44:15] <kurushiyama> darkfrog Most do hope for that ;) can you pastebin the original query (preferably in shell syntax) and the indices of the collection in question?
[16:45:41] <darkfrog> kurushiyama: https://gist.github.com/darkfrog26/44f7e0d99d728a046f1110093ae49f3e
[16:46:14] <darkfrog> kurushiyama: "owner.laboratory" does have an index
[16:46:58] <kurushiyama> darkfrog Please add an `.explain()` to the query and pastebin the results.
[16:47:33] <darkfrog> kurushiyama: give me about 3 minutes to run it. :o
[16:47:50] <kurushiyama> darkfrog Debugging is debugging
[16:48:17] <darkfrog> most of the optimizations I've been able to find don't work on 2.6.x
[16:49:33] <kurushiyama> darkfrog From what I can see, your problem most likely stems from a data model not fitting your use cases.
[16:49:53] <kurushiyama> darkfrog I am tempted to put a bet on Mongoose driven data modelling.
[16:50:24] <darkfrog> kurushiyama: agreed...I just inherited this project and am about to start a complete re-write, but I have to get the performance to "limping" status before I can do so.
[16:51:01] <darkfrog> kurushiyama: I get: "TypeError: db.analysisresult.count(...).explain is not a function"
[16:51:12] <kurushiyama> Ah, count
[16:51:23] <darkfrog> should I change it to find?
[16:51:23] <kurushiyama> do a find instead, sorry.
[16:51:41] <darkfrog> three more minutes. ;)
[16:53:04] <kurushiyama> darkfrog What I can say is that the $and is unnecessary.
[16:53:48] <darkfrog> kurushiyama: no, it's not Mongoose driven, but I think it might be nearly as bad. It currently uses Casbah (Scala toolkit for MongoDB)
[16:54:53] <darkfrog> kurushiyama: if I remove all the ObjectIds from the $in it's perfectly fast
[16:55:09] <kurushiyama> darkfrog See
[16:55:18] <kurushiyama> darkfrog Data model problem.
[16:56:24] <darkfrog> kurushiyama: https://gist.github.com/darkfrog26/e491601332a414202e02df297786b423
[16:58:04] <kurushiyama> darkfrog Is that collection sharded?
[16:58:22] <darkfrog> I believe so
[16:58:47] <kurushiyama> run an sh.status() to check
[16:59:11] <KostyaSha> is it possible to speedup somehow index build?
[16:59:24] <darkfrog> kurushiyama: hehe, my account isn't authorized to execute that.
[16:59:28] <kurushiyama> KostyaSha Yep. Do it in foreground.
[16:59:48] <kurushiyama> darkfrog Uh... Are you at some sort of hoster with that?
[16:59:51] <KostyaSha> fresh replica, grid.fs, 400gb of data
[17:00:25] <KostyaSha> it uses only 1 core and do it for ~24 hours already
[17:00:29] <darkfrog> kurushiyama: yeah, they cloned the database on a cloud hosted MongoDB server so I could test.
[17:01:54] <darkfrog> I wasn't sure if it would be possible to do a String contains type of call instead of $in with oids?
[17:01:56] <kurushiyama> darkfrog Ah, ok. Well, you can remove the $and clause (not that this would help much, but it is simply unnnecessary, since clauses are always inclusive). Aside from that... Remodel?
[17:02:07] <darkfrog> ...wasn't sure if that would be faster either, but I thought it might be worth a try.
[17:02:09] <kurushiyama> darkfrog Nah, that does not work.
[17:02:48] <darkfrog> there's no way to accomplish $in faster?
[17:03:08] <ily> thats what she said
[17:03:14] <darkfrog> ily: wow
[17:03:20] <ily> ahahahahha
[17:03:22] <ily> i had to
[17:05:56] <kurushiyama> darkfrog $in queries are notorious.
[17:07:09] <darkfrog> kurushiyama: okay, I have another code-related work-around, but was hoping someone here might have some suggestions before I make my code uglier. :-p
[17:08:16] <kurushiyama> darkfrog Have you tried aggregations?
[17:08:46] <KostyaSha> is there any other types of index generation rather then "bulk"?
[17:08:51] <darkfrog> kurushiyama: how do you mean?
[17:09:26] <darkfrog> I thought about trying a mapReduce scenario, but wasn't sure it would make any difference
[17:09:58] <kurushiyama> darkfrog https://docs.mongodb.com/manual/aggregation/
[17:10:31] <kurushiyama> darkfrog In general, the aggregation pipeline outperforms m/r. In this case maybe not, but I would try.
[17:10:34] <darkfrog> kurushiyama: how might I re-write that query to use aggregation where it wouldn't still be a massive $in?
[17:11:39] <kurushiyama> darkfrog well, yeah, but still might be faster. You have to try. My suggestion: Try aggregations first, and if that fails do an m/r (which tends not to be incredibly fast)
[17:14:02] <kurushiyama> darkfrog But best ofc would be to start the rewrite instead of putting a lot of effort into something which might be in vain.
[17:19:04] <cheeser> not really
[17:19:12] <kurushiyama> @cheeser ?
[17:19:13] <cheeser> oops. scrolled back :)
[17:20:09] <wwwd> I am trying to figure out how to return specific fields from an array in a sub-document based on the date. That is I want to return fields from a document inside the array of registrations which is nested in pets. E.g. https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1. Can anyone tell me how to do this. I have tried using $slice and including specific fiels with projections. The closest I can get is to return the entire
[17:20:10] <wwwd> array with only the fields I want. As I said, ideally I would like only the fields for the specific registration.
[17:22:51] <darkfrog> kurushiyama: something like this? https://gist.github.com/darkfrog26/44f7e0d99d728a046f1110093ae49f3e
[17:24:44] <kurushiyama> darkfrog depends on your data model. would need to see a sample doc.
[17:29:31] <darkfrog> kurushiyama: well, aggregate slightly improved the performance, but not enough. Thanks anyway.
[17:43:36] <poz2k4444> hi guys, does anybody here know how to use mongo-connector correctly?
[17:45:47] <wwwd> poz2k4444: I followed https://github.com/mongodb-labs/mongo-connector and found it to be very easy to get going with.
[17:46:21] <poz2k4444> wwwd: when you modify a document on a mongo collection, does it sync automatically with elasticsearch?
[17:48:46] <wwwd> poz2k4444: Magically!
[17:49:12] <wwwd> I was shocked at how easy it is to get setup!
[17:49:38] <poz2k4444> wwwd: that is the problem I'm facing, I can't get it to sync the updates, I've read and re-read the docs and everything seems fine, but the logs stays trying to sync (apparently)
[17:52:28] <wwwd> poz2k4444: Do you have mongo running replica seta?
[17:52:56] <poz2k4444> wwwd: yeah, it won't run unless you have replicaset
[17:53:01] <poz2k4444> (I've tried that
[17:53:22] <wwwd> poz2k4444: And have you saved anything to mongo since you started it?
[17:53:31] <poz2k4444> yeah of course
[17:53:49] <poz2k4444> wwwd: it actualy syncs the data, creations and deletions, only the updates are missing
[17:54:10] <poz2k4444> and for my purposes, updates are more important
[17:54:47] <wwwd> poz2k4444: Sorry thats about all I got...it just worked when I set it up.
[17:55:26] <poz2k4444> wwwd: do you have any special configuration on elasticsearch or mongo? that I might missed
[17:58:26] <wwwd> poz2k4444: Not that I can think of. As I said it just worked!
[17:58:46] <poz2k4444> wwwd: well, thanks anyway, you are the first one that answered me in days :(
[17:59:07] <wwwd> I know the feeling!
[17:59:21] <Derick> it's a question of timing
[18:00:01] <wwwd> In fact...anyone care to take a look at my previous question and https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1 ;)
[18:00:21] <Derick> wwwd: it helps if you mention the subject here
[18:00:29] <wwwd> I have been struggling with this for hours!
[18:01:26] <wwwd> I am trying to figure out how to return specific fields from an array
[18:01:26] <wwwd> in a sub-document based on the date. That is I want to return fields
[18:01:26] <wwwd> from a document inside the array of registrations which is nested in
[18:01:27] <wwwd> pets. E.g. https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1. Can
[18:01:29] <wwwd> anyone tell me how to do this. I have tried using $slice and including
[18:01:34] <wwwd> specific fiels with projections. The closest I can get is to return the
[18:01:37] <wwwd> entire [12:11]
[18:01:40] <wwwd> <wwwd> array with only the fields I want. As I said, ideally I would like only
[18:01:41] <wwwd> the fields for the specific registration.
[18:03:35] <kurushiyama> wwwd Erm. $elemMatch might help. Or proper modelling. But I feel that is my personal mantra.
[18:04:43] <kurushiyama> I have no clue whatsoever what the parent document is supposed to represent.
[18:05:06] <wwwd> Derick: I'm kinda stuck with the model for now. And I tried $elemMatch in various ways!
[18:05:37] <kurushiyama> wwwd You are aware of the fact that you can do an $elemMatch in projections?
[18:06:05] <kurushiyama> wwwd Apparently you are.
[18:07:28] <wwwd> Kurushiyama: It is a pet with a list of registrations nested in it. The registrations reflect all the registration events in the pets history. And isn't line 77 - 83 the projection?
[18:08:05] <kurushiyama> wwwd So, you self reference the parent document in the subdocuments?
[18:08:16] <kurushiyama> wwwd Let me quickly check something
[18:09:30] <wwwd> Kurushiyama: And if so I did an $elemMatch on the date and it ruturned a list of _id's. And, yes...I did not set it up I just have to use it for the time being. Eventually I would like to flatten it so that registrations and vaccinations are collections with references by _id...but not today!
[18:11:33] <kurushiyama> Problem a) Your query part does not match the sample document
[18:11:39] <wwwd> I can get back the correct pets...i.e just the pets that have registrations done in June. So maybe I just need to do post processing to pull out what I need for now. I was just hoping to find a way to do it easilly with mongo.
[18:13:44] <kurushiyama> wwwd Sorry, my bad, wrong collection
[18:14:04] <wwwd> Kurushiyama: Ok pretty new to mongo and programming in general, but does'nt it return the pet based on 'registrations' that match the date >= june 1 and < june 30?
[18:14:17] <kurushiyama> Gimme a few
[18:14:46] <wwwd> kurushiyama...glad it was wrong! Thought my brain was gonna melt;)
[18:21:12] <kurushiyama> wwwd https://gist.github.com/mwmahlberg/3c8b73f500b83aca491652b625711e53#file-mongoexport-L81
[18:23:53] <wwwd> kurushiyama: Thank you very much! That is huge!!!
[18:26:27] <kurushiyama> Actually, it was just a small mistake compare your line 80 to my line 81
[18:28:14] <wwwd> Yep! Saw it and it seems to be working perfectly!
[18:29:00] <kurushiyama> wwwd Well, I tried it before posting it ;)
[18:29:55] <kurushiyama> wwwd Extra points for style: Sample doc, code in shell syntax, result, expected result. If everybody followed that pattern, it would be much easier to help people.
[18:31:11] <wwwd> Lol! Thanks! I suspect I will be back!
[18:31:53] <kurushiyama> wwwd May I give you a well meant advice? Not meant as patronizing, but something I noticed.
[18:32:10] <wwwd> Please!
[18:33:38] <kurushiyama> wwwd Your data model is horrible, to say the least. What language is your main one?
[18:34:48] <wwwd> Python. Problem is I have an ember front end that is set up to use it and can't change it now.
[18:35:15] <wwwd> And, frankly you have only seen a small number of the issues with the data!
[18:35:20] <kurushiyama> wwwd There are no excuses for technical debts.
[18:35:26] <kurushiyama> wwwd ;)
[18:35:31] <wwwd> True!
[18:36:38] <wwwd> I am hoping that very soon I will get it fixed. I inherited the data and can only do so much at one time!
[18:36:46] <kurushiyama> wwwd Please do not take this wrong, but I guess https://university.mongodb.com/courses/M101P/about might be a good idea.
[18:37:25] <kurushiyama> wwwd 2-3h per week. Every minute worth it.
[18:38:00] <wwwd> Was I correct when I said I want to flatten this out so that the registrations and vaccination are in their own collections and I use references by id?
[18:38:43] <wwwd> I will deffinitly look at it! I am really interested in learning. I do have the problem that my front end is using this model and I cant realy change it now!
[18:39:38] <kurushiyama> wwwd Maybe. Depends on your use cases. First thing to learn. You do not model by ERM, you model to have the questions on your data for your most common use cases answered as efficiently as possible.
[18:40:13] <wwwd> ERM?
[18:40:30] <kurushiyama> wwwd Entity Relationship mode.
[18:40:34] <kurushiyama> s/mode/model/
[18:41:31] <kurushiyama> wwwd https://en.wikipedia.org/wiki/Entity–relationship_model
[18:42:09] <wwwd> Right! I will deffinitly checkout this course! And thanks a lot!!!
[18:42:34] <kurushiyama> wwwd Most people coming from RDBMS conciously r unconciously follow that or at least similar patterns.
[18:44:49] <wwwd> Do you do any consulting?
[18:46:32] <jayjo_> I have a s3 bucket with json files... I'm trying to import into my local mongodb. I'm using s3cmd and piping it to mongoimport, but I get this error: "Failed: error processing document #1: invalid character 'd' looking for beginning of value"
[18:47:23] <jayjo_> and then it reads "Imported 0 documents"
[18:55:13] <jayjo_> so my entire command currently is: s3cmd get --recursive s3://<bucket-name>/path/- | mongoimport -d <db> -c <collection> --type json
[19:07:20] <wwwd> kurushiyama: Can I use this in a mongoexport command? Something like https://gist.github.com/johnhitz/c9128e52871a482d13d9f2c7cf30548d
[19:07:52] <kurushiyama> wwwd I have _no_ clue.
[19:08:24] <wwwd> Ok! Thanks!
[19:08:27] <kurushiyama> wwwd I stay away from mongodump and mongoexport as far as I can. I do not say they are bad tools, but personally, I dislike them.
[19:16:57] <StephenLynx> what do you use then?
[19:17:12] <StephenLynx> you just copy and paste the database files?
[19:35:31] <saml> jayjo_, example output of s3cmd?
[19:59:32] <kurushiyama> StephenLynx Filesystemsnapshots
[20:00:01] <kurushiyama> StephenLynx Or customized import/export tools.
[20:01:30] <StephenLynx> hm
[20:01:40] <StephenLynx> the first one kind of bites more than you need, doesn't it?
[20:02:07] <StephenLynx> because its taking the whole fs? or it doesn't work that way?
[20:03:06] <kurushiyama> StephenLynx Actually not so much. I tend to mount the snapshots, pump them into a tar file which (if no compression was used) is pumped through snappy and then either via nc or ssh to a backup server.
[20:03:41] <kurushiyama> StephenLynx But yes, in general you have a point in time copy of the complete data. Which is usually what I want.
[20:03:58] <kurushiyama> StephenLynx And if not, I usually want a migration.
[20:04:09] <StephenLynx> and is this method any faster?
[20:04:18] <kurushiyama> Downtime is almost 0
[20:04:32] <StephenLynx> you could use replica sets for that, couldn't you?
[20:04:36] <StephenLynx> then you would reach zero downtime
[20:04:57] <kurushiyama> StephenLynx Well, I do. But in a standard setup, you would loose a replication factor.
[20:05:18] <dino82> What would cause a node to stay in STARTUP2 and never become a secondary? optimeDate stays at 1970
[20:05:28] <jayjo_> saml: by itself it works and prints the output of the data, but not on mongoimport. Could it be that the files are json files that are json separated by newlines. can i handle that in the code?
[20:05:37] <kurushiyama> dino82 I guess the log will tell you.
[20:06:00] <dino82> The log simply says it's connecting to the other replica set members
[20:06:34] <kurushiyama> StephenLynx And think of a sharded cluster. Doing a dump comes with its own intricacies, there.
[20:07:44] <jayjo_> Could I just pip it one more time?
[20:07:48] <jayjo_> *pipe it
[20:08:29] <saml> jayjo_, you mean you can do it in two steps? s3cmd > wow.jsons; mongoimport wow.jsons ?
[20:09:06] <jayjo_> I'm thinking it may be becauuse the file 1.json has many json objects separated by newlines
[20:09:11] <dino82> It seems like it isnt consuming the oplog
[20:09:13] <jayjo_> can I take that out in a pipe so I don't have to write the json to disk
[20:09:18] <saml> isn't that how mongoimport expects?
[20:12:01] <kurushiyama> jayjo_ You could simply pipe it through sed to remove all newlines between a closing and an opening bracket. Or all newlines, thinking of it, in case those are the problems.
[20:13:11] <kurushiyama> wait. Does a s3 cli command actually write the file to stdout?
[20:14:10] <jayjo_> think thats what the trailing - does
[20:14:36] <kurushiyama> jayjo_ Have you _checked_?
[20:26:22] <dino82> Isn't the _id index supposed to be unique by default?
[20:27:42] <kurushiyama> dino82 Aye
[20:28:14] <dino82> I thought so. It appears whoever initially set this up didn't make it so
[20:28:28] <kurushiyama> dino82 That is impossible.
[20:28:51] <kurushiyama> dino82 _id's are even more than unique. They are immutable.
[20:29:03] <kurushiyama> dino82 hardcoded.
[20:29:30] <kurushiyama> dino82 iirc, which, given my level of beer might be off.
[20:29:32] <dino82> I'm getting this warning: WARNING: the collection 'testdb.Db' lacks a unique index on _id. This index is needed for replication to function properly
[20:29:51] <kurushiyama> What the?
[20:29:58] <kurushiyama> Lemme check something
[20:30:03] <dino82> Might explain my replication problems
[20:30:17] <dino82> This database came from Parse, so maybe they did something funny to it
[20:32:14] <jayjo_> I don't understand what you're asking me to check. - writes it to stdout
[20:32:26] <kurushiyama> That would have to be really funny – actually, you can not even delete the _id index. And if you import documents, those without an _id are simply assigned one.
[20:33:16] <kurushiyama> jayjo_ Well, if it does so, then it is fine. "_ think_ thats what the trailing - does" Assumptions have bitten my lower back more than once.
[20:34:57] <dino82> Should db.id.getIndexes() return something other than [ ] ?
[20:39:22] <kurushiyama> db.id?
[20:40:17] <dino82> err db.testdb.getIndexes()
[20:42:16] <kurushiyama> Yes, definetly.
[20:43:48] <kurushiyama> dino82 I can only imagine that data files were copied, and some where missing. An _id index can not be deleted, and every collection gets one when it is created. o.O
[20:47:09] <kurushiyama> dino82 What you could do is to try to recreate it. Lemme check quickly.
[20:47:57] <dino82> Okay, t hanks
[20:49:29] <kurushiyama> dino82 Wait.
[20:49:49] <kurushiyama> dino82 can you pastebin the output of "show collections"?
[20:50:00] <dino82> Sure one moment
[20:50:17] <kurushiyama> and "show databases", while you are at it.
[20:51:26] <dino82> http://pastebin.com/B3DuvkHL
[20:52:03] <dino82> show collections returns nothing, apparently
[20:54:05] <kurushiyama> please enter just "db"
[20:54:25] <kurushiyama> dino82 ^
[20:54:44] <dino82> testdb
[20:55:57] <kurushiyama> dino82 and "show collections" returns _nothing_?
[20:57:01] <kurushiyama> dino82 That is _weird_, to say the least.
[20:57:25] <dino82> Ah that was my mistake, I wasn't on the master. Yes, I see collections now
[20:58:28] <kurushiyama> dino82 ok, can you pasetbin them?
[20:59:00] <dino82> http://pastebin.com/TCq2JYam
[20:59:17] <kurushiyama> omg
[21:02:13] <dino82> I have no idea how this got built as it is, I just know it's giving me headaches now
[21:02:27] <saml> nice schema
[21:03:16] <saml> mongodb is join on write
[21:03:42] <saml> which is proven to be wrong (by twitter)
[21:03:59] <saml> i don't know what i'm talking about
[21:04:46] <saml> i'm working with a system that writes bunch of documents for "write". and reads bunch of documents to combines documents for "read"
[21:05:10] <saml> ~300 _id lookups to render an html page.
[21:05:41] <saml> denormalized mongodb at its finest
[21:08:05] <kurushiyama> dino82 Please run http://pastebin.com/dQKd6dvb
[21:08:17] <kurushiyama> dino82 Then, try to find your collection with the missing index.
[21:09:16] <kurushiyama> saml Hm, can I see a sample doc? and could you describe your use case a bit?
[21:10:12] <kurushiyama> dino82 I know it could be done programatically, but I am no JS wiz and it is late.
[21:10:36] <dino82> kurushiyama: thanks for the help, no worries
[21:11:25] <kurushiyama> dino82 But a missing _id index is something I would take !*_VERY_*! seriously.
[21:11:47] <kurushiyama> dino82 I would probably stop my instance first and create a backup of the data files.
[21:11:57] <dino82> I did run a backup earlier using mongodump
[21:12:16] <kurushiyama> dino82 o.O It did not complain?
[21:12:55] <dino82> It did not
[21:13:07] <dino82> I think I'm getting in over my head :|
[21:13:38] <dino82> None of this data is ultra-mission-critical, thankfully
[21:13:57] <dino82> I'm heading home but will return
[21:21:48] <poz2k4444> hey guys, how can I replicate the oplog of one of my collections? apparently mongo-connector can update data because of this, I've tried with a new collection and everything works fine and with a dumped collection and the updates doesn't work, so after a while I realized that this has something to do with the oplog
[21:21:52] <kurushiyama> dino82 I will not. At least not today – it is 11pm
[23:38:06] <jayjo_> I'm still trying to use s3cmd and mongoimport to transfer data from s3 to mongodb. When I use s3cmd if I put it to stdout then there are other bits of the s3cmd output that is present and throws errors. If I use --quiet it supresses output entirely. any ideas?
[23:41:03] <jayjo_> if I just run the s3cmd I get this output: http://pastie.org/10868544
[23:54:14] <sector_0> how integral are those online course on the mongodb site?
[23:54:32] <sector_0> I'd sign up for them but they don't start until august
[23:54:57] <sector_0> I'm at the stage in my project where I'm ready to start building my database
[23:55:11] <sector_0> ...I kinda don't want to wait that long
[23:56:07] <sector_0> can I get that same info from reading the manual?