[02:06:29] <wwwd> I'm trying to use mongoexport to generate a csv file. I can get a csv file with all of my records but want to use a --query parm to limit the records returned. I am using http://pastebin.com/knCY4gKq as my query. It is giving this error " error validating settings: query ... is not valid JSON: invalid character 'd' looking for beginning of value" Where ... is the original query. Can anyone tell me why this is not valid?
[02:14:46] <joannac> wwwd: have you read the docs?
[02:15:15] <joannac> the --query option takes just the query, which is what's inside the find()
[02:29:24] <wwwd> joannac: Can't even tell you how many times I read them...just kept doing what I thought instead of what I was told! Story of my life!
[03:18:33] <Jonno_FTW> cheeser: "Furthermore, if your files are all smaller the 16 MB BSON Document Size limit, consider storing the file manually within a single document instead of using GridFS. You may use the BinData data type to store the binary data."
[03:48:57] <macwinner> what are some common MongoDB misconceptions or FUD that is spread.. I'm doing a small presentation and I wanted to talk about them and why they are invalid points.
[03:53:30] <Boomtime> if you find some FUD on mongodb this channel might be an appropriate place to dispel it, but it's not really appropriate to relay it
[03:55:25] <wwwd> Though I would think discussion of pros & cons would be in order. It is impresive but not omnipotent!
[03:57:25] <wwwd> mcwinner: I would say it takes a bit more effort to keep your data clean. But in exchange you get flexability and maybe speed, depending on what your doing.
[04:00:28] <macwinner> Boomtime: oki! I will find some FUD and be back :)
[04:40:06] <syrius> hey guys. i'm trying a fairly automated approach to searching a mongo collection. i'm using mongoengine (python) to define classes and using reflection to generate possible filters to populate jQuery QueryBuilder.. this just generates the appropriate mongo query for me (json) which I can pass as a raw query and it *does* work right now.. I'm curious if there's a way to only return items for an embedded
[04:40:11] <syrius> list that met the search criteria?
[04:42:06] <syrius> http://pastebin.com/LFhgSb1j here's a quick example of my schema (as python objects) but conceptually it shouldn't matter that it's python
[04:50:29] <syrius> ahh okay yeah i've been looking at $elemMatch and $
[04:50:49] <syrius> thanks Boomtime - I might have to check my design. this is still poc so i haven't uploaded all the data yet (and the schema is fake.. just trying to evaluate how i want to do it)
[04:51:02] <syrius> i might change those embedded lists to reference fields
[04:51:22] <syrius> i just need to be able to search and get back only what matched from those nested lists
[05:16:23] <Lope> mongoDB acting funny. I'm running it manually. I've set the shell of mongodb to /bin/bash. And with su I can browse to the dbpath, make files etc (have permissions). But when I enter the console "show dbs;" shows everything as 0.000GB. If I try insert new test data use foo; db.bar.insert({a:1}); it can't recall it when I try db.bar.find() afterwards.
[05:17:08] <Lope> I'm running it with this command. su -c '/usr/bin/mongod --replSet rs0 --oplogSize 50 --wiredTigerCacheSizeGB 1 --smallfiles --port 27017 --dbpath /mnt/mongodb --logpath /var/log/mongodb/test.log --logRotate rename --fork' mongodb
[05:45:44] <Jonno_FTW> macwinner: you could talk about how people use it for the wrong reasons (they need a relational db or ACID), then complain it doesn't meet their needs. People just don't know when mongo is the right tool for the job
[05:49:42] <Lope> ok i figured it out. I had a wrong path somewhere.
[05:50:09] <Lope> does docker always use the same offset for container UID's?
[05:50:38] <macwinner> Jonno_FTW: yeah, that's going to be one of my points
[05:51:23] <Lope> for example if I run a CT from an image named imgfoo. let's say it's UID 33 is 10033 on the host. Now if I destroy that CT, and create another one, will the UID mapping be the same, or different?
[07:39:15] <chris|> is there a way to call sh.* functions from the java client?
[07:47:35] <Boomtime> @chris|: by sh.* i think you mean the shell helpers - these are javascript functions defined in the javascript of the shell
[07:47:59] <Boomtime> if you want to emulate these in any driver, take a look at the implementation in the shell
[07:50:19] <chris|> Boomtime: yes, I was already looking at those and was hoping to save me some time
[08:00:18] <Boomtime> many of the helpers don't have a generic means of implementation in a driver; consider sh.status() which ouputs a formatted table-ish sort of arrangement to the console - what would you expect the java driver to do?
[08:01:24] <Boomtime> even if there were an analogue to this that took say a textstream or some such, would it really be so useful that any single driver would want to maintain it?
[08:02:13] <Boomtime> the range of possibilities is wide, but the shell has a narrow scope so it can implement there extra little helpers
[08:05:45] <Lope> does anyone know how the /etc/subuid file works?
[08:26:28] <kurushiyama> Jonno_FTW: What language?
[09:11:47] <chris|> what is the default mode of the balancer? is it enabled by default or do I have to enable it explicitly?
[09:15:13] <kurushiyama> chris|: Enabled by default
[09:25:44] <chris|> so I assume isBalancerRunning is only true if the balancer is currently doing work?
[09:28:46] <kurushiyama> Yes. but there should be an enabled flag in sh.status()
[09:29:06] <kurushiyama> iirc, there should be a command for it, even.
[09:30:28] <chris|> yes, there is rs.getBalancerState()
[13:03:30] <rpap> hello , I am looking for recommendations. I am planning to store user web activity in mongodb. There can be hundreds of requests per second and data payload will vary from page to page. Is mongodb the best choice
[13:21:12] <darkfrog> I am trying to diagnose a bug in my project using ReactiveMongo, and when I dump out the query it references "$oid" but when I attempt to use the query directly it says, "unknown operator: $oid". I presumed ReactiveMongo was converting "$oid" to "_id" and tried that and the query then worked, but it then doesn't return any results.
[13:38:55] <jordonbiondo> I'm looking for something like $ but updates all matching array elements, does that exist? {foo: [{a: 3}, {a: 1}, {a: 3}]} => {foo: [{a: 9}, {a: 1}, {a: 9}]}
[13:40:05] <jordonbiondo> I can use $ to update the first foo.a to 9 where foo.a == 3, but can't find a way to update all foo.a to 9 where foo.a == 3
[13:42:13] <kurushiyama> jordonbiondo Nope. And if you need to, it smells like overembedding.
[13:59:29] <jordonbiondo> kurushiyama: normally yes, but this was for a migration, I guess I'll just script it with mongoose, thanks.
[14:06:21] <saml> kurushiyama, what was the timestamp db you're using?
[15:36:47] <Frenchiie> i have mongod running on the default port and then i try to start a new mongod on port 27018 but i get an error
[15:36:57] <Frenchiie> Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'. 2016-06-07T11:25:33.690-0400 W - [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty. 2016-06-07T11:25:33.690-0400 I STORAGE [initandlisten] exception in initAndListen: 98 Unable to lock file: /data/db/mongod.lock errno:35 Resource temporarily unavailable. Is a mongod
[15:37:12] <Frenchiie> i tried mongod --repair but it tries to repair on the default port so that doesn't help
[15:37:28] <Frenchiie> i also tried mongod --port 27018 --repair but i get the errror above
[15:37:40] <Frenchiie> anyone knows how to fix this?
[15:49:20] <ily> hello there. i have a question. for the people that use the Mongoose API with nodejs. if i have a index on say first name like FirstName: 1
[15:49:46] <ily> and i use mongoose to do a search by first name. do i have to do a normal kind of search? or do i have to pass arguments to make it aware to use the index?
[15:52:09] <ily> StephenLynx, ok alright. it will be a bit of work but i think im tired of mongoose enough to switch over. so for mongodb how does that one handle indexes?
[15:52:44] <StephenLynx> usually its smart enough to use indexes where it can.
[15:52:45] <ily> StephenLynx, as in, is the use of an index transparent? as in if i do a search without index and one with, it will be the same command?
[15:53:00] <StephenLynx> explain will tell you what the db is doing to do what you asked for.
[16:39:45] <edrocks> kurushiyama: I'm working on my reporting stuff a little bit this week so we'll chat
[16:43:13] <darkfrog> I have a query I'm trying to optimize running against MongoDB 2.6.9 and I'm just trying to count results, but it has an $in array of like 500 ObjectIds. It takes roughly three minutes to count them all. Obviously this should be completely re-written, but I'm hoping for some low-hanging optimizations in the short-run if anyone has any suggestions?
[16:44:15] <kurushiyama> darkfrog Most do hope for that ;) can you pastebin the original query (preferably in shell syntax) and the indices of the collection in question?
[16:46:14] <darkfrog> kurushiyama: "owner.laboratory" does have an index
[16:46:58] <kurushiyama> darkfrog Please add an `.explain()` to the query and pastebin the results.
[16:47:33] <darkfrog> kurushiyama: give me about 3 minutes to run it. :o
[16:47:50] <kurushiyama> darkfrog Debugging is debugging
[16:48:17] <darkfrog> most of the optimizations I've been able to find don't work on 2.6.x
[16:49:33] <kurushiyama> darkfrog From what I can see, your problem most likely stems from a data model not fitting your use cases.
[16:49:53] <kurushiyama> darkfrog I am tempted to put a bet on Mongoose driven data modelling.
[16:50:24] <darkfrog> kurushiyama: agreed...I just inherited this project and am about to start a complete re-write, but I have to get the performance to "limping" status before I can do so.
[16:51:01] <darkfrog> kurushiyama: I get: "TypeError: db.analysisresult.count(...).explain is not a function"
[16:53:04] <kurushiyama> darkfrog What I can say is that the $and is unnecessary.
[16:53:48] <darkfrog> kurushiyama: no, it's not Mongoose driven, but I think it might be nearly as bad. It currently uses Casbah (Scala toolkit for MongoDB)
[16:54:53] <darkfrog> kurushiyama: if I remove all the ObjectIds from the $in it's perfectly fast
[16:58:47] <kurushiyama> run an sh.status() to check
[16:59:11] <KostyaSha> is it possible to speedup somehow index build?
[16:59:24] <darkfrog> kurushiyama: hehe, my account isn't authorized to execute that.
[16:59:28] <kurushiyama> KostyaSha Yep. Do it in foreground.
[16:59:48] <kurushiyama> darkfrog Uh... Are you at some sort of hoster with that?
[16:59:51] <KostyaSha> fresh replica, grid.fs, 400gb of data
[17:00:25] <KostyaSha> it uses only 1 core and do it for ~24 hours already
[17:00:29] <darkfrog> kurushiyama: yeah, they cloned the database on a cloud hosted MongoDB server so I could test.
[17:01:54] <darkfrog> I wasn't sure if it would be possible to do a String contains type of call instead of $in with oids?
[17:01:56] <kurushiyama> darkfrog Ah, ok. Well, you can remove the $and clause (not that this would help much, but it is simply unnnecessary, since clauses are always inclusive). Aside from that... Remodel?
[17:02:07] <darkfrog> ...wasn't sure if that would be faster either, but I thought it might be worth a try.
[17:02:09] <kurushiyama> darkfrog Nah, that does not work.
[17:02:48] <darkfrog> there's no way to accomplish $in faster?
[17:05:56] <kurushiyama> darkfrog $in queries are notorious.
[17:07:09] <darkfrog> kurushiyama: okay, I have another code-related work-around, but was hoping someone here might have some suggestions before I make my code uglier. :-p
[17:08:16] <kurushiyama> darkfrog Have you tried aggregations?
[17:08:46] <KostyaSha> is there any other types of index generation rather then "bulk"?
[17:08:51] <darkfrog> kurushiyama: how do you mean?
[17:09:26] <darkfrog> I thought about trying a mapReduce scenario, but wasn't sure it would make any difference
[17:10:31] <kurushiyama> darkfrog In general, the aggregation pipeline outperforms m/r. In this case maybe not, but I would try.
[17:10:34] <darkfrog> kurushiyama: how might I re-write that query to use aggregation where it wouldn't still be a massive $in?
[17:11:39] <kurushiyama> darkfrog well, yeah, but still might be faster. You have to try. My suggestion: Try aggregations first, and if that fails do an m/r (which tends not to be incredibly fast)
[17:14:02] <kurushiyama> darkfrog But best ofc would be to start the rewrite instead of putting a lot of effort into something which might be in vain.
[17:20:09] <wwwd> I am trying to figure out how to return specific fields from an array in a sub-document based on the date. That is I want to return fields from a document inside the array of registrations which is nested in pets. E.g. https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1. Can anyone tell me how to do this. I have tried using $slice and including specific fiels with projections. The closest I can get is to return the entire
[17:20:10] <wwwd> array with only the fields I want. As I said, ideally I would like only the fields for the specific registration.
[17:22:51] <darkfrog> kurushiyama: something like this? https://gist.github.com/darkfrog26/44f7e0d99d728a046f1110093ae49f3e
[17:24:44] <kurushiyama> darkfrog depends on your data model. would need to see a sample doc.
[17:29:31] <darkfrog> kurushiyama: well, aggregate slightly improved the performance, but not enough. Thanks anyway.
[17:43:36] <poz2k4444> hi guys, does anybody here know how to use mongo-connector correctly?
[17:45:47] <wwwd> poz2k4444: I followed https://github.com/mongodb-labs/mongo-connector and found it to be very easy to get going with.
[17:46:21] <poz2k4444> wwwd: when you modify a document on a mongo collection, does it sync automatically with elasticsearch?
[17:49:12] <wwwd> I was shocked at how easy it is to get setup!
[17:49:38] <poz2k4444> wwwd: that is the problem I'm facing, I can't get it to sync the updates, I've read and re-read the docs and everything seems fine, but the logs stays trying to sync (apparently)
[17:52:28] <wwwd> poz2k4444: Do you have mongo running replica seta?
[17:52:56] <poz2k4444> wwwd: yeah, it won't run unless you have replicaset
[18:00:01] <wwwd> In fact...anyone care to take a look at my previous question and https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1 ;)
[18:00:21] <Derick> wwwd: it helps if you mention the subject here
[18:00:29] <wwwd> I have been struggling with this for hours!
[18:01:26] <wwwd> I am trying to figure out how to return specific fields from an array
[18:01:26] <wwwd> in a sub-document based on the date. That is I want to return fields
[18:01:26] <wwwd> from a document inside the array of registrations which is nested in
[18:01:27] <wwwd> pets. E.g. https://gist.github.com/johnhitz/cfec74eb7d33a20bce4715a045269bc1. Can
[18:01:29] <wwwd> anyone tell me how to do this. I have tried using $slice and including
[18:01:34] <wwwd> specific fiels with projections. The closest I can get is to return the
[18:07:28] <wwwd> Kurushiyama: It is a pet with a list of registrations nested in it. The registrations reflect all the registration events in the pets history. And isn't line 77 - 83 the projection?
[18:08:05] <kurushiyama> wwwd So, you self reference the parent document in the subdocuments?
[18:08:16] <kurushiyama> wwwd Let me quickly check something
[18:09:30] <wwwd> Kurushiyama: And if so I did an $elemMatch on the date and it ruturned a list of _id's. And, yes...I did not set it up I just have to use it for the time being. Eventually I would like to flatten it so that registrations and vaccinations are collections with references by _id...but not today!
[18:11:33] <kurushiyama> Problem a) Your query part does not match the sample document
[18:11:39] <wwwd> I can get back the correct pets...i.e just the pets that have registrations done in June. So maybe I just need to do post processing to pull out what I need for now. I was just hoping to find a way to do it easilly with mongo.
[18:13:44] <kurushiyama> wwwd Sorry, my bad, wrong collection
[18:14:04] <wwwd> Kurushiyama: Ok pretty new to mongo and programming in general, but does'nt it return the pet based on 'registrations' that match the date >= june 1 and < june 30?
[18:23:53] <wwwd> kurushiyama: Thank you very much! That is huge!!!
[18:26:27] <kurushiyama> Actually, it was just a small mistake compare your line 80 to my line 81
[18:28:14] <wwwd> Yep! Saw it and it seems to be working perfectly!
[18:29:00] <kurushiyama> wwwd Well, I tried it before posting it ;)
[18:29:55] <kurushiyama> wwwd Extra points for style: Sample doc, code in shell syntax, result, expected result. If everybody followed that pattern, it would be much easier to help people.
[18:31:11] <wwwd> Lol! Thanks! I suspect I will be back!
[18:31:53] <kurushiyama> wwwd May I give you a well meant advice? Not meant as patronizing, but something I noticed.
[18:36:38] <wwwd> I am hoping that very soon I will get it fixed. I inherited the data and can only do so much at one time!
[18:36:46] <kurushiyama> wwwd Please do not take this wrong, but I guess https://university.mongodb.com/courses/M101P/about might be a good idea.
[18:37:25] <kurushiyama> wwwd 2-3h per week. Every minute worth it.
[18:38:00] <wwwd> Was I correct when I said I want to flatten this out so that the registrations and vaccination are in their own collections and I use references by id?
[18:38:43] <wwwd> I will deffinitly look at it! I am really interested in learning. I do have the problem that my front end is using this model and I cant realy change it now!
[18:39:38] <kurushiyama> wwwd Maybe. Depends on your use cases. First thing to learn. You do not model by ERM, you model to have the questions on your data for your most common use cases answered as efficiently as possible.
[18:46:32] <jayjo_> I have a s3 bucket with json files... I'm trying to import into my local mongodb. I'm using s3cmd and piping it to mongoimport, but I get this error: "Failed: error processing document #1: invalid character 'd' looking for beginning of value"
[18:47:23] <jayjo_> and then it reads "Imported 0 documents"
[18:55:13] <jayjo_> so my entire command currently is: s3cmd get --recursive s3://<bucket-name>/path/- | mongoimport -d <db> -c <collection> --type json
[19:07:20] <wwwd> kurushiyama: Can I use this in a mongoexport command? Something like https://gist.github.com/johnhitz/c9128e52871a482d13d9f2c7cf30548d
[19:08:27] <kurushiyama> wwwd I stay away from mongodump and mongoexport as far as I can. I do not say they are bad tools, but personally, I dislike them.
[20:01:40] <StephenLynx> the first one kind of bites more than you need, doesn't it?
[20:02:07] <StephenLynx> because its taking the whole fs? or it doesn't work that way?
[20:03:06] <kurushiyama> StephenLynx Actually not so much. I tend to mount the snapshots, pump them into a tar file which (if no compression was used) is pumped through snappy and then either via nc or ssh to a backup server.
[20:03:41] <kurushiyama> StephenLynx But yes, in general you have a point in time copy of the complete data. Which is usually what I want.
[20:03:58] <kurushiyama> StephenLynx And if not, I usually want a migration.
[20:04:09] <StephenLynx> and is this method any faster?
[20:04:32] <StephenLynx> you could use replica sets for that, couldn't you?
[20:04:36] <StephenLynx> then you would reach zero downtime
[20:04:57] <kurushiyama> StephenLynx Well, I do. But in a standard setup, you would loose a replication factor.
[20:05:18] <dino82> What would cause a node to stay in STARTUP2 and never become a secondary? optimeDate stays at 1970
[20:05:28] <jayjo_> saml: by itself it works and prints the output of the data, but not on mongoimport. Could it be that the files are json files that are json separated by newlines. can i handle that in the code?
[20:05:37] <kurushiyama> dino82 I guess the log will tell you.
[20:06:00] <dino82> The log simply says it's connecting to the other replica set members
[20:06:34] <kurushiyama> StephenLynx And think of a sharded cluster. Doing a dump comes with its own intricacies, there.
[20:07:44] <jayjo_> Could I just pip it one more time?
[20:08:29] <saml> jayjo_, you mean you can do it in two steps? s3cmd > wow.jsons; mongoimport wow.jsons ?
[20:09:06] <jayjo_> I'm thinking it may be becauuse the file 1.json has many json objects separated by newlines
[20:09:11] <dino82> It seems like it isnt consuming the oplog
[20:09:13] <jayjo_> can I take that out in a pipe so I don't have to write the json to disk
[20:09:18] <saml> isn't that how mongoimport expects?
[20:12:01] <kurushiyama> jayjo_ You could simply pipe it through sed to remove all newlines between a closing and an opening bracket. Or all newlines, thinking of it, in case those are the problems.
[20:13:11] <kurushiyama> wait. Does a s3 cli command actually write the file to stdout?
[20:14:10] <jayjo_> think thats what the trailing - does
[20:14:36] <kurushiyama> jayjo_ Have you _checked_?
[20:26:22] <dino82> Isn't the _id index supposed to be unique by default?
[20:29:30] <kurushiyama> dino82 iirc, which, given my level of beer might be off.
[20:29:32] <dino82> I'm getting this warning: WARNING: the collection 'testdb.Db' lacks a unique index on _id. This index is needed for replication to function properly
[20:30:03] <dino82> Might explain my replication problems
[20:30:17] <dino82> This database came from Parse, so maybe they did something funny to it
[20:32:14] <jayjo_> I don't understand what you're asking me to check. - writes it to stdout
[20:32:26] <kurushiyama> That would have to be really funny – actually, you can not even delete the _id index. And if you import documents, those without an _id are simply assigned one.
[20:33:16] <kurushiyama> jayjo_ Well, if it does so, then it is fine. "_ think_ thats what the trailing - does" Assumptions have bitten my lower back more than once.
[20:34:57] <dino82> Should db.id.getIndexes() return something other than [ ] ?
[20:43:48] <kurushiyama> dino82 I can only imagine that data files were copied, and some where missing. An _id index can not be deleted, and every collection gets one when it is created. o.O
[20:47:09] <kurushiyama> dino82 What you could do is to try to recreate it. Lemme check quickly.
[21:13:07] <dino82> I think I'm getting in over my head :|
[21:13:38] <dino82> None of this data is ultra-mission-critical, thankfully
[21:13:57] <dino82> I'm heading home but will return
[21:21:48] <poz2k4444> hey guys, how can I replicate the oplog of one of my collections? apparently mongo-connector can update data because of this, I've tried with a new collection and everything works fine and with a dumped collection and the updates doesn't work, so after a while I realized that this has something to do with the oplog
[21:21:52] <kurushiyama> dino82 I will not. At least not today – it is 11pm
[23:38:06] <jayjo_> I'm still trying to use s3cmd and mongoimport to transfer data from s3 to mongodb. When I use s3cmd if I put it to stdout then there are other bits of the s3cmd output that is present and throws errors. If I use --quiet it supresses output entirely. any ideas?
[23:41:03] <jayjo_> if I just run the s3cmd I get this output: http://pastie.org/10868544
[23:54:14] <sector_0> how integral are those online course on the mongodb site?
[23:54:32] <sector_0> I'd sign up for them but they don't start until august
[23:54:57] <sector_0> I'm at the stage in my project where I'm ready to start building my database
[23:55:11] <sector_0> ...I kinda don't want to wait that long
[23:56:07] <sector_0> can I get that same info from reading the manual?