PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 16th of October, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:41:31] <Boomtime> @fxmulder: I dropped out of the channel apparently, is your question about IO still relevant? FYI, mms.mongodb.com
[00:54:34] <sabrehagen> i'm trying to work out the best way to write a query. i have server logs stored in mongodb, and each log entry has a user property. i want to build a query that will show me users that were active each week for n weeks.
[00:54:39] <sabrehagen> e.g. for n=3, i want to know only users that were active this week, last week, and the week before.
[00:54:44] <sabrehagen> so far my idea is to have something like { $and : [ { $and : [ { user : 'username' }, { _id : { $gt : from1 } }, { _id : { $lt : to1 } } ] }, { same $and object as before but with from2, and to2 } ] }
[00:54:48] <sabrehagen> is that a terrible way to do the query? how mongodb-processing-intensive is that way? should i be using the aggregation framework?
[00:58:18] <fxmulder> Boomtime: it is, it's rather strange not sure why this machine is different from the other two, I've killed everything but mongodb and its still high IO and it isn't the primary
[00:59:43] <fxmulder> the write rate on it is quite low and the read rate is quite high
[01:41:42] <xack-work> Howdy all.
[01:42:16] <Boomtime> hi there
[01:42:49] <xack-work> I'm noob to mongodb. Is this a channel I'm free to ask random noob like questions? (While I do heavily research, some documentation isn't too great for me)
[01:44:17] <Boomtime> sure thing
[01:45:15] <Boomtime> you will get better answers whenever you can demonstrate you've already tried to learn something yourself of course
[01:46:03] <xack-work> I'm from a long term experience of MySQL junk and other RDMS's, so..
[01:46:21] <xack-work> Have you messed with the shard/clustering of MongoDB much? I really enjoy the elegance and documentation
[01:46:40] <xack-work> will be implementing it soon. hoping it isn't a huge resource hog.
[01:54:53] <xack-work> most of my pains are things like: this doesn't work: { _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] }, worker.id: { $exists: false }
[01:54:55] <xack-work> but this does: { $and: [ { _id: ObjectId('561b920d5ac3a9e258178c67')}, { "worker.id": { $exists: false } } ] }
[01:55:43] <xack-work> well $in added back: { $and: [ { _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] } }, { "worker.id": { $exists: false } } ] }
[01:55:50] <xack-work> yeah.. queries get a bit ugly some times. Hehe.
[01:59:01] <Boomtime> you don't need $and in that query
[01:59:14] <Boomtime> the only other difference is that 'worker.id' is quoted
[01:59:20] <Boomtime> and it must be to be valid JSON
[02:00:06] <Boomtime> otherwise the two queries are identical - the only reason you ever need to use $and is when there is a conflict with names at the same level, which many JSON parsers don't like
[02:00:33] <Boomtime> perhaps there are some other use cases too, but generally you don't need $and
[02:01:10] <xack-work> that was my understanding
[02:01:39] <xack-work> { _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] } }, { "worker.id": { $exists: false } }
[02:01:46] <xack-work> returns unexpected token ", { "worke"
[02:02:29] <xack-work> so wasn't sure how to comma delimit multiple conditions without $and
[02:02:30] <cheeser> that } before the ,
[02:02:45] <cheeser> move it to the end
[02:03:25] <xack-work> so.. { _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] }, { "worker.id": { $exists: false } } }
[02:03:46] <xack-work> Error: unexpected token: "{ _id: { $"
[02:04:10] <cheeser> in the shell, yes?
[02:04:18] <cheeser> show your actual entry
[02:04:20] <xack-work> using MongoHub
[02:04:29] <xack-work> let's see at cli..
[02:07:06] <xack-work> db.user.find({ _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] }, { "worker.id": { $exists: false } } }).sort({"_id": 1}).skip(0).limit(30)
[02:07:07] <xack-work> 2015-10-15T19:02:47.518-0700 E QUERY SyntaxError: Unexpected token {
[02:07:59] <xack-work> and for reference, this query: > db.user.find({ _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] } }).sort({"_id": 1}).skip(0).limit(30)
[02:08:01] <xack-work> works fine, so..
[02:09:32] <xack-work> this is fine too.. db.user.find({ "worker.id": { $exists: false } }).sort({"_id": 1}).skip(0).limit(30)
[02:09:39] <cheeser> db.user.find({ _id: { $in: [ ObjectId('561b920d5ac3a9e258178c67') ] }, "worker.id": { $exists: false } }).sort({"_id": 1}).skip(0).limit(30)
[02:09:54] <cheeser> the worker.id clause shouldn't be in {}
[02:10:31] <xack-work> hmmmm
[02:10:59] <xack-work> Yeah, that worked fine. I see it now.
[02:11:18] <xack-work> due to the $in being wrapped, I messed up the worker.id part. Makes sense.
[02:12:17] <xack-work> now to convert it to go mgo.. wee.
[02:24:07] <xack-work> trying to make a findAndModify command, returns an error The dotted field 'worker.id' in 'worker.id' is not valid for storage.
[02:24:27] <xack-work> when you guys debug query issues, i assume you hit the cli and manually hit it that way?
[02:25:37] <xack-work> i turned on verbose mode (db.setProfilingLevel(2)) and see dumps of queries, but most that junk isn't easily copy pasted into cli.. getting myself acclimated to the new environment. :)
[02:28:05] <cheeser> field names can't have dots in them.
[02:29:11] <xack-work> my use case above is that i'm trying to place inside a document a worker field, and inside that an id field, hence worker.id
[02:30:18] <joannac> should be {worker: {id: ...}}
[02:30:25] <xack-work> oh?
[02:30:38] <xack-work> so during queries you can use dot notation, but during updates.. you wrap them ?
[02:31:18] <xack-work> that worked. THanks
[02:54:09] <sabrehagen> does anybody have any tips for my problem above?
[02:57:45] <xack-work> before i jump too heavily, findAndModify should only modify the field noted in the change, not REMOVE all other document data and only place the change data.. right? heh
[03:04:14] <joannac> xack-work: depends what the update part looks like
[03:07:21] <Boomtime> @sabrehagen: why don't you use a single date range?
[03:07:47] <Boomtime> also, you don't need any of those $and
[03:07:55] <Boomtime> predicates are 'and' by default
[03:08:43] <sabrehagen> i think i can't use a single date range because i need to know if their logs are in week 1, week 2, and week 3, not just anywhere between week 1 to week 3
[03:14:24] <sabrehagen> and on that premise, that's why i used the multiple $ands
[03:19:07] <Boomtime> so how are you going to tell whether they are in week1 and not week2?
[03:19:20] <Boomtime> you run a single query.. you get back a single contiguous stream os results
[03:20:04] <Boomtime> if you want them grouped by which week they fall into, you need to either use aggregation to generate a specific grouping, or run 3 queries
[03:20:32] <sabrehagen> no need to group them by week, just know if the conditions were met or not
[03:20:51] <Boomtime> yeah.. and when you run that query, and the 'conditions' are met?
[03:21:11] <Boomtime> the 'conditions' you have stipulated are "in week1 AND in week2" - you'll get back nothing right?
[03:21:48] <Boomtime> so even if you want multiple ranges for that fate field, you'll need a $or
[03:22:42] <sabrehagen> i don't believe so. i have server logs where each has a user recorded against each log, so in week one and week two should be fine, both can hold true
[03:22:44] <Boomtime> but let's say you run a query with "in week1 OR in week2" -> now what? the results are indistinguishable from "in the fortnight X"
[03:23:48] <Boomtime> aren't you using _id as the date field?
[03:23:55] <Boomtime> can you re-paste your query here
[03:24:51] <sabrehagen> i wrote it in irc, so it's rough
[03:24:57] <sabrehagen> something like { $and : [ { $and : [ { user : 'username' }, { _id : { $gt : from1 } }, { _id : { $lt : to1 } } ] }, { same $and object as before but with from2, and to2 } ] }
[03:25:14] <sabrehagen> i agree i can drop the inner $and, something like...
[03:25:52] <Boomtime> you can drop all the $and, there is no reason for any of them, you'll get the same results with or without them
[03:26:09] <sabrehagen> { $and : [ { user : 'username', _id : { $gt : from1, $lt : to1 } }, { same $and object as before but with from2, and to2 } ] }
[03:26:12] <Boomtime> note as it is, the result won't be parseable because of duplicate _id
[03:26:14] <sabrehagen> is that legal?
[03:26:21] <Boomtime> but that can be re-formed to remove one of them anyway
[03:26:35] <Boomtime> { _id : { $gt : from1, $lt : to1 } }
[03:26:48] <sabrehagen> cool, so like what i posted after
[03:27:01] <Boomtime> keep doing that and you'll see that no $and is required here
[03:28:21] <sabrehagen> isn't this exactly the point where i need the $and? e.g. { $and : [ { user : 'username', _id : { $gt : from1, $lt : to1 }, { user : 'username', _id : { $gt : from2, $lt : to2 }, { user : 'username', _id : { $gt : from3, $lt : to3 } }
[03:28:41] <sabrehagen> * } ] } on the end
[03:29:13] <Boomtime> use $in for the username
[03:29:22] <Boomtime> wait wtf...
[03:29:29] <Boomtime> the same username for each predicate?
[03:29:36] <Boomtime> ok, so again...
[03:29:42] <sabrehagen> yes. looks like that could be extracted out...
[03:30:23] <Boomtime> are the three date ranges contiguous? or are these spread out?
[03:30:43] <Boomtime> like, on a calendar, if each range represents 1 week, are all 3 weeks in a row?
[03:32:02] <sabrehagen> they're contiguous
[03:32:50] <sabrehagen> also, how come you suggested $in for username? was that to avoid malicious input exploitation?
[03:32:58] <Pritchard> Does the update method / set support updating with the value of another field in the collection?
[03:33:07] <Boomtime> no, i thought you had different usernames
[03:33:14] <Pritchard> I want to essentially rename a field from "start" to "begin".
[03:33:19] <sabrehagen> okay cool
[03:33:39] <sabrehagen> Pritchard: see $rename
[03:33:52] <Pritchard> Awesome, thanks.
[03:33:56] <Pritchard> sabrehagen.
[03:34:04] <Boomtime> if your three date ranges are contiguous then the query you have is no different from querying for from1 through to to3 a single match
[03:34:31] <Boomtime> i.e if to1 == from2 then you may as well coalesce them
[03:35:34] <sabrehagen> wouldn't that return results where one log entry was found in any of the three weeks, essentially a $or not a $and?
[03:36:19] <Boomtime> what do you think you'll get back right now?
[03:36:30] <Boomtime> i postulate that you cannot possibly get any results right now
[03:36:37] <sabrehagen> haha okay :)
[03:36:39] <Boomtime> the query is impossible to satisfy
[03:37:09] <sabrehagen> yes, i would have to agree with you
[03:37:13] <Boomtime> :-)
[03:37:40] <sabrehagen> i want the intersection of the three sets, where the sets are the three objects in the $and above
[03:38:22] <Boomtime> let me see if i can summarize:
[03:38:41] <Boomtime> you have lots of documents like this: { username: "Boomtime", _id: ISODate }
[03:38:47] <sabrehagen> yes
[03:39:35] <Boomtime> and you want to know which instances of 'username' where there are at least 3 documents, where each of those _id is a date that occurs in 3 particular weeks
[03:39:50] <Boomtime> one each of three particular weeks
[03:40:16] <sabrehagen> yes
[03:40:23] <Boomtime> you will need aggregation, and it might require some weirdness
[03:40:41] <Boomtime> is this an existing system or can you influence the schema?
[03:41:29] <sabrehagen> it's my system so i could influence it
[03:41:36] <sabrehagen> what changes do you suggest?
[03:41:55] <Boomtime> how many documents a week does a user end up inserting (at a guess)?
[03:42:14] <Boomtime> like, are there potentially many many matches for one user in a single given week?
[03:42:14] <sabrehagen> it could be huge, potentially in the thousands. this is a web server transaction log
[03:42:23] <Boomtime> ok, too many to use an array
[03:42:40] <Boomtime> it may simply be a lot easier to use 3 queries
[03:43:12] <Boomtime> and restrict them to 'count' not a plain query since you don't care about the data, you just want to know how many
[03:43:13] <sabrehagen> and then do the intersection in the software layer?
[03:43:21] <Boomtime> pretty much
[03:43:28] <sabrehagen> i'd like to know which users, not just how many :)
[03:43:43] <Boomtime> ah, then you will need aggregation
[03:44:10] <Boomtime> this is not a task for the faint hearted - and i'm not going to teach you aggregation here - i would suggest reading through the docs, and trying a few basic things first
[03:44:35] <Boomtime> things to familiarize yourself with are $group
[03:44:40] <Boomtime> $match
[03:44:58] <Boomtime> make sure there are indexes to use for the date ranges or you're dead
[03:46:09] <sabrehagen> haha yes i will
[03:46:09] <Boomtime> there's probably a couple different ways to do this actually, but if you play with $match and $group[ to start with you can ask here for further help once you understand those
[03:46:16] <sabrehagen> thank you very much, your assistance has been great!
[03:46:53] <Boomtime> :-)
[12:06:39] <newbsduser> hello, how can i send "show dbs" command to mongodb shell... i tried mongo --port 27041 <<< "show dbs" ... but i didn't work
[12:40:59] <cheeser> echo "show dbs" | mongo --port 27041
[12:51:09] <newbsduser> cheeser,not working
[12:51:20] <newbsduser> omg it worked
[12:52:14] <deathanchor> newbsduser: use a script for more controls: var dbo = new Mongo().getDB('test'); var dbarray = dbo.adminCommand('listDatabases').databases;
[12:52:34] <cheeser> what does "not working" mean?
[12:55:04] <newbsduser> actually i tried to use javascript deathanchor but i dont want to parse json output... can i get another output rather than json with javascript command
[13:45:30] <deathanchor> newbsduser: learn some javascript, you can print out any format you want.
[13:45:56] <deathanchor> newbsduser: https://gist.github.com/deathanchor/461baaee48ac569445c4
[15:04:40] <mawe> I imported a dump from v2.2.3 to v2.6.9 and now db.stats().dataSize has nearly doubled, is this because of the 255 vs 256kbyte size of the chunks collection?
[15:08:11] <cheeser> might be related to this: http://docs.mongodb.org/v2.6/core/storage/#power-of-2-sized-allocations
[15:13:02] <mawe> yes I found this, https://jira.mongodb.org/browse/SERVER-13331, I'm wondering if it will shrink if I store it again with the new version
[15:47:59] <StephenLynx> when I connect using ssl by adding ssl=true in the url, am I required to provide a certificate and a key to the server?
[15:48:10] <StephenLynx> mongodb://hostname:27017/test?ssl=true for example
[16:08:56] <steeze> when using the aggregation pipeline with $group and the $push accumulator
[16:09:14] <steeze> if i have to properties using $push, can i assume the indices in those array will match what records they came from?
[16:09:23] <steeze> like, they get stuffed in the same order?
[16:46:50] <jr3> can anyone think of why a multi update that drops a lot of data on a collection reports no data size change via collection.stats()
[18:52:45] <jr3> If I promote a primary to a secondary to do work on it, then promote it back to primary afterwards will the secondary sync back to primary once it's back up
[18:53:30] <n0arch> they will sync from eachother depending on which is the primary
[18:54:09] <jr3> my concern is I want any new records to go back up to the primary once it gets restored
[18:56:13] <n0arch> when you step the primary down, the new primary will accept your writes
[18:56:23] <n0arch> while its a secondary it will sync those writes
[18:56:33] <jr3> gotcha
[18:56:36] <jr3> thanks n0arch
[18:56:40] <n0arch> np
[18:59:20] <tejasmanohar> is there a query i can do to see if a field: string is one of the following ['string1', 'string2', 'string3']?
[18:59:32] <tejasmanohar> or do i have to use mongodb full text search with an index for that?
[18:59:44] <cheeser> $in
[19:00:44] <tejasmanohar> oh i thought $in didnt work with string equality, maybe i'm wrong
[19:01:58] <cheeser> i've never heard anything along those lines
[19:20:40] <honken> hey! started testing MongoDB today, and i like it alot, thou having some problem finding examples of different stuff, mainly php reference code, anyone have a good source?
[19:21:13] <StephenLynx> not using PHP is an option?
[19:21:40] <n0arch> ^
[19:21:55] <honken> it is, but felt like its the way i want to go with this DB :)
[19:22:01] <StephenLynx> not really.
[19:22:04] <n0arch> pymongo
[19:22:09] <StephenLynx> the most natural language to use with it is js.
[19:22:16] <StephenLynx> because of the dictionary-like objects.
[19:22:48] <honken> indeed, but honken“s natural way is to use it with PHP :)
[19:23:09] <StephenLynx> well, honken will have a bad time, IMO.
[19:23:16] <honken> :)
[19:23:17] <StephenLynx> because the PHP community is stuck in 1995
[19:23:19] <deathanchor> It is not necessary to change. Survival is not mandatory.
[19:23:49] <StephenLynx> I wouldn't be surprised if most PHP devs have never heard of mongo.
[19:23:57] <honken> just looking now for a SORT/ORDER BY option.
[19:24:13] <honken> ohh never mind found one now :)
[19:34:24] <him> hi
[19:35:50] <him> hi
[19:46:08] <him> hi
[19:46:34] <him> having problem in aggregation pipline
[19:46:51] <him> its taking huge amount of time to process data
[19:49:17] <him> ??
[20:11:53] <steeze> tejasmanohar, $in doesnt work with partial text matching or regex
[20:12:02] <steeze> but does work with normal matching
[20:12:16] <tejasmanohar> steeze: perfect
[20:13:03] <steeze> there's a jira issue out there someone calling for partial text matching with $in. boy howdy would that have helped me last month
[20:13:27] <steeze> ended up building a crazy looking query with regex and indexed fields, still not as powerful as it could have been with partial matching for $in though :/
[20:29:46] <tejasmanohar> lol
[20:29:48] <tejasmanohar> :)
[20:33:21] <honken> quick question using the console, i cant seem to be able to update anything
[20:33:23] <honken> http://pastebin.com/gMrk4JUX
[20:33:58] <honken> trying to set the field pos to 1, since i messed up when adding it so it got the NULL value
[20:34:03] <honken> but it aint letting me.
[20:34:23] <honken> should try team not name
[20:34:45] <honken> well still aint updating
[20:38:27] <amitprakash> Hi, wheneever I specify a readPreference=primaryPreferred/secondaryPreferred against MongoClient, I keep getting a ServerSelectionTimeoutError: No replica set members match selector "<function any_server_selector at 0x7f8de97a7bf8>" at random
[20:38:32] <amitprakash> How do I resolve this issue?
[20:54:57] <honken> figured it out, thanks for all the respons :)
[21:17:31] <mikeatgl> I've recently experienced Mongo crashing with a really simple API. Before the log ends it says there were over 3800 connections now open. That seems like way too much. I'm using db.close() on my callbacks. Is that a normal number of connections? The DB is probably only getting a hundred hits a week right now at most.
[21:18:56] <mikeatgl> It had been running with no problems for a couple weeks and all of a sudden won't stay up. No changes were made to the Node script or Mongo version as far as I know. The only thing I could see is that connection number kept going up and up.
[22:28:26] <amitprakash> Anothing thing I'm seeing is that MongoDB is taking a lot of time to return all the results (49s) via pymongo when I can fetch them directly almost instantaneously when querying the db using the mongo client
[22:32:39] <Boomtime> @amitprakash: 'all the results' is how many roughly?
[22:32:55] <amitprakash> Boomtime, 5k
[22:33:13] <Boomtime> the shell does not return them all - it returns a cursor
[22:33:28] <Boomtime> so does pymongo, but i'm betting you're measuring how long it takes to get them all in mython
[22:33:33] <Boomtime> *python
[22:33:33] <amitprakash> 5313 to be precise
[22:33:48] <amitprakash> Boomtime, and yes.. I tried using passing a stream of it
[22:33:55] <amitprakash> Boomtime, well it shouldn't take 49s
[22:34:02] <amitprakash> that significantly longer
[22:34:14] <amitprakash> Boomtime, and yes, possibly
[22:34:16] <Boomtime> are you getting all the results in the shell?
[22:34:21] <amitprakash> Yes
[22:34:33] <amitprakash> pushing to a list
[22:34:42] <Boomtime> paste the code
[22:35:02] <Boomtime> pastebin, both python snippet and shell snippet
[22:35:09] <Boomtime> shell should be a one line, python might be too
[22:37:19] <amitprakash> Boomtime, https://bpaste.net/show/88199b699c20
[22:39:08] <amitprakash> If you mean mongo shell paste, thats just db.barr.edges.find({'active': true}) and then spamming it
[22:43:03] <atomicb0mb> Hello! Why i'm getting "callback is not a function" here: https://jsfiddle.net/qbxq356s/ ?
[22:44:11] <Boomtime> @amitprakash: db.barr.edges.find({'active': true}) <- doesn't get all the results, what do you mean 'spamming it'?
[22:45:41] <Boomtime> @amitprakash: please add .toArray() to the end of the find() in the shell
[22:47:36] <atomicb0mb> anybody?
[23:12:28] <amitprakash> Boomtime, spamming 'it'
[23:12:33] <amitprakash> on the mongo shell
[23:12:38] <amitprakash> gets the next batch of results
[23:16:01] <Boomtime> @amitprakash: and in your python code, how do you know you're measuring the driver and not how long it takes python to insert 5000 items into a dynamic array?
[23:21:56] <amitprakash> Boomtime, I don't know
[23:22:45] <Boomtime> remove the lst.append and just increment a counter - the driver is forced to retrieve the doc from the server, but python doesn't have to do anything with it
[23:24:29] <amitprakash> aight