PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 29th of July, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:06:01] <tpayne> yeah this isn't working out very well, it does a find instead of an absolute heh
[00:06:23] <cheeser> instead of an absolute?
[00:22:49] <tpayne> cheeser: yeah, for example if the value is ["2", "3"] and I insert ["2"] it collides
[00:30:16] <tpayne> it would be great if i could tell it not to have duplicates at all,
[00:31:30] <cheeser> where are you inserting that 2?
[00:31:52] <tpayne> what do you mean?
[00:32:32] <tpayne> if i could set _id myself, then i can create a hash
[00:32:43] <cheeser> what's the insert command that's failing?
[00:32:55] <tpayne> nothing is failing, it's just adding duplicates
[00:32:58] <tpayne> i'm using save
[00:33:23] <cheeser> can you pastebin your indexes?
[00:33:40] <cheeser> and an example insert?
[00:36:10] <tpayne> nice that worked
[00:36:15] <tpayne> i just hacked _id
[00:36:34] <cheeser> how so?
[00:41:19] <tpayne> just created my own hash and set _id
[00:41:22] <tpayne> so it didn't set it for me
[00:41:34] <tpayne> and since _id is unique by default, it ignores duplicates
[00:41:43] <cheeser> you can set _id to anything you want. it can be a document even.
[00:50:20] <caitp> Thu Jan 01 1970 14:45:00 GMT-0500 (EST) = 71100000 ("1970-01-01T19:45:00.000Z")
[00:50:21] <caitp> startTime offset: 300 (53100)
[00:50:48] <caitp> I don't really get it
[00:51:40] <caitp> it translates to 2:45 pm
[00:51:45] <caitp> i mean why
[00:51:50] <caitp> how is this helpful ;-;
[00:52:09] <cheeser> not sure I follow
[00:52:50] <caitp> client's request has that start time URL encoded (coming from an HTML5 time input)
[00:53:18] <caitp> server gets it, deserializes it, and decides that 2:45 is the actual time
[00:53:25] <caitp> and not 7:45
[00:53:39] <caitp> it is such a bloody headache :u
[00:55:20] <caitp> I'm not sure why i'm pasting that in here, that's not really a mongo issue
[00:55:25] <caitp> but whatever!
[00:55:27] <cheeser> :)
[00:55:33] <cheeser> i dunno what to tell you on that one.
[00:55:43] <caitp> hey it's 3rd glass of wine, it happens
[00:56:04] <cheeser> oooh, i wish I had some.
[01:04:01] <cheeser> tpayne: but perhaps we should get in to that here rather than #scala
[01:04:56] <tpayne> yeah ok i'm not going to use an ObjectId
[01:04:56] <tpayne> thanks
[01:23:57] <cheeser> any time, tpayne
[03:12:49] <tpayne> from the command line, how do i groupBy a certain collections column?
[03:15:50] <ron> http://docs.mongodb.org/manual/reference/aggregation/group/
[03:18:08] <tpayne> ron do you use scala casbah?
[03:18:16] <ron> no
[03:21:14] <tpayne> does anyone in here use casbah, and nows how to use its groupBy?
[03:22:20] <cheeser> doesn't its dsl basically follow the shell syntax?
[03:22:26] <cheeser> i've only glanced at it.
[03:30:10] <pngl> Do people use the projection part of queries a lot or do people get all the records back and sort things out in their language's driver?
[03:31:47] <cheeser> depends on the use case, i think.
[03:39:40] <tpayne> I don't know why all of the casbah examples are so bad
[03:39:54] <tpayne> they go from 0 to 10 instead of 0 to 1
[03:44:32] <cheeser> trying to remember who works on that...
[03:45:04] <cheeser> i'll see if i can't shake some trees tomorrow. you could probably file an issue or two on the docs/examples
[03:49:08] <tpayne> cheesed maybe i can ask you how to do something via command line, that way i'll know if it's possible or not
[03:49:13] <tpayne> cheeser: *
[03:50:15] <tpayne> I have a collection of a simple document. ["key","value"]
[03:50:29] <tpayne> I have 3 entries in this collection
[03:50:49] <tpayne> ["hello", "world"]
[03:50:52] <tpayne> ["hello", "tpayne"]
[03:51:06] <tpayne> ["goodbye": "buddy"]
[03:51:37] <tpayne> i want to query mongo to give me all of this data back, in a bucket. OR Map[String, List[String]]
[03:52:19] <tpayne> so that i have Map("hello" -> List("world", "tpayne"), "goodbye" -> List("buddy"))
[03:52:47] <tpayne> that's it. I was looking into using coll. aggregate to returning something like this
[03:53:18] <tpayne> groupBy "key" and get back a Map[String, List[String]]
[06:00:31] <raar> Hi, I just removed all collections from a 800GB database and I want to re-claim (un-allocate) this space. I'm considering deleting the whole database, but I don't want a global write lock for more than a couple of seconds - any idea how long deleting this DB might take?
[06:03:00] <raar> any advice would be greatly appreciated
[07:04:47] <cmex> hi all
[07:05:00] <cmex> good morning
[07:05:26] <cmex> imj planning to buil a forum based on mongodb
[07:05:32] <cmex> and hava a question
[07:05:45] <cmex> the structure of messages is like a tree
[07:06:11] <cmex> i think to use the nested object arcitecture to save messages
[07:06:45] <cmex> like : child message object is nested in father message object
[07:06:56] <cmex> and may be N objects in objects
[07:07:12] <cmex> is it best way to do it like this or may be to do it linked?
[07:07:44] <cmex> the flat document with array of child messages
[07:08:00] <cmex> i wnat to know which problems with each of them
[07:08:04] <cmex> please
[07:09:25] <cmex> som1 ? :)
[07:22:05] <rspijker> cmex: ?
[07:27:07] <raar> any ideas on my issue? How to remove allocated space from my now empty database, which is 800gb?
[07:27:18] <raar> without having a long-time global lock
[07:34:46] <rspijker> raar: drop collections
[07:34:54] <cmex> rspijker: yes?
[07:35:04] <rspijker> cmex: what's the question?
[07:35:25] <cmex> the questiion is it right to store nested objects of same type
[07:35:45] <cmex> in forum with tree structure replies
[07:35:57] <cmex> the number of replies to post may be N
[07:36:10] <cmex> or its better to do it linked
[07:36:49] <rspijker> cmex: you can do it nested, just keep in mind the maximum size of a document is 16MB
[07:36:49] <[AD]Turbo> hi there
[07:36:56] <joannac> cmex: What's N?
[07:37:08] <cmex> N is unknown
[07:37:13] <raar> rspijker: they're dropped
[07:37:17] <joannac> You could hit the limit as mentioned by rspijker above
[07:37:29] <cmex> i think 16mb per doc its ok
[07:37:29] <rspijker> cmex: http://docs.mongodb.org/manual/use-cases/storing-comments/
[07:37:32] <rspijker> check that out
[07:37:34] <raar> the database is empty, but still takes up the originally allocated space (which is expected behaviour)
[07:37:59] <rspijker> raar: if it's actually empty, you can remove the files…
[07:38:07] <rspijker> otherwise your only option is repairDatabase()
[07:38:08] <cmex> my problem is to work with data like this . i mead find a document inside document update ..
[07:38:23] <cmex> is it any easy way to find a nested document?\
[07:38:27] <rspijker> which will lock it, but if the amount of actual data in there is small, it will be fairly fast
[07:38:50] <cmex> when i say nested document im not meaning father->child
[07:39:15] <cmex> i mean father->child->child->->child and so on
[07:39:41] <rspijker> cmex: I presume it's not recursive nesting?
[07:39:49] <cmex> yep it is
[07:39:54] <rspijker> cmex: so more like page -> comment -> user -> name ...
[07:40:20] <raar> I don't have access to the filesystem at present (it's being hosted by mongohq) - repairDatabase() on 800GB is going to take a while I assume, no? I don't want a global lock of more than a couple of seconds
[07:40:23] <rspijker> it is? That's a bad idea, I think
[07:40:38] <rspijker> raar: the time it takes depends on the size of the *actual* data
[07:41:06] <rspijker> raar: the problem with repairDatabase is that it needs 800GB to start if your current storage size is 800GB
[07:41:08] <cmex> rspijker: not really got u:((
[07:41:21] <rspijker> cmex: give an example of the structure?
[07:41:21] <cmex> u mean to do it related ?
[07:41:32] <rspijker> I mean, is it recursive
[07:41:41] <rspijker> as in, Parent -> child -> child -> child
[07:41:44] <joannac> cmex: The link rspijker gave has a example on how to find a path through the thread to a comment
[07:41:44] <cmex> each message may be a father of N messages
[07:41:49] <rspijker> are the childs all the same kind of object
[07:41:53] <raar> rspijker: The actual data - well it's empty, so I guess the actual data is zero, no?
[07:41:55] <cmex> child is the same
[07:41:59] <rspijker> cmex: than put those messages in an array
[07:42:04] <cmex> but child can be a parent too
[07:42:17] <rspijker> cmex: look at the example page :)
[07:42:30] <cmex> which examplke page?
[07:42:44] <joannac> cmex: http://docs.mongodb.org/manual/use-cases/storing-comments/
[07:43:32] <cmex> thank you im reaiding it right now
[07:48:57] <rspijker> raar: if it's hosted at mongohq, talk to them
[07:49:14] <rspijker> raar: if the database is really empty, I don't see the problem with downtime
[07:49:41] <raar> the problem is that we have a whole bunch of other databases that are production-critical on the same setup
[07:49:50] <raar> so having a global write lock would be very bad
[07:50:14] <rspijker> I'm not 100% sure, but it might just take a db write lock...
[07:50:17] <rspijker> let me check though
[07:50:40] <joannac> Warning This command obtains a global write lock and will block other operations until it has completed.
[07:51:07] <rspijker> hmmm, yes
[08:53:15] <_Heisenberg_> hI! I installed mongodb on ubuntu 12.04 which automatically installs startscripts in /etc/init and /etc/init.d ... If I want to autmatically start a configserver instead of a normal mongod instance, do I have to change the startscripts or just a config file?
[08:56:31] <rspijker> _Heisenberg_: if you want to start both, you need to add another start script for that
[08:56:47] <rspijker> if you want to start config _instead_ of mongos, then you only need to modify the config file it uses
[08:57:26] <Nodex> config files
[08:58:42] <Nodex> http://docs.mongodb.org/manual/reference/configuration-options/#configsvr <---
[09:02:23] <_Heisenberg_> Nodex: by default it does not use a config file? so I need to change the intit scripts to tell it to use a config file no?
[09:02:55] <rspijker> the init script doesn't tell it to use a cfg file? :s
[09:03:00] <rspijker> I highly doubt that
[09:03:35] <_Heisenberg_> which init script is used? the one in /etc/init.d or /etc/init ? or both?
[09:04:23] <rspijker> /etc/init.d normally
[09:04:54] <rspijker> just cat that one and see what it says
[09:08:53] <_Heisenberg_> seems like i fucked up the installation. cant even start the mongod by using sudo service mongodb start, ps ax returns no runnig mongod instance
[09:09:29] <Nodex> by default mongodb uses an init / upstrat file on ubuntu
[09:09:39] <Nodex> (provided you installed it with apt or w/e)
[09:09:53] <_Heisenberg_> yes, with apt
[09:12:05] <Nodex> then a stop/start script should be in /etc/init.d
[09:12:46] <_Heisenberg_> yes, there is. but also in /etc/init/ . here there is a symlink to /lib/init/upstart-job and a conf file
[09:13:24] <Nodex> `service` doesn't look in /etc/init
[09:17:36] <_Heisenberg_> can't figure out the config file from the init script --> http://pastebin.com/mKEvDW4j
[09:20:49] <Nodex> it resides in /etc/mongodb.conf .... sometimes in /etc/mongodb/mongodb.conf
[09:21:00] <Nodex> (just like everything else you install with apt"
[09:26:39] <puppeh> can I do '$addToSet' for a nested field
[09:27:56] <puppeh> ex. http://pastie.org/private/qjhgph3uknhkyjtl7bmrta
[09:29:40] <rspijker> that should be fine, yeah
[09:30:01] <puppeh> strange, cause it only adds one value
[09:30:05] <puppeh> not the 2nd or 3rd
[09:30:44] <rspijker> puppeh: are you using 1 $addToSet to add multiple items? If so, are you using the $each command?
[09:30:56] <puppeh> no I'm using separate addtosets
[09:31:00] <puppeh> not 1
[09:31:28] <Nodex> your pastebin doesn't give more than 1 value
[09:31:29] <rspijker> what is the query you are using?
[09:32:33] <Nodex> if you want to add multiples to it then you must mix with $each
[09:32:49] <Nodex> http://docs.mongodb.org/manual/reference/operator/addToSet/#addtoset <--- docs
[10:36:47] <aandy> hi, i'm running 2.4.3, and have a weird situation i don't understand. i have a collections, empty, then import a json file with mongoimport which reports ~1k imported objects. from the shell, db.find()/db.findOne() still says null. i then try to insert a doc manually from shell, and it can see only that doc. i export the whole (supposedly 1 object) collection and end up with a ~1k json file again. the only setup for the collection i did was to add db.ensur
[10:37:14] <aandy> am i the only one who sees some sort of black magic going on?
[10:37:37] <Zelest> are you switching database when using the shell?
[10:37:49] <Zelest> (stupid question, but just making sure :-)
[10:37:50] <aandy> it's part of a 1 master, 2 slaves setup if that helps shed some light on it. all actions done to the primary
[10:38:06] <aandy> not stupid at all, and from what i can tell, i'm not
[10:38:18] <Zelest> just "use <databasename>"
[10:38:23] <Zelest> and try that find again :-)
[10:38:25] <aandy> i did change to the admin db to enable text indices, but i've reverted back since
[10:38:29] <aandy> yeah good idea, one sec
[10:39:15] <aandy> argh, you're right. thanks!
[10:39:23] <Zelest> no worries :D
[10:39:28] <aandy> i was using the wrong db on the shell, gah monday mornings
[10:39:51] <aandy> hehe, saved me a headache ;)
[10:39:51] <Zelest> hehe
[11:10:48] <_Heisenberg_> I set up a replicaset in order to deploy a sharded testcluster. am I on the currect way? what makes me suspicious is, that all replicas seem to be "primarys" ---> http://pastebin.com/z6q4yTRB
[11:35:31] <hard_day> hi all
[12:10:21] <hard_day> i have a problem with cursor , i'm reading the doc and the google group by continuous the problem and i have no a idea http://pastie.org/8186478
[12:10:38] <amitprakash> Hi, is it possible to pass the document name as a variable? i.e. instead of saying db.Document.find, I want to call db.find(documentnName, params)
[12:12:24] <kali> amitprakash: you mean the collection name, right ? if not, you're seriously confused
[12:12:45] <kali> hard_day: is it a long-running query ?
[12:13:11] <amitprakash> kali, yes, the collectionname
[12:13:31] <kali> amitprakash: db[collectionName].find(params) then
[12:14:31] <hard_day> Hi kali, Hi kali, no is very short it's like this db.2013070100.find()
[12:20:33] <kali> hard_day: http://docs.mongodb.org/manual/reference/limits/#Restriction on Collection Names
[12:23:39] <hard_day> kali the problem it's the collection name?
[12:24:03] <kali> hard_day: yes.
[12:25:07] <jimbosimo> hi, question if anyone knows, what will happen if I issue a multi-document update instruction and server crashes before all documents are updated? will it redo the last operation when it goes up again ?
[12:25:07] <kali> hard_day: (well, i'm pretty sure)
[12:26:19] <hard_day> kali i'll try
[12:26:25] <hard_day> thanks
[12:28:56] <Nodex> jimbosimo : no
[12:30:18] <jimbosimo> Nodex: thanks
[12:43:48] <dotpot> hello there .)
[12:44:11] <dotpot> aggregation related question: http://pastie.org/private/yxr90lqcxaz9csuuzmpaoa
[13:08:07] <_Heisenberg_> if I try adding a replica as a shard I keep getting "couldn't connect to new shard ReplicaSetMonitor no master found for set: rs0" even if the replica set has one primary and two slaves in it
[13:09:40] <kali> _Heisenberg_: what does your addShard command look like ?
[13:28:06] <_Heisenberg_> kali: sorry just dropped out. my command looks like that; mongos> sh.addShard("rs0/10.1.1.49:27017")
[13:33:39] <_Heisenberg_> kali: see here with response: http://pastebin.com/wPpTYD7S
[13:35:08] <_Heisenberg_> rs.status(): http://pastebin.com/uGfAEqY2
[13:48:04] <_Heisenberg_> ok looks like defining all hosts in /etc/hosts solved the problem...
[13:50:19] <cheeser> yeah, i think you have to pick with the host name or the IP and be consistent with that usage everywhere.
[13:52:59] <kali> _Heisenberg_: great.
[13:55:08] <double_p> cheeser: indeed. otherwise hell breaks lose :>
[14:10:58] <_Heisenberg_> someone has a tip where to get sample data sets, product catloge would be great :>
[14:12:19] <cheeser> that *would* be handy
[14:17:08] <ron> there are ways to generate json quickly for creating such data sets.
[14:17:24] <ron> http://www.json-generator.com/ for example.
[14:21:33] <remonvv> \o
[14:21:38] <ron> o/
[14:23:13] <_Heisenberg_> ron: nice one, let's see if it can generate 10^6 records :D
[14:23:29] <ron> _Heisenberg_: give it a go. there are others out there as well.
[14:23:59] <cheeser> i need that for my javaone presentation actually. i need a ton of "products" in my db to play with.
[14:24:20] <ron> well, go ahead and use it.
[14:25:18] <cheeser> oh, i will!
[14:25:24] <cheeser> probably. maybe.
[14:25:32] <cheeser> i mean, i have a smaller set i've already built.
[14:25:40] <cheeser> maybe i won't use it after all.
[14:25:47] <cheeser> yeah, i'll just leave it. i'm good. all set.
[14:25:52] <cheeser> :D
[14:27:05] <ron> dude, you're just embarrassed admitting I actually gave a good advice. it's okay, I understand.
[14:27:36] <Nodex> ron doens't know anything except how to make tea and coffee
[14:27:44] <cheeser> it's not so much embarrassed as amazed. i've never seen it happen before. ever. not sure how to cope.
[14:27:45] <Nodex> you should never listen to her
[14:28:22] <ron> cheeser: you love me.
[14:28:24] <remonvv> To be fair I just spit out some tea when someone mentioned ron said something useful.
[14:28:46] <remonvv> I'm so proud. It's...it's hard to put into words really. I need a moment to gather myself.
[14:29:12] <ron> Nodex: cheeser knows me for far longer than you do. and trust me, once he learns you adore php, he would have as little respect towards you as me. probably even moreso.
[14:29:27] <remonvv> Ohhh...he makes a valid point.
[14:31:02] <Nodex> yet ron i am still better than you in spite of these things so that really doesn't say a lot about you does it :)
[14:31:23] <ron> Nodex: I love how delusional you are. it makes you so... special.
[14:31:30] <Nodex> I know righ
[14:31:33] <Nodex> right|*
[14:32:00] <Nodex> in all seriousness, apart from making tea and being a fluffer what do you do in your job?
[14:32:13] <remonvv> I only make fun of ron because I know he's somewhat capable
[14:32:30] <cheeser> somewhat *handy*capable
[14:32:57] <Nodex> he knows his redis stuff I'll give him that
[14:33:03] <Nodex> the jury is still out on anything else
[14:33:34] <ron> omg, a compliment from Nodex. shocking.
[14:33:42] <ron> too bad I can't give him any in return.
[14:33:45] <remonvv> Knowing about Redis is rather like knowing DOS
[14:33:51] <ron> I lie, he knows his Solr queries.
[14:33:56] <cheeser> does it really count coming from a php guy?
[14:34:04] <ron> cheeser: no.
[14:34:04] <Nodex> ron, you don't need to compliment me, I have a mirror and a succesful business
[14:34:11] <cheeser> ah, snap!
[14:34:17] <remonvv> Indeed. It's rather like a blind guy complimenting you on your painting skills.
[14:34:46] <remonvv> I sense you people veered OT a bit.
[14:34:53] <cheeser> so. any morphia users?
[14:34:57] <remonvv> lol
[14:34:58] <remonvv> no
[14:35:24] <cheeser> i've just wrapped up everything i'd tagged for 0.102 and was about to cut a release.
[14:35:46] <remonvv> I wrote my own mapping layer after Morphia had some issues. But this was way back when.
[14:35:57] <ron> 'some issues'.
[14:36:00] <cheeser> i'm trying to fix those issues. :)
[14:36:06] <cheeser> what were they do you recall?
[14:36:18] <ron> the project was dead. I call that an issue.
[14:36:32] <cheeser> not dead. it was just resting.
[14:36:39] <cheeser> morphia prefers keeping on its back.
[14:36:53] <remonvv> cheeser: It wasn't super active, I think only scott was maintaining it. And I don't like the API. And there were quite a few bugs at the time but I can't comment on that now.
[14:37:24] <cheeser> the API will likely shift quite a bit with the switch to the 3.x driver underneath
[14:37:40] <ron> dude, it was dead. dead, dead, dead.
[14:37:52] <remonvv> cheeser: Doubtful.
[14:38:06] <cheeser> doubtful?
[14:38:47] <remonvv> cheeser: 3.0 will change details, maybe. But the general approach of using builder patterns for query composition and whatnot will stay the same.
[14:38:52] <ron> remonvv: cheeser took over the development of Morphia as part of the 10gen team.
[14:39:15] <remonvv> ron: Don't think that changes that much ;)
[14:39:26] <cheeser> the 3.x driver does a lot more than it used so morphia will trim down a bit, i expect.
[14:39:27] <remonvv> It's unwieldy and verbose
[14:39:36] <ron> remonvv: just wanted you to know :)
[14:40:43] <remonvv> cheeser: Okay, but that's moving things around. The gap between JSON/shell syntax and Morphia/driver code is too big in my opinion, and much more verbose.
[14:41:16] <cheeser> you want your java code to look like the mongo shell?
[14:41:22] <ron> jongo!
[14:41:29] <cheeser> casbah does that in scala. seems ... an ugly way to do that.
[14:41:39] <Nodex> if all code or representation in the driver did it would be very helpful
[14:41:50] <cheeser> jongo is pretty limited
[14:41:50] <remonvv> cheeser: Wouldn't go that far but our developers pretty much universally wanted to get rid of Morphia in favor of a more..hm...JPA-ish style.
[14:42:04] <ron> Nodex: sssh... let the grownups talk.
[14:42:05] <cheeser> what does that mean?
[14:42:08] <Nodex> ron: die
[14:42:11] <cheeser> heh
[14:42:14] <ron> Nodex: right after you.
[14:42:25] <Nodex> I'm importal
[14:42:41] <ron> you're infantile.
[14:42:53] <Nodex> cool story
[14:43:49] <ron> I wish it were only a story.
[14:44:05] <Nodex> only in your head ;)
[14:45:03] <remonvv> cheeser: It's hard for me to explain in detail, not a native english speaker ;) Our code basically looks something like this : User u = find("{userId: ?1"}).one(); List<User> users = find("{someStatestate: ?state}"}).all()
[14:45:27] <ron> remonvv: that's your excuse? not a native english speaker? pfft.
[14:45:34] <cheeser> oh, right. so you can do a .set("state", someValue), yeah?
[14:45:44] <ron> Nodex *is* a native english speaker and he can't speak it properly.
[14:46:00] <Nodex> ron there is a difference between speaking and typing
[14:46:08] <ron> You don't say.
[14:46:13] <Nodex> I do
[14:46:49] <remonvv> cheeser: Well we have a few variations to set parameter values, but yeah
[14:47:22] <cheeser> right.
[14:47:49] <cheeser> so basically you want a mongo version of JPAQL
[14:48:38] <ron> when cheeser asks things that way, I'm never sure if he's asking sarcastically or not.
[14:48:39] <remonvv> cheeser: Well, JPAQL is a new query language spec, I rather think mongodb's query language should be mirrored as closely as possible in APIs
[14:48:42] <ron> or states rather
[14:49:34] <remonvv> cheeser: I've had this discussion with scott as well and I think it basically boils down to type safety and whatnot. Our stuff has to do runtime type checking and whatnot.
[14:49:39] <cheeser> remonvv: right. i've played with doing that in ophelia. it's a bit tricky but I've learned a way to help that out. it just means a server round trip in the easy solution.
[14:49:46] <cheeser> right.
[14:50:17] <cheeser> basically morphia is JPA's criteria API for mongo when you want something more like its QL
[14:51:19] <remonvv> cheeser: Right, we need maintainable/readable code and Morphia/driver code doesn't give you that in our experience.
[14:52:04] <remonvv> cheeser: It becomes cluttered, verbose and so forth. This may improve in 3.0/Morphia next gen but at that time we had to decide.
[14:52:05] <cheeser> i wrote critter to help with the maintainability but it's kinda going in the other direction.
[14:52:52] <_Heisenberg_> ron: I dont get it maybe you can help me, what does the two parameters of "repeat" mean in the http://www.json-generator.com/ thing?
[14:52:56] <remonvv> cheeser: And right now it's a 3year old layer that runs on production clusters that routinely host 100,000+ concurrent users so for us switching back to something else that we don't have to maintain is only an option if the functional gap is relatively limited and the robustness equal or better.
[14:53:04] <remonvv> cheeser: What's critter?
[14:53:29] <cheeser> https://github.com/evanchooly/critter
[14:54:06] <ron> _Heisenberg_: no idea, sorry.
[14:54:11] <cheeser> it's mildly experimental. i've used it in a couple of projects and it works well enough so far.
[14:54:50] <remonvv> cheeser: Another thing is that we had to add a ton of other features that were then (and sometimes still) not supported. For example, hash indexes and automatic hashing of shard key values, validation of cluster topology metadata versus our document POJOs (uses similar model to Morphia but with slightly different annotations and sometimes different behaviour).
[14:54:55] <remonvv> cheeser: Let me have a look.
[14:55:34] <remonvv> cheeser: Ah okay, codegen?
[14:55:42] <cheeser> i'm adding support for text indexing and queries and also aggregation in the next few weeks.
[14:55:47] <cheeser> yeah. codegen++
[14:55:54] <cheeser> i'm a sucker for that
[14:56:01] <remonvv> cheeser: Masochist.
[14:56:06] <cheeser> basically
[14:56:17] <remonvv> cheeser: If you can auto generate code it shouldn't be needed ;)
[14:56:25] <cheeser> hrm?
[14:56:46] <remonvv> cheeser: But yeah, we went the opposite route.
[14:57:47] <cheeser> i dislike the impossibility of validating queries
[14:58:03] <ron> ~interesting
[14:58:17] <cheeser> ~mock ron
[14:58:18] <cheeser> :D
[14:58:24] <ron> ;)
[14:58:39] <Nodex> _Heisenberg_ : the paramters mean to repeat it to an array
[14:58:46] <remonvv> cheeser: It's not impossible, but it's hard. And dislike it too. It just doesn't outweigh the benefits.
[15:00:34] <remonvv> cheeser: So do you work for 10gen now or are you a contributor to Morphia?
[15:00:56] <_Heisenberg_> Nodex: so why two parameters? o.O
[15:01:42] <Nodex> it's supposed to be min/max iirc but I don't think it works
[15:01:53] <cheeser> i work for 10gen, yeah.
[15:02:09] <cheeser> morphia, java driver, probably the hadoop connector.
[15:02:12] <Derick> oh hello then cheeser
[15:02:29] <remonvv> cheeser: Awesome. Good luck.
[15:02:41] <remonvv> cheeser: Especially with the driver code :)
[15:02:53] <cheeser> 3.x is looking nice from what I've seen of it.
[15:02:58] <ron> cheeser: Derick is the php guy. so it's kind of a dilemma. you have to be nice to him since he's a 10gen employee, but he does php.
[15:03:18] <Derick> ron: I really ought to ban you for this constant trolling :P
[15:03:19] <remonvv> cheeser: Okay, I'm only aware of the 2.x and that isn't nice.
[15:03:31] <ron> Derick: dude, you love me and you *know* it.
[15:03:36] <remonvv> Derick, put it up for a vote?
[15:03:38] <Derick> no, not really
[15:03:41] <ron> no, really.
[15:03:51] <cheeser> oh, you're in the UK, Derick ?
[15:03:58] <Derick> yes
[15:04:04] <cheeser> oh, that's too bad. :)
[15:04:11] <cheeser> hi, anyway. :)
[15:04:13] <Derick> it's pretty nice here
[15:04:33] <Nodex> we banned ron from entering the country to the UK is the only safe place left
[15:04:37] <cheeser> it's too bad you're not in the office to meet in person. i'd love to be in the UK for a while.
[15:05:09] <remonvv> So is Morphia going to be revamped or...evolved?
[15:05:14] <cheeser> yes
[15:05:15] <cheeser> :)
[15:05:48] <cheeser> i'm working the backlog of issues trying to get it cleaned up.
[15:05:56] <cheeser> there's ... a lot to do.
[15:06:03] <remonvv> I'm aware.
[15:06:36] <remonvv> And the driver is going to do mapping and whatnot?
[15:07:27] <cheeser> 3.x has encoders/decoders which do a lot of the work shuttling data that morphia's been doing. morphia will use/build on that functionality.
[15:08:46] <remonvv> Hm, suppose that's not a bad thing.
[15:09:25] <cheeser> i think it'll be good overall. i'm hoping much of morphia can be trimmed down.
[15:09:39] <Derick> cheeser: did you not see my PM? :-)
[15:10:21] <remonvv> Yeah as long as the driver itself doesn't get too feature-creepy. It should be a driver with a low level interface. People (or 10gen) can build seperately packaged magic on top of that.
[15:10:35] <cheeser> Derick: sorry. i had a whitelist running. i need to update my config to not load it by default anymore. try again. :)
[15:10:45] <cheeser> remonvv: i think that's the plan.
[15:11:20] <remonvv> cheeser: Cool.
[15:23:51] <flaper87> Hey guys, Can you help me figuring out why this query is not using _id's index for sorting documents? http://bpaste.net/show/OX41rlImcg2eHZYe6V4c/
[15:25:08] <Derick> flaper87: MongoDB can only use one index per query
[15:25:25] <Derick> and it uses "active" already for the match, and it can't use that for the _id sort
[15:25:50] <flaper87> Derick: yup, I know that but, the _id is part of 'active'
[15:25:56] <cheeser> i was gonna suggest dropping the sort and seeing if it uses the index or not.
[15:25:57] <flaper87> shouldn't it use it ?
[15:26:07] <Derick> flaper87: yes, but it's after a range query on k
[15:26:17] <Derick> cheeser: it already uses the "active" index
[15:26:48] <cheeser> that's true. just noticed that cursor
[15:34:35] <remonvv> At some point someone should write a good blog about composite indexes and sorting and such.
[15:35:00] <Nodex> get on it remonvv
[15:35:04] <remonvv> Someone not me.
[15:35:08] <remonvv> Because effort.
[15:35:18] <remonvv> ;)
[15:35:33] <flaper87> Derick: (facepalm) right, the range! Thanks for pointing that out!
[15:36:09] <Derick> remonvv: http://emptysqua.re/blog/optimizing-mongodb-compound-indexes/
[15:36:36] <remonvv> Derick: Actually never saw that, awesome.
[16:01:30] <ashley_w_> i'm trying to do a query where the timestamp is newer than another timestamp, but i get "$err" : "error on invocation of $where function:\nJS Error: ReferenceError: o is not defined nofile_b:0" http://pastebin.com/RaAf7XfT
[16:02:42] <Nodex> you cannot reference "this" unless you're map/reducing
[16:03:14] <Nodex> my mistake, it appears you can
[16:04:19] <Nodex> http://docs.mongodb.org/manual/reference/operator/where/ <-- look at the table on the bottom for supported commands to "this"
[16:10:11] <ashley_w_> i know this.foo works
[16:11:52] <Nodex> http://docs.mongodb.org/manual/reference/operator/where/ <-- look at the table on the bottom for supported commands to "this"
[16:13:38] <ashley_w_> yes
[16:14:03] <ashley_w_> this includes ObjectId(), which getTimestamp() is a part of
[17:40:47] <rogerthealien> hi, i have a question and was hoping someone here could help... I'm getting "TypeError: if no direction is specified, key_or_list must be an instance of list" when trying to call collection.ensure_index({'attributes.id':1}) through pymongo, and the pymongo documentation doesn't seem to be very helpful on this
[17:42:48] <idank> I have two collections: companies and employees. When choosing between embedding employees or putting it in a separate collection with a company reference, how does one deal with high read throughput (embedding is superior) and thousands of employees per company, surpassing the 16MB limit per document (GridFS sounds like overkill)?
[17:48:03] <DanWilson> idank: I'd say put a reference to the companyID on the employee
[17:49:29] <idank> DanWilson: but that kills read performance when reading companies and getting their employees
[17:51:14] <ashley_w_> you can set the max nssize
[17:51:32] <idank> ashley_w_: what's that?
[17:51:44] <ashley_w_> er, nevermind me.
[17:52:25] <DanWilson> still, there is a limit to the document size
[17:52:47] <DanWilson> are you worried about the disk seeking? or the total amount of data brought back?
[17:52:54] <bcows> can you embed company in employee ?
[17:53:55] <idank> DanWilson: worried about seeking. I've done the reference thing once and it killed performance
[17:54:08] <DanWilson> are you properly indexed for your access pattern?
[17:54:23] <idank> bcows: how would that help?
[17:54:30] <idank> DanWilson: yeah
[17:55:06] <idank> DanWilson: when I started embedding, iowait wasn't as bad and reads/sec jumped
[17:55:40] <bcows> if each employee only has 1 company then it would be a smaller document then embedding all company employees in a company document
[17:56:00] <Nodex> do the employees change often?
[17:56:05] <DanWilson> do your indexes fit in memory?
[17:56:23] <idank> bcows: yeah but iterating the employees grouped by company will still involve a massive amount of seeks
[17:56:54] <idank> DanWilson: the employees collection has 30 million docs with an int _id, companies is around 3 million
[17:57:02] <idank> Nodex: no
[17:57:03] <Nodex> idank : I think the main issue you're missing is embedd what you need for 1 read and store the rest in a separate collection
[17:57:41] <ashley_w_> who are you? ADP?
[17:57:44] <idank> Nodex: for each company I need all its employees, what would go in a separate collection?
[17:58:15] <DanWilson> so, a quick bit of math, the average employees per company is 10?
[17:58:16] <Nodex> ashley_w_ : are you talking to me?
[17:58:33] <DanWilson> is that pushing you up to the document limit?
[17:58:57] <idank> DanWilson: that's misleading, a few thousand have more than 2000
[17:59:05] <idank> others have <10
[17:59:14] <Nodex> idank : I am talking about for a quick view..... for example if you only need the name of each employee on a result of a company then drill down further then embedd each name and a reference to the _id of said employee in the other collection
[17:59:49] <idank> Nodex: I'm not keeping a lot per employee, just a few fields
[17:59:53] <idank> and I do need all of them
[18:00:25] <Nodex> so store them in thier own collection and embedd them in the employer.employee array also
[18:00:39] <Nodex> (if you need to access them in one read)
[18:01:10] <idank> so I'm still embedding the same data?
[18:01:20] <Nodex> yup
[18:01:33] <Nodex> if you dont need the data until someone click and drills down then there is no point
[18:01:34] <idank> companies with 2000 employees will still exceed the 16MB limit
[18:02:15] <Nodex> 2000 array members of "not a lot of data" is not 16mb
[18:02:39] <idank> maybe it's 4000, I can't remember
[18:02:53] <Nodex> still won't be 16mb but i understand the point
[18:03:40] <Nodex> if you're constrained then you have no choice but to reference, in which case you shoud embedd the _id of the employee and do an $in - which may or may not be faster
[18:04:09] <idank> that's what I used to do in a similar collection, it was terrible
[18:04:27] <idank> but there the documents referenced were bigger, that may have contributed
[18:04:46] <Nodex> you might have to chunk the query
[18:04:53] <ashley_w_> throw more hardware at it
[18:05:07] <Nodex> not really the answer ashe
[18:05:11] <Nodex> ashley_w_*
[18:05:20] <idank> ashley_w_: I'm already with 5 xlarge EC2 instances, RAID iniside.. 1500 IOPS..
[18:05:47] <idank> cost of mongo per month is getting close to my income :)
[18:06:06] <idank> Nodex: what do you mean chunk?
[18:06:11] <Nodex> split the array
[18:06:19] <idank> yeah, that's an option
[18:06:24] <idank> but feels like a hack
[18:06:31] <Nodex> how many employers are there?
[18:06:37] <idank> 3 million
[18:06:37] <Nodex> and how many employees in total
[18:06:41] <idank> 30 million
[18:06:50] <Nodex> and how long is the query taking?
[18:06:58] <Nodex> sorry, I came back half way thru
[18:07:22] <idank> I haven't tried it yet, trying to think upfront so I don't repeat past mistakes
[18:07:55] <Nodex> well I suggest you use replica sets for the read and reference the _id's if you;'re worried about size
[18:08:16] <Nodex> * replaica sets to give you read scalability
[18:08:42] <idank> I'm already sharded, not replicated yet though
[18:08:55] <Nodex> 3m docs is nothing, not sure how much power an xlarge instance is though
[18:09:06] <ashley_w_> hence my "throw more hardware at it" comment. vague, but i had no idea of current setup
[18:09:07] <peterjohnson> hey there, is this a good place to ask questions about mongo?
[18:09:18] <idank> it's not the 3m docs that was killing perf
[18:09:31] <idank> it was that for each doc in there, you had to seek 20 times
[18:09:57] <Nodex> idank : use $in
[18:10:06] <idank> I am
[18:10:07] <Nodex> dont foreach the employees array LOL
[18:10:13] <Nodex> that's not 20 seeks
[18:10:16] <Nodex> that's 1 :)
[18:10:29] <idank> why? each item in the array is a doc in a different collection
[18:10:35] <idank> they're not adjacent in the other collection
[18:10:47] <Nodex> it might be 20 disk seeks but not 20 queries
[18:10:54] <idank> sure, 1 query
[18:11:01] <idank> 20 seeks are more painful :)
[18:11:32] <peterjohnson> I'm trying to search a message database for any messages with body containing website links (starts with www. , http or contains .com) but for some reason when I try db.messages.find( { type : 0 , body : { $regex: '\.com', $options: 's' } } the period isn't included in the search, so any result with "com" comes back
[18:11:35] <Nodex> not a lot you can do about it tbh, you can't control when an employee is added and thus cannot control the seek
[18:12:07] <idank> well if they're embedded, there are no seeks w.r.t to employees
[18:12:21] <Nodex> your limit nulls that so it's not an option
[18:13:02] <idank> right, well splitting the array could work when embedding
[18:13:09] <idank> or compressing the employees array
[18:13:35] <ashley_w_> peterjohnson: try '\\.com'
[18:14:48] <Nodex> I didn't mean splitting when embedding and it still doens't get you round your 16mb problem
[18:15:23] <peterjohnson> thanks ashley
[18:15:36] <Nodex> how many req/s do you think will kill your app ? is it CRM and if so how large scale is it?
[18:15:53] <idank> why not? I'll insert each company multiple times and concat the employee arrays
[18:16:19] <Nodex> LOL and you thought the other way was a hack?
[18:16:26] <idank> :)
[18:16:48] <idank> my access pattern is read all data for processing
[18:17:46] <idank> I never need a certain company
[18:17:54] <idank> and never a single employee
[18:18:11] <idank> it's always process a company with all of its employees
[18:18:30] <Nodex> here is some specs... our CRM has roughly 25M docs with perhaps 50M associated to the docs - contacts/notes/documents (pdf's etc) and I can fire wel over 1000 req/s at it without it blinking
[18:18:50] <Nodex> * 50M docs associated
[18:19:02] <Nodex> I think you're probabl worrying to much
[18:19:06] <Nodex> probably*
[18:20:15] <idank> but I was in a similar position a month ago, only the two collections were for scraped websites
[18:20:30] <idank> one collection had a start url with an array of urls
[18:20:38] <idank> the other had a url and its html
[18:20:53] <idank> I used $in to get a website with all its html
[18:21:19] <idank> that thing had terrible performance, once I started embedding things worked much better
[18:24:22] <Nodex> in that state I wouldve made sure my pages were in order on the disc tbh
[18:24:34] <idank> how?
[18:24:40] <Nodex> i/e processed the scraping elsewhere and then inserted to mongo
[18:24:58] <Nodex> mongo adds a certain amount of padding per document incase it grows a little but not a lot
[18:25:33] <Nodex> I understand if you scrape again and find another page you missed that it will be at the end (fragmentation) but that's just the way it is
[18:26:06] <idank> you're talking about what will happen when embedding
[18:26:23] <Nodex> no, reerencing
[18:26:47] <idank> I don't follow
[18:26:49] <Nodex> scraping is a bad idea to embedd, you don't control the data or it's size so it's not predictable performance
[18:26:57] <Nodex> referencing *
[18:27:08] <idank> how do you make sure your docs are in order on disc?
[18:27:25] <Nodex> by inserting them in order
[18:27:44] <idank> oh, well that's hard
[18:27:48] <idank> everything's distributed
[18:28:07] <Nodex> being it all back to a queue and insert from the queue
[18:28:12] <idank> and yeah, if new pages are found it's an issue
[18:29:03] <idank> yeah size is an issue but it's minor for me atm
[18:29:12] <idank> only a few reach 16MB when compressed
[18:30:26] <Nodex> unfortnatley if even 1 does you can't embedd
[18:30:33] <Nodex> brb
[18:42:13] <bluefoxxx> Can MongoDB replicate through a one-way firewall?
[18:42:47] <cheeser> doubtful
[18:43:15] <bluefoxxx> Bah.
[18:43:31] <bluefoxxx> I have a node on a secure network that I would like to connect to a replica set as a non-voting, hidden member. Ideally it would connect to the MongoDB replica set in the DMZ and keep a running replica of all the data.
[18:43:49] <bluefoxxx> the idea being that this one is on our secure network, and is pulling data for backup.
[18:44:40] <bluefoxxx> we have Symantec back-up software, so the client has to connect to the server rather than a back-up agent supplying a pull service (it does, but they have to have two-way connection initiation--client must connect to the server, server must open connections to the client)
[18:45:09] <remonvv> \o
[18:45:28] <slowest> I saw that there's group by and count/sum functions for mongodb. Will this work properly if you have a large data set spread on several servers? I guess itwould be quite slow if it had to fetch from all servers? Even if it's properly indexed it would still have to talk to all servers?
[18:45:53] <cheeser> by spread, you mean sharded?
[18:46:16] <cheeser> because unless you're sharding, all of a collection can be found on each machine in the repl set
[18:46:36] <mfletcher> Hi, anyone here with experience in load balancing mongodb servers? Id like to use a load balancer to balance connections between a bunch of mongo db servers that would be a in replica set. Is there any side-effects to load balancing all the nodes in the replica set including the primary node?
[18:47:11] <cheeser> you could just set your read preference to allow for secondaries.
[18:47:22] <jblack> What is this? replication&sharding hour?
[18:47:25] <cheeser> but i don't know that a load balancer would really buy you anything.
[18:47:33] <remonvv> mfletcher: Load balancing doesn't get you anything
[18:47:37] <remonvv> yeah, what he said
[18:48:17] <remonvv> slowest: The AF is sharding compatible but it's..hm...not all pipelines scale
[18:48:48] <mfletcher> I was just thinking for the point of view if I have webs pointed at the mongodb servers, and one of them goes down, or out for maintenance, Id still like that web server to serve traffic
[18:48:52] <remonvv> (AF = Aggregation Framework)
[18:49:06] <cheeser> mfletcher: use mongos
[18:49:17] <remonvv> mfletcher: Use local mongos processes, use appropriate read preferences to balance reads
[18:49:28] <remonvv> Again, what he said.
[18:49:31] <remonvv> I need to type faster.
[18:49:33] <mfletcher> ok I'll go read that up. Thanks again!
[18:49:48] <slowest> cheeser: oh, I thought it was sharded automatically, or either way I'll have to shard it because of size.
[18:49:50] <cheeser> remonvv: i would say I should probably just get to work but i suppose this is technically work. :)
[18:49:54] <slowest> cheeser: so yes
[18:50:05] <cheeser> shards are not implicit, no.
[18:50:13] <remonvv> mfletcher: What language? Java has a different implementation of read preferences than all the other drivers
[18:50:24] <cheeser> does it?
[18:50:27] <remonvv> secondaryPreferred is currently "broken"
[18:50:31] <remonvv> cheeser: Yeah.
[18:50:45] <cheeser> is there an issue open on that?
[18:50:53] <cheeser> https://jira.mongodb.org/browse/JAVA-889 ?
[18:50:56] <remonvv> cheeser: Yeah, I opened one a while back
[18:51:58] <cheeser> have that number handy?
[18:52:00] <remonvv> cheeser: No not that one. It's a two-part. I think currently read secondary is conceptually broken in mongos due to a weird sticky connection rule and there's a special Java only implementation of that read preference that makes it different for that environment.
[18:52:03] <remonvv> cheeser: Looking
[18:52:17] <cheeser> ok. i should probably deal with 889 at some point.
[18:52:44] <mfletcher> We're using python with pymongo. Im not familiar with the implemenation, one of my devs tells me he just maintains a list of of the mongo servers in a config file, but I figured there might be a better way to do it, since we'd be scaling up and down the number of mongo servers in the farm
[18:53:18] <remonvv> cheeser: https://jira.mongodb.org/browse/SERVER-9788, the Java note is is Randolphs comment "Actually, the Java driver is a special case where the pinning behavior must be explicitly requested by the user."
[18:54:33] <remonvv> cheeser: I don't recognize 889. I've done tests with that setup and it works fine.
[18:54:50] <remonvv> cheeser: The issue there is most likely the "closest first" system in the current driver
[18:55:30] <cheeser> ok, thanks.
[18:56:38] <remonvv> cheeser: Not sure if they/you are redoing the io/network thing but another weird issue is that you can improve throughput by having multiple Mongo(Client) instances. Seems to be a locking issue somewhere.
[18:58:27] <remonvv> mfletcher: Is the setup actually a repset?
[19:00:06] <remonvv> mfletcher: Either way, the better way to do it is put mongos in front of it and let the config servers deal with cluster topology data. Your app should only have a configurable endpoint for mongos. If you want to connect directly to a repset you will need all mongod addresses.
[19:00:18] <remonvv> mfletcher: And for that reason it's not very common anymore afaik.
[19:11:29] <mfletcher> Thanks remonvv, I'll pass this on.
[19:12:27] <remonvv> mfletcher: yw, the only downside to mongos is that is relatively cpu hungry so it will affect net throughput of your app per box slightly if it's pressed
[19:12:34] <remonvv> but then that should never really happen.
[19:12:47] <remonvv> (downside of a LOCAL mongos)
[19:32:19] <Ontological> I'm having a hard time using simple regex on pymongo. Here is an example of my problem: http://pastie.org/private/1rzmozkankutqc7thbcjng
[19:34:37] <remonvv> Ontological: You need to use $regex to tell MongoDB it has to match a regex
[19:35:16] <remonvv> Ontological: Also, if at all possible try to reproduce your problem in the shell first
[19:36:17] <Ontological> Well, the shell does it correctly because the shell doesn't encapsulate ''. That's my problem, is that it's being treated as a string
[19:36:23] <Ontological> I'll just force $regex, thanks
[19:56:18] <konr`> So, dear folks, is there an utility to import data from an .sql file to a collection in mongo?
[19:57:24] <jblack> konr`: Not that I'm aware of, but you should be able to use a csv gem and mongo to do it easily enough.
[19:58:00] <jblack> It'll end up being maybe 20 lines of code, including the "db" setups
[19:58:22] <jblack> the actual select/insert will be about 3 lines of code.
[19:58:59] <jblack> http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV.html : CSV.foreach("path/to/file.csv") do |row| Mongodb insert row; end
[20:00:45] <pwelch> anyone seen this error when trying to connect to a replicaset with ruby driver: Mongo::ConnectionFailure: Failed to connect to any given member.
[20:02:16] <ranman> pwelch: all the members in the seed list are down?
[20:02:56] <pwelch> ranman: can you expand on that? If I use the MongoClient.new it connects to a single instance fine
[20:03:54] <ranman> pwelch: try with a MongoReplicaSetClient?
[20:04:05] <ranman> https://github.com/mongodb/mongo-ruby-driver/wiki/Replica-Sets
[20:08:40] <ranman> pwelch: fixed?
[20:09:03] <pwelch> ranman: reading over it. going to get a co-worker to get a second pair of eyes to see if it's PEBKAC
[20:09:21] <ranman> ok, :P, let me know
[20:36:24] <cheeser> i just pushed morphia 0.102 for those interested. https://groups.google.com/forum/#!topic/morphia/TNrA-y5RS8g
[20:38:43] <remonvv> cheeser: You just singlehandedly increased project activity by 13,931% ;)
[20:39:47] <cheeser> :)
[20:40:10] <cheeser> just fixed inline mapreduce support. about to start on aggregation.
[20:40:11] <caitp> okay, this time filtering thing is just not working :\
[20:40:36] <caitp> there has to be some way to do this
[20:40:56] <remonvv> caitp: Some way to do what?
[20:41:33] <remonvv> cheeser: Ugh. I still haven't figured out a good API for the AF in Java
[20:41:38] <caitp> for a document with a start date and end date, I need to be able to query for documents where the start date has a time which starts at a certain hour
[20:41:41] <caitp> but ignore the date components
[20:42:59] <remonvv> caitp: Ignoring $where which you shouldn't use; that's not really possible. The standard way of doing that sort of thing is to store the exact data you want to query on. In this case if you update the start date also write a field for the start data hour of the day.
[20:43:08] <remonvv> And query (and index) on that.
[20:43:41] <caitp> I am currently storing auxilliary fields which are UTCHours() * 3600 + UTCMinutes() * 60
[20:43:44] <caitp> but I mean
[20:43:46] <caitp> it doesn't really work
[20:43:52] <remonvv> How so?
[20:44:13] <remonvv> I'm assuming UTCHours() * 3600 is a typo btw
[20:44:14] <caitp> because if I need a query like "startTime >= 6:00pm", the UTC seconds is like 0
[20:44:49] <caitp> using local seconds is the same sort of problem, I don't get reasonable values
[20:44:54] <remonvv> startTime > 18 you mean
[20:45:04] <remonvv> If you just query on hours then store hours
[20:45:27] <caitp> it's not a query on hours, it's a query on time
[20:45:35] <caitp> so, hours and minutes, ignoring seconds and millis
[20:45:59] <ranman_away> cheeser I sit behind you
[20:46:17] <remonvv> ranman_away: creep mode engaged
[20:46:29] <caitp> the aux field thing doesn't really work, because I don't get reasonable values from dates
[20:46:35] <caitp> like, midnight should really have a value of 0
[20:46:39] <caitp> but it in fact doesn't
[20:46:45] <remonvv> caitp: According to what?
[20:46:56] <ron> remonvv: dude, it's late for you. why are you on irc? you have a life, not like me.
[20:46:59] <caitp> what do you mean by according to what?
[20:47:05] <remonvv> ron: Working on hobby projects
[20:47:12] <ron> eew
[20:47:13] <cheeser> ranman: :)
[20:47:16] <ron> dude, you're not a geek.
[20:47:27] <ranman> quick look
[20:47:30] <remonvv> caitp: What makes you unable to get 0 for midnight?
[20:47:45] <caitp> even for local dates, Date.getHours() does not return 0 at midnight
[20:47:51] <caitp> it returns 0 at 1am, for whatever reason
[20:48:11] <caitp> and then converting to UTC to try and get around the timezone issue screws it up a fair bit more
[20:48:11] <remonvv> caitp: Java?
[20:48:19] <caitp> javascript, yeah
[20:48:28] <remonvv> so, no
[20:48:53] <caitp> basically, converting the times to an integer does not meet the requirements
[20:49:07] <remonvv> And your times have to be UTC?
[20:49:13] <remonvv> Or local?
[20:49:34] <remonvv> I'm going to assume you're not understanding your timezones here ;)
[20:50:02] <caitp> uh
[20:50:14] <caitp> if I'm not understanding timezones, please explain
[20:51:47] <remonvv> caitp: You're doing everything in UTC no?
[20:52:13] <cheeser> (if not, you should be)
[20:52:23] <remonvv> yeah just making sure
[20:52:32] <remonvv> everything is UTC until you show a date to an end user
[20:52:52] <caitp> I'm not sure why this doesn't make sense to you, I think I'm communicating clearly :l
[20:53:24] <remonvv> caitp: I'm not disagreeing. But if you're getting 1am UTC out of a simple Date.getTime() manipulation for a UTC Date that's midnight something's going wrong no?
[20:53:46] <caitp> so saying that demonstrates that you don't understand anything I've said :[
[20:54:25] <remonvv> caitp: That's certainly one of the possibilities.
[20:54:47] <cheeser> so try again using different words :)
[20:55:37] <remonvv> caitp: secondsInDay = new Date("2013-01-01 00:00:00Z").getTime() % (24 * 60 * 60 * 1000) / 1000
[20:55:50] <caitp> if I say "var date = new Date(); console.log(date.getHours())" at 1am, it will return '0'
[20:55:55] <caitp> but if I do that at midnight, it will not
[20:56:00] <caitp> even though, according to the docs, it is supposed to
[20:56:12] <caitp> so, something is completely screwed up there
[20:56:16] <caitp> and I'm not sure why
[20:56:27] <caitp> but ignoring that, I can't really use integers for this anyways
[20:56:30] <caitp> because it just doesn't work
[20:57:21] <caitp> the strategy of storing times as an integer of seconds since midnight in the database, in UTC time, simply is not working
[20:57:27] <caitp> so there has to be a better way
[20:58:11] <caitp> because otherwise, I have to tell someone that a feature they want is not feasible
[20:58:32] <caitp> even though it really should not be a difficult problem to solve
[20:58:55] <remonvv> caitp: Well, if you scroll up it's sort of solved.
[20:59:08] <caitp> what's solved?
[20:59:09] <remonvv> caitp: Replace your broken getHours() code with a getTime() based version.
[20:59:22] <caitp> getTime doesn't do what is necessary?
[20:59:29] <caitp> getTime == seconds since epoch
[21:01:58] <caitp> what I need is literally just hours/minutes, I can't take the date into account. eg, "Show me the events that happen after midnight", not "Show me the events that happen after midnight of august 1st"
[21:02:21] <caitp> but of course, "after midnight" doesn't really give you anytthing meaningful, that would give you everything
[21:02:28] <caitp> so there's really no good way to do this
[21:02:31] <caitp> time is stupid
[21:02:32] <caitp> :[
[21:06:52] <remonvv> caitp: getTime() returns the number of ms since epoch for the given Date. If you strip the date and millisecond components you will have a seconds in day value.
[21:07:35] <caitp> this was the first strategy that I tried
[21:07:42] <caitp> aside from not dealing with the timezone issue, it doesn't work anyways
[21:11:32] <remonvv> caitp: Dude. There is no timezone issue. It's UTC. And it does work.
[21:12:01] <caitp> it in fact does not work
[21:12:26] <remonvv> Show your code that goes from a Date instance d to seconds-in-day
[21:13:02] <remonvv> Print current UTC date, then calc those seconds, convert them to hours and minute and show me that's a different value than current UTC hours:minutes
[21:13:36] <caitp> maybe you don't understand what I mean by "doesn't work"
[21:15:07] <remonvv> Possibly. Do share.
[21:15:16] <caitp> so, using a UTC timestamp isn't going to be helpful here, because UTC midnight is 0, and >= 0 is going to return everything, and <= 0 is going to return almost nothing -- Local 6:00 pm might be UTC midnight, for instance
[21:15:28] <caitp> it's kind of not helpful
[21:16:01] <caitp> unless I pad it with some arbitrary amount
[21:17:12] <caitp> which is problematic in other ways
[21:20:51] <remonvv> Right, so seperate the date component from the hours (or seconds, whatever) in your doc, both UTC. Do timezone conversion to UTC, you'll get one or two candidate matches (two if your range runs across two UTC days even if it's single TZ local day) and combine appside as needed.
[21:21:33] <remonvv> It just means you have to query for two documents sometimes if your range intersects UTC midnight. It's not exactly rocket science.
[21:22:09] <caitp> it's kinda rocket science
[21:23:32] <remonvv> Not really. Our stuff (amongst other things) tracks daily and hourly leaderboards for gaming services that do roughly that.
[21:23:34] <caitp> "You'll get one or two candidate matches (two if your range runs across two UTC days even if it's asingle TZ local day)" -- I don't follow your logic, one or two candidate matches for what?
[21:24:12] <remonvv> Was your original problem that you need to finds documents that match an certain "hours of the day" range?
[21:24:37] <remonvv> e.g. "Get all foos that were created between start time and end time of day T in timezone Z"
[21:24:43] <remonvv> time as in time of day
[21:25:38] <caitp> the service is an event management thing, so events have startdates/times and enddates/times -- the UI is supposed to enable you to search for dates/times starting at or after X or ending at or before X
[21:26:01] <remonvv> yes but X as in absolute date or X as in time of the day (so date component removed)
[21:26:07] <caitp> dates is easy, date+time is easy, time alone is significantly less easy
[21:26:25] <remonvv> right
[21:26:25] <remonvv> so
[21:28:02] <remonvv> Say you store your times in your event as {utcDateInDays:221323, utcTimeOfDayInSeconds:8731}
[21:28:26] <remonvv> That format would make it super easy for a UTC based range check but you have timezones
[21:28:46] <remonvv> so in cases where start.getUTCDay() != end.getUTCDay() you need to create two seperate queries
[21:29:28] <remonvv> One that grabs the events from your local TZ start to UTC midnight and one from UTC midnight to your TZ end (your TZ values converted to UTC)
[21:30:26] <remonvv> Just remove the TZ conversion from the problem and create code that allows range queries on UTC start to end where end - start < 24
[21:30:29] <remonvv> <=
[21:33:41] <remonvv> So say your local TZ range is from 15:00pm to 23:00pm that might convert to say; UTC 19:00pm to 03:00am (next day), then you'll grab all events that are 19:00-00:00 on utcDateInDays = X and all events 00:00-03:00 on utcDateInDays = X + 1
[21:34:25] <caitp> but
[21:34:26] <remonvv> or if you need all of them rather than for a specific day ignore utcDateInDays altogether
[21:34:29] <caitp> and here's the thing
[21:34:44] <caitp> this makes date a factor, when it can not be a factor
[21:35:19] <remonvv> So your UI shows all events between 01:00 and 02:00 for all days in the past or something?
[21:35:38] <remonvv> date is only a factor if you need a date range
[21:35:51] <remonvv> If not just remove the utcDateInDays condition.
[21:36:23] <caitp> filtering by _just_ time cannot use date as a factor (so I can't say, check >= utcTodayX AND <= utcTomorrowY)
[21:36:53] <remonvv> Okay, but then you don't.
[21:36:53] <remonvv> The above doesn't break if you don't filter on days
[21:36:54] <remonvv> It's just that if you do have to be date aware you need the +1 stuff
[21:36:54] <caitp> but it does break if utcTodayX == 0
[21:37:18] <caitp> >= 0 -- return everything, <= 0 -- return nothing
[21:37:21] <caitp> or almost nothing
[21:37:54] <remonvv> You never have >=0
[21:38:13] <remonvv> The range queries would be $gt on the left side and $lt on the right side
[21:38:22] <remonvv> inclusive and exclusive respectively
[21:38:33] <remonvv> >=0 unbounded is noop-ish
[21:39:05] <caitp> >= 64800 is not noop-ish
[21:39:14] <caitp> but it might look like >= 0 (or near 0) in UTC
[21:39:39] <remonvv> yes but it's not going to be unbounded is it.
[21:39:56] <caitp> well it's pretty unbounded if the UTC time is zero-ish :[
[21:40:48] <remonvv> it's never unbounded if you have a start and an end.
[21:41:10] <caitp> if you don't have an end time, you don't have an end time
[21:41:13] <remonvv> If time T in your local time converts to UTC = 0 then you're assured the end of the range is <24
[21:41:24] <caitp> you may or may not have an end time
[21:41:25] <caitp> or start time
[21:41:39] <remonvv> At what scenario would you not have an end time if your requirement is "of the day"
[21:41:51] <remonvv> if you say give me all events after 3pm your upper bound is 12am
[21:41:53] <caitp> Here's how it's supposed to work
[21:42:04] <caitp> "Show me events that happen after X:45 pm"
[21:42:09] <remonvv> Right.
[21:42:15] <caitp> or "Show me events that end before X:30 am"
[21:42:16] <remonvv> in a local timezone I assume.
[21:42:23] <caitp> yes, the user would see it in local time
[21:43:17] <caitp> so there may be an end time, or end time, or both, or n'either
[21:43:21] <caitp> er
[21:43:23] <caitp> start time or end time*
[21:43:58] <remonvv> Okay, so in that last case.
[21:44:43] <remonvv> Wait, X as in I want events that ended after the start of the hour and before the half hour mark?
[21:45:06] <caitp> X as in "pick a number between 0 and 23, doesn't matter"
[21:45:10] <remonvv> Oh right
[21:45:25] <remonvv> Okay, let's just make X = 2
[21:45:27] <remonvv> Easier to talk
[21:46:17] <remonvv> So "after 02:30" = "between 02:30 and 00:00"
[21:46:37] <remonvv> bounded
[21:46:49] <caitp> yes
[21:46:51] <remonvv> say GMT-2
[21:47:48] <remonvv> so in UTC "between 04:30 - 02:00(+1)" = "between 04:30 - 00:00" OR "between 00:00 - 02:00"
[21:48:47] <caitp> but doesn't that still sort of catch all?
[21:48:52] <remonvv> No.
[21:49:04] <caitp> I'm not sure I see how
[21:49:09] <remonvv> Your original range is 21.5 hours
[21:49:14] <remonvv> This combined range is also 21.5 hours
[21:49:22] <caitp> hm
[21:49:40] <remonvv> Your simple shifting the window and have the complication of having the UTC midnight as a range seperator
[21:49:50] <remonvv> That complication you have to catch in your query code.
[21:49:53] <remonvv> Hence the "two results"
[21:50:34] <remonvv> so if your UTC converted range is still within the same UTC day you can simple TZ convert, if it spans two UTC days you have to use an $or query to grab both ranges.
[21:51:11] <remonvv> If you need date ranging it's slightly more complicated but still doable.
[21:52:19] <remonvv> caitp : Test it in a spreadsheet or something if you can't wrap your head around it.
[21:52:30] <caitp> I think I get what you're saying
[21:52:40] <remonvv> I feel a "but" coming
[21:52:40] <caitp> maybe
[21:52:45] <caitp> close :p
[21:53:45] <remonvv> What do you think is the problem?
[21:54:09] <caitp> Just trying to wrap head around the queries I need to write, so
[21:54:55] <caitp> in a case where I am only specifying a start time, I need to say "show me events that start between <UTC startTime> and UTC midnight"
[21:55:12] <remonvv> well, only small footnote is that the right side of your ringe has to be *inclusive* on the highest possible value (23 if you do hours) rather than the 0 (00:00) i mentioned to clarify the approach
[21:55:34] <remonvv> right, both UTC start and UTC midnight in your time unit, hours I think
[21:55:41] <caitp> so like, between 14:45 and 23:59
[21:55:50] <remonvv> Right
[21:56:07] <remonvv> All events between 23:00 and 00:00 would fall in hour 23
[21:56:10] <remonvv> so that's accurate.
[21:56:31] <remonvv> Sounds like you actually wants ranges on minutes now but that's the same logic
[21:57:15] <remonvv> inclusive
[21:57:46] <remonvv> so 14:45 <= event.timeOfDayInMinutes <= 23:59
[21:58:20] <remonvv> 14:45 -> 14 * 60 + 45
[21:59:49] <remonvv> caitp: Also, minor sidetrack but the AF has some date functionality as well
[22:00:03] <caitp> the AF?
[22:00:09] <remonvv> aggregation framework
[22:00:12] <caitp> ohhhh
[22:00:12] <caitp> right
[22:00:21] <caitp> yeah, I looked into using that
[22:00:25] <remonvv> Don't think it's a good match here but you can check it out
[22:01:10] <caitp> yeah it didn't seem like it was going to make things simpler :( although this looks like it could turn into a potentially complicated bad query too
[22:01:22] <remonvv> Not really
[22:02:26] <remonvv> easy case is {end:{$gte:<S>, $lte:<E>}} and worst case is $or:[{end:{$gte:<S>, $lte:"23:59"}}, {end:{$gte:"00:00", $lte:<E>}}]
[22:02:45] <remonvv> S and E values adjusted accordingly
[22:03:14] <remonvv> I'd use a bit of query composition code to make that code readable.
[22:03:17] <remonvv> readible?
[22:03:18] <remonvv> readable
[22:03:30] <remonvv> English is a silly language :s
[22:03:50] <caitp> well eh
[22:03:59] <caitp> I'll give an example of my interpretation of what you're saying
[22:04:01] <caitp> {$and: [{$gt: {'startTime': startTime}}, {$lte: {'startTime': utcMidnightMinus1}}]},
[22:04:08] <remonvv> No
[22:04:14] <remonvv> AND is implied
[22:04:20] <remonvv> you don't need the seperated blocks here
[22:04:27] <remonvv> {$gt:x, $lt:y} is ok
[22:04:33] <caitp> hm, ok
[22:04:46] <caitp> thats a good point
[22:05:07] <Chammmmm> Hi Guys, I have getting this error message: balancer move failed: { ok: 0.0, errmsg: "" } ... I got this after adding a shard.. everything was going smoothly and now it cannot move chunks.. and I get an empty error message.. (running mongo 2.4.2) I tried to restart all the mongos.. and restart the config servers.. but nothing
[22:05:10] <Chammmmm> any ideas?
[22:05:25] <caitp> so {$gt: {'startTime': startTime}, $lte: {'startTime': utcMidnightMinus1}},
[22:05:26] <caitp> heh :p
[22:05:29] <remonvv> Without the closing
[22:05:37] <remonvv> and gt !+ gte
[22:05:39] <remonvv> !=
[22:05:48] <caitp> yeah, true
[22:05:57] <remonvv> format is {v:{$gte:<start>, $lte:<end>}}
[22:06:34] <remonvv> Not the other way around
[22:07:30] <remonvv> Your way I think executes but does nothing ;)
[22:07:44] <remonvv> v:{$gt instead of $gt:{v is what I'm trying to get to
[22:08:35] <remonvv> Don't ask me why mongo doesn't error out on the reverse
[22:09:10] <remonvv> caitp: I have to go and get some sleep. Hope you're on the right track.
[22:09:27] <caitp> constructing the query looks like this http://pastebin.mozilla.org/2739992 -- whether it will work or not, i'll have to see
[22:09:51] <remonvv> that wont work, see comments above
[22:09:58] <remonvv> you have your operator and your value flipped
[22:10:14] <caitp> ah okay
[22:10:38] <remonvv> And that code isn't right
[22:11:19] <remonvv> It's a bit more complicated than that.
[22:11:38] <remonvv> Remember that even with just an endTime you can have two seperate clauses in your $or
[22:12:11] <remonvv> What you do is this :
[22:13:17] <remonvv> actualStartTime = startTime == null ? 00:00 : startTime; actualEndTime = endTime == null ? 23:59 : endTime
[22:13:22] <remonvv> then convert both to UTC
[22:13:40] <remonvv> see if there's an UTC midnight in between, if no, single clause, if yes, $or the two ranges that result
[22:15:13] <remonvv> validation steps are actualStartTime < actualEndTime
[22:15:36] <remonvv> midnight cross check = actualStartTime.toUTC().getDay() != actualEndTime.toUTC().getDay()
[22:15:47] <remonvv> find language equiv of toUTC() and getDay() or make them
[22:16:56] <caitp> er, so with time there's no meaningful concept of day. I think it's just "if endTime < startTime -- assume midnight crossing"
[22:17:15] <remonvv> Ah, sorry, yep
[22:17:21] <remonvv> Assuming range is max 24h
[22:18:09] <remonvv> and then if(midnightCrossed) doOrStuffWeDiscussed else simpleRangeQuery
[22:18:20] <remonvv> That's about all I have time for.
[22:18:21] <remonvv> Good luck.
[22:18:28] <caitp> ciao, thanks
[22:18:36] <remonvv> No problem.
[22:18:50] <remonvv> It works, we use it./.
[22:19:19] <remonvv> (although at some point we converted leaderboards to have timezone metadata because we rarely had cross TZ leaderboards and such ;) )
[22:19:30] <remonvv> night
[22:19:33] <caitp> nite