PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 22nd of November, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:00] <retran> i would treat aggreation as a separate issue
[00:00:06] <retran> (counting is a type of aggreation)
[00:00:16] <retran> why is the number of comments important
[00:00:34] <pzuraq> Want to display it in search results
[00:00:37] <retran> its' not really important for pagination
[00:00:47] <retran> why is it important to display precise number of results
[00:00:53] <retran> in your use case?
[00:01:18] <pzuraq> hmm, actually this is going to get even more complicated...
[00:01:26] <retran> it probably isn't
[00:01:29] <pzuraq> it's just a UI thing, display the number of comments
[00:01:41] <retran> why is that important to tell the user exact number of comments
[00:01:44] <pzuraq> no, just realized I need to display the number of unread comments as well
[00:02:01] <pzuraq> which will be different per-user
[00:02:17] <retran> counting is expensive, you should justify the need
[00:02:22] <qsd> unread comments =~ notifications
[00:03:36] <pzuraq> qsd: They would see the number of unread for each issue... having a separate model may work
[00:03:39] <retran> i've removed displaying exact number of (fill in blank) in many UIs, and no complaints from users
[00:04:18] <retran> i've concluded that it's probably rarely important
[00:04:36] <pzuraq> retran: Good point, I could just display "new comments" or some such
[00:06:24] <retran> business reporting is one place that exact counts seem to be important, i guess. but that's kind of a different use case than our discussion
[00:06:52] <retran> but those can be done as batch processes, not making a user wait real-time
[00:07:29] <pzuraq> true. I was mostly modeling the app off github issues since it's similar to what I want to accomplish.
[00:07:56] <pzuraq> and they display a comment count
[00:08:05] <pzuraq> so that's where it came from I suppose
[00:08:17] <retran> i'm guessing it's not a real-time count, but aggregated in batch
[00:09:01] <pzuraq> so meaning you would have a comment_count property on the Issue model
[00:09:12] <arek_deepinit> is there a way to build shared library without building whole source :(
[00:09:12] <retran> .. yeah something like that
[00:09:35] <retran> it could even be accurate within last 5 min or something
[00:09:47] <pzuraq> it's instant, just checkeu
[00:09:49] <retran> just know counting can get expensive
[00:09:53] <pzuraq> checked*
[00:10:06] <retran> ok
[00:10:11] <retran> they probably aggregate it on the fly then
[00:10:16] <retran> like you say with a field
[00:10:26] <retran> a +1
[00:10:31] <retran> once a user adds a comment
[00:10:42] <retran> instead of counting each time how many comments fit X criteria
[00:10:54] <pzuraq> mm
[00:11:09] <retran> mongo has a +1
[00:11:52] <pzuraq> I still would like to show the number of unread
[00:12:03] <retran> you could keep that aggregated too, similarly
[00:12:19] <pzuraq> but that is different per user
[00:12:24] <pzuraq> hmm
[00:12:31] <pzuraq> ah I see how I could do that
[00:12:35] <pzuraq> cool :)
[00:13:02] <arek_deepinit> does anyone use shared client library?
[00:13:24] <pzuraq> ok last question, would it be faster to select a collection of records by there ids or by the value of a field?
[00:14:01] <retran> $inc: {unreadCount: 1}
[00:14:08] <retran> (in your update)
[00:14:13] <pzuraq> yeah
[00:14:32] <retran> depends if that value is indexed or not
[00:14:41] <retran> the id you know is going to always be indexed
[00:15:02] <retran> but if you run ensureIndex on the criteria of the value, should be similar speed
[00:15:37] <pzuraq> mmk
[00:23:02] <qsd> hm.. the java driver has a findandlodify method and a update one, weird
[00:30:49] <qsd> o well the 1st is atomic
[00:31:53] <qsd> guys, it's maybe stupid but is there a way to insert with a condition? couldn't find it in doc, or I simply do the find before
[00:33:03] <qsd> just count() actually
[01:19:21] <silne30> When I create a mongodb document, there is a default record that doesn't contain any of the fields I have in the collection. Can I drop it?
[01:25:27] <retran> there's no such thing, silne30
[01:25:37] <retran> no such thing as a "default record"
[01:49:16] <Fieldy> hello, i'm running 2.4.6. I changed my dbpath and now I get a permissions error. I've gone as far to chmod 777 every directory down the path including the one dbpath is, and I still get the error. the directory is owned by the user and in the group for mongo that my distro (fedora 19) set up. i'd post the log line but it's an offline system.
[01:49:37] <Fieldy> i've searched and found people with the same issue but haven't found a resolution. have you run into this?
[01:49:52] <harttho> Have you tried chown'ing it?
[01:50:55] <Fieldy> harttho: yes, it's owned by the user the distro set up for mongo, and in the group as well
[01:51:11] <Fieldy> i also disabled selinux and tried again to make sure that wasn't it, no change
[01:51:44] <Fieldy> the error specifically says the exact dbpath dir i gave, it doesn't have permissions to, but it does (including the directories dbpath is in)
[01:52:06] <Fieldy> confusing :)
[01:52:49] <harttho> Pathing using symlinks at all?
[01:54:06] <harttho> The dbpath must exist before you start mongod. If it does not exist, create the directory and the permissions so that mongod can read and write data to this path.
[01:59:30] <Fieldy> harttho: not a symlink. it existed before I started it, all I did was a mv from the original to the new, then updated the config dbpath variable
[02:00:36] <harttho> Fieldy: try mv'ing it out, start mongod. Stop mongod and mv it back
[02:01:39] <retran> i've moved around the mongodb data dir contents before
[02:01:46] <laphlaw> this might be more of a git question, but i'm trying to clone the project, and it seems like its 'stuck' at 'remote: Compressing objects'
[02:02:10] <retran> one use case is i used a powerful more powerful system to run a ton of indexes on a set of "fixed" data
[02:02:14] <Fieldy> harttho: okay, so move it out, don't create a (now empty) dir that it can access, and try to start it?
[02:02:32] <harttho> Create the dir so It can access an empty one
[02:02:36] <Fieldy> roger
[02:02:37] <retran> but it's epxensive to have powerful systems attached with great internet connectiivity
[02:02:50] <Fieldy> brb... very annoying hopping between this online and that offline system
[02:02:56] <retran> so i moved the data dir to a cheap VPS after the indexing was done
[02:06:20] <harttho> Fieldy: And just double checking, user permissions are good for the dir? chown mongodb /data/db
[02:11:36] <Fieldy> harttho: correct. it's owned by mongodb user, in the mongodb group, which is what it's to be using. dir perms are 770. I moved the old one out of the way, created a new one, set up the perms the same, and I still get the error.
[02:11:49] <Fieldy> it's not selinux, i disabled that to test. i also see nothing in dmesg.
[02:12:26] <harttho> Fieldy: Hmm, if it didn't work on a fresh dir not sure beyond that
[02:13:04] <harttho> Fieldy: Other than su mongodb and try to make a file in the dir from home dir or something
[02:20:32] <Fieldy> harttho: yeah I don't either. i found other people having the same issue with no clear path to resolution. i've been at it hours now, i have about 15 years of admin experience and this has me stumped.... thanks for your time
[02:20:51] <harttho> Fieldy: Np, good luck
[02:21:23] <Fieldy> work time too... folks getting impatient. might have to move on as annoyed as that makes me :/
[02:43:08] <Fieldy> harttho: got it figured out, though i'm not entirely sure why. i removed mongo entirely using the package manager, removed the dbpath dir, re-installed, and changed dbpath before starting it the first time...
[02:43:23] <Fieldy> must have been some kind of first-time init thing, is my guess
[02:45:04] <retran> seems liek you're unfamiliar with how your system's package management works
[04:04:08] <Fieldy> i've read a lot of documentation today. it's good stuff. however I also feel a little buried. how can I begin learning about how to have mongo look at gzip compressed text log files, with a predictable field order and type, for use in queries which will then output the name of the file to hand off to another app for further in-depth querying?
[04:04:25] <retran> huhhhhhhhhh
[04:04:47] <retran> mongo doesn't natively look at text files
[04:05:00] <retran> you'll need to design an application to do that
[04:05:56] <retran> i'm curious, how did you come to the idea that Mongo will look at your "gzip compressed text log files"
[04:12:35] <Fieldy> retran: i didn't really, then it was poorly worded. i'm not a coder, or script writer, or anything like that, is there something canned out there i can start with and try to figure out for my own purposes?
[04:20:06] <retran> oh
[04:20:16] <retran> Mongo is not like Filemaker Pro or Access
[04:20:35] <retran> so no, nothing you can do unless you're interested in application development / programming
[04:20:45] <retran> or hiring someone to do so
[07:37:11] <proximity> hey guys
[07:38:01] <proximity> is there a possible way to pass a collection to find. For example i have documents for cars and i pass a collection with models of cars and i want mongo to find me all cars matching models in the collection
[07:42:54] <proximity> anyone
[10:04:48] <bui> Hi ! I'm pretty new to mongodb and thinking about it for a project. I will very often need to be able to know how much % of the total documents in collection are returned by my find(). Sometime according to criterias (ie. shares a common property / value). Is there a more elegant way than having several find() with counts() to get statistics ? thanks :)
[10:06:22] <Nodex> issuing counts() is quite costly
[10:06:51] <Nodex> +1
[10:06:54] <Nodex> hahahaha
[10:06:59] <Zelest> :D
[12:01:29] <Repox> Hi. Could anyone please share some ideas how I can solve the issue with my level of interests on this question? http://stackoverflow.com/questions/19989197/dynamic-fields-and-slow-queries - Any input would be gladly appreciated.
[12:03:54] <kali> Repox: http://blog.mongodb.org/post/59757486344/faceted-search-with-mongodb
[12:11:18] <Repox> kali: Thank you - I will take a look :)
[12:17:24] <alen> dear masters i have a simple question
[12:17:46] <alen> i have a 3 node mongo cluster with mongos and confserver
[12:18:06] <alen> i want to take one node offline to do HW maintenance
[12:18:18] <alen> what is the best way to close a node?
[12:19:23] <dongodb> Hey guys, trying to figure this one out https://gist.github.com/anonymous/a03f332bedcd22fb4078
[12:29:12] <alen> <alen> i have a 3 node mongo cluster with mongos and confserver [14:16] <alen> i want to take one node offline to do HW maintenance [14:16] <alen> what is the best way to close a node?
[13:43:58] <tl> Hi! Can anybody here help with Node.js streaming from MongoDB to Hadoop?
[13:45:24] <ron> possibly, if you ask an actual question.
[13:46:50] <tl> I asked on the MongoDB google groups forum but git no response. https://groups.google.com/forum/#!topic/mongodb-user/bAg-drbEnHw I would be glad if somebody could look at my command - if I did somethong wrong
[13:47:07] <tl> git = got
[13:47:54] <ron> I'd try bumping the question.. if that doesn't help, SO to the rescue.
[13:48:44] <tl> "bumping" meaning post a reply myself so it get's on top again?
[13:49:35] <ron> yeah, but do it nicely. Something like - Sorry to bump the question, but there was no replied for a few days. Any suggestions would be appreciated.
[13:49:40] <ron> Something along those lines.
[13:50:13] <tl> Okay :) Thanks!
[13:55:16] <scrdhrt> How do I show the last queryies by time?
[14:04:35] <tl> My question about mapReduce streaming through Node.js was so unspecific because I actually wonder if anybody is using it at all. There are barely no questions let alone answers about it in the forum or on stack overflow.
[14:13:52] <Nodex> best to just ask, if someone can help they will
[14:23:01] <tl> It's just that I'm starting to wonder if there's something I should know :)
[14:24:04] <scrdhrt> As a newbie my self, I've found that as soon as you stray from the basic stuff, it's really hard to get an answer
[14:24:39] <scrdhrt> Dunno if it's because people dont know, or if they don't want to share the answer
[14:25:07] <Nodex> if you asked a question that made sense perhaps you would get an answer
[14:25:19] <Nodex> [13:53:52] <scrdhrt> How do I show the last queryies by time? <------------- Doesn't make sense
[14:26:08] <tl> harder questions are of course harder to answer too
[14:26:16] <scrdhrt> Really?
[14:26:25] <tl> nodex: you think it doesn't make sense?
[14:26:47] <Nodex> in what possible reality would that question make sense without ANY context ?
[14:27:12] <Nodex> it's exactly the same as me asking "what time was it yesterday"
[14:27:51] <scrdhrt> Yeah sure, your right about that. But how hard is it to ask for a clarification? No diss, I'm just reflecting on the subject
[14:28:24] <Nodex> why should we ask you to clarify, you're the one who needs the help
[14:28:39] <scrdhrt> ofc questions should be clear, but even then I get answers like "look it up your self"
[14:28:43] <Nodex> it's pretty implied that you should explain in as much detail as possible a question to get the help you need
[14:29:05] <scrdhrt> Yes I agree, that question was subpar
[14:29:16] <Nodex> ok so can you explain it in a better way ?
[14:30:08] <tl> well I didn't realize that my question is unclear. writing too much can lead to tl;dr
[14:30:53] <scrdhrt> How can I look up the lastest queries made to the db, using system.profile.find(), sorting by time showing only the latest ones, like tail /some/file
[14:31:08] <Nodex> read queries or write operations?
[14:31:16] <Nodex> t1 : I was not talking to you lol
[14:31:16] <scrdhrt> Both, if possible
[14:31:24] <Nodex> you haven't even asked a question, you have asked to ask one
[14:32:04] <Nodex> scrdhrt : I don't think that information is available, it will certainly be in the journal so you would need to query that
[14:32:16] <Nodex> *available in a collection
[14:32:43] <scrdhrt> Allright, thanks
[14:33:12] <Nodex> you can easily get all inserts to the DB by sorting on _id
[14:33:22] <tl> but I still don't see what the problem with my question is: I'm posting my command and ask if somebody can spot a problem. I post the error which basically says "permission denied" and remark that the permissions AFAICT are alright.
[14:33:49] <Nodex> for me I just append a "_lu" (last updated) field to each doc so I can see what was inserted/updated
[14:34:03] <Nodex> t1, I don't see the question sorry
[14:34:07] <tl> what's wrong abot that question? The project is on GitHub but I doubt that soembody would want to read through it
[14:34:45] <scrdhrt> Nodex: I'll look into that
[14:35:14] <tl> Nodex: the question is "why do I get an error?" and what could I possibly do to find the real issue (because the error seems misleading to me)
[14:36:30] <[AD]Turbo> when setting mongod to user syslog, is it possible to specify the syslog facility (in order to redirect logging to a specific file) ?
[14:36:50] <[AD]Turbo> *to use*
[14:40:02] <Nodex> t1 : I still don't see your question
[14:41:11] <tl> Nodex: then I'm afraid I can't help you.
[14:41:28] <Nodex> I don't need your help though
[14:41:32] <Nodex> now I'm really confused
[14:41:41] <Nodex> is that your question on the google group?
[14:41:58] <tl> Yep
[14:42:37] <Nodex> I don't know a lot about java but I would say tthat you should make sure the owner of the file is probably the owner of the java process
[14:54:45] <tl> Nodex: Thanks! I set the scripts to the (quite unreasonable) -rwxrwxrwx@ 1 777 (everybody can do anything with tem) and now I don't get any more permission issues. I do get more errors but I'll examine them now
[15:08:46] <singh_abhinav> how good is to store payment or revenue info in mongodb .. I have one use case .. I am storing revenue which each user should get in mongodb ... sample schema is like this http://paste.ubuntu.com/6458942/ ... There will be cron job which will be running say every day and it will transfer amount specified in "currentrevenue" field via paypal and set currentrevenue to 0 and add a record in payment history ... can there be any problem if I use mo
[15:10:22] <Nodex> mongodb doesn't have rollbacks
[15:13:39] <singh_abhinav> Nodex: yeah I understand that .. but in this use case I will not need rollback .. i will update mongodb document (to decrease currentrevenue) of user x only if paypal response is that money is transferred to user x
[15:17:58] <singh_abhinav> Nodex: what do you suggest
[15:25:13] <Nodex> then there is no problem
[16:10:47] <richthegeek> hey, does anyone have a quick idea of what happens if I do a $set and then an $inc (or any two atomic modifiers) on the same field?
[16:11:54] <richthegeek> ah, "Field name duplication not allowed with modifiers"
[16:11:56] <richthegeek> nice and easy
[17:05:45] <revoohc> Does anyone know how count works? Does it just use the index or does it actually scan the table? (i.e. db.coll.find().count() )
[17:31:00] <jmar777> revoohc: it used to scan. I believe this was resolved in 2.3.2
[17:31:14] <jmar777> revoohc: (for simple counts without match expressions, that is)
[17:31:51] <papajuans> is there a node-mongodb irc channel?
[17:32:21] <jmar777> revoohc: looks like it can be optimized as well if you match against an indexed field. see https://jira.mongodb.org/browse/SERVER-1752
[17:33:23] <jmar777> papajuans: don't think so. a lot of mongodb uses in #node.js, though (and you may have some luck in #mongoosejs, which uses node-mongodb as the driver)
[17:33:30] <jmar777> sp/uses/users
[17:55:17] <paulfryz_> there seems to be a something off with the latest node-mongodb-native release (V1.3.20)
[17:58:14] <paulfryzel> as supposed patch release (1.3.19 -> 1.3.20) it's breaking packages that depend on it (mongojs) and the release SHA itself seems to point to an orphaned commit (maybe the 1.4 branch was force-pushed over)
[18:46:39] <harttho> Fieldy: Good to hear it worked. I've moved the dbpath after installation so it sounds like you may have found a small bug or something
[19:03:00] <jmar777> paulfryzel: sorry, had to run off earlier. did you ever get the node-mongodb-native thing sorted?
[19:07:30] <disorder> hi
[19:07:36] <disorder> right now I'm using this query
[19:07:36] <disorder> find({$query: {'expiry_date': {$gt: new Date()}},$orderby:{'create_date':-1},$maxScan: 10})
[19:08:06] <disorder> however, it doesn't work as I expect
[19:08:44] <disorder> since maxscan is computed for the unfiltered elements
[19:08:51] <disorder> so that i.e. I get only 1 row
[19:08:58] <disorder> while I expect to get 10 rows
[19:11:43] <jmar777> disorder: you're ordering by create_date, but filtering on expiry_date. Do you really want to use limit instead?
[19:12:28] <jmar777> disorder: you're telling the query engine to only look at a max of 10 documents, which in theory could result in 0-10 matches before the scan terminates
[19:12:54] <paulfryzel> jmar777: nah, nothing so far. i just put in an issue on the mongojs project though: https://github.com/mafintosh/mongojs/issues/105
[19:12:59] <disorder> yeah, I want alwys 10 documents
[19:13:29] <paulfryzel> for now, we're just shrinkwrapping (locking to V1.3.19) any packages that use these modules
[19:14:59] <jmar777> disorder: try: `.find({ expiry_date: { $gt: new Date() } }).sort({ create_date: -1 }).limit(10)
[19:15:18] <disorder> mmh ok
[19:15:50] <jmar777> disorder: (i'm assuming you're running in the mongo shell, there)
[19:16:08] <disorder> yeah I didn't used this before, because I was concerned about perormance
[19:16:24] <disorder> i.e. I didn't want that mongo searches all documents and then limit
[19:16:44] <disorder> but it is necessary
[19:17:44] <jmar777> disorder: limit() will be applied within the query engine itself, so it won't return all matching documents first
[19:17:57] <disorder> ok
[19:18:41] <disorder> btw, for large documents this should be quite inefficient right? the sort forces the db to compute all the documents to sort them
[19:18:46] <disorder> is it correct?
[19:18:51] <disorder> large dbs*
[19:21:07] <disorder> maybe I need to split the collection and maintain only newer documents (which satisfy the filter)
[19:44:37] <Goopyo> how are ints stored in mongodb? Would 100 occupy the same space as say a timestamp: 1356998400?
[19:47:45] <Fieldy> harttho: thanks. it could be a distribution packaging issue too, my guess is it was trying to be helpful at first start, but unknowingly caused issues. though I still don't understand why, but it works now. thanks for your time
[19:51:35] <algernon> Goopyo: yes, they would. unless you store the timestamp on 64 bits.
[19:52:37] <Goopyo> Interesting
[20:31:24] <jmar777> disorder: hey, sorry. i'm in and out.
[20:31:32] <jmar777> disorder: if you don't have any indexes, then yes - that's a pretty expensive query
[20:32:02] <jmar777> disorder: since you want to filter by expiry_date and sort by create_date, you'd want the following index to make it efficient:
[20:32:30] <jmar777> .ensureIndex( { expiry_date: 1, create_date: -1 } )
[20:33:26] <jmar777> disorder: the most important part there is that in the compound index, you specify the field you want to sort by *last*
[20:33:55] <jmar777> disorder: and, of course, that you indicate the correct sort order (-1 in this case)
[20:34:20] <jmar777> disorder: with that index in place, that query should be very fast
[20:42:07] <x0nic_> can anyone tell me why my query does not work as I expect?
[20:49:24] <x0nic_> The sort seems to have no effect. http://pastie.org/8501955
[20:50:43] <Goopyo> ry query.sort(date, 1)
[20:50:44] <Goopyo> try*
[20:50:58] <Goopyo> .sort({'date' : 1})
[20:51:38] <Goopyo> x0nic_ thats a terrible way to store that kind of data
[20:51:45] <Goopyo> also*
[20:52:13] <x0nic_> Goopyo I am up for suggestins
[20:52:55] <Goopyo> your data model or sorting?
[20:53:22] <x0nic_> the data model
[20:53:54] <x0nic_> i just trying your suggestion on the sort right now
[20:53:56] <Goopyo> is it alot of data points? I've only dealt with that kind of data at higher frequencies
[20:54:31] <x0nic_> yeah i think that collection has around 1 million recordds
[20:55:16] <x0nic_> i started with the history embedded, but my server fell over after a few stocks where added
[20:55:22] <Goopyo> well honestly that kind of data isn't ideal for monogo anyway but for one I'd trim all the keys to o,h,l,c,d,t
[20:56:24] <Goopyo> you're duplicating those keys 1 million times in the db memory
[20:56:37] <x0nic_> yeah good point
[20:56:40] <Goopyo> Also if you're going to document store these go with ranges
[20:57:18] <Goopyo> finally skip datetimes and go with timestamps
[20:57:37] <Goopyo> most programming languages convert them really quick so the data save will be worth it
[20:57:48] <x0nic_> so a range would be for the say all of one day
[20:57:56] <Goopyo> well go with month
[20:58:01] <Goopyo> a day might be too small
[20:58:28] <x0nic_> oh i got ya
[20:58:37] <Goopyo> something like {'t' : appl , start_dt, end_dt, timestamps : [], o : [], h : [], l :[] … }
[20:59:04] <Goopyo> and then every write do a push query to all the keys so $push : { o : 123, h: 1262, timestamps : now … }
[20:59:57] <x0nic_> hrm, never even thought of that. thanks goopyo
[21:00:04] <Goopyo> sure thing
[21:03:43] <x0nic_> goopyo, so the .sort() has the same problem
[21:23:15] <joeroot> hey guys, sorry to be annoying, but does anyone know how to batch inserts in casbah
[21:23:42] <joeroot> im using the .insert method with a List
[21:24:08] <joeroot> however couldn't find it doc'd anywhere that this handles batch
[21:32:30] <drjeats> Goopyo: what do you mean by timestamps? ISODates are stored as 64bit int millisecond timestamps
[21:32:50] <ruffyen> im trying to figure out about the storage of the data
[21:33:08] <ruffyen> how is one large doc better than 10 small ones
[21:33:17] <ruffyen> i have tried both in the past
[21:33:23] <ruffyen> storing fantasy football data
[21:33:32] <ruffyen> where every week as a new entry in the doc
[21:33:43] <ruffyen> and the data was very difficult to work with
[21:35:37] <ruffyen> also you run into an issue where the document size can only be 16MB
[21:38:24] <x0nic_> so I narrowed down my problem to an index. If i have an index on a field the sort is broken
[21:39:19] <x0nic_> am i doing something wrong? the docs say that mongo can still sort no matter the direction of the index
[21:49:32] <joannac> x0nic_: going to need more information here. you've changed your schema since the last pastie?
[22:17:02] <x0nic_> @joannac no i havent changed anything. But i do have an index
[22:17:13] <x0nic_> always did, sorry
[22:33:54] <Goopyo> django: are you sure?
[22:34:25] <Goopyo> what I got: http://pastebin.com/h4tCM38E
[22:34:40] <Goopyo> a year worth of minute timestamps vs datetimes
[22:45:06] <drjeats> Goopyo: well I can't argue with that, now im off to go read the spec/docs/impl and understand why
[22:46:05] <Goopyo> oops pinged the wrong person. Let me know when you do, Im going to make sure these are the same length
[22:51:41] <Fieldy> can I get some pointers for apps that will use mongodb where my use case is several TB of gzip compressed proxy log files? I'd like to stick a few key fields in mobgodb, including the filename it came from, so we can search for terms in fields and then know what filename to be searched. versus searching all of them as we do now.
[22:52:57] <drjeats> Goopyo: this is what I got: http://pastebin.com/AepjTtFp
[22:53:28] <joannac> x0nic_: works for me (the sorting)
[22:53:46] <joannac> x0nic_: which index do you have?
[22:54:39] <Goopyo> drjeats: your ts is a float so thats probably why
[22:54:55] <drjeats> Goopyo: what's yours?
[22:55:08] <Goopyo> int
[22:55:11] <drjeats> in32?
[22:55:32] <Goopyo> I believe so
[22:55:54] <drjeats> well then it's sort of a matter of what sort of precision you're looking for
[22:56:54] <Goopyo> In [16]: len(BSON.encode({'ts': 1356998400})) > 13
[22:57:09] <Goopyo> well not really you can do microseconds in int too
[22:57:33] <Goopyo> oh nevermind youre right
[22:58:39] <Goopyo> one digit larger its 17. So I guess for second granularity timestamp is a more compact store
[22:59:22] <drjeats> yeah. at least until 2038 haha
[22:59:49] <Goopyo> future proofing!
[23:00:16] <drjeats> woo!
[23:25:43] <Goopyo> drjeats: wait wouldnt __sizeof__ be more accurate then len?
[23:26:38] <drjeats> Goopyo: isn't __sizeof__ the size of the python object in memory, not the number of bytes that get sent over the wire?
[23:28:08] <Goopyo> oh so its the size of the string object not the contents
[23:30:46] <drjeats> yeah, it's also a BSON object (which does inherit from bytes/str) so there's probably other metadata stuffed in it