[01:39:19] <veesahni> Using ruby/mongoid, anybody know how to get bson serialized representation? I'm trying to stash an object to disk if i have mongo connectivity failure (to unserialize and save! in the future)
[01:54:29] <fbjork> is it possible to search arrays of arrays in mongodb?
[01:55:54] <ralphholzmann> If I want to upgrade my database from 1.8, can I simply mongodump, install the new version, and restore?
[01:56:00] <ralphholzmann> will mongodumps for 1.8 work in 2.2?
[03:18:15] <michaeltwofish> Pleasure. I'm happy and shocked that I'm able to answer a question here :)
[03:20:14] <michaeltwofish> I have a question that isn't mongo specific ... I'm trying to raise ulimit -n on a CentOS box, as mongo is refusing connections, but it's not sticking.
[03:20:34] <michaeltwofish> Is anyone willing and able to help me troubleshoot that?
[03:22:13] <svm_invictvs> This may seem like a dumb quesitons, but what functions are availble when I'm running a mapreduce job?
[03:22:21] <svm_invictvs> Am I able to fetch objects from the collection?
[03:53:23] <cmatta> Does anyone know if there's a way to just get the sec value of a PHP MongoDate from inside an aggregate function? So far all I've been able to get it to output is MongoDate objects
[03:53:42] <cmatta> this is using the PHP mongo module
[07:13:29] <saby> wereHamster this query executes fine when run in the shell, but when i create the same query using the java driver, the db.eval part gets enclosed in double quotes, hence making it a string and instead of executing it
[08:43:34] <pgcd> failing that, I have a (possibly quick) question: if I have an embedded document, is there any way of retrieving the container document without a find()?
[09:05:01] <saby> I have a doubt regarding a query in the aggregation framework
[09:05:32] <NodeX> pgcd: I imagine it's something you would have to re-create as in a new model for this specific type of query
[09:05:35] <saby> i'm making a aggregation framework query, which takes values from each row and performs calculation on them and projects that value of each row
[09:05:49] <NodeX> I detest MVC coding in any language so I could be totaly off the mark
[09:06:02] <saby> is it possible, that i can have a check condition, so that if the value of a row is this, it multiplies the value with some x
[09:06:14] <NodeX> normaly some python users are around pretty soon, perhaps hang around till then?
[09:07:00] <pgcd> NodeX: thanks, I will hang around - i'm sure there's lots to be learned =) (and I'll have to really understand how/when to use map/reduce)
[09:17:01] <saby> the first $multiply is multiplying the result of all the calculations being performed with the return of the if condition
[09:17:20] <saby> so if $priority=6 multiply the score with 6 else multiply the score with 1
[09:17:27] <saby> kind of like boosting my results
[09:18:18] <ZacS1234> anyone know what you need to do to get the mongodb node.js driver/mongoose to auto reconnect?
[09:18:41] <NodeX> saby, not sure it can be done then
[09:18:51] <NodeX> I didn't read your question properly sorry
[09:20:20] <saby> NodeX so basically in short i have a variable called $score, if the $priority=6 then multiply $score with 6 else multiply $score with 1
[09:20:34] <saby> hmmm, there might be someway to do it
[09:21:04] <NodeX> not with aggregation framework unless you multiply by a preset number
[09:33:40] <ron> NodeX: how big is your dataset? I know it's a complex answer, but I'd appreciate a general idea. I'm trying to grasp the performance mongodb gives with it as its map/reduce is less than optimal.
[09:34:38] <NodeX> I aggregate around 5M docs a day
[09:35:12] <NodeX> I do roughly 150k at a time to keep it fast
[09:35:20] <ron> what's their average size? do you do it in batch or online?
[09:35:41] <NodeX> online and they vary in size, let me past a sample doc
[09:37:40] <ron> my boss wants to compare that solution and amazon's m/r but we're short on time and I think going with mongo for the initial stage should be more than enough.
[09:38:06] <NodeX> single machine 16gb ram, quad core xeon, live sites on the box too
[09:38:16] <ron> okay, that looks fair. I imagine our docs are of a similar size.
[09:38:22] <NodeX> fast = about 3 seconds for 150k rows mebbe
[09:40:22] <NodeX> basically in that 3 or so seconds I get everything I need to know about my app - i/e every single thing that's been clicked or visited and also counts (totals) for them
[09:40:33] <NodeX> it's very differnet but similar concept
[09:42:20] <NodeX> well in terms of performance I dont worry because I aggregate every ten mins or so and only aggregate today's data
[09:42:59] <NodeX> when I finish a day's aggregation I export $gt : 3 days data to json and store it in Amazon arctic
[09:43:12] <jordiOS> hello! I have document where I store the content (per language and key as) $model->content[$language]['title']. I am looking for using ensureIndex on title for each language but I can seem to find how. Any ideas or hints?? Thanks!
[09:43:20] <NodeX> so my collections stay smallish and that keeps performance nice and managable
[09:43:48] <NodeX> jordiOS : you'll have to loop it
[09:51:59] <NodeX> if I then need to get results over time I can aggregate (or loop) the daily counters
[09:52:06] <ron> and I imagine that using Hadoop it would be the same.. aggregate on going data and not by demand.
[09:52:59] <ron> you just need to make sure that the data resolution is fine enough. I imagine that yours is daily since you don't need 'by hour' aggregations.
[09:53:29] <ron> do you need to aggregate several times for different views? for example, per day, per user, per url?
[10:01:29] <NodeX> I traded off realtime for efficiency
[10:01:47] <saby> NodeX can we have collection join in mongo ?
[10:01:53] <NodeX> it's near enough realtime for my app, other apps I would do it once a minute
[10:01:58] <ron> of course. I'm not even looking for a long run solution. if I go with your method, I think it could hold us for at least a few months efficiently.
[10:03:44] <saby> NodeX in the previous scenario, where I was trying to use If conditions, instead of using that, I would be inserting another field like priorityMultiplier and the value would be pointing to an id number and the value of that id number would be present in another collection
[10:03:46] <martinrue> kali: does this mean the mongodb-native driver in JS is likely adjusting my local time dates to UTC before storing them?
[10:05:26] <saby> so for that computation, I cannot fetch the value from another collection ?
[10:06:17] <saby> i would be storing the id number of the other collections document in $priorityMultiplier
[10:09:11] <NodeX> it's not possible to join or query another collection or document
[11:16:25] <fotoflo> hello all. I have a collection playlists with an property videos. some videos have a property comments -- how do i select those comments?
[11:46:11] <remonvv> Why on earth can't you paste an example document? The odds of someone running away with a schema that currently doesn't even work are slim at best.
[11:54:53] <Bartzy> remonvv: Yes, after documents deletion
[11:55:19] <remonvv> If you mean what I think you mean, then no. MongoDB is not more efficient with reusing diskspace that is no longer used since 2.0+ that I'm aware of.
[11:55:46] <remonvv> There are many other valid reasons to upgrade though.
[12:00:14] <NodeX> the ones I like the most is "Mysql can do it so why can't mongo .... or APache can do it so why not nginx" ... my answer is "Go use Mysql / apache then" LOL
[12:15:56] <remonvv> NodeX, yes. A had a reply similar to that on SO, it's quite popular ;)
[12:58:40] <omie__> suppose I have some data with keys/columns name,surname,city. If I try a query like: {'$or': [ {'name': {'$ne': 'xyz'}}, {'city': {'$ne': 'newyork'}} ]}
[12:59:19] <omie__> it doesnt work right. I mean what I expect it to do is exclude all the records if either name is xyz or city is newyork
[12:59:46] <omie__> however, it only excludes records where name is xyz AND city is newyork
[13:01:54] <ppetermann> is that your complete find?
[13:03:59] <omie__> i mean if I do { {'name': {'$ne': 'xyz'} }, {'city': {'$ne': 'newyork'}} }
[13:04:38] <omie__> it works like as if it was OR, takes out all the records where either of conditions is true
[13:05:50] <omie__> so far I know only this. you want me to investigate more ? how should I do that ? (I am new to mongodb, also non relational databases)
[13:07:17] <omie__> and behaviour is same in both 2.0.7, and 2.2
[13:46:36] <omie__> actually I am working with django-nonrel. Initially I thought problem is with django's filter() and exclude(). but later I checked the queries created by django and tried them on mongo shell
[13:46:51] <NodeX> no, it's doing this... (SQL) SELECT * FROM foo WHERE name !='john' OR city !='newyork'
[13:47:30] <NodeX> from your documents which are you trying to select and which do you want left out...
[13:50:11] <NodeX> The $or operator lets you use a boolean or expression to do queries. You give $or a list of expressions, any of which can satisfy the query.
[13:50:23] <NodeX> (from the docs) which means that anything can satisy it
[14:14:50] <omie__> see, the current data I am showing you. I made it for simplicity. I am making a simple log-viewer. and there are really very simple filters I wish to apply
[14:15:33] <NodeX> that's great but you have not told me still what data you want back from the queries and why what it returns currently is wrong
[14:16:22] <NodeX> "The $or operator lets you use a boolean or expression to do queries. You give $or a list of expressions, any of which can satisfy the query."
[14:51:24] <ThomasJ__> If I have a shard that's extremely low on disk space, what can I do to make sure it doesn't completely run out of space?
[14:55:16] <ThomasJ__> It's only this one shard that's full. Most of my other shards have lots of space
[14:58:40] <tncardoso> ThomasJ__: probably your shard key is not dividing content in a balanced way
[14:59:03] <bgilbert> Hey guys, I'm seeing a strange issue using the java driver......I keep running into the following exception: java.lang.IllegalArgumentException: response too long: 1634610484, when the driver tries to create a response from mongo
[15:00:01] <bgilbert> this seems to be happening at random, and happens in cases where mongo shouldn't be returning any results from the query......and the size reported is inconsistent and normally way larger than it should or could be
[15:00:45] <bgilbert> I'm guessing some form of corruption with the response.....has anyone seen an issue like this or have any suggestions on where I should look next to further debug this issue?
[15:00:52] <ThomasJ__> I tried draining that overloaded shard but I get "Can't have more than one draining shard at a time" (I am draining another one as well). Is this a hard limit on mongo?
[15:49:04] <Industrial> At what scope should I be opening connections to MongoDB though node-mongodb-native? application scope (1 connection per app) or request scope?
[15:58:23] <Mmike> I'm upgrading mongo in my cluster - i upgraded one secondary and I see 'sycing to' in it's web interface
[15:58:34] <Mmike> is there a way I can see how long will that last?
[15:59:00] <Industrial> what im really trying to avoid is wrapping my whole application in a db.open(cb). Is there a way to do this once? (Iḿ cleaning up with db.close on process.exit/SIGINT)
[16:04:57] <Mmike> Also, on the server i just upgraded I see it's syncing to another secondary. But on that (another) secondary, and on primary, I don't see 'syncing to' status on any box
[16:26:29] <Bartzy> Is it possible (through the drivers or otherwise) to try to insert a document, but if it exists (i.e. a unique index with that value already exists in the collection), just return the _id ?
[16:28:01] <algernon> Bartzy: but you can try an insert with safe mode, and if that returns an error, do a query on the unique index and return whatever you want.
[16:28:38] <rybnik> Hi there fellows, the mongoosejs channel is rather idle, so perhaps someone here can give me some advice on a "mongoose schema design", I've created a gist describing the problem, anyone care to give it a try ? https://gist.github.com/3900332
[16:28:46] <Bartzy> algernon: Why not try to query that unique index , and if nothing returns - inserT ?
[16:29:27] <algernon> Bartzy: because between the query and the insert, something may insert it
[16:29:57] <Bartzy> That is extremely unlikely, but correct :)
[17:01:13] <twrivera> What is the best practice for storing dates in mongodb? epoch values?
[17:04:01] <kali> mongo and bson have a timestamp value, which is a millisec offset from epoch
[17:04:47] <kali> twrivera: drivers map this natively to the language time class
[17:04:53] <kali> so it's probably your best option
[17:10:52] <thewanderer1> hi. let's say I have a movie database, each movie has an "actors" array. now, I want to obtain a list of actors and movies each has played in. how would I do that?
[17:11:25] <thewanderer1> it's easy to do if I want the movies in which "Tom Thanks" has played, but what if I need this for all actors? looping through them all and issuing queries like mad doesn't sound wise.
[17:12:19] <kali> thewanderer1: you can look at the aggregation framework if it is an occasional query
[17:13:08] <thewanderer1> aha, so essentially map-reduce?
[17:13:56] <kali> thewanderer1: mmm... well, the implementation is actualy very different. m/r relies on a JS interpreter, whereas the AF is a native implementation
[17:14:20] <thewanderer1> the most straightforward way is to create <actor,movie> pairs and then reduce that to actor documents, I imagine... would the AF do that?
[17:14:39] <kali> thewanderer1: that said, if the DB is too big, the AF may not be enough
[17:14:44] <thewanderer1> (and it doesn't really sound very efficient, maybe I'm not using the right tool for the job... SQL comes to mind as "righter")
[17:15:14] <kali> nope, it's not efficient, you're basicallu pulling the entire db
[17:15:35] <kali> but SQL would not be much more efficient
[17:15:44] <twrivera> kali: i'm getting the date values in json format for ex: /Date(1250121600000)/ from an API and I want to insert that value into mongo docs
[17:15:46] <kali> the "ugly" join is just more visible in mongo
[17:16:25] <kali> twrivera: what kanguage are you pulling the data from ?
[17:16:29] <thewanderer1> hopefully SQL would have a way to optimize the join
[17:18:03] <kali> thewanderer1: not really, you're really hitting an algorithmic problem. in bost cases, you're down to a sort of one of the collection, so you're O(n*log(n))
[17:18:15] <kali> thewanderer1: sql does not do miracles
[17:18:31] <kali> thewanderer1: it just hides you from the actual ugliness of the world :)
[17:19:28] <thewanderer1> well, yes, the datasets (actors, movies) would need to be sorted in both SQL and Mongo
[17:20:11] <kali> twrivera: i can't help with python, all i remember about manipulating dates in python is a lot of pain
[17:21:22] <thewanderer1> actually, it's not that bad... I only wonder about the space needed (disk, RAM). would Mongo or SQL have any advantage in this regard?
[17:22:00] <Mmike> where do I change the number of connections for mongodb?
[17:22:08] <Mmike> i have maxconn in my conf file, but I still get this:
[17:22:11] <twrivera> kali: I'm learning that the hard way but how would you store the dates in the db? as string or timestamp object? I figured out how to convert the epoch to a string and I can convert from there
[17:23:25] <kali> thewanderer1: to be honest, the thing that strikes me is... mongo and *sql are more designed for "small" queries that impact only a small part of your dataset
[17:26:55] <thewanderer1> kali, I'm asking all this because I'm writing a task assignment system (human resources, all that stuff), and need to show who's doing what, etc... and aggregation plays a key role here
[17:27:31] <thewanderer1> and Mongo seems really great because of document storage (rich information), only aggregation so far is troublesome
[17:28:01] <kali> thewanderer1: sure. you need to have a look at the aggregation framework. it's much better than map reduce, but will explode if the dataset is too big
[17:28:53] <thewanderer1> kali, define too big? O.o
[17:29:43] <kali> thewanderer1: "If any single aggregation operation consumes more than 10 percent of system RAM the operation will produce an error."
[17:35:05] <thewanderer1> so it does seem that AF is the go-to technology in this case... now I only need to get Mongo >=2.1 (Debian stable is still at 1.4) and I'm all set. thanks!
[17:36:03] <kali> thewanderer1: yes. AF is designed to avoid 99% of m/r use cases, and for good reasons
[17:46:55] <Mmike> How do I know when replica is in sync with its primary?
[17:47:09] <Mmike> I always seem to se 'syncing to' on the _replSet status page
[17:49:32] <ralphholzmann> why is a three member replica set recommended?
[17:52:57] <kali> ralphholzmann: the problem is that when one of the server can't talk to the other, it has no way to make the difference between a network split or a host failure
[17:53:15] <kali> ralphholzmann: so in order to avoid the dreadfull split-brain, it has to go down too
[17:53:16] <Mmike> kali, where do I get those? I'm checking the _replSet web page on each server, but they doesn't seem to give me the same data. On srvA i see that srvB is syncing to srvC, but I don't see that on srvB and srvC
[17:53:42] <Mmike> _m, i get something like this: source: ded778:27017\n syncedTo: Tue Oct 16 2012 13:44:23 GMT-0400 (EDT)\n = 340 secs ago (0.09hrs)
[17:53:43] <kali> Mmike: in the array at the top ?
[17:53:45] <Mmike> does that means it is still syncing, or that is synced?
[17:57:01] <kali> ralphholzmann: i'm sure you need an arbiter or a third server. as for datacenter, it comes with its bundle of costs and issues. i won't answer that question for you
[17:58:04] <kali> Mmike: what state your node is in ? RECOVERING ? or SECONDARY ?
[17:58:10] <ralphholzmann> i appreciate your help kali
[18:38:09] <Mmike> thnx for the info, learned a lot today :)
[18:38:21] <jmar777> any chance that someone has some input on http://stackoverflow.com/questions/12917943/casbah-java-mongodb-driver-java-lang-illegalargumentexception-response-too-l ?
[19:13:41] <_m> uroborus_labs: Have you perused the docs on storing documents in mongo? You'll probably want a field to represent a 'user id' and fields for whatever appointment data you care to store.
[19:14:32] <uroborus_labs> I think my main question whould be, how much data to actually store in the document itself or how much to infer from the application itself
[19:15:04] <uroborus_labs> For example, if a doctor has availability from 9am to 5pm
[19:15:29] <uroborus_labs> Would you actually store that there is an empty slot?
[19:16:44] <uroborus_labs> Or would you store availability and what appointments have been made and then show what slots are open from that data on the application side
[19:47:30] <pgcd> if anybody's using django-mongodb: is there any (API-based) way of finding an object in a list of embedded objects? I mean, if I have the usual Post with comments, can I find a specific comment by ID?
[20:24:50] <ashley_w> i have some strangeness in my program since i added some code yesterday: http://pastie.org/5069647
[20:39:57] <jmar777> i know there used to be some issues with using indexes on a count() operation. is that still the case?
[21:47:55] <Bilge> Soooooooooooooooooo... I accidentally just rm -rf my mongodb dir
[21:48:02] <Bilge> But it seems that it still has everything cached
[21:48:11] <Bilge> Is there some way I can flush it back to disk before it truly is all gone?
[21:49:19] <mgriffin> Bilge: do not stop the service!
[21:53:00] <rybnik> Bilge: just a quick question, I'm rather interested in this…. if you perform a db.coll.find() will this have any negative impact on the cache ?
[21:57:45] <mgriffin> probably also i would want to do something like mongodump or something that does a logical export of the data (i dont know how to back up mongo well, someone else comment)
[22:01:08] <mgriffin> yeah, it seems that mongodump is a really good idea probably, since it seems to: connect over socket, request data from running instance
[22:01:38] <Bilge> Well I managed to copy my database back from the file descriptors like in the article
[22:03:03] <Bilge> As it happens this is just dev data anyway so it's all expendible, but it would still be a pain to lose it. I'm really impressed by your knowledge mgriffin :)
[22:11:43] <Bilge> No, that Unix variants without procfs really are inferior
[22:14:08] <mgriffin> Bilge: they probably have superior filesystems with undelete ;)
[23:06:22] <cjhanks> So the C++ doxygen is entirely out of date an incomplete, is there any better source?
[23:19:32] <cjhanks> And a = Date_t( ); a == Date_t(a.toTimeT()) --> False. When querying. Ech...
[23:44:27] <kenyabob> new guy here. I've used php to create a list of json objects I want to import into mongodb, not sure what the simplest way to to just feed in this comma delineated list of objects.