[01:44:43] <smith_> I have a dataset (logs) which grows a few gigabytes per day. What is my strategy for scaling? Those logs will be routinely analysed.
[01:52:20] <phup> You need to keep the logs around for ever (continuous data expansion) or are you planning to only keep several months of window (you'll hit a db size plateau)?
[01:56:46] <smith_> This decision is up to the management and they are still unsure.
[01:57:03] <smith_> The problem is I need to get working asap
[01:58:44] <phup> There's a lot written about sharding
[01:58:55] <phup> which is how you'd want to scale to many machines
[01:59:04] <phup> tbh I mostly use mongo for smaller personal projects
[01:59:13] <phup> we use hbase at work for the bigger stuff
[01:59:26] <smith_> For regulatory purposes they need to store at least a year worth of logs, but storing more has some additional benefits so they a thinking..
[06:49:42] <arvraepe> Does anyone know why some values are disappearing from my mapreduce result?
[06:50:22] <arvraepe> using these functions: map = function () { emit(this.item, 1); } and reduce = function (key, values) { return values.length; }
[06:51:10] <arvraepe> I have noticed that some of the values are disappearing from the generated list...
[07:12:48] <sirpengi> arvraepe: can you be more specific about what is disappearing?
[07:12:58] <sirpengi> from here the map-reduce functions you have look standard
[07:15:17] <arvraepe> Well, I think I'm not quite getting the map reduce thing completely. I just tried this code to test (i am new to this). This code should count all occurrences of the item property. so count many documents contain the .item property... I figured that emitting 1 would generate [1,1,1,1..1] a 1 for every occurence
[07:15:39] <arvraepe> There was an item lets call it a that occured 45 times
[07:16:09] <arvraepe> and when I checked again this morning it only had 2 .. ( I sort on count : -1 , so I thought it dissapeared)
[07:26:36] <arvraepe> sirpengi: I changed my map to function () { emit(this.item, {count:1}); } and reduce to: function (key, values) { var sum = 0; values.forEach(function (value) {sum += value.count;}); return {count:sum}; } and this works like it should... any idea what the difference is between them?
[07:55:57] <synchrone> So I was wondering, what happens if in my .NET application I try to use MongoDBFileStream after IDisposable RequestStartResult object was disposed ?
[09:58:23] <jwilliams_> a stupid question. i write large amont of data with write concern safe. when will those data appear if another user try to query those data at the same time?
[10:01:06] <jwilliams_> for example, when writing around 60k record, which supposes taking 10 mins to 20 mins; and then around 10 mins, a user query with mongo shell will the data that is acked (passig the e.g. save call) appear in shell?
[10:08:10] <jwilliams_> my understanding is that with write concern set to safe, and journaling is on. basically the data should be pretty safe (no need to worry losing data unless e.g. disk crash.)
[10:09:03] <jwilliams_> so just want to check generally when can i expect those data can be seen if another use search with e.g. mongo shell.
[10:09:03] <_aegis_> why not use a second server anyway?
[10:09:35] <jwilliams_> we've already had shards up running.
[11:09:26] <nb-ben> Hello, I am new to mongo :) coming from relational databases.. I am looking for a reliable way to find the ID of a newly inserted document using mongodb.MongoClient for node.js
[11:10:21] <nb-ben> it appears from some examples that db.collection.insert takes a callback which returns the inserted document but it does not appear to be documented in the reference on the official website
[11:15:38] <kali> nb-ben: i'm not sure how it translate in node.js, but one way is to genreate the id yourself ( BSON::ObjectId.new() in ruby )
[11:17:30] <nb-ben> yeah it appears that this is the approach since the IDs are generated on the client side rather than on the server side.. which is a bit weird to me because I don't yet understand how collisions are prevented :P
[11:18:08] <nb-ben> hmm it appears that db.collection.save() will update the object to contain _id
[11:18:24] <kali> nb-ben: the definition of objectids prevent the collisions
[11:21:20] <nb-ben> hmm smart, so for every second you get 16777216 unique IDs and that's even ignoring the machine/process IDs
[12:32:59] <benpro> Hello, I have issue with mongodb replicaset, 2 secondary are lagging due to many updates, but I don't know why there is such many updates... http://www.dump-it.fr/mongolagpng/ca9b0a517d307feb38474cd64879f8bb.png.html someone has an idea ?
[12:45:36] <synchrone> I was wondering, what happens if in my .NET application I try to use MongoDBFileStream after IDisposable RequestStartResult object was disposed ?
[13:24:26] <Fodi69> hello, I'm using Node.js with mongodb. How can I compare 2 ObjectID values? userID == user.id returns false
[15:49:44] <rickibalboa> Is it possible to say get all the documents before a specific id? Or a limited number of those documents
[16:01:29] <wereHamster> rickibalboa: _id: {$lt: <the id before which you want the docs>}
[16:02:05] <rickibalboa> wereHamster, thats the one, thanks a lot!
[16:02:40] <wereHamster> Fodi69: mongodb docs don't have an 'id' property, they have '_id'
[16:02:50] <wereHamster> so I don't know what user.id returns
[16:02:59] <wereHamster> also, you can't compare ObjectIds with ==
[18:14:00] <zenmaster> So I have this giant database. And it is massive at least to me. 120GB.
[18:14:13] <zenmaster> I was reffered to try using MongoDB instead of MySQL.
[18:14:29] <zenmaster> The gentlemen in CentOS say MongoDB has database corruption issues.
[18:15:24] <zenmaster> Let me give you a little bit of background. The database is one large flat file 383 columns, 184 million rows. I tried with SQL breaking that down into a relational database using 21 Tables... But some queries were fine, othere took hours.
[18:15:56] <zenmaster> The application at hand here is so people can pull consumer data via drop down menus in php. I need to be as close to real time as I can possible get.
[18:16:41] <zenmaster> Is MongoDB the holy grail I am looking for? Or do I need to read more books? Or do I need to fire my php developer? Thanks in advance. :D
[18:26:38] <StephenDebs> i would research why your queries are taking so long before jumping the mysql ship
[18:34:14] <joe_p> zenmaster: you need to isolate what is taking so long and understand why. might want to try talking to the folks in #mysql as they are very helpful.
[18:41:47] <StephenDebs> I have an api result I am trying to save to a DB (using pymongo). I get an error saying it cannot save an object of type set. Is this a limitation on mongodb? havent seen any documentation on it...
[20:03:14] <lucian> zenmaster: mysql has explain and a good profiler, try them. your query is likely not quite optimal. also, if you wish to switch to something else, postgresql would be easier than mongo to switch to
[20:15:12] <zenmaster> lucian: I just want to know what would be the most effiecant database to use for something this large on a single server.
[20:15:51] <lucian> zenmaster: pretty much anything, 120gb isn't particularly big. you just need good queries and perhaps indexes/views
[20:16:00] <lucian> zenmaster: if you're already using mysql, stick to it
[20:18:03] <zastern_> So, I just took over control (as admin) of a mongo replica set. its just 1x R/W, 1x Read-only, 1x arbiter
[20:18:10] <zastern_> but ive noticed that if one goes down they all go down
[20:18:17] <zastern_> e.g. the readonly one is undreadable of the R/W is down
[20:18:22] <zastern_> is that the normal behavior?
[20:20:38] <zenmaster> lucian: You really think so? Ok, what storage engine would you say is better, innodb, or myiasm? This is a database once the data is loaded will rarely if never get written to. :)
[20:21:31] <lucian> zenmaster: innodb is the sane one, with proper transactions and such. myisam may be slightly faster. again, your query and indexes are likely to matter much more. and ask in #mysql