pmxbot IRC Log Viewer

[01:44:43] <smith_> I have a dataset (logs) which grows a few gigabytes per day. What is my strategy for scaling? Those logs will be routinely analysed.

[01:52:20] <phup> You need to keep the logs around for ever (continuous data expansion) or are you planning to only keep several months of window (you'll hit a db size plateau)?

[01:56:46] <smith_> This decision is up to the management and they are still unsure.

[01:57:03] <smith_> The problem is I need to get working asap

[01:57:27] <phup> Hah

[01:57:28] <phup> Well

[01:58:44] <phup> There's a lot written about sharding

[01:58:55] <phup> which is how you'd want to scale to many machines

[01:59:04] <phup> tbh I mostly use mongo for smaller personal projects

[01:59:13] <phup> we use hbase at work for the bigger stuff

[01:59:26] <smith_> For regulatory purposes they need to store at least a year worth of logs, but storing more has some additional benefits so they a thinking..

[01:59:43] <smith_> Is hbase up to the task?

[01:59:50] <phup> hbase is a huge pain in the ass

[01:59:53] <phup> but, it's a beast

[02:00:18] <smith_> oh dear

[02:00:42] <phup> mongo is much much easier to work with

[02:01:04] <phup> but query performance can degrade when your working set is > available memory

[02:01:11] <smith_> yep, it is exactly reason we a wanted to use it.

[02:02:36] <phup> yeah, well, to try and future-proof yourself when using mongo you can write all your analysis tools as map and reduce functions

[02:03:07] <phup> but, I have no personal experience using very large data sets w/ mongo, sorry I can't be more helpful

[02:03:31] <smith_> you are very helpful already

[02:04:00] <phup> cool, glad to hear it

[02:04:59] <smith_> one more question though: what size of the biggest know mongo database?

[02:05:54] <phup> not sure, but this might be down the right track: http://www.10gen.com/video/mongosf2011/craigslist

[02:06:00] <phup> apparently CL uses it

[02:06:14] <phup> and they have billions of records

[02:06:28] <phup> er

[02:06:29] <phup> http://www.mongodb.org/display/DOCS/Production+Deployments

[02:06:31] <phup> that links dead

[02:08:22] <smith_> nice

[02:08:30] <UForgotten> hbase via cloudera is really easy to set up. but mongo is also fairly simple.

[03:23:44] <pingupingu> hello guys

[03:24:05] <pingupingu> how can I have client side offline storage with mongodb?

[03:24:35] <pingupingu> can I use pouch for local offline storage and then sync with mongodb when online?

[04:12:42] <StephenDebs> evening

[06:49:42] <arvraepe> Does anyone know why some values are disappearing from my mapreduce result?

[06:50:22] <arvraepe> using these functions: map = function () { emit(this.item, 1); } and reduce = function (key, values) { return values.length; }

[06:51:10] <arvraepe> I have noticed that some of the values are disappearing from the generated list...

[07:12:48] <sirpengi> arvraepe: can you be more specific about what is disappearing?

[07:12:58] <sirpengi> from here the map-reduce functions you have look standard

[07:15:17] <arvraepe> Well, I think I'm not quite getting the map reduce thing completely. I just tried this code to test (i am new to this). This code should count all occurrences of the item property. so count many documents contain the .item property... I figured that emitting 1 would generate [1,1,1,1..1] a 1 for every occurence

[07:15:39] <arvraepe> There was an item lets call it a that occured 45 times

[07:16:09] <arvraepe> and when I checked again this morning it only had 2 .. ( I sort on count : -1 , so I thought it dissapeared)

[07:26:36] <arvraepe> sirpengi: I changed my map to function () { emit(this.item, {count:1}); } and reduce to: function (key, values) { var sum = 0; values.forEach(function (value) {sum += value.count;}); return {count:sum}; } and this works like it should... any idea what the difference is between them?

[07:28:09] <sirpengi> oh

[07:28:11] <sirpengi> yeah, that makes sense

[07:28:45] <sirpengi> the reason using values.length is being weird there is because reduce can be called multiple times

[07:29:25] <sirpengi> like, on one call, it might have received only 2 emits (so "a", [1,1])

[07:29:53] <sirpengi> and the values.length way would reduce to "a", 2

[07:30:14] <sirpengi> but the result of that can be sent into reduce again, so another call might get "a", [2,1,1,1]

[07:31:04] <wereHamster> if length(X) != sumOfElementsIn(X) then some elements in X have count != 1 (presumably >1)

[07:31:24] <arvraepe> I see! then it indeed makes sense that the values.length fails. It was a nice try though :p

[07:31:30] <wereHamster> how can that happen? Well, the output of the ruduce function is fed into the reduce function again.

[07:32:17] <arvraepe> well sirpengi, I thank you for your time :-)

[07:54:27] <synchrone> Hi everyone

[07:55:57] <synchrone> So I was wondering, what happens if in my .NET application I try to use MongoDBFileStream after IDisposable RequestStartResult object was disposed ?

[09:58:23] <jwilliams_> a stupid question. i write large amont of data with write concern safe. when will those data appear if another user try to query those data at the same time?

[10:01:06] <jwilliams_> for example, when writing around 60k record, which supposes taking 10 mins to 20 mins; and then around 10 mins, a user query with mongo shell will the data that is acked (passig the e.g. save call) appear in shell?

[10:01:51] <_aegis_> you're inserting 60k records?

[10:04:28] <jwilliams_> yeah

[10:04:52] <jwilliams_> with several machines concurrently.

[10:05:13] <_aegis_> so you're just using mongo normally

[10:05:36] <jwilliams_> yes

[10:05:41] <_aegis_> if you insert one, would you expect it to appear pretty quickly?

[10:06:47] <jwilliams_> no. just want to check when those data will be reflected in mongo shell?

[10:07:01] <IAD> bulk insert is good for it

[10:08:10] <jwilliams_> my understanding is that with write concern set to safe, and journaling is on. basically the data should be pretty safe (no need to worry losing data unless e.g. disk crash.)

[10:09:03] <jwilliams_> so just want to check generally when can i expect those data can be seen if another use search with e.g. mongo shell.

[10:09:03] <_aegis_> why not use a second server anyway?

[10:09:35] <jwilliams_> we've already had shards up running.

[11:09:26] <nb-ben> Hello, I am new to mongo :) coming from relational databases.. I am looking for a reliable way to find the ID of a newly inserted document using mongodb.MongoClient for node.js

[11:10:21] <nb-ben> it appears from some examples that db.collection.insert takes a callback which returns the inserted document but it does not appear to be documented in the reference on the official website

[11:15:38] <kali> nb-ben: i'm not sure how it translate in node.js, but one way is to genreate the id yourself ( BSON::ObjectId.new() in ruby )

[11:16:45] <nb-ben> I see

[11:17:30] <nb-ben> yeah it appears that this is the approach since the IDs are generated on the client side rather than on the server side.. which is a bit weird to me because I don't yet understand how collisions are prevented :P

[11:18:08] <nb-ben> hmm it appears that db.collection.save() will update the object to contain _id

[11:18:24] <kali> nb-ben: the definition of objectids prevent the collisions

[11:18:51] <kali> nb-ben: http://docs.mongodb.org/manual/core/object-id/

[11:21:20] <nb-ben> hmm smart, so for every second you get 16777216 unique IDs and that's even ignoring the machine/process IDs

[12:32:59] <benpro> Hello, I have issue with mongodb replicaset, 2 secondary are lagging due to many updates, but I don't know why there is such many updates... http://www.dump-it.fr/mongolagpng/ca9b0a517d307feb38474cd64879f8bb.png.html someone has an idea ?

[12:45:36] <synchrone> I was wondering, what happens if in my .NET application I try to use MongoDBFileStream after IDisposable RequestStartResult object was disposed ?

[13:24:26] <Fodi69> hello, I'm using Node.js with mongodb. How can I compare 2 ObjectID values? userID == user.id returns false

[15:49:44] <rickibalboa> Is it possible to say get all the documents before a specific id? Or a limited number of those documents

[16:01:29] <wereHamster> rickibalboa: _id: {$lt: <the id before which you want the docs>}

[16:02:05] <rickibalboa> wereHamster, thats the one, thanks a lot!

[16:02:40] <wereHamster> Fodi69: mongodb docs don't have an 'id' property, they have '_id'

[16:02:50] <wereHamster> so I don't know what user.id returns

[16:02:59] <wereHamster> also, you can't compare ObjectIds with ==

[16:06:28] <Fodi69> yes, I meant _id

[16:06:47] <Fodi69> working now with: http://mongodb.github.com/node-mongodb-native/api-bson-generated/objectid.html#equals

[18:13:48] <zenmaster> Hi guys!

[18:14:00] <zenmaster> So I have this giant database. And it is massive at least to me. 120GB.

[18:14:13] <zenmaster> I was reffered to try using MongoDB instead of MySQL.

[18:14:29] <zenmaster> The gentlemen in CentOS say MongoDB has database corruption issues.

[18:15:24] <zenmaster> Let me give you a little bit of background. The database is one large flat file 383 columns, 184 million rows. I tried with SQL breaking that down into a relational database using 21 Tables... But some queries were fine, othere took hours.

[18:15:56] <zenmaster> The application at hand here is so people can pull consumer data via drop down menus in php. I need to be as close to real time as I can possible get.

[18:16:41] <zenmaster> Is MongoDB the holy grail I am looking for? Or do I need to read more books? Or do I need to fire my php developer? Thanks in advance. :D

[18:26:38] <StephenDebs> i would research why your queries are taking so long before jumping the mysql ship

[18:34:14] <joe_p> zenmaster: you need to isolate what is taking so long and understand why. might want to try talking to the folks in #mysql as they are very helpful.

[18:41:47] <StephenDebs> I have an api result I am trying to save to a DB (using pymongo). I get an error saying it cannot save an object of type set. Is this a limitation on mongodb? havent seen any documentation on it...

[18:41:55] <StephenDebs> db.nyt12262012.save({story["adx_keywords"]})

[18:45:35] <kali> StephenDebs: mongodb supports hash and arrays, not set

[18:49:36] <StephenDebs> thanks kali!

[20:03:14] <lucian> zenmaster: mysql has explain and a good profiler, try them. your query is likely not quite optimal. also, if you wish to switch to something else, postgresql would be easier than mongo to switch to

[20:15:12] <zenmaster> lucian: I just want to know what would be the most effiecant database to use for something this large on a single server.

[20:15:51] <lucian> zenmaster: pretty much anything, 120gb isn't particularly big. you just need good queries and perhaps indexes/views

[20:16:00] <lucian> zenmaster: if you're already using mysql, stick to it

[20:18:03] <zastern_> So, I just took over control (as admin) of a mongo replica set. its just 1x R/W, 1x Read-only, 1x arbiter

[20:18:10] <zastern_> but ive noticed that if one goes down they all go down

[20:18:17] <zastern_> e.g. the readonly one is undreadable of the R/W is down

[20:18:22] <zastern_> is that the normal behavior?

[20:20:38] <zenmaster> lucian: You really think so? Ok, what storage engine would you say is better, innodb, or myiasm? This is a database once the data is loaded will rarely if never get written to. :)

[20:21:31] <lucian> zenmaster: innodb is the sane one, with proper transactions and such. myisam may be slightly faster. again, your query and indexes are likely to matter much more. and ask in #mysql

[20:22:27] <zenmaster> Roger.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 27th of December, 2012