PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 24th of January, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:18:00] <bmw0679> I've run into a write lock problem because of lack of disk space. This prevents me from running db.dropDatabase() which would solve my disk space problem. If I remove the database files manually from the file system would this cause a problem?
[01:55:21] <dvpalex_> Hi
[02:13:13] <Semor> Is mangodb a ram db?
[02:25:16] <ranman> mongodb?
[02:26:58] <Semor> yes
[02:27:03] <Semor> mongodb
[02:28:59] <darius93> ram database Semor? I doubt it
[02:29:22] <darius93> if you want to store the db in a ramdisk you can easily but you must make backups to prevent data lost
[02:29:45] <Semor> In my application ,I save my data by executing SQL sentences , is mongodb advanced than this ?
[02:31:18] <Semor> I save my data in ram by c++ struct ,then in another db thread execute sql transections to do persistence
[02:33:26] <darius93> no. Its more of a json-like format but almost has similar functions.
[02:33:27] <darius93> http://docs.mongodb.org/manual/reference/sql-comparison/
[02:33:53] <darius93> i wouldnt recommend storing the data in ram for a long period of time
[02:34:12] <darius93> possibly dump it to disk and clean it out to of memory and call on it when needed
[02:34:29] <Semor> darius93: my db thread save for 1 second interval
[02:35:02] <darius93> hmm
[02:35:19] <Semor> darius93:if not use a ram cache ,my application may be some slower
[02:35:20] <darius93> well you can go on and try mongo out if you want. check out that link to see the comparison between the two
[02:35:37] <darius93> you can store the data in memory and save it to the db
[02:35:49] <darius93> but i wouldnt recommend saving it every second unless your server is that fast
[02:35:58] <darius93> and a native application being slow?
[02:36:29] <Semor> darius93:yes ,but if I use mongodb ,how is the operation procedure?
[02:37:02] <Semor> darius93:I do not know whether mongodb implement db cache in ram
[02:37:44] <Semor> And one mongodb sql operation whether directly operate on the db disk ?
[02:41:53] <darius93> well i havent used mongo in C/C++ yet but overall e.g INSERT INTO EXAMPLE ('section') values ('data'); would be like doing db.example.insert({section:'data'});
[02:44:22] <darius93> Semor: if you dont mind me asking, what type of app are you building?
[03:03:36] <Semor> darius93:I am building a network server ,receiving client's operations and calculating results , then saving to db
[03:04:08] <darius93> tcp server?
[03:04:56] <Semor> darius93:Yes , And my server has its matching client
[03:05:59] <Semor> they communicate with each other with tcp protocol
[03:06:07] <darius93> cool
[03:06:48] <Semor> I am using mysql now
[03:07:31] <darius93> Ah ok. I would suggest MariaDB over MySQL but if you need too, use mongodb
[03:07:42] <darius93> all depends on your needs
[03:07:50] <Semor> MariaDB?
[03:08:08] <darius93> Yep. Its basically a fork of MySQL
[03:08:22] <darius93> but have different things that mysql doesnt
[03:08:38] <darius93> it works out of the box and is compat with mysql dbs
[03:09:07] <Semor> Do I need to write sql sentences to save data?
[03:10:34] <darius93> you can leave the statements you currently have
[03:11:20] <Semor> darius93:why do not you recommend mongodb ?
[03:11:36] <darius93> I would recommend it
[03:11:46] <darius93> but it all depends on your needs
[03:12:05] <darius93> mongodb is a document db, while mysql/mariadb is a rational db
[03:12:42] <darius93> NoSQL is scalable and MariaDB can be scalable too
[03:12:57] <darius93> but it depends on how you use and take advantage of it
[03:13:20] <Semor> darius93:my requirement is faster running and easy to use
[03:14:58] <darius93> Any database can be fast, its just how you use it to make it perform better. I think MongoDB may be worth a try then, or MariaDB
[03:16:34] <Semor> darius93: I want to refer to some c++ using mongodb examples
[03:17:00] <darius93> http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-cpp-driver/
[03:17:05] <darius93> that should help you out Semor
[03:19:14] <Semor> darius93: thanks!
[04:50:11] <SlexAxton> hi, if I have a sparse index on two properties, and one is nil, and the other exists, does it still get indexed?
[04:50:54] <SlexAxton> (sparse, unique index)
[04:51:41] <SlexAxton> and if it does get picked up, does nil count towards the uniqueness?
[04:52:03] <SlexAxton> [nil, x] can only occur once
[04:52:13] <SlexAxton> thanks!
[04:58:02] <SlexAxton> i think this might be related
[04:58:02] <SlexAxton> https://jira.mongodb.org/browse/SERVER-785
[04:58:04] <SlexAxton> and is open
[04:58:10] <SlexAxton> so probably not in stable
[06:42:53] <b0ss_> so mongodb does not implement a many-to-many relationship in something similar to the JOIN TABLES, right? It just place the other object's ID onto the required object ?
[06:46:41] <b0ss_> However, Managing these relationships is the difficult part. Say, if you delete one side you have to manually update the other side. Right ?
[07:39:52] <omid8bimo> i need some guidance on something. i have 3 mongodb servers in my replica set, 1 master mongodb, 1 slave mongodb and one as arbiter; my slave server crashed yerterday and after lots of file system repair, server become ok but mongodb couldnt start, so i removed the /var/lib/mongodb, created an empty directory for it and started the service to get the data back from master, but only half of the data got
[07:39:53] <omid8bimo> replicated. is that normal?
[08:42:22] <Gargoyle> omid8bimo: Nope. Sound like the server is stull fubar.
[08:48:07] <omid8bimo> Gargoyle: could you explain more?
[08:48:40] <omid8bimo> first of all, how can i know how much data has been saved in oplog? so that i can understand when i will start overwriting
[08:49:16] <omid8bimo> second of all, why mongodb would not replicate the whole dataset when a fresh mongo instance joined the replicaset?
[08:55:49] <Gargoyle> omid8bimo: I can't think of any reason why under a normal configuration the whole data would not be sync'd. Are there any errors in the logs?
[08:56:10] <Gargoyle> Dunno about checking the oplog size, but don't think that would make any difference ether.
[08:57:20] <omid8bimo> Gargoyle: no errors. db.printSlaveReplicationInfo() on master said secondary is 20 hours behind
[08:57:39] <Gargoyle> Is it still syncing?
[08:57:43] <omid8bimo> and db.printReplicationInfo() says:
[08:57:48] <omid8bimo> configured oplog size: 102400MB
[08:57:50] <omid8bimo> log length start to end: 141318secs (39.26hrs)
[08:58:14] <omid8bimo> does this mean my oplog for keep around 40 hours of data?
[08:59:34] <omid8bimo> Gargoyle: master db db.printSlaveReplicationInfo() says syncedTo: 1 seconds ago, but data on secondary that came up like 1 hour ago is still 30 GB behind master!
[09:04:41] <Gargoyle> What does printSlaveReplication() on the slave say?
[09:07:24] <omid8bimo> says its not a function!
[09:07:34] <omid8bimo> do you mean printSlaveReplicationInfo()?
[09:07:41] <Gargoyle> yeah
[09:08:15] <omid8bimo> p_set:SECONDARY> db.printSlaveReplicationInfo()
[09:08:16] <omid8bimo> source: kookoja-web:27017
[09:08:18] <omid8bimo> no replication info, yet. State: ARBITER
[09:08:20] <omid8bimo> source: kookoja-db01r1s0:27017
[09:08:22] <omid8bimo> syncedTo: Fri Jan 24 2014 12:35:30 GMT+0330 (IRST)
[09:08:24] <omid8bimo> = 0 secs ago (0hrs)
[09:10:36] <kali> omid8bimo: what makes you think only half the data got there ?
[09:10:58] <kali> omid8bimo: have you checked the database from inside, or just look at the file size ?
[09:12:26] <omid8bimo> becase of two factors, one is that total size of /var/lib/mongodb is different on both servers. (secondary is aroudn 20GB less) and also, based on my collections, i have kcollection.48 on master but only kcollection.43 on secondary
[09:13:27] <Gargoyle> omid8bimo: Don't think you can count disk space size. Your master will probably have "holes" in it's data files, your secondary will be nice and neat to start with.
[09:14:16] <omid8bimo> Gargoyle: so to be sure that my data on secondary is up to date, can i do this?
[09:14:23] <omid8bimo> remove the /var/lib/mongodb
[09:14:35] <omid8bimo> create an empty folder
[09:15:16] <omid8bimo> start mongod on secondary with replicaSet config and wait until it sync the entire /var/lib/mongodb on master which is 180 GB
[09:15:44] <kali> omid8bimo: yes. just don't expect the size on disk to be similar, it has no reason to be
[09:16:19] <omid8bimo> kali: ok but collection names or the quantity must be the same, correct?
[09:16:43] <kali> omid8bimo: yes. except stuff from the local db
[09:17:04] <omid8bimo> like if i have up to kcollection.48 on master, i must have kcollection.48 on secondary as well?
[09:17:11] <kali> yeah
[09:17:21] <kali> ha !
[09:17:24] <kali> no !
[09:17:29] <omid8bimo> ?
[09:17:30] <kali> that's a file name
[09:17:37] <kali> right ?
[09:17:53] <omid8bimo> yup
[09:18:06] <kali> "kcollection" is a database name, it needs 48 data "segments" on your primary
[09:18:28] <kali> but it will very likely require less when synced on your secondary
[09:18:37] <Gargoyle> omid8bimo: Perhaps you should stop looking at the files and use the shell to check things and leave mongo to "do its thing" with your files!
[09:18:40] <kali> same problem than size on disk
[09:18:47] <kali> Gargoyle: +1
[09:21:16] <omid8bimo> all right. so tell me a cmd so i can use to compare data on both
[09:21:16] <omid8bimo> something like "show dbs" is good enough?
[09:21:30] <Gargoyle> rs.status() ?
[09:21:56] <kali> rs.status() should show the secondary in state secondary (not recovering)
[09:22:32] <kali> you can also look at the rsSync lines in the secondary logs
[09:23:17] <kali> but bottom line is, the procedure you followed is very really robust (as long as the oplog is deep enough, and it sounds like it is)
[09:23:43] <kali> stop messing with your server, it's fine :)
[09:23:55] <omid8bimo> kali: ok, here is my rs.status() output on secondary http://paste.debian.net/78117/
[09:24:16] <kali> yep, purring like a kitten
[09:24:35] <kali> leave it alone
[09:24:49] <omid8bimo> kali: ok :) so its up to date and working well enough
[09:25:41] <omid8bimo> one more thing, how much approximately does it take to do a initial sync on a LAN connection for 180 GB data?
[09:26:00] <omid8bimo> if ever i wanted to do a initial sync?
[09:26:27] <Gargoyle> omid8bimo: define LAN ?
[09:27:37] <kali> it depends on about a dozen factor...
[09:27:40] <omid8bimo> Gargoyle: normal 1000GB connection
[09:27:55] <omid8bimo> oops! i meat 1GB/ps
[09:27:58] <Gargoyle> omid8bimo: 100GB is not normal! :P
[09:27:59] <omid8bimo> *meant
[09:28:45] <Gargoyle> omid8bimo: Data throughput on your ethernet will be about 75% of it's speed.
[09:29:28] <omid8bimo> so im guessing several days to do a initial sync of 180GB data from master to a new secondary
[09:31:29] <Gargoyle> So. do 180GB * 8 = 1440Gbits/sec * 1024 = 1,474,560Mbits / 750mbps = 1,966 seconds = about 30 mins.
[09:31:58] <kali> more in the ballpark of one hour, yeah
[09:33:07] <omid8bimo> Gargoyle: just 30 mins?
[09:33:42] <Nodex> max throughput would be 30 mins
[09:33:50] <Nodex> in reality it's closer to an hour
[09:35:22] <Gargoyle> Other factors will make a difference, disk speed, other network traffic, etc. But theiretically, your two machines should be about to shove 180GB down the wire in about 30 mins (theoretical max).
[09:36:41] <omid8bimo> ok i wanna do something for test. i just stopped secondary, renamed /var/lib/mongodb, created an empty folder with permissions set, started and now it start syncing data from master
[09:36:56] <omid8bimo> but i dont see any recovering state in rs.status()
[09:37:24] <kali> it might be "initial sync" or something like that
[09:38:39] <omid8bimo> even i tried grep on mongodb.log for ip address of master, and im getting liek 10 connections per 5 mins
[09:38:48] <balboah> also building indexes will pause replication
[09:39:05] <Nodex> indexes get built after the sync
[09:39:06] <balboah> so not all about bandwidth
[09:39:14] <balboah> well before it becomes ready at least :)
[09:39:54] <Nodex> if the replication has high traffic there will be a lock while indexes are built stopping new data arriving which will resync once the index has built
[09:41:40] <omid8bimo> ok fair enough. i have MMS service configured on my servers. which graph on secondary host i should pey attention for the progress of replication/sync?
[10:29:09] <tiller> Hi!
[10:29:59] <tiller> Is there a way to check it -a field- has a specific value? A field being: "It doesn't matter which one".
[10:31:05] <tiller> For example, if I have: {_id: 1, subDocumentToTest: [ {a: "test"}, {b: "thing"}, {a: "hello", c: "test"} ] }
[10:31:29] <tiller> I want to have a query like : { "subDocumentToTest.?": "test" }
[10:35:47] <Nodex> no
[10:36:30] <tiller> Ok. I guess I'll just do a -big- OR
[10:36:50] <tiller> Thanks
[12:15:23] <byalaga> How to know when the oplog rotation happens on a primary? any help ?
[12:19:26] <kali> byalaga: what ?
[12:21:19] <byalaga> kali: Is there any way to estimate when a oplog is rotated ?
[12:22:00] <kali> byalaga: what do you mean "rotated" ?
[12:22:29] <kali> oplog is sliding, not rotating
[12:23:43] <byalaga> I mean, we need to take backup from a secondary node and we want to ensure the oplog on the primary node is not rotated/slided?
[12:23:44] <byalaga> so that whenever we bring this secondary node.. it should catch up with the primary
[12:25:02] <kali> byalaga: db.printReplicationInfo() on the primary will tell you what are the first and last event of your oplog
[12:25:20] <kali> byalaga: and the length of the time span covered
[12:26:24] <gnagno> hello all
[12:26:34] <kali> byalaga: apart from that i don't understand what you mean
[12:26:51] <gnagno> is it possible to create a replicaset using diffent mongodb versions? say I have a 1.8.1 and 2.4.x ?
[12:27:59] <byalaga> kali: http://pastie.org/8663455
[12:28:23] <byalaga> does that mean my oplog on this server can hold 62hrs of oplog?
[12:28:24] <kali> gnagno: heteogeneous replica sets are meant for version migration only. they will work for some combination but it is not recommended, and not supported as far as i know
[12:28:48] <kali> gnagno: with such a wide range (1.8 to 2.4) i would not even think about it
[12:29:03] <kali> byalaga: yes
[12:30:29] <byalaga> how can we say that?? the host may have 5operation in 1 sec sometimes.. and 50operations sometimes..
[12:30:53] <gnagno> kali, thank you, actually version migration is what I need, so you suggest me to migrate first from 1.8 to 2.0 then from 2.0 to 2.4 ?
[12:31:55] <kali> gnagno: look at the release notes for the release branch (2.0, 2.2, and 2.4). they will tell you for sure which one can be skipped. Don't forget 2.2
[12:32:26] <byalaga> kali: how can we say that exactly?? the host may have 5operation in 1 sec sometimes.. and 50operations sometimes..
[12:32:26] <byalaga> if the latest operation on my secondary may/may not be present on the primary
[12:34:47] <kali> byalaga: it's just based on what happened in the past. it holds 62 hours of backlog, but if your load change, it will obviously cover less time
[12:36:43] <byalaga> kali: my oplog size on all nodes is 2048MB. for how log can i keep my secondary node down for backups? so that.. if i bring it back it should be able to catch up with primary?
[12:38:32] <kali> byalaga: well, assuming your load does not change suddenly, about 62 hours
[12:39:57] <byalaga> "oplog first event time" on primary is not changing.. does that mean.. oplog is not sliding now?
[12:41:45] <kali> that's a bit weird... unless it is actually the beginning of the existence of the primary and every write since is fitting in the oplog
[12:42:24] <byalaga> exactly, so it is not yet filled and not sliding?
[12:42:37] <kali> sounds like it
[12:42:57] <byalaga> is there a way to check when it will start sliding?
[12:44:48] <_pash> hello, how long would it take to find a match of 12000 entries in a db and whats the best way to do that?
[12:45:06] <kali> byalaga: mmmm maybe try: use local; db.oplog.rs.stats();
[12:46:14] <byalaga> "count" : 5361108,
[12:46:14] <byalaga> "size" : 2061707724,
[12:47:13] <byalaga> "storageSize" : 2147487728,
[12:47:59] <byalaga> kali: any idea? how to calculate it from the above results?
[12:48:08] <kali> count is the count of documents, size and storageSize are quite close... so i'm not sure
[12:48:29] <gnagno> quick question: assume that I have a mongo server with a lot of data in it already running, if I create a replica set will it automatically migrate all the data also?
[12:48:49] <byalaga> ok thanks for you time
[12:48:55] <byalaga> kali: ^^
[12:49:04] <kali> gnagno: you want to migrate from standalone to replicated ?
[12:49:43] <gnagno> kali, yes
[12:49:58] <kali> gnagno: it's a relatively painless and well documented procedure: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
[12:53:33] <_pash> how long would it take to run a search through 12000 entries and mongo and whats the best way of finding a match?
[12:57:26] <gnagno> thank you very much kali
[13:00:39] <Nodex> _pash : that's relative
[13:00:56] <Nodex> normally about the same length as a piece of string
[13:01:57] <_pash> Nodex: it would literally be like 12000 entries of usernames that still have not registered, and as soon as the user types that username, i need to check whether it is or not registered
[13:03:25] <kali> less than a tenth of milliseconds if you have an index on the username and it fits in RAM
[13:03:25] <Nodex> sometimes it can be shorter than the string but sometimes it's the same
[13:04:21] <_pash> kali: how do i make sure that its in ram?
[13:04:32] <_pash> what do i read up on? kali
[13:04:34] <kali> you buy enough of the stuff
[13:09:36] <_pash> kali: ensureIndex() ?
[13:11:28] <kali> come on, have a look at the doc
[13:11:40] <_pash> which part
[13:11:41] <kali> (yes. ensureIndex)
[13:11:49] <_pash> ok thanks
[15:42:20] <trueneu> Hi. Is it possible that PRIMARY and SECONDARY servers in the same replica set have different collections in the same db?
[15:42:52] <trueneu> No replication lag whatsoever. I did an initial sync for the SECONDARY server and it doesn't seem to be synced.
[15:47:42] <joannac> Unlikely, but anything's possible
[15:48:35] <cheeser> that'd be a pretty colossal bug...
[15:49:54] <joannac> I would say more likely you hit some problem, possibly corruption, or maybe you're not really up to date
[15:50:42] <trueneu> Right now it's 2.0.something version, probably there _was_ such a bug?
[15:51:39] <trueneu> I actually have 3 servers in this repl set, PRIMARY and one SECONDARY are 2.0.smth, and the second SECONDARY is 2.4.smth. Both secondaries have the same data sets
[15:52:00] <trueneu> That differ from the set that is on PRIMARY
[15:53:03] <trueneu> And rs.status() doesn't show anything abnormal
[16:01:08] <tiller> Hey, I've some difficulties with an upsert, can someone have a look? :/
[16:01:11] <tiller> http://pastebin.com/qHhECkjP
[16:13:51] <joannac> tiller: could you do it with elemMatch and the positional operator?
[16:15:21] <tiller> joannac> You mean within my $pull? If so, I've got the same error
[16:15:48] <joannac> No, without using a pull
[16:15:55] <joannac> just an update with elemMatch
[16:16:13] <tiller> ah
[16:16:17] <tiller> Lemme try :)
[16:17:31] <joannac> trueneu: erm, both secondaries have the same data and both were synced from scratch from the primary?
[16:17:47] <trueneu> Not sure about the second secondary
[16:17:52] <joannac> trueneu: secondary has a strict subset of the primary's data?
[16:18:25] <trueneu> I've checked the other secondary, the one I did initial sync for -- it synced from the second secondary :)
[16:18:39] <trueneu> joannac, yes, a strict one.
[16:19:12] <tiller> joannac> I think it won't work. Because if a subscription already exists for the (user, entity) but without the new tag I want to insert, the search query will fail, and I'll upsert a whole new thing. Don't I?
[16:19:23] <trueneu> I still do not understand why the replication works, but I guess I will just bring down both secondaries and do 2 initial syncs.
[16:21:58] <trueneu> Thanks for your help.
[17:12:39] <harrymoore> hi, all. can someone tell me if there is way to use $pull to remove a single matching array element? it looks like it pulls all matching elements
[17:22:42] <Joeskyyy> harrymoore: Could probably use the $ position operator, assuming you're wanting to remove the first occurence
[17:24:06] <harrymoore> @joeskyy: yes i want to remove the first (or any really) matching value
[17:29:18] <Joeskyyy> even with that though, it'd be setting it to a new value :\ there's no way to pop it out from the looks of it
[17:29:22] <Joeskyyy> http://docs.mongodb.org/manual/reference/operator/update/positional/
[17:29:32] <Joeskyyy> But that's what I was referencing, in case you're interested.
[17:30:05] <Joeskyyy> Otherwise you may want to have an array that each element has something like an id of sorts, and the content.
[17:30:15] <Joeskyyy> That way you can use pull on the id field
[17:30:48] <Joeskyyy> i.e. if it were an array of comments, each new comment has the comment field, and the id field (incremented by one)
[17:35:06] <harrymoore> thx @joeskyyy! i will just have to handle it in app code (java) instead: read the doc, remove the element in question and $set the entire array
[17:36:28] <Joeskyyy> no prob
[17:44:44] <NyB> Hi, does anyone have any tips for speeding up the Java driver? It seems that it has become the bottleneck in my application...
[17:48:28] <NyB> it just doesn't seem to be able to feed my application fast enough :-/
[17:49:16] <Joeskyyy> maybe harrymoore has something on that :D he was just talking about the java driver
[17:50:27] <NyB> Joeskyyy: :-)
[17:52:32] <NyB> I am thinking about using two or more driver threads in parallel, but I was hoping to avoid the complexity involved if there is a more standard way.
[17:53:23] <harrymoore> @Ny8: you can distribute your read requests if you have a replica set. the java driver is smart enough to read from the fastest responding node
[17:53:39] <harrymoore> you may have to set slave ok or something
[17:53:40] <NyB> as it is, my DB input thread seems to be spending 70% of its time in BasicBSONDecode.decode() :-/
[17:54:37] <NyB> harrymoore: the server is not the problem - the limiting factor seems to be the fact that I am reading a collection from a single thread
[17:56:33] <NyB> I believe I cannot use the result of find() safely from parallel threads, right? I could not find any relevant synchronization calls in the source coe...
[17:56:34] <harrymoore> i haven't done any multi-threaded work. assume the drive is ok with it though
[17:57:27] <kali> NyB: i would rather run several find() in parallel than sharing a cursor
[17:57:42] <kali> NyB: can you shard you find() somehow ? some kind of natural pagination ?
[17:58:27] <kali> i assume you're fetching a significant amount of data from a collection, right ?
[17:59:05] <NyB> kali: about 1,5 million documents sorted by a counter field.
[17:59:40] <NyB> kali: for my tests that is, the real thing will fetch quite a bit more
[18:00:07] <NyB> each document is a single object with ~40 fields
[18:00:16] <kali> you need all the fields ?
[18:00:42] <NyB> kali: unfortunately yes...
[18:01:09] <kali> there is one thing to be aware of, also. if your database is accessed by real time stuff (like a web app) at some point you'll manage to move the bottleneck to mongodb, and you web app will die
[18:01:51] <kali> sounds tricky
[18:01:57] <kali> you need the documents in order ?
[18:02:24] <NyB> kali: yes, the order is important
[18:02:55] <kali> then you'll need to reorder them once decoded before pushing them through the main thing
[18:04:00] <NyB> kali: hmm... are you suggesting fetching them out-of order from several threads and then sorting them?
[18:06:05] <kali> well whatever you do, if you decode them in parallel, you'll get them in disorder, at least locally
[18:08:43] <NyB> kali: I guess part of my problem is that the bottleneck does not lie on the MongoDB server - if it did I would have a couple of ways to handle it... as it is mongod consumes about 30% of a single CPU core and the I/O is nowhere near saturation yet...
[18:09:26] <NyB> kali: it seems that I will have to split the collection somehow...
[18:10:15] <NyB> kali: thanks for the tips (and for being a sounding board) :-)
[18:17:47] <NyB> kali: is there a way to have the server add an "order" field automatically after sorting? something that I could use with $mod to split the output?
[18:23:24] <kali> NyB: i don't think so
[18:27:42] <NyB> kali: yeah, I did not really believe that such a thing would exist...
[19:28:04] <ekristen> looking for some advice for hosting mongodb in aws, I’ve got in development 400-500k documents right now, looking to be in the 10s of millions in a few months
[19:28:38] <ekristen> looking for advice on aws instance sizes to start out on
[19:31:52] <harrymoore> @NyB if you're still there here is a gist showing threaded access to the driver. I don't write threaded code so take it for what it's worth: https://gist.github.com/harrymoore/8604212#file-app-java you could add a synchronized data structure for defining which thread will process which records
[19:32:33] <cheeser> what's all that for?
[19:50:43] <ekristen> quiet in here
[19:55:54] <cheeser> we're all busy refreshing gmail trying to get it load
[20:00:39] <whaley> I missed something... what's wrong with just setting connectionsPerHost on MongoClient?
[20:01:18] <whaley> NyB: ^
[20:46:04] <module000> ekristen: late reply...but i host in AWS with mongo. you need to look at the size of your data sets, and pick accordingly. for example, i have a 60GB data set, so i host it on a m2.4xlarge, which has 68GB of memory. if performance is less important, scale down your instance to include less RAM
[20:46:59] <ekristen> module000: my data set is currenly only like 6gb on disk, thats roughly 1GB per 100k documents we are storing at the moment
[20:47:16] <ekristen> module000: sounds like you aren’t using a replicaset either?
[20:47:45] <module000> ekristen: it's sharded on a m1.large, but that shard only exists to mitigate an outage if the primary shard fails
[20:48:21] <module000> ekristen: if you only have 6gb,, you could use a m3.large (7.5GB memory), and make a much smaller instance to shard it with, so if you have an outage your tiny node keeps you from having an outage (albeit you move slower)
[20:50:32] <ekristen> maybe I misunderstand the term shard, as what you are saying doesn’t line up with my understanding of replicasets and sharding
[20:51:06] <module000> ekristen: i'm not using replica sets, i'm using a sharded cluster deployment
[20:51:28] <module000> ekristen: you could do the same with replica sets though, just elect a smaller instance to serve as your secondary
[20:52:45] <ekristen> ah ok
[21:06:00] <luimemee> hello
[21:06:45] <ekristen> so module000 how are you running 3 config servers? and mongos?
[21:06:55] <luimemee> if have a collections and each elements have a tags list. how can i have a list of all the tags on all the element please ?
[21:09:02] <kali> luimemee: look for "distinct" in the documentation
[21:09:43] <luimemee> kali, thanks !
[21:11:38] <luimemee> kali, this is magic !
[21:13:23] <cheeser> https://www.youtube.com/watch?v=e7mmrF-4rUE
[21:31:58] <beardage> hi, how can I tell mongodb in python not to index my data im importing. Thing is that it's OOM while importing millions of records.
[21:46:17] <luimemee> this should work ? db.runCommand ( { distinct: "mycollection", key: "tags", query: {'tags':{$regex:'*toto*'}}} )
[22:27:22] <the-erm> I need some advice, is there a better way to do this? http://pastie.org/8664985 I'm trying to store the hours of business so it can be looked up, but the challenge is hours of operation are from 12pm-2am the next day.
[22:31:16] <the-erm> I was thinking I could possibly store the values 'Monday': [ [0,1], [1200,2359] ] but I'm not sure how to write a query for that.
[23:16:49] <unholycrab> anyone use rockmongo? i want to know how to remove the "drop database" link
[23:41:40] <JeffC_NN> unholycrab: If you're comfortable editing the PHP, edit app/controllers/db.php and remove the links/capabilities
[23:43:57] <JeffC_NN> (for very easy edits, just search the file for "drop" and add a simple "return;" at the beginning of functions named things like doDropDatabase() (line 488) and doDropDbCollections() (line 451) or whatever you want to disable. Pretty easy.
[23:47:32] <unholycrab> thanks JeffC_NN