[01:18:00] <bmw0679> I've run into a write lock problem because of lack of disk space. This prevents me from running db.dropDatabase() which would solve my disk space problem. If I remove the database files manually from the file system would this cause a problem?
[02:35:19] <Semor> darius93:if not use a ram cache ,my application may be some slower
[02:35:20] <darius93> well you can go on and try mongo out if you want. check out that link to see the comparison between the two
[02:35:37] <darius93> you can store the data in memory and save it to the db
[02:35:49] <darius93> but i wouldnt recommend saving it every second unless your server is that fast
[02:35:58] <darius93> and a native application being slow?
[02:36:29] <Semor> darius93:yes ,but if I use mongodb ,how is the operation procedure?
[02:37:02] <Semor> darius93:I do not know whether mongodb implement db cache in ram
[02:37:44] <Semor> And one mongodb sql operation whether directly operate on the db disk ?
[02:41:53] <darius93> well i havent used mongo in C/C++ yet but overall e.g INSERT INTO EXAMPLE ('section') values ('data'); would be like doing db.example.insert({section:'data'});
[02:44:22] <darius93> Semor: if you dont mind me asking, what type of app are you building?
[03:03:36] <Semor> darius93:I am building a network server ,receiving client's operations and calculating results , then saving to db
[06:42:53] <b0ss_> so mongodb does not implement a many-to-many relationship in something similar to the JOIN TABLES, right? It just place the other object's ID onto the required object ?
[06:46:41] <b0ss_> However, Managing these relationships is the difficult part. Say, if you delete one side you have to manually update the other side. Right ?
[07:39:52] <omid8bimo> i need some guidance on something. i have 3 mongodb servers in my replica set, 1 master mongodb, 1 slave mongodb and one as arbiter; my slave server crashed yerterday and after lots of file system repair, server become ok but mongodb couldnt start, so i removed the /var/lib/mongodb, created an empty directory for it and started the service to get the data back from master, but only half of the data got
[07:39:53] <omid8bimo> replicated. is that normal?
[08:42:22] <Gargoyle> omid8bimo: Nope. Sound like the server is stull fubar.
[08:48:07] <omid8bimo> Gargoyle: could you explain more?
[08:48:40] <omid8bimo> first of all, how can i know how much data has been saved in oplog? so that i can understand when i will start overwriting
[08:49:16] <omid8bimo> second of all, why mongodb would not replicate the whole dataset when a fresh mongo instance joined the replicaset?
[08:55:49] <Gargoyle> omid8bimo: I can't think of any reason why under a normal configuration the whole data would not be sync'd. Are there any errors in the logs?
[08:56:10] <Gargoyle> Dunno about checking the oplog size, but don't think that would make any difference ether.
[08:57:20] <omid8bimo> Gargoyle: no errors. db.printSlaveReplicationInfo() on master said secondary is 20 hours behind
[08:57:50] <omid8bimo> log length start to end: 141318secs (39.26hrs)
[08:58:14] <omid8bimo> does this mean my oplog for keep around 40 hours of data?
[08:59:34] <omid8bimo> Gargoyle: master db db.printSlaveReplicationInfo() says syncedTo: 1 seconds ago, but data on secondary that came up like 1 hour ago is still 30 GB behind master!
[09:04:41] <Gargoyle> What does printSlaveReplication() on the slave say?
[09:10:36] <kali> omid8bimo: what makes you think only half the data got there ?
[09:10:58] <kali> omid8bimo: have you checked the database from inside, or just look at the file size ?
[09:12:26] <omid8bimo> becase of two factors, one is that total size of /var/lib/mongodb is different on both servers. (secondary is aroudn 20GB less) and also, based on my collections, i have kcollection.48 on master but only kcollection.43 on secondary
[09:13:27] <Gargoyle> omid8bimo: Don't think you can count disk space size. Your master will probably have "holes" in it's data files, your secondary will be nice and neat to start with.
[09:14:16] <omid8bimo> Gargoyle: so to be sure that my data on secondary is up to date, can i do this?
[09:14:23] <omid8bimo> remove the /var/lib/mongodb
[09:18:06] <kali> "kcollection" is a database name, it needs 48 data "segments" on your primary
[09:18:28] <kali> but it will very likely require less when synced on your secondary
[09:18:37] <Gargoyle> omid8bimo: Perhaps you should stop looking at the files and use the shell to check things and leave mongo to "do its thing" with your files!
[09:33:42] <Nodex> max throughput would be 30 mins
[09:33:50] <Nodex> in reality it's closer to an hour
[09:35:22] <Gargoyle> Other factors will make a difference, disk speed, other network traffic, etc. But theiretically, your two machines should be about to shove 180GB down the wire in about 30 mins (theoretical max).
[09:36:41] <omid8bimo> ok i wanna do something for test. i just stopped secondary, renamed /var/lib/mongodb, created an empty folder with permissions set, started and now it start syncing data from master
[09:36:56] <omid8bimo> but i dont see any recovering state in rs.status()
[09:37:24] <kali> it might be "initial sync" or something like that
[09:38:39] <omid8bimo> even i tried grep on mongodb.log for ip address of master, and im getting liek 10 connections per 5 mins
[09:38:48] <balboah> also building indexes will pause replication
[09:39:05] <Nodex> indexes get built after the sync
[09:39:14] <balboah> well before it becomes ready at least :)
[09:39:54] <Nodex> if the replication has high traffic there will be a lock while indexes are built stopping new data arriving which will resync once the index has built
[09:41:40] <omid8bimo> ok fair enough. i have MMS service configured on my servers. which graph on secondary host i should pey attention for the progress of replication/sync?
[12:28:23] <byalaga> does that mean my oplog on this server can hold 62hrs of oplog?
[12:28:24] <kali> gnagno: heteogeneous replica sets are meant for version migration only. they will work for some combination but it is not recommended, and not supported as far as i know
[12:28:48] <kali> gnagno: with such a wide range (1.8 to 2.4) i would not even think about it
[12:30:29] <byalaga> how can we say that?? the host may have 5operation in 1 sec sometimes.. and 50operations sometimes..
[12:30:53] <gnagno> kali, thank you, actually version migration is what I need, so you suggest me to migrate first from 1.8 to 2.0 then from 2.0 to 2.4 ?
[12:31:55] <kali> gnagno: look at the release notes for the release branch (2.0, 2.2, and 2.4). they will tell you for sure which one can be skipped. Don't forget 2.2
[12:32:26] <byalaga> kali: how can we say that exactly?? the host may have 5operation in 1 sec sometimes.. and 50operations sometimes..
[12:32:26] <byalaga> if the latest operation on my secondary may/may not be present on the primary
[12:34:47] <kali> byalaga: it's just based on what happened in the past. it holds 62 hours of backlog, but if your load change, it will obviously cover less time
[12:36:43] <byalaga> kali: my oplog size on all nodes is 2048MB. for how log can i keep my secondary node down for backups? so that.. if i bring it back it should be able to catch up with primary?
[12:38:32] <kali> byalaga: well, assuming your load does not change suddenly, about 62 hours
[12:39:57] <byalaga> "oplog first event time" on primary is not changing.. does that mean.. oplog is not sliding now?
[12:41:45] <kali> that's a bit weird... unless it is actually the beginning of the existence of the primary and every write since is fitting in the oplog
[12:42:24] <byalaga> exactly, so it is not yet filled and not sliding?
[12:47:59] <byalaga> kali: any idea? how to calculate it from the above results?
[12:48:08] <kali> count is the count of documents, size and storageSize are quite close... so i'm not sure
[12:48:29] <gnagno> quick question: assume that I have a mongo server with a lot of data in it already running, if I create a replica set will it automatically migrate all the data also?
[12:49:58] <kali> gnagno: it's a relatively painless and well documented procedure: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
[12:53:33] <_pash> how long would it take to run a search through 12000 entries and mongo and whats the best way of finding a match?
[13:00:56] <Nodex> normally about the same length as a piece of string
[13:01:57] <_pash> Nodex: it would literally be like 12000 entries of usernames that still have not registered, and as soon as the user types that username, i need to check whether it is or not registered
[13:03:25] <kali> less than a tenth of milliseconds if you have an index on the username and it fits in RAM
[13:03:25] <Nodex> sometimes it can be shorter than the string but sometimes it's the same
[13:04:21] <_pash> kali: how do i make sure that its in ram?
[15:42:20] <trueneu> Hi. Is it possible that PRIMARY and SECONDARY servers in the same replica set have different collections in the same db?
[15:42:52] <trueneu> No replication lag whatsoever. I did an initial sync for the SECONDARY server and it doesn't seem to be synced.
[15:47:42] <joannac> Unlikely, but anything's possible
[15:48:35] <cheeser> that'd be a pretty colossal bug...
[15:49:54] <joannac> I would say more likely you hit some problem, possibly corruption, or maybe you're not really up to date
[15:50:42] <trueneu> Right now it's 2.0.something version, probably there _was_ such a bug?
[15:51:39] <trueneu> I actually have 3 servers in this repl set, PRIMARY and one SECONDARY are 2.0.smth, and the second SECONDARY is 2.4.smth. Both secondaries have the same data sets
[15:52:00] <trueneu> That differ from the set that is on PRIMARY
[15:53:03] <trueneu> And rs.status() doesn't show anything abnormal
[16:01:08] <tiller> Hey, I've some difficulties with an upsert, can someone have a look? :/
[16:19:12] <tiller> joannac> I think it won't work. Because if a subscription already exists for the (user, entity) but without the new tag I want to insert, the search query will fail, and I'll upsert a whole new thing. Don't I?
[16:19:23] <trueneu> I still do not understand why the replication works, but I guess I will just bring down both secondaries and do 2 initial syncs.
[17:12:39] <harrymoore> hi, all. can someone tell me if there is way to use $pull to remove a single matching array element? it looks like it pulls all matching elements
[17:22:42] <Joeskyyy> harrymoore: Could probably use the $ position operator, assuming you're wanting to remove the first occurence
[17:24:06] <harrymoore> @joeskyy: yes i want to remove the first (or any really) matching value
[17:29:18] <Joeskyyy> even with that though, it'd be setting it to a new value :\ there's no way to pop it out from the looks of it
[17:29:32] <Joeskyyy> But that's what I was referencing, in case you're interested.
[17:30:05] <Joeskyyy> Otherwise you may want to have an array that each element has something like an id of sorts, and the content.
[17:30:15] <Joeskyyy> That way you can use pull on the id field
[17:30:48] <Joeskyyy> i.e. if it were an array of comments, each new comment has the comment field, and the id field (incremented by one)
[17:35:06] <harrymoore> thx @joeskyyy! i will just have to handle it in app code (java) instead: read the doc, remove the element in question and $set the entire array
[17:52:32] <NyB> I am thinking about using two or more driver threads in parallel, but I was hoping to avoid the complexity involved if there is a more standard way.
[17:53:23] <harrymoore> @Ny8: you can distribute your read requests if you have a replica set. the java driver is smart enough to read from the fastest responding node
[17:53:39] <harrymoore> you may have to set slave ok or something
[17:53:40] <NyB> as it is, my DB input thread seems to be spending 70% of its time in BasicBSONDecode.decode() :-/
[17:54:37] <NyB> harrymoore: the server is not the problem - the limiting factor seems to be the fact that I am reading a collection from a single thread
[17:56:33] <NyB> I believe I cannot use the result of find() safely from parallel threads, right? I could not find any relevant synchronization calls in the source coe...
[17:56:34] <harrymoore> i haven't done any multi-threaded work. assume the drive is ok with it though
[17:57:27] <kali> NyB: i would rather run several find() in parallel than sharing a cursor
[17:57:42] <kali> NyB: can you shard you find() somehow ? some kind of natural pagination ?
[17:58:27] <kali> i assume you're fetching a significant amount of data from a collection, right ?
[17:59:05] <NyB> kali: about 1,5 million documents sorted by a counter field.
[17:59:40] <NyB> kali: for my tests that is, the real thing will fetch quite a bit more
[18:00:07] <NyB> each document is a single object with ~40 fields
[18:01:09] <kali> there is one thing to be aware of, also. if your database is accessed by real time stuff (like a web app) at some point you'll manage to move the bottleneck to mongodb, and you web app will die
[18:01:57] <kali> you need the documents in order ?
[18:02:24] <NyB> kali: yes, the order is important
[18:02:55] <kali> then you'll need to reorder them once decoded before pushing them through the main thing
[18:04:00] <NyB> kali: hmm... are you suggesting fetching them out-of order from several threads and then sorting them?
[18:06:05] <kali> well whatever you do, if you decode them in parallel, you'll get them in disorder, at least locally
[18:08:43] <NyB> kali: I guess part of my problem is that the bottleneck does not lie on the MongoDB server - if it did I would have a couple of ways to handle it... as it is mongod consumes about 30% of a single CPU core and the I/O is nowhere near saturation yet...
[18:09:26] <NyB> kali: it seems that I will have to split the collection somehow...
[18:10:15] <NyB> kali: thanks for the tips (and for being a sounding board) :-)
[18:17:47] <NyB> kali: is there a way to have the server add an "order" field automatically after sorting? something that I could use with $mod to split the output?
[18:27:42] <NyB> kali: yeah, I did not really believe that such a thing would exist...
[19:28:04] <ekristen> looking for some advice for hosting mongodb in aws, I’ve got in development 400-500k documents right now, looking to be in the 10s of millions in a few months
[19:28:38] <ekristen> looking for advice on aws instance sizes to start out on
[19:31:52] <harrymoore> @NyB if you're still there here is a gist showing threaded access to the driver. I don't write threaded code so take it for what it's worth: https://gist.github.com/harrymoore/8604212#file-app-java you could add a synchronized data structure for defining which thread will process which records
[20:46:04] <module000> ekristen: late reply...but i host in AWS with mongo. you need to look at the size of your data sets, and pick accordingly. for example, i have a 60GB data set, so i host it on a m2.4xlarge, which has 68GB of memory. if performance is less important, scale down your instance to include less RAM
[20:46:59] <ekristen> module000: my data set is currenly only like 6gb on disk, thats roughly 1GB per 100k documents we are storing at the moment
[20:47:16] <ekristen> module000: sounds like you aren’t using a replicaset either?
[20:47:45] <module000> ekristen: it's sharded on a m1.large, but that shard only exists to mitigate an outage if the primary shard fails
[20:48:21] <module000> ekristen: if you only have 6gb,, you could use a m3.large (7.5GB memory), and make a much smaller instance to shard it with, so if you have an outage your tiny node keeps you from having an outage (albeit you move slower)
[20:50:32] <ekristen> maybe I misunderstand the term shard, as what you are saying doesn’t line up with my understanding of replicasets and sharding
[20:51:06] <module000> ekristen: i'm not using replica sets, i'm using a sharded cluster deployment
[20:51:28] <module000> ekristen: you could do the same with replica sets though, just elect a smaller instance to serve as your secondary
[21:31:58] <beardage> hi, how can I tell mongodb in python not to index my data im importing. Thing is that it's OOM while importing millions of records.
[21:46:17] <luimemee> this should work ? db.runCommand ( { distinct: "mycollection", key: "tags", query: {'tags':{$regex:'*toto*'}}} )
[22:27:22] <the-erm> I need some advice, is there a better way to do this? http://pastie.org/8664985 I'm trying to store the hours of business so it can be looked up, but the challenge is hours of operation are from 12pm-2am the next day.
[22:31:16] <the-erm> I was thinking I could possibly store the values 'Monday': [ [0,1], [1200,2359] ] but I'm not sure how to write a query for that.
[23:16:49] <unholycrab> anyone use rockmongo? i want to know how to remove the "drop database" link
[23:41:40] <JeffC_NN> unholycrab: If you're comfortable editing the PHP, edit app/controllers/db.php and remove the links/capabilities
[23:43:57] <JeffC_NN> (for very easy edits, just search the file for "drop" and add a simple "return;" at the beginning of functions named things like doDropDatabase() (line 488) and doDropDbCollections() (line 451) or whatever you want to disable. Pretty easy.