PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 1st of January, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:04:43] <pbryan> On 2.6.6, I seem to be successfully using an index with multiple keys, two of which contain arrays of strings. This seems to contradict the documentation. Query plan explains I am using the index. How is this working?
[00:06:50] <morenoh149> mordof: the docs are pretty good. I think you store your geo data as normal and then use a 2d/sphere index to make looking them up faster
[00:09:15] <Boomtime> pbryan: can you provide a query with explain as a gist/pastebin?
[00:09:23] <pbryan> Boomtime: Sure!
[00:09:42] <Boomtime> it is possible to use each field as an array individually, just not both in the same document
[00:10:10] <pbryan> Boomtime: That's what the doc says...
[00:11:03] <Boomtime> also, can you provide the query, and output showing the document which violates this constraint
[00:11:46] <pbryan> Query: http://pastie.org/private/xxtzydfmaq5lmgm0r9qw
[00:12:28] <pbryan> Explain: http://pastie.org/private/o1rxn2q5gjozndzn92ag
[00:13:44] <mordof> morenoh149: hmm. i just need to learn the proper way to store geo data then, heh
[00:16:31] <Boomtime> pbryan: your "hashtags" match is exact, it's not an array query
[00:16:52] <pbryan> Okay, query:
[00:16:54] <pbryan> http://pastie.org/private/db0alip7trjsndxk4srta#21
[00:17:09] <pbryan> Explain:
[00:17:26] <Boomtime> yep, the query is fine either way, you'll never have trouble with that part
[00:17:30] <pbryan> http://pastie.org/private/dvlfelrd9puveggrj59ta
[00:17:40] <pbryan> Result:
[00:17:50] <pbryan> { "_id" : 2, "text" : "This is a test of the emergency broadcast system. #fail #zoolander #yahoo", "keywords" : [ "test", "emergency", "broadcast", "system", "fail" ], "hashtags" : [ "fail", "zoolander", "yahoo" ], "language" : "en", "geojson" : { "type" : "Point", "coordinates" : [ -123.14489, 49.30452 ] }, "place" : "Stanley Park", "time" : ISODate("2014-12-30T12:00:00Z") }
[00:17:58] <pbryan> So, it's clearly using the index.
[00:18:04] <pbryan> And there are two arrays in the query.
[00:19:08] <mordof> i'm seeing a lot of lat/lon used with the more recent mongodb indexes - i don't want earth lat/lon coordinates, is it still possible for me to use newer indexing?
[00:21:48] <pbryan> According to the docs, with an index on two fields, you can't even insert if both fields have arrays.
[00:22:04] <pbryan> If you attempt to insert such a document, MongoDB will reject the insertion, and produce an error that says cannot index parallel arrays.
[00:22:15] <pbryan> (last line is a quote from http://docs.mongodb.org/manual/core/index-multikey/)
[00:22:37] <Boomtime> yes, that is normally true, not sure yet what is going on in your case
[00:22:50] <pbryan> It's magic!
[00:22:52] <pbryan> :-)
[00:25:38] <Boomtime> holy cow.. the 2dsphere seems to cause it to go blind to stopping parallel array indexing
[00:25:43] <pbryan> lol
[00:25:55] <pbryan> So, if I didn't have the 2dsphere index, it would reject the insert...
[00:25:59] <pbryan> Let me try that. :-)
[00:26:33] <Boomtime> if you index without the 2dsphere the index is rejected if the document exists already, and the insertion is rejected if the collection was empty first
[00:26:57] <pbryan> Yep.
[00:27:14] <pbryan> Cannot index parallel arrays.
[00:27:26] <Boomtime> you should raise a server ticket
[00:27:32] <pbryan> Okay, so a bug (feature?!) in MongoDB w.r.t. 2dsphere index!
[00:27:34] <Boomtime> that does not look right to me
[00:28:03] <Boomtime> the number of index buckets would be exploding.. check your index size
[00:28:47] <pbryan> I would imagine the index would get crazy big, Cartesian product.
[00:31:47] <pbryan> Thanks for the help Boomtime.
[00:33:25] <mordof> trying to find out if i can use the newer mongo 2dsphere / geojson indexes with coordinates that aren't lat/lon, and shouldn't be calculated based on it being on earth... or .. whatever it does
[00:33:32] <mordof> or if i even need to
[00:34:02] <mordof> the "legacy" 2d indexes would seem to be what i actually want, but some articles have suggested those might be getting removed since they're being labelled as legacy
[00:34:03] <Boomtime> pbryan: will you raise a server ticket? alternatively, do you mind if i use your data with minor changes to raise one?
[00:34:52] <Boomtime> p.s the time field is irrelevant
[00:36:48] <pbryan> Boomtime: I'll be happy to open a ticket.
[00:37:13] <pbryan> Yeah, I experimented with dropping the time field as well.
[00:39:42] <mordof> hmm
[00:39:58] <mordof> seems like i'm just gonna have to deal with my coordinates like regular integers instead
[00:48:06] <pbryan> Boomtime: https://jira.mongodb.org/browse/SERVER-16704
[00:50:54] <Boomtime> thanks
[00:52:03] <Boomtime> i fixed the formatting.. though i see you added a comment about it too :p
[00:54:19] <pbryan> Heh, how did you edit it?
[00:54:23] <pbryan> I couldn't find a way.
[00:54:34] <pbryan> I'm not exactly a JIRA expert though. ;-)
[00:57:56] <Boomtime> um.. you can't.. i can
[00:58:13] <pbryan> lol
[00:58:22] <pbryan> Ah, the force is strong with you. :-)
[00:58:34] <pbryan> Thanks again.
[00:58:52] <Boomtime> also, the query must contain the geo part of the index won't be used, no matter how otherwise applicable it might be
[00:58:59] <Boomtime> of = or
[01:02:40] <pbryan> Ah.
[01:03:23] <pbryan> I'm assuming it's smart enough to use an index if high order keys are relevant.
[01:03:36] <pbryan> In other words, if I used the geo without the other fields, it would prefer the index.
[03:15:46] <jlolofie> I'm importing a large dataset; mongoimport doesnt report any errors and keeps priting the progress as if things are going fine
[03:15:59] <jtal> but num of docs in the collection stopped
[03:16:02] <jtal> a little over 1 million
[03:16:38] <jtal> anyone know why it stopped inserting?
[03:22:59] <jtal> the collection is 1.5 GB at this point
[03:23:04] <jtal> and it is not growing in size
[03:23:14] <jtal> despite mongoimport continuing to report that it is doing work
[10:33:32] <jtal> happy new year
[10:45:30] <jukee> how i can save my complete db info and move to other server with mogno db ?
[10:58:41] <jukee> hmmM?
[11:00:42] <jukee> fuck this channel , shame on you 2015
[11:01:34] <BurtyB> *yawn*
[11:59:00] <dimon222> happy new code
[12:06:17] <gundas> Hi all, I have one node js application which is generating random data and putting it into a capped collection and a uncapped collection. I also have another node js application which is reading from the same database and tailing the capped collection and then sending data through sockets to the ui. I;m using mongoose but confused about recreating the schema any ideas on the best way to set this up?
[12:07:14] <dimon222> two arrays maybe
[12:08:11] <Tailor> hi
[12:08:13] <Tailor> Social-networking: Hadoop, HBase, Spark over MongoDB or Postgres? - http://stackoverflow.com/q/27730628
[12:09:46] <gundas> dimon222: what do you mean two arrays?
[15:38:10] <capleton> Hey, does anyone here have experience using "subdocuments?"
[15:50:36] <cheeser> ask your question and see if anyone knows.
[19:25:15] <Industrial> Hi. Using the NodeJS driver for mongodb, can I get an update event from when something from selected collections update?
[19:25:42] <Industrial> I would like to create a mechanism for a browser to subscribe to a collection or model for updates, and gets this streamed through a websocket.
[19:26:10] <Industrial> Is this feasible?
[19:31:28] <Industrial> or would that be totally conter-intuitive to scaling?
[19:59:14] <capleton> I'd like to create new subdocuments using mongo, but when I try to use insert() with $set, i can only manage to overwrite the existing subdocument. Can someone point me in the right direction?
[20:21:51] <polhemic> I have a random and possibly noobish questions about journal files. Why do mine persist for long (hours) periods of time?#
[20:22:23] <polhemic> I thought they only needed to exist until the change were applied to the shared view and remapped every minute or so.
[20:22:40] <polhemic> ^ changes
[20:23:38] <polhemic> But for me it keeps growing - currently the journals are twice the size of the sum of the *.[012] files
[20:46:05] <edrocks> polhemic: mongodb allocates twice the amount of your current data size
[20:46:25] <edrocks> polhemic: and every time you hit the amount it allocated it doubles again
[20:46:52] <polhemic> even in the journal files? They appear to be slowly increasing in size
[20:47:30] <polhemic> but I was expecting them to hit a timeout or threshold and then drop to zero again when the entries were commited to the database files
[20:47:48] <polhemic> Current sizes are :
[20:47:49] <polhemic> 4 /var/lib/mongodb/_tmp
[20:47:49] <polhemic> 65540 /var/lib/mongodb/eve.0
[20:47:49] <polhemic> 131076 /var/lib/mongodb/eve.1
[20:47:49] <polhemic> 262148 /var/lib/mongodb/eve.2
[20:47:52] <polhemic> 16384 /var/lib/mongodb/eve.ns
[20:47:54] <polhemic> 789188 /var/lib/mongodb/journal
[20:47:57] <polhemic> 65540 /var/lib/mongodb/local.0
[20:47:59] <polhemic> 16384 /var/lib/mongodb/local.ns
[20:48:02] <polhemic> 4 /var/lib/mongodb/mongod.lock
[20:48:04] <polhemic> in kB
[20:48:11] <polhemic> The journal directory started at 0 and has been climbing slowly
[20:48:25] <edrocks> not exactly an expert on this but i will see if i can find the right docs about the files doubling i think they should have more info
[20:50:00] <edrocks> polhemic: try http://docs.mongodb.org/manual/faq/storage/#faq-disk-size
[20:55:46] <polhemic> thanks, edrocks, I'll look at it now
[21:05:30] <polhemic> There's nothing very helpful on that page. Following through to the journaling page, there's the sentence "Once MongoDB applies all the write operations in a particular journal file to the database data files, it deletes the file, as it is no longer needed for recovery purposes."
[21:05:46] <polhemic> Which, to me indicates that the journal file shouldn't be there
[21:05:53] <polhemic> for very long, at least.
[21:11:14] <polhemic> So, my concern is that something I'm doing client side is stopping the journal getting processed and removed after the write operation is complete
[21:18:05] <Boomtime> polhemic: what is the problem you are observing? (i did not see the start of the conversation)
[21:19:43] <polhemic> no problem as such, just concerned about the size of my journal directory continuously climbing.
[21:20:32] <polhemic> I'm running a write heavy app, and the journal just keeps growing - it's at almost 1GB after 3 hours of runtime
[21:21:23] <polhemic> Once the writes have gone through to the database files, I was expecting it to drop in size - containing only the very latest changes that haven't gone into the shared model yet
[21:26:20] <Boomtime> polhemic: afaik journal files are constantly recycled while mongodb is running
[21:26:39] <polhemic> My current db directory looks like this:
[21:26:40] <polhemic> 4 /var/lib/mongodb/_tmp
[21:26:40] <polhemic> 65540 /var/lib/mongodb/eve.0
[21:26:40] <polhemic> 131076 /var/lib/mongodb/eve.1
[21:26:40] <polhemic> 262148 /var/lib/mongodb/eve.2
[21:26:43] <polhemic> 524292 /var/lib/mongodb/eve.3
[21:26:45] <polhemic> 16384 /var/lib/mongodb/eve.ns
[21:26:48] <polhemic> 1015988 /var/lib/mongodb/journal
[21:26:50] <polhemic> 65540 /var/lib/mongodb/local.0
[21:26:53] <polhemic> 16384 /var/lib/mongodb/local.ns
[21:26:55] <polhemic> 4 /var/lib/mongodb/mongod.lock
[21:27:40] <polhemic> journal just keeps going up and, because I've got an update heavy dataflow, the db size will asymtote to a final value, but the journal will just keep growing and growing
[21:27:41] <Boomtime> yeah, that all looks normal
[21:28:01] <Boomtime> ok, it continues to climb now?
[21:28:21] <polhemic> yup
[21:28:30] <Boomtime> as in.. if you pause writes for a minute or two, then start writing again, it allocates more journal?
[21:28:47] <polhemic> -rw------- 1 mongodb nogroup 1058013184 Jan 1 21:27 j._0
[21:29:02] <polhemic> -rw------- 1 mongodb nogroup 1059569664 Jan 1 21:27 j._0
[21:29:21] <polhemic> just those few seconds and it's grown by 1.5MB
[21:29:34] <polhemic> (ish)
[21:29:36] <Boomtime> wait, that is a single file growing in size? not new files?
[21:29:44] <polhemic> yup
[21:30:32] <polhemic> Is there something I should be doing client side to periodically close and reopen or flush my connections?
[21:30:48] <Boomtime> no
[21:31:29] <polhemic> As it stands, I'm creating a persistent pymongo.MongoClient(), then I just keep calling update() on a collection (with upsert true)
[21:31:49] <Boomtime> is this a replica-set?
[21:32:06] <polhemic> It's just hit a limit and started a second journal file.
[21:32:11] <polhemic> -rw------- 1 mongodb nogroup 1073782784 Jan 1 21:28 j._0
[21:32:11] <polhemic> -rw------- 1 mongodb nogroup 11067392 Jan 1 21:30 j._1
[21:32:54] <Boomtime> i expected it to do that much sooner
[21:33:46] <polhemic> I've just tried closing my client and the journals haven't been removed. I'm just going to restart mongod to check they get removed on shutdown.
[21:34:12] <Boomtime> also, do a permissions list of the main db files
[21:35:11] <polhemic> Every 2.0s: sudo ls -lR /var/lib/mongodb/ Thu Jan 1 21:33:25 2015
[21:35:14] <polhemic> /var/lib/mongodb/:
[21:35:16] <polhemic> total 1081372
[21:35:19] <polhemic> -rw------- 1 mongodb nogroup 67108864 Jan 1 21:31 eve.0
[21:35:21] <polhemic> -rw------- 1 mongodb nogroup 134217728 Jan 1 21:31 eve.1
[21:35:24] <polhemic> -rw------- 1 mongodb nogroup 268435456 Jan 1 21:31 eve.2
[21:35:26] <polhemic> -rw------- 1 mongodb nogroup 536870912 Jan 1 21:31 eve.3
[21:35:29] <polhemic> -rw------- 1 mongodb nogroup 16777216 Jan 1 21:30 eve.ns
[21:35:31] <polhemic> drwxr-xr-x 2 mongodb mongodb 4096 Jan 1 21:32 journal
[21:35:34] <polhemic> -rw------- 1 mongodb nogroup 67108864 Jan 1 21:32 local.0
[21:35:36] <polhemic> -rw------- 1 mongodb nogroup 16777216 Jan 1 21:32 local.ns
[21:35:39] <polhemic> -rwxr-xr-x 1 mongodb mongodb 6 Jan 1 21:32 mongod.lock
[21:35:41] <polhemic> /var/lib/mongodb/journal:
[21:35:44] <polhemic> total 16
[21:35:46] <Zelest> pastie.org
[21:35:46] <polhemic> -rw------- 1 mongodb nogroup 16384 Jan 1 21:32 j._0
[21:35:49] <polhemic> That's with the server restarted before restarting the injest app
[21:37:35] <polhemic> I've restarted the app and the journal is off and running again - increasing by a few hundred kBytes a second
[21:38:11] <Boomtime> what version btw?
[21:39:09] <opmast> how to remove 10 specific _id's from database ? i have seen only remove with specific criteria or all documents ..
[21:39:33] <polhemic> 2.6.6
[21:40:36] <Boomtime> huh, 1GB is the limit to a journal file: http://docs.mongodb.org/manual/core/journaling/
[21:40:40] <Boomtime> i thought it was less than that
[21:40:51] <polhemic> ubuntu packages from mongodb-org depot
[21:41:01] <polhemic> running on LTS12.04
[21:41:35] <Boomtime> opmast: use $in
[21:42:11] <polhemic> The docs talkabout running a write flush every 60 seconds, so you have to be shifting some serious data to hit 1GB with 60s of writes. Unless you're me.
[21:42:28] <Boomtime> polhemic: no, the journal is write only
[21:42:45] <opmast> Boomtime is only for arrays no ? _id is array?
[21:42:46] <Boomtime> it is a write-ahead
[21:43:17] <Boomtime> opmast: $in specifies an array of items to match, you have 10 items to match, thus $in is perfect
[21:45:45] <Boomtime> polhemic: the journal is not read except at start-up (crash recovery), so it always writes ahead - wait for a minute or two after a new file is allocated and see if the old one disappears then
[21:46:23] <Boomtime> note it may take more than a minute after the new file is created, due to overlap - wait at least two minutes (or a bit longer) to be sure
[21:46:43] <polhemic> It's still there and it's still growing. Up to 45MB now since the restart
[21:48:15] <Boomtime> yeah, that amount is normal - i have just learned that the limit is 1GB - i thought it was less than that
[21:48:51] <polhemic> But at 1GB it just creates a new journal file. The only time it ever removed the journals is when the server is rebooted.
[21:49:42] <Boomtime> can you try leaving it?
[21:49:43] <polhemic> This means that either my server has to have a periodic restart to clean out used journals, or I've got to turn journaling off (which I really don't want to do)
[21:49:59] <polhemic> I'll leave it running overnight, no problem.
[21:51:27] <polhemic> VPS has 26GB free, so that's around 75 hours before the drive fills up
[21:53:56] <opmast> Boomtime how to delete _id when i make _id : 54a57bd848177e240d8b4567 it not work
[21:57:02] <Boomtime> ObjectId("54a57bd848177e240d8b4567")
[22:19:06] <polhemic> Boomtime: I've done a bit more digging, and this seems to be a common issue - journals filling discs.
[22:19:33] <polhemic> The workaround is to db.fsyncLock() and db.fsyncUnlock() the database, which cleans up the journal.
[22:20:07] <polhemic> Probably a prime candidate for a cronjob.
[22:20:40] <polhemic> http://superuser.com/questions/657197/why-journaling-use-so-much-space-in-mongodb
[22:21:18] <Boomtime> polhemic: journaling uses 2GB of disk
[22:21:24] <Boomtime> it caps out there
[22:21:54] <Boomtime> if you want it to use less, you may consider the smallFiles option
[22:21:58] <polhemic> Is that a hard limit? If so, then there's no worries.
[22:22:13] <Boomtime> please do not use db.fsyncLock that will just cause you grief in other fun ways
[22:22:19] <polhemic> I though smallfiles use just as much space - more files but smaller
[22:22:34] <polhemic> ^ thought
[22:22:59] <Boomtime> polhemic: it may end up using the same amount if that is what is required, but it will also cause the files to be available for recycling sooner
[22:23:46] <Boomtime> what is happening is that the previous journal file is not being made available for recycling until the current one is full
[22:24:17] <Boomtime> thus it sits around taking up space until a new one is needed, then it is reset and recycled for use as the new journal file
[22:24:48] <Boomtime> thus, 2GB is not a hard limit - but it's unlikely you'll be able to push it above that
[22:26:31] <polhemic> good to know, thanks
[22:26:38] <polhemic> I'll let it run tonight and see what happens
[22:40:21] <Nitax> does Mongoose/MongoDB persist the order of an Array SchemaType?
[22:43:51] <Boomtime> Nitax: mongodb persists the order of arrays, but i do not know about mongoose
[22:44:30] <Nitax> Hmm. I would assume that Mongoose wouldn't change that but it would be nice to be 100% sure before I assume I don't need to keep track of it myself