[00:01:12] <GothAlice> The Chinese is definitely multi-byte. No hope of using $slice on those.
[00:01:36] <GothAlice> This is how I normalize and strip accents in Python, FYI: ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))
[00:01:52] <bmillham> I get what you're suggesting. Take the first char (unicode or not), save it in a seperate field, and use that for a aggregation
[00:02:40] <GothAlice> Well, no. Make *sure* it's the native unicode representation, that way you iterate literal characters, not literal bytes.
[00:03:28] <GothAlice> Even then, you'll need a loop like my example which you'll then pull the first character out of, due to the stripping of class M and n characters.
[00:03:35] <bmillham> I can use grab the data as I'm importing it from the mysql database, because it can properly do a substr on unicode
[00:04:16] <bmillham> That's how I display them on the current web page
[00:04:23] <GothAlice> http://grimoire.ca/mysql/choose-something-else — search for UTF on this. ;)
[00:04:40] <GothAlice> It's UTF-8, Jim, but not as we know it.
[00:05:58] <GothAlice> (The 4-byte thing will matter for you as you're storing oriental characters.)
[00:18:18] <bmillham> Reading through the mysql posting that you linked gave me a good LOL with the "It's easy to hire MCSEs" comment.
[00:18:43] <bmillham> I refused to get an MCSE when I saw how useless most MCSEs are
[00:19:04] <bmillham> They passed a test, it doesn't mean they know anything.
[00:19:24] <GothAlice> At the same time, the last three relational projects I was on were Postgres. Higher than normal density of intelligent developers here in Montréal…
[00:20:00] <GothAlice> OTOH, there's all sorts of reasons one might get stuck with MySQL. The fact that it exists and is popular means it's a minefield that regularly needs to be traversed.
[00:22:52] <bmillham> Personally, I've never had any real problems with it. The only reason for switching to MongoDB is just because I want to experiment.
[00:23:16] <bmillham> In my few days of playing with it, I like it enough to go forward with it.
[00:23:51] <bmillham> Still trying to wrap my head around some of the concepts, and figure out the best way to setup my structure
[00:23:54] <GothAlice> MySQL (on AWS) chewed on my data badly enough to require reverse engineering of the on-disk structures that took 36 hours straight… immediately prior to dental surgery to extract 8 teeth (wisdom and impacted). I'm bitter. ;)
[00:24:29] <bmillham> lol (but sorry to hear about the teeth)
[00:24:29] <GothAlice> bmillham: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html may be a useful (short) article.
[00:41:10] <bmillham> Thanks for the suggestions GothAlice. Very helpful.
[00:42:47] <GothAlice> bmillham: It never hurts to help. :)
[00:43:36] <bmillham> I'm thinking for my schema that I will still have to have separate linked documents for album/artist/track
[00:44:27] <bmillham> But since it's easy to write a python script to convert the existing mysql database into mongodb, I'll experiment with different ways
[00:45:15] <GothAlice> bmillham: In my Exocortex project (which is a metadata filesystem; contains all my music, movies, and TV shows, etc.) I do have entities to represent the artist, album, etc, but some of that metadata is also "duplicated" in the records that use them. (I.e. my references cache certain values needed to efficiently display the record, without forcing additional queries to find "related" data.)
[00:45:52] <GothAlice> I.e. instead of {artist: ObjectId(…), album: ObjectId(…), title: "Madness"} I have: {artist: {ref: ObjectId(…), name: "Chester See"}, …}
[00:46:48] <GothAlice> Updates to the artist entity, though, do require multiupdate's across the whole collection to update those caching references, though…
[00:47:40] <bmillham> Updates don't worry me, because they only happen when I find an incorrect entry.
[00:47:57] <bmillham> So extra overhead is no worry there.
[01:03:26] <meandev> Hello! Anyone got an idea of the best approach for deleting documents from a secondary collection when a delete happens on a primary collection. Something like a trigger, maybe?
[01:04:43] <meandev> I understand that triggers are not a feature of mongo and I'm fine with that but would like to know a good suggested alternative people have found
[01:05:38] <meandev> Should I simply manage all the deletions manually in my application, offloading the task from Mongo to entirely within the application?
[02:24:42] <harttho> admincommand list databases and dbstats having average 600ms-1s during the higher load
[02:33:10] <harttho> config servers also hit harder during these spikes
[02:35:26] <joannac> harttho: could try increasing log level and seeing what happens every 15 mins, if you have the disk space
[02:38:26] <harttho> So much log spam can't really decipher =/
[02:40:19] <davo> can i ask a pymongo question? i have a cursor that contains a list of dictionary entries, is it possible to change the datatype of fields from long to string?
[02:40:51] <GothAlice> davo: Run str() across the variable holding the value you wish to convert.
[02:41:22] <GothAlice> davo: Or, are you wishing to modify, MongoDB-side, all the record yielded by that cursor?
[02:42:00] <davo> GothAlice, no just the first case you mentioned is enoguh
[02:42:33] <GothAlice> davo: For general Python questions, I can highly recommend ##python-friendly. :)
[02:43:00] <davo> great, i was there, and got a similar suggestion for here. thanks! :)
[02:45:23] <harttho> Also, here's the memory usage
[02:54:31] <CipherChen> joannac: I know 2.8 is release candidate, just trying to figure how to upgrade from 2.4 to 2.8. Upgrading the WHOLE cluster to 2.6 first or just upgrade the SINGLE node to 2.6 then 2.8.
[02:56:13] <joannac> CipherChen: um, mixed version is never supported (except in the process of upgrading)
[02:58:47] <Boomtime> CipherChen: you should upgrade the whole cluster to 2.6 first
[03:00:16] <GothAlice> And verify continued operation of your application; there may or may not be incompatible changes added or changes to defaults. (I.e. default write concern: http://docs.mongodb.org/manual/release-notes/drivers-write-concern/)
[03:00:50] <GothAlice> (Check release notes to quickly find these types of changes.)
[03:01:25] <davo> GothAlice, reading the mongo documentation, i noticed a db.update() function, might i be able to use this to modify/convert the dataype of fields ?
[03:01:57] <GothAlice> Link to where you found that?
[03:02:03] <davo> or would you still recommend a python implementation
[03:02:35] <davo> i'm thinking something like http://docs.mongodb.org/manual/reference/method/db.collection.update/
[03:02:52] <davo> or http://docs.mongodb.org/manual/tutorial/modify-documents/
[03:03:19] <GothAlice> Yeah, those will modify the data in the database, so you'd use those if you wanted to change the types on those fields permanently.
[03:03:34] <Cipher> GothAlice: what? sorry offline just now
[03:03:46] <GothAlice> davo: If you only wanted to do it say, for display purposes (i.e. to render padding, decimals, etc.) you'd do str() in Python.
[03:04:19] <GothAlice> Cipher: From here: http://irclogger.com/.mongodb/2014-12-04#1417748343
[03:04:23] <davo> GothAlice, ah I see, yes, I think modifying the original data in the database would be best
[03:04:43] <GothAlice> Do you ever do number-like things to that field?
[03:04:57] <GothAlice> (Perform math, sort, or search ranges of values?)
[03:06:46] <davo> I'm not exactly sure, perhaps I am performing searches on the data, which is causing problems using the data as-is
[03:07:26] <GothAlice> Well, a better solution if you're allowing searching (exact or range like $gt or $lt) on those, is to convert the input data to numbers instead. Then it'll work as intended. :)
[03:07:32] <davo> specifically, javascript is pulling data from the database, but is searching for a string instead of a long
[03:08:05] <GothAlice> Yeah, int() that in the face. :)
[03:12:50] <davo> would you still recommend i investigate using the update() on the cursor ?
[03:14:39] <GothAlice> Nonono, I was recommending the db.collection.update() approach to update the records as you read them in if you reeeeally wanted them in MongoDB as strings. MongoDB has 64 bits of integer accuracy; if your numbers are larger storing them as integers isn't an option, of course.
[03:15:53] <davo> indeed, i think i do really want the records in MongoDB as strings
[03:16:16] <davo> that particular field of the data, that is
[03:16:53] <davo> that's acceptable, as the data are user Id's
[03:18:52] <bpap> I saw this in my mongod 2.6 log: "insert bp.fs.chunks ninserted:1 keyUpdates:0 locks(micros) w:24238310 24237ms". Is locks(micros) time WAITING for locks, time HOLDING locks, or total COMBINED?
[03:21:21] <GothAlice> AFIK time spent waiting, in microseconds.
[03:22:31] <bpap> system.profiling breaks it down into two fields: acquiring and held. i was unable to find any docs on the system log format.
[03:23:25] <GothAlice> Ah, yes, the docs only mention the meaning behind that one. According to one stackoverflow comment (not answer) it's time spent holding. http://stackoverflow.com/questions/20338802/what-is-locksmicros-w16035-and-locksmicros-r10051-in-the-mongodb-log#comment30363960_20338802
[03:24:22] <bpap> source code link. very useful. thanks!
[03:29:42] <davo> GothAlice, if it's not to much to ask, might you be willing to suggest how i could perform the update? the data is in the form of `[{u'username': u'user1', u'_id': 123456789012345671L}, {u'username': u'user2', u'_id': 123456789012345672L}]`, etc...
[03:32:14] <davo> maybe something along these lines? http://docs.mongodb.org/manual/tutorial/modify-documents/#use-update-operators-to-change-field-values
[03:35:15] <GothAlice> With that out of the way, you would need to use a query to run through the documents in the collection, str() those integers, then update the original document.
[03:35:43] <davo> oh sorry, my example was misleading in the sense that the id's are not sequential or in any order
[03:36:03] <GothAlice> That's why I give both answers. :)
[03:36:07] <davo> ah yes, your second step sounds like all i need to do :)
[03:40:57] <davo> the atomic update would need to access the MongoDB data directly and perform the persistent update, yes?
[03:41:51] <GothAlice> Well, your client application looping through the data and issuing these updates simply sends the request to the server, which then applies the change in "a reasonable amount of time" (using default "write concerns").
[03:41:53] <davo> it's about 1,526 data entries, so i imagine for now this might be an acceptable quick-fix
[03:43:45] <davo> btw, pardon my naivety, but in your update command, what does the `doc` term refer to?
[03:44:10] <GothAlice> The value representing the current document being "updated" in Python.
[03:45:01] <GothAlice> I.e. for doc in collection.find({}, ['_id']): collection.update({'_id': doc['_id']}, {'$set': {…}})
[03:46:20] <davo> Ah, wow, very nice. I might need to work in that approach in contrast to how i'm doing it currently
[03:46:58] <davo> seems i have a bit of logic refactoring to do first .. :) thanks
[03:49:58] <GothAlice> davo: Might I suggest looking into http://mongoengine.org? (Full disclosure: I contribute to this project.) It can… simplify the logic behind certain operations.
[03:50:41] <GothAlice> For example, that would be written instead: for doc in User.objects.only('id'): doc.update(set__id=str(doc.id))
[03:51:14] <GothAlice> (Makes things more natural-language-like.)
[03:52:03] <davo> Hm interesting, I'll need to look into it further to fully appreciate your example :)
[07:14:52] <ss22> if i limit query by time does it make it less effiicent?
[07:15:01] <ss22> meaning if i want documents before certain date
[07:15:11] <ss22> assuming i have index on the timestamp
[07:26:01] <aksh> i am building a python flask app using mongoengine as orm. I want to use objectid as the id for data model. How can i do that. I tried to use ObjectIdField.its not working
[09:24:52] <deviantony> hey there, I got a question toward MMS
[09:25:23] <deviantony> is it possible to install MMS inside our datacenter?
[09:48:49] <deviantony> cause I just found that: http://mms.mongodb.com/help-hosted/v1.5/core/deployments/
[09:49:00] <deviantony> seems that you can deploy your MMS server instance
[09:50:31] <GothAlice> Indeed, http://www.mongodb.com/subscription/downloads/mms is a thing.
[09:52:21] <GothAlice> Are you operating a cluster worthy of self-hosting a complicated automation suite for? (I.e. them-hosted MMS gives 8 free hosts in a provisioned cluster, free backups below certain bandwidth restrictions, etc.)
[10:00:44] <amk> Can someone give me a hint where to look for causes of multiple lingering connections (ex. 810 open connections on a low low low traffic app db)? Is a the event of a connection not being closed always caused by the client?
[11:13:25] <stefuNz> Hello, i'm getting an error when updating a document with $push to add an element to an array: "fields stored in the db can't start with '$' (Bad Key: '$push')" This is the code i'm using: http://pastebin.com/pyhzBPGr -- What am i doing wrong?
[11:14:52] <amk> stefuNz, I've never seen $push before but the docs say it should be like "{ $push: { <field1>: <value1>, ... }"
[11:15:01] <amk> and you seem to be sending push as if it were a string, no?
[11:15:13] <kali> stefuNz: you can't mix "fields" and "$operators" in the update
[11:15:54] <stefuNz> amk: kali: so how would i set "lc" and "l" and push an element to an inner array at the same time?
[11:15:55] <kali> stefuNz: if you want to set "l" and "lc" in the same findAndModify than the $push, you need to put them in a "$set"
[15:13:16] <Alittlemurkling> Hello #mongodb. I'm fairly new to this database, so forgive me if this is a stupid question. I have a dataset with 4.4 million entries, but each implementation of my application will only require a subset, based on geographic location. I have thought about partitioning the data by USA state, using top-level documents in my collection that will embed the data. However, I have had trouble finding wa
[15:13:22] <Alittlemurkling> ys to search across only specified documents, excluding the others. Is this possible, or even wise?
[15:14:09] <cheeser> not all documents in a collection have to look the same. so the docs that need those fields can have them and those that don't can skip them.
[15:14:55] <Alittlemurkling> cheeser: Is this in reference to my question?
[15:19:28] <Alittlemurkling> I don't quite understand your second statement. I know there is no required schema.
[15:25:25] <Alittlemurkling> Say one of my implementations requires data from California and Oregon. I have a collection that contains a list of States. Each State contains the data located inside it. How do I only query the "OR" and "CA" documents? Or is the speed up I would gain from this not worth it?
[15:25:59] <Alittlemurkling> (Given 4.4 million entries)
[16:45:26] <grkblood13> if I do a find query that returns 100 results and I only want to display X amount of them after the Nth result how would I do that?
[16:56:27] <boutell> Hi. What’s the minimum storage cost of adding another mongodb database? As in, the cost of doing “use foo” where foo has never been seen before. I’m trying to measure it but of course disk space is constantly fluctuating for other reasons
[18:24:33] <natefinch> Anyone on here knowledgeable about replicasets? I have a question re: replSetGetStatus and knowing when mongo is ready to go after initiating
[19:41:02] <sheikpunk> Anyone knows how can i optimze MongoDB count with conditions?
[20:21:32] <salty-horse> hi. If I resize a document (that is, save()ing a document where one of the fields is much smaller), does mongodb use less space to store it? if so, when does it update the collection's stats() ?
[20:27:07] <hi_resolution> I am having some issues with mixed version replica sets
[20:27:36] <hi_resolution> I was wondering if anybody had encountered user not found errors when adding 2.6 members to a pool of 2.4 servers?
[20:28:04] <hi_resolution> The replica set is secure and is using a key for authentication, yet it does not seem capable of adding the new members
[20:28:27] <hi_resolution> any help is greatly appreciated
[21:39:43] <blizzow> If I shard an already existing large collection, does the preexisting data get moved to the other shards or does only new data get sharded?
[21:44:22] <cheeser> your data will get balanced across the shards.