[00:36:44] <sinisa> what you mean with "do it in code"
[02:54:15] <hdm> I consistently get a corrupted database after ~150m records. I tried 2.0.6, 2.1.x, and two different storage destinations (SSD & R0 spinning disk), with errors like: "$err" : "Invalid BSONObj size: 1801675112 (0x6861636B) first element: */\\n}\\n\\n#content {\\nwidth: 700px;\\npadding: 10px;\\nmargin-t
[02:54:23] <hdm> any idea what to check next? RAM possibly?
[02:54:36] <hdm> (its ECC,registered, basic tests come back fine)
[02:55:03] <hdm> "$err" : "Invalid BSONObj size: 1801675112 (0x6861636B) first element.. or error: { "$err" : "BSONElement: bad type 64", "code" : 10320 } or similar keep popping up
[02:58:10] <crudson> hdm: one error looks like you are trying to insert a document way over the max size limit
[02:58:27] <hdm> those are errors doing an unrelated count() query later
[02:58:49] <hdm> now all counts immediately fail since something is instantly hit that is corrupted
[02:59:07] <hdm> im about to blame linux's raid-0 implementation or something
[02:59:26] <hdm> about four days of chasing this around, data load takes 12-48 hours depending on what set
[03:02:24] <crudson> hdm: ah ok. So you are sure the size and "bad type 64" are just db corruption
[03:16:07] <hdm> crudson: yup, random other bson errors if i keep trying to insert
[08:39:57] <Bartzy> I mean, when there's no perfect index to use and there's a choice of more than 1 index, the query optimizer is choosing one of the indexes to use (or none at all) by doing query plans for all of them together, and checking nscanned versus n
[08:40:07] <Bartzy> But doing that means actually doing the queries, right ?
[08:40:24] <Bartzy> and then just kill the ones that didn't finish yet when one of them finishes?
[09:08:50] <Oggu> I have a hash in a field. Say A. Which gives A.a, A.b, A.c, A.d… I want to find documents where A.a is equal to condition a or doesn't exists. And the same for A.b, A.c….
[09:34:34] <PDani> how should I run those mognod instances, which are only arbiters in a sharded replicated environment? is it a good practice to start these with --noprealloc and --nojournal because they won't hold any data?
[09:35:44] <NodeX> Oggu : $elemMatch has to match both elements
[09:36:03] <NodeX> so A.a='string' or exists:false would allways fail
[09:37:18] <NodeX> using $or but it's not efficient
[09:37:59] <NodeX> or change your schema slightly to always have either a value or a null value for A.a ... then use ... A.a : {$in ['stringA',null]}
[09:38:52] <NodeX> or do 2 queries and merge the results - not sure if that's faster than an $or ... I doubt it very much
[09:39:12] <Oggu> Speed isn't very important here. Only run once a day
[09:39:16] <zenista> hi guys my client is on windos xp 32 bit... i m trying mongodb first time with new version of a app... but i learned that 32 bit of monogdb is only for testing and eval... and plus there are lots of posts of data corruption by using mongo db...
[09:40:43] <Oggu> I don't see how I can do it with $or. $or {'A.a': 'value', 'A.a' {'$exists': false}, 'A.b': 'value', 'A.b': {'$exists': false}} would match to much. It would be enough if A.b didn't exist for the element to match
[09:40:56] <zenista> i m tryng node.js also .. and mongodb is quite well go with node.js
[09:41:00] <NodeX> I would strongly recommend your client upgrade to a stable operating system that is designed for large data
[10:03:15] <PDani> how should I run those mognod instances, which are only arbiters in a sharded replicated environment? is it a good practice to start these with --noprealloc and --nojournal because they won't hold any data?
[10:17:34] <schlitzer|freihe> or will i have to figure out myself how much members are in a replica set and set the right number in the "w" parameter?
[10:18:55] <schlitzer|freihe> because i know tat at least the java driver has these writeconcern features that you can say "majority" instead of "2" if you have 3 members in a shard.
[10:36:17] <mephju> Hello guys. I want to do something in one query which is more easy to do in two queries. But I think it would be more efficient in one. So here is what I want to do: I want the results of an $and query and also the results of an $and query. both at the same time. is it possible with some operaiton I might not know yet?
[10:37:03] <mephju> I want the results of an $and query and also the results of an $or query.
[10:37:49] <mephju> e.g. something like this { '$or': [ {'$and':array}, {'$or':array} ] }
[10:56:27] <Oggu> NodeX: Efficiency isn't very important in this case. Should I get all documents and filter application side? Or do or cases for all different combinations?
[11:14:37] <harryhcs> can you copy an existing mongo db to replace another one? would that work?
[11:14:48] <harryhcs> its for a django app, and i moved it to a new server
[11:15:11] <Derick> yes, as long as mongod isn't running on either side
[11:15:32] <mephju> when I have an $or query with two conditions. will both conditions be evaluated in any case? or will the evaluation stop in case first condition is met
[11:23:19] <harryhcs> Derick: mm, i did copy it when running
[11:23:28] <harryhcs> should I stop and do it again?
[12:07:16] <Bartzy> NodeX: And it is considered a range query ?
[12:07:17] <NodeX> a: {in :[1,2] } - match a=1 or a=2
[12:07:31] <NodeX> no, sql in is not a range query
[12:08:06] <Bartzy> NodeX: I'm asking because I'm doing: db.things.find({uid: {$in: [1,2,3,4.....]}.sort({_id: -1}), and I have this index: {uid: 1, _id: 1} , but I still get scanAndOrder for the query...
[12:16:32] <Bartzy> so the reason I need uid:1, _id: -1 is because when the index is past using the uid part of it, it can't do the sorting by _id reversed ?
[12:16:50] <NodeX> correct because you setup the index for asc not desc
[12:17:05] <Bartzy> NodeX: I still get: "scanAndOrder" : true,
[12:17:18] <NodeX> you'll probably have to hint the index now
[12:26:15] <NodeX> it could be a bug maybe, that _id is choking the parser or somehting, I should wait for a 10gen employee to confirm you cna actually index and sort on _id -1
[12:26:31] <Bartzy> I think it's because query cache or something - because now it takes 8ms for the regular , non-hint query ?
[14:18:50] <_simmons_> I'm looking on mongodb.log and there is many errors like this: "Assertion failure false ./util/../db/../bson/../util/hex.h 29"
[14:19:40] <_simmons_> And in the final of the "stack trace" /usr/lib/libboost_thread.so.1.42.0(thread_proxy+0x60) [0x7f0250c0b200] /lib/libpthread.so.0(+0x68ba) [0x7f02513328ba] /lib/libc.so.6(clone+0x6d) [0x7f025006802d]
[14:47:34] <wereHamster> no. It's an assertion failure in the mongodb code
[14:47:46] <wereHamster> The first line is relevant, not the last one
[15:17:36] <markgm> I'm currently working with embedded documents and from above it looks like they are saved in two places, inside the parent document itself and the collection which holds the embedded documents. Is this in fact the case?
[15:18:03] <wereHamster> markgm: they are saved wherever you save them.
[15:18:30] <wereHamster> and only there, and noplace else.
[15:19:35] <markgm> My understanding of embedded documents is that they are a fully initialized reference to a documents and that changes in one correspond to changes in the other
[15:20:59] <wereHamster> you either embed the document or you use a reference to one. You can not have both.
[15:21:29] <markgm> Im using doctrine and embedding many of the same document type in a parent doc. so when I look at the parent doc, it contains the correct docuements within it
[15:21:51] <wereHamster> no idea what doctrine is or what it does or how it does it
[15:22:06] <markgm> and then there is a seperate collection which holds these embedded documents
[15:22:32] <wereHamster> I recomment to use the shell to look at what is actually stored in the database
[15:22:34] <markgm> Im wondering if they are the same document or a copy saved in two different places
[15:53:19] <ramsey> Derick: Does the latest release of the mongo extension for PHP fix the replica set issues we discussed on that thread a while back?
[16:28:37] <_simmons_> could be caused by Wed Jul 18 01:00:02 [conn13270] assertion 13297 db already exists with different case other: [utf-8] me [UTF-8] ns:UTF-8.* query:{} ???
[16:58:50] <diegok> Hi!, is it possible to retrieve last element of an array using dot notation?
[17:51:07] <soumya> i want it return the person if 2 keys in one of the dictionaries i nthe list match
[17:52:00] <soumya> i want it to return the person if 2 keys in one of the dictionaries in the list match (Note: the dictionary has a total of 4 keys)
[18:15:20] <greenberet123> I want to retrieve all records in which a key matches any of 20000 values…….what is the best way to run this query from a program(I already have an index on the key)?
[18:27:16] <crudson> greenberet123: depends what you want to do with it.
[18:27:48] <greenberet123> crudson: I want to retrieve all documents where the key is IN these 20000 values from a program..
[18:28:21] <crudson> greenberet123: I found splitting into smaller chunks was more efficient, but depends whether you have the logical freedom to do that.
[18:29:16] <greenberet123> crudson: splitting the query? so….eventid IN [1,2,3,4,5,6] becomes eventid IN [1,2,3] and eventide IN [4,5,6] ?
[18:30:32] <crudson> greenberet123: yeah for very large numbers, 10s of thousands, at least what I experienced when having to process and move many documents from one db to another depending on a list of values
[18:31:11] <crudson> greenberet123: but I'd do your own tests and see, it could be a factor of how your documents are structured, indexed and queried
[18:31:36] <greenberet123> crudson: But is specifying a list the only way? the query would become really HUGE…….
[18:32:03] <greenberet123> crudson: it seems a little inelegant to me…..
[18:34:44] <crudson> greenberet123: unless you can identify some other way to identify those documents. Perhaps look at how the list of eventids is generated and see if documents can be identified similarly.
[19:12:49] <j0shua> hey; i want to do an aggregate w/ group to create a sum but I only want the top 5 results. how can i do that w/o sorting the entire resutl set w/ sort in my app ?
[19:14:54] <j0shua> meaning i want to reduce {name: 'joe', likes: 5}, {name: 'joe', likes: 2}, {name: 'betty', likes:4} to [{name: 'joe', likes: 7}, {name:'betty', likes: 4} but i ONLY want to get the winner
[19:47:30] <crudson> j0shua: actually this will be more efficient, and it renames _id back to name db.users.aggregate({$group:{_id:'$name',likes:{$sum:'$likes'}}}, {$sort:{likes:-1}}, {$limit:1}, {$project:{name:'$_id',likes:1,_id:0}} ).result
[19:53:41] <crudson> j0shua: if you wanted "top 5" then just change the $limit accordingly
[19:58:55] <j0shua> @crudson i cant chain limit to group