[01:13:44] <Init--WithStyle-> Hey guys... I would like to set up a collection of data on my server before hitting the database... but have no idea how to set the collection up correctly.
[01:13:59] <Init--WithStyle-> Initially I tried just doing an insert on the database for every piece of my collection but... it's too big
[01:14:04] <Init--WithStyle-> Its a 2000 x 2000 array
[01:14:15] <Init--WithStyle-> too many database hits
[01:15:07] <_johnny> Init--WithStyle-: you can use mongoimport directly on the data (if you're able to have the mongod shut down while you do it)
[01:15:31] <Init--WithStyle-> _johnny: could you point me towards an example/some literature?
[01:15:37] <Init--WithStyle-> I'm not sure hwat mongoimport is
[01:16:48] <_johnny> yes. it's part of mongodb, as an import util, for json/mongo/csv/tsv data: http://www.mongodb.org/display/DOCS/Import+Export+Tools
[01:17:02] <Init--WithStyle-> The main thing is programatically prepping this array for sending to the mongoDB where it can be unpackaged into my collection..
[01:17:36] <_johnny> right, parsing is usually the intensive part
[01:18:06] <_johnny> i was prepping some xml to json which took me 4 hours. the import of json to mongo took 30 minutes :)
[01:19:24] <crudson1> Init--WithStyle-: what form is the data in currently?
[01:19:38] <Init--WithStyle-> just a 2d array created via javascript
[01:19:56] <Init--WithStyle-> right now i'm parsing through every part of the array in a for loop and doing a mongo insert
[01:20:11] <Init--WithStyle-> seems i'm getting cut off for some reason at ~ line 546 of the 2d array..
[01:20:15] <Init--WithStyle-> maybe i'm hitting it too intensively
[01:21:31] <Init--WithStyle-> i'm using nodejitsu.. if there was some way I could just push the whole array over and then have it unpack itself... maybe that would work better?
[01:22:11] <Init--WithStyle-> Am i approaching this completely wrong?
[01:22:14] <crudson1> Init--WithStyle-: so it's being generated programmatically. If performing inserts in realtime is slow (or getting slower over time, which could be for a number of reasons) then you could output the json for each document to a file and import that afterwards (as _johnny suggested)
[01:22:51] <Init--WithStyle-> crudson is doing an insert for each part of the array the correct way to go here?
[01:23:10] <Init--WithStyle-> this is for my initial population of the collection
[01:24:55] <Init--WithStyle-> for some reason things just stop when I get to line 576 of my 2d array :/
[01:25:00] <crudson1> Init--WithStyle-: it depends whether you've decided on the best document structure for this data. Have you decided how it will be queried or analyzed, as how you are representing it should be a consideration at this stage.
[01:30:40] <Init--WithStyle-> could be that jitsu buffered the inserts and it sequentially pushing it in
[01:37:16] <Init--WithStyle-> Strange.... it seems at a certain point my inserts take forever to complete....
[01:39:17] <Init--WithStyle-> Do I need to have multiple primary keys or something?
[02:59:15] <Glace> Any good experiences with rep set on ebs raid 0?
[03:19:38] <Glace> Is there any reason to not use raid 0 if you have rep sets+ snapshots?
[03:20:15] <Init--WithStyle-> I wish i knew what you were talking about Glace :D
[03:21:29] <Glace> Hmm.. I see that all example of mongodb with replication sets on ec2 use raid10 on ebs volumes. I was wondering why not just raid0 since the data is replicated..+journaling and taking ebs snapshots
[04:07:52] <circlicious> if i am doing a mapReduce and there are 1000 requests made at once, its going to be performant if i keep on creating a tmp collection for each operation and then drop them?
[04:14:00] <circlicious> can i filter over the resultset returned by mapreduce when using inline:true ?
[04:57:37] <ravana> is there an equivalent to mysql sql_calc_found_rows in Mongo?
[04:59:10] <IAD> $sursor->count() and $sursor->count(true) for PHP
[05:02:30] <ravana> if i put there $collection->find()->limit(1), can i expect the same behavior as mysql does?
[07:50:22] <NodeX> The trouble with that is now we're going to get alot of idiots who dont have a clue how to write efficient queries and programing code brigning the overall speed of mongo down and giving ti a bad name :/
[07:51:26] <NodeX> oh well, I suppose take the good with the bad
[08:18:48] <kali> NodeX: there are already here anyway, the post is actually 4 monthes old
[09:00:14] <yatiohi> Hello, I want to "break" a replica set and switch to a single instance. Do I have to just restart the server without the --replset parameter?
[09:06:49] <_johnny> jQuy: i "expose" a db with rockmongo, and i never use php in my stack. it's very lightweight, and just an instance of php-cgi. personally i find most UI's, both app and web, rather limiting, but for basic stuff either of them seems to do
[09:07:06] <_johnny> and besides, like NodeX said, there's MongoDB Rest which is node based :p
[09:07:45] <_johnny> NodeX: reminds me of a chat i saw on nodejs yesterday. "i'd never use mongo". i got curious, so i asked why. he wanted a rest interface. lol
[09:09:02] <jQuy> _johnny: I google for MongoDB Rest
[09:09:56] <jQuy> _johnny: yesterday I was adviced to use Mongoose
[09:10:16] <_johnny> mongoose seems to be a popular one aswel, yes
[09:15:57] <jQuy> MongoDB Rest server doesn't start up!
[09:16:57] <jQuy> I installed it via npm and followed the instructions there: https://github.com/tdegrunt/mongodb-rest
[09:33:59] <jQuy> _johnny: mongodb-rest command still doesn't work. " 'mongodb-rest' is not recognized as an internal or external command, operable program or batch file."
[09:39:13] <jQuy> I might use Mongoose. It works better.
[09:56:47] <jQuy> Is it wise to create own js-file for data modelling?
[09:59:20] <jwilliams> is there any place that a mongo admin can check for slow update?
[10:29:20] <NodeX> 19 keys, split the array by 19 and you'll have zero based array members that will match your keys (as long as the key worder is correct), then loop the keys and assign each part of the chunk to a key:value array, then add the while lot back together and insert
[10:29:44] <lizzin> i need to explore the json libraries more. would be extremely helpful if there was a jsonArray to List method. then i could just zip the two
[11:44:28] <Vile> Hi! I still need an idea. I have an hierarchically arranged collection (using materialized path). I need to do a m/r on it, but… for proper processing of each document i need all of its parents
[11:45:43] <NodeX> perhaps ask 10gen for some professional consulting
[12:13:10] <Vile> tree structure with materialized path.
[12:13:41] <Vile> probably for the nested set the processing like i want can be done
[12:14:15] <NodeX> and what do you need to do with it again
[12:17:07] <Vile> NodeX: final purpose is to get the hierarchical search
[12:18:03] <NodeX> I dont understand what that is sorry
[12:19:21] <Vile> i.e. i have tree like: {path:"/a", data:"hello"}, {path:"/a/b", data:"world"}. Then search for the terms "hello", "world" should return node "/a/b"
[12:20:08] <NodeX> and this cannot be done appside?
[12:21:00] <Vile> NodeX: sorry for being unclear. No, it can not => nodes are in the database and there is quite a large number of them
[12:30:28] <remonvv> you just said you need to search on content right? Put your content elements in a flat collection, search that, fetch the _ids and use those _ids to query on the tree node data to get whatever it is you need that for.
[12:32:12] <Vile> (or more). If one of them appears higher on the hierarchy and another is deeper - the deepest level which has all the search terms matched is considered a match
[12:32:26] <remonvv> You need to write out what you're trying to do functionally somewhere. That might be easier.
[12:32:59] <remonvv> Right but that doesn't require storing it hierarchical at all. The content elements then simply need to know how deep they are rather than what their exact path is.
[12:33:07] <remonvv> In which case you can simple sort them
[12:33:13] <Vile> i.e. search matches if this item and all of its parent items on the hierarchy match all the search term
[12:34:47] <Vile> remonvv: parent has "hello" somewhere in the data, child has "world" (but does not have "hello"). search terms are hello && world. only child item matches (because hello is contained in the parent)
[12:49:39] <Vile> remonvv: should be find({data:{$all:["hello", "world"]}}) but in the collection where each node contains all the data from its parents
[12:50:27] <Vile> (but i can not use such a collection, because objects could be large and updates will be a nightmare)
[12:57:13] <circlicious> can i use mongodb on one server from anothe server?
[13:21:42] <remonvv> Anyone attending the munich event here?
[14:13:09] <doxavore> Pro tip: under no circumstance should one use ext3 with MongoDB. Yuck. It's bringing everything down on every new file allocation.
[14:13:25] <doxavore> Is there a way to check and see how close MongoDB is to thinking it needs to allocate a new file?
[14:22:26] <doxavore> Or even a way to pre-allocate a few files at a time, so I can control and work around the server coming to a stand-still?
[14:26:56] <NodeX> it pre-allocates in chunks doesnt it
[14:28:10] <doxavore> NodeX: yeah... I'm using it for GridFS and have to continue running on ext3 for at least a few mode days. I'd just like it to not keep bringing everything down. :-/
[14:34:23] <NodeX> (just trying to work out where the bottleneck is)
[14:35:57] <doxavore> nothing very big, we hover around 5-10 inserts/sec, GridFS increasing around 300-400MB/hour
[14:38:05] <thewanderer1> hi. let's say I use Mongo as a data warehouse and have daily data migrations from various sources to one collection. How do I ensure the collection integrity, i.e. it contains the old dataset, or the new dataset, but not partial data?
[14:38:31] <thewanderer1> filesystem analogy: write the new file first, then rename old file to new file
[14:38:50] <thewanderer1> err, other way round (but you get the idea)
[14:49:10] <remonvv> thewanderer1, I believe db.old.drop() -> db.imported.renameCollection("old", true) should do the trick
[14:59:53] <thewanderer1> remonvv, hmm, can't I do an in-place rename?
[15:00:06] <jmar777> Anyone got some recent benchmarks with the aggregation framework? testing right now against some eventing/olap use cases. getting ~1sec to aggregate across ~100k events. am i seeing roughly the best I can expect?
[15:38:52] <brahmana> Hi all. (again.. got disconnected earlier)...
[15:39:18] <brahmana> So anyone knows what causing this assertion : http://pastebin.com/EVExxKAg ?
[15:48:11] <circlicious> (Could not connect to a primary node for replica set
[15:48:39] <circlicious> tried to set connection string to ip:port from other server (mongoid ruby library). thats what i get, what should be done?
[15:55:13] <estebistec> for a read-heavy app, what's a good lower-bound on journal internval? I'd like to crank it down at least somewhat from the 100ms default, but don't want to go crazy
[15:55:46] <estebistec> Is it cheap enough when there are no writes such that 10ms j-interval wouldn't cost me much cpu or other resource contention for the DB?
[16:53:21] <Lujeni> Hello - It's possible to specify query for mongo_connector tools ? if a want only store document older than 90 days for example. thx
[17:26:13] <Almindor> what's the correct date format for the JSON import using mongoimport?
[17:29:25] <crudson1> Almindor: see the section "Dates" http://www.mongodb.org/display/DOCS/mongoexport
[17:30:51] <crudson1> Almindor: try exporting one document that contains a data and examine that
[17:33:50] <eka> hi all... anyone knows if ensure_index from pymongo behaves as ensureIndex from the shell? I mean, I don't want the index to be recreated
[18:24:39] <LesTR> have someone idea about this? Please : )
[19:27:16] <ninjai> can soemone help me with authentication? I ran the command db.auth('admin', 'password') and it says "1" after. I try to use the init script to try to start graylog2 and I get and auth fail from mongo. Why?
[19:32:08] <linsys> ninjai: did you restart with --auth?
[19:33:26] <ninjai> linsys, no, because I cannot find out how. I use an init script in /etc/init.d, and there is not even a mention of the mongo command in it
[19:33:37] <ninjai> is there some other way I should be adding it in?
[19:35:11] <ninjai> when I try to stop the init script/service, and use "mongod --auth", I get this error: exception in initAndListen: 10296 dbpath (/data/db/) does not exist, terminating
[20:21:13] <quuxman> oops, s/one/is1/ (just renamed that, as it conflicts with Python's 'in')
[20:24:16] <quuxman> I would figure I'm recreating something out there, but I haven't found it
[20:42:55] <geoffeg> findAndModify can not operate as a cursor, right? if i wanted to use findandmodify's semanitics with thousands of documents, i would have to run findandmodify thousands of times in a loop?
[20:43:11] <quuxman> does anybody even use pymongo here?
[20:59:21] <jgornick> Hey guys, with Mongo 2.0.x, does this issue still exist? http://stackoverflow.com/questions/6743849/mongodb-unique-index-on-array-elements-property
[21:22:37] <crudson1> jgornick: I don't think there is a planned feature for this at the index level. Has to be enforced at the application level.
[21:23:07] <jgornick> crudson1: Ok :( Thanks for taking a look at that!
[21:23:40] <crudson1> It comes up a fair bit - I've been searching the jira for matching issues, but I don't see any planned feature.
[21:26:30] <crudson1> jgornick: you may be able to use a feature of the language you use (e.g. Set vs Array), but you may not get automatic serialization.
[21:30:04] <ninjai> how do i grant my admin user r/w to the admin database
[21:33:13] <crudson1> jgornick: don't know the php api sorry, but "uniqueness in an array" should be fairly standard material. Just find the most sensible part of your application or object model to manage the constraint.
[21:33:47] <jgornick> crudson1: Yes, pretty simple to implement. I'll check it out. Thanks for the insight.
[21:41:44] <quuxman> pretty curious about which APIs people here use for Mongo
[21:49:48] <eka> hi, is there any operation that based on a query it will insert if there is no doc and do nothing if there is ? I don't have an _id just a query
[21:52:02] <ninjai> when I do a db.auth("user","pass"), what do the 1 or 0 mean that follow it?
[22:24:56] <nexion> hey guys, is it possible to sort the result of a .find by a value that's computed from two other columns, all within mongodb?
[22:26:50] <nexion> like if I have a list of {a: 3, b: 5}, {a: 1, b: 7}, and I call .find(/* a + b > 7 */) or similar for getting the results in order by (a+b)?
[22:32:02] <crudson1> nexion: sure, depending on how big the query result size is
[22:32:58] <crudson1> nexion: for in-memory result sets, you can do db.col.aggregate({$project:{c:{$add:['$a','$b']}}}, {$sort:{c:-1}})
[22:34:52] <crudson1> nexion: oh I missed the > 7 bit
[22:36:37] <crudson1> but that will do such ordering for you, for the find bit you can use $where, but the two can't be used together
[22:37:38] <crudson1> actually you could do it all with .aggregate(), sorry am doing multiple things currently
[22:40:22] <crudson1> nexion: like aggregate({$project:{a:true,b:true,c:{$add:['$a','$b']}}}, {$match:{c:{$gt:7}}}, {$sort:{c:-1}})
[22:44:57] <MikeFair> Is there anything in MongDB that's the equivalent to Couch Apps?
[22:46:18] <MikeFair> I'd like to make a small mobile app for the Android platform. I'd like the database for this app to synchronize with a server whenever its available
[22:46:59] <MikeFair> The database is pretty simple, it's basically a list of contact groups