pmxbot IRC Log Viewer

[00:00:48] <ianblenke1> progolferyo: create a new shard as a replica set, and then migrate the chunks off of the non-replica shard?

[00:01:15] <ianblenke1> just thinking here, I can't find anything in the documentation on your case.

[00:01:26] <progolferyo> ianblenke1: yah, if it wasn't so big, I would just do that

[00:11:07] <progolferyo> ianblenke1: but the cluster is just so big that we are trying to avoid that if at all possible, we just forgot to make the server a replica set

[00:19:09] <caleb_> Hope it's okay to ask a Flask-PyMongo question here. Does anybody know the right way to access pymongo.errors.DuplicateKeyError without 'import pymongo'? It seems like there should be a way to access it through flask.ext.pymongo.

[00:52:41] <wting> How do I login to a mongo db using user / password? I'm using `mongo server.com:27017 -u user` but I don't get a password prompt.

[00:58:45] <wting> Never mind, it's `mongo server.com -u user -p` to get a prompt. However despite putting in my password I'm still getting "need to login" errors.

[02:24:18] <wting> I've tried db.auth("user", "pass") but nothing works.

[02:41:12] <jardineworks> hey guys -- Can someone help me troubleshoot why my mongoexport is not working?

[02:41:27] <jardineworks> I'm a total newb to mongo and I can't see to get this sorted.

[02:42:06] <jardineworks> I am running the command: mongoexport --dbpath data/db --port 27017 -c tags -o test.json

[02:42:27] <jardineworks> but for a result I am getting --

[02:42:29] <jardineworks> exported 0 records

[02:42:29] <jardineworks> Fri Oct 19 22:37:13 dbexit:

[02:42:29] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: going to close listening sockets...

[02:42:29] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: going to flush diaglog...

[02:42:29] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: going to close sockets...

[02:42:30] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: waiting for fs preallocator...

[02:42:32] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: closing all files...

[02:42:34] <jardineworks> Fri Oct 19 22:37:13 [tools] closeAllFiles() finished

[02:42:36] <jardineworks> Fri Oct 19 22:37:13 [tools] shutdown: removing fs lock...

[02:42:38] <jardineworks> Fri Oct 19 22:37:13 dbexit: really exiting now

[03:58:48] <mrpro> hi

[03:58:55] <mrpro> MongoDB.Bson is for .net 2.0?

[14:39:50] <aster1sk> Ugh I'm at the 'back to the drawing board' stage.

[14:40:16] <aster1sk> Originally our documents were great for writes, sucked for reads but everything was ok.

[14:40:42] <aster1sk> Now boss has changed requirements and unfortunately cannot adapt the existing model.

[14:51:12] <kali> aster1sk: depending what your app does, it might be the right call

[14:52:09] <aster1sk> Yeah, it's basically event tracking. I've been aggregating heavily however the current model wont allow tagging sub documents to geographic / device types.

[14:53:21] <aster1sk> So devices and geo locations will probably be broken into either a different document model or perhaps different collection.

[14:53:47] <aster1sk> There will be minimal redundancy (date , mysql id) so I'm not too upset about that.

[14:54:25] <aster1sk> I am however a little dissapointed I've gone this far with a model that wont provide what they need.

[14:55:16] <aster1sk> I'd be happy to pay for consulting.

[14:56:30] <aster1sk> kali: not seeing any Toronto meetups in the future, do you have insight here?

[14:58:22] <kali> aster1sk: no idea, i'm in paris

[14:59:22] <aster1sk> Oh sorry, there's another 'k' name here who attends the meetups, I'm sorry.

[15:00:05] <kali> nopb

[15:05:07] <aster1sk> Anyone know where best to go for consulting?

[15:13:48] <Goopyo> is a local mongodb write as fast as a local sqlite write?

[15:16:31] <richthegeek> Hi. I'm building a system that has user-configurable sorting on data and the sort-order has to persist across multiple collections of the same data (their are good reasons for this, it's just difficult to explain the whole system). To make this easier, we're basically copying the sorted data into an _o property that's an array, and sorting by {_o.0: 1, _o.1: 1, _o.2: 1 ... }

[15:16:59] <richthegeek> A few questions about this method: if we index on just _o, will that provide a good index for the sorting?

[15:17:32] <richthegeek> and if the _o.5 field, for example, is null will this cause it to be sorted before or after other (non-null) _o.5 rows

[15:19:07] <richthegeek> this seems to answer the first one: http://www.mongodb.org/display/DOCS/Multikeys

[15:19:40] <Goopyo> richthegeek: well indexes are sorted but you define how the sort when you create the index

[15:19:52] <Goopyo> you want to define it in the most common way you access it

[15:20:17] <Goopyo> From the docs: "When creating an index, the number associated with a key specifies the direction of the index, so it should always be 1 (ascending) or -1 (descending). Direction doesn't matter for single key indexes or for random access retrieval but is important if you are doing sorts or range queries on compound indexes."

[15:20:32] <richthegeek> Goopyo: we're transforming desc searches on numeric fields into ascending values (0 - val) so we can just index/sort ascending without issue

[15:20:55] <Goopyo> ok so whats the question exactly?

[15:21:06] <richthegeek> well I hadn't found the multikeys page before i asked ;)

[15:21:23] <Goopyo> oh alright ;)

[15:21:47] <richthegeek> so we can index on {_o: 1} and sort by {_o.0:1, _o.1: 1, _o.2: 1 ... } without real issue

[15:22:24] <Goopyo> I wouldn't expect fake issues either

[15:22:31] <richthegeek> the other way would be creating a compound string that's sortable but that involves doing string slicing and padding and transforming numbers into zero-padded binary so that's an inferior soltuion

[15:22:33] <richthegeek> ha

[15:23:38] <Goopyo> btw why don't you just sort one way and resort in your programming langauge for the user? Seems more intuitive.

[15:23:42] <richthegeek> just a final question on this, and I suspect the answer is "of course not, silly goose", but is there any way to change the default/natural sort on a collection?

[15:24:03] <Goopyo> richthegeek: yeah the ID field can be overwritten

[15:24:07] <kali> richthegeek: i think what you want is an index on { _o.0:1, _o.1:1, _o.2:1 }, not an index on _o

[15:24:12] <Goopyo> with any custom value. Just they must be unique

[15:24:50] <richthegeek> kall: are you sure? the multikeys page seemed to say it would index all values of the array?

[15:24:58] <kali> richthegeek: an index on _o will just be helpful for finding a value in an any position in the array

[15:25:05] <richthegeek> ah right ok

[15:25:23] <Goopyo> How bigs the datasets?

[15:25:48] <richthegeek> in a theoretical sense, terabytes

[15:25:51] <Goopyo> i.e why dont you retrieve them unsorted and resort in the language of choice? should be a millisecond operation

[15:25:54] <Goopyo> oh

[15:26:57] <richthegeek> I mean, the reason I'm jumping through hoops with sorting is because the way the data is manipulated is user-provided and the operations are done in a streaming fashion (data is processed as it's added, with the minimum amount of interaction with the database)

[15:27:13] <richthegeek> each stage of the processing is cached in separate collections, so the sorting needs to persist across collections

[15:28:11] <richthegeek> and because we can't know the "sort position" of the data without doing a binary sort + writing back to the database on multiple records (as many as the entire set) we can't overwrite the _id field easily (because that field doesn't support arrays, afaict?)

[15:28:43] <kali> mmmm not sure, _id can be object, so maybe array works too

[15:29:14] <richthegeek> odd... might just be a "feature" of Mongolian Deadbeef but when I tried to write even a string to the _id field it got converted to hex

[15:29:20] <richthegeek> an ObjectID even

[15:29:40] <Goopyo> I'm not really sure about your architecutre/plans but you should perhaps look into using something like redis to be used as the index/sort tracker

[15:29:53] <kali> richthegeek: this is not mongodb. i use string _id all over the place

[15:30:17] <richthegeek> we are already using Redis for a few other things in the system, but we need to have the sort persist across shutdowns

[15:30:24] <richthegeek> cloud architecture and all that

[15:30:57] <richthegeek> redis can do persistence of course, but it's not really a good solution here

[15:31:07] <richthegeek> the compound _o field seems to cover all use cases

[15:32:04] <kali> richthegeek:have you checked it's working ? i mean index { _o.0:1, _o.1:1, etc } ?

[15:32:12] <kali> richthegeek: because i'm not sure it does

[15:32:53] <kali> richthegeek: there is also the matter of the size. indexed values are limited to a few hundreds of bytes (about 800 i think) so if your fields are big, you can have problems

[15:33:22] <richthegeek> we're limiting it to a maximum of six sort columns

[15:33:42] <richthegeek> and we can limit string lengths to 800/6 without causing any problems that I can think of

[15:34:06] <richthegeek> I had no problem creating an index with rockmongo, btw

[15:35:34] <richthegeek> numbers as well, I suppose - 132 bytes is a massive number

[15:35:42] <Goopyo> kali: can you run two mongo processes on one database file?

[15:35:51] <kali> Goopyo: nope

[15:36:10] <richthegeek> sharding?

[15:36:36] <Goopyo> so a replica would always require a dataset dupe?

[15:37:21] <kali> Goopyo: well, that's the point of a replica, i guess

[15:37:35] <kali> Goopyo: what problem are you trying to solve ? :)

[15:38:25] <Goopyo> number of write operations expected for mongodb to be able to typically process per second

[15:38:33] <Goopyo> and the scalability of that figure

[15:39:00] <kali> Goopyo: 10ks of insert per second for a process are common

[15:39:08] <Goopyo> oh shit nvm

[15:39:25] <kali> Goopyo: but if you need to scale write, the answer is harding

[15:39:28] <kali> sharding

[15:40:35] <Goopyo> oh alright. Also does mongodb have a memory overhead on non-indexed things?

[15:41:11] <kali> i don't understand the question. but mongodb, as most databases, will be memory greedy

[15:41:27] <Goopyo> I want to feed in 500 datapoints of stock data every say .1s, if I only index the ticker and dt would the rest of the data have memory overhead or just the ticker and dt?

[15:43:23] <Goopyo> And also would you store that in 500 different collections named by ticker or store them in one collection with ticker indexed?

[15:43:31] <Goopyo> if you're only querying one ticker at a time

[15:43:48] <kali> Goopyo: it does not work that way. mongodb puts everything, documents and indexes in memory-mapped files. the kernel decides what pages go to disk or are allowed to stay in memory

[15:45:18] <kali> and honestly for this kind of time series data, you should consider using specialized databases like rrd or whisper

[15:45:53] <Goopyo> yeah, I was consdering something else… kinda wanted to minimized the architectural fragmentation since I'm already using mongodb

[15:50:58] <richthegeek> 500 tickers at a resolution of 100ms is "only" 5000 ops/s - Mongo should be sufficient for that surely?

[15:51:30] <kali> richthegeek: yes. but the document model induces an terrible overhead here

[15:51:57] <kali> it will waste memory and disk space like crazy

[15:52:17] <richthegeek> I guess so - if it's not persistent then RRD looks like a great fit

[15:54:10] <Goopyo> kali: yeah thats what I'm worried about. It is persistant though

[15:54:40] <Goopyo> anyway to make mongo not do memory mapping on the documents for a specific db/collection?

[15:55:05] <richthegeek> run a separate instance of mongo for this dataset and reduce it's max memory to 0?

[15:55:20] <Goopyo> I think then the indexes would be fucked

[15:55:34] <Goopyo> on dt and ticker that is

[15:55:48] <richthegeek> yeah i guess so..

[15:55:58] <kali> Goopyo: no. memory mapping is structural in mongodb

[15:56:48] <richthegeek> presumably the issue here is that you have data coming in that you need to read out very shortly after, so the cache usage is all over the place and memory IO is way too much

[15:56:55] <kali> what's your problem with persistence and rrd ?

[15:58:36] <Goopyo> Still looking into RRD. Not sure what value a timeseries db adds over a regular db...

[15:59:35] <kali> Goopyo: it's as compact a storage as possible

[16:00:54] <kali> for a given point in mongodb, you'll store a timestamp, the value itself, both fields names and various bson overhead

[16:00:57] <kali> plus the indexing

[16:01:17] <Goopyo> yeah my tests on it looked shitty

[16:01:19] <kali> in rrd files, you will just have the numbers

[16:01:48] <Goopyo> sqlite handled fine but no server, so I'll look into postgres and rrd

[16:02:02] <Goopyo> any other TS db's you know of that are popular?

[16:02:05] <kali> yes, sql would do a better job than mongodb here

[16:02:15] <kali> Goopyo: whisper, part of the graphite project

[16:02:38] <kali> Goopyo: it has some benefits, like being able to write at an arbitrary time position, something difficult with rrd

[16:03:36] <kali> Goopyo: i am not sure about how it will behaves with sub second intervals, now that i think of it

[16:56:24] <mrpro> hi

[16:56:37] <mids> ho

[16:56:38] <mrpro> MongBD.Driver.Bson references .NET 2.0?

[17:12:19] <jardineworks> hey guys -- when I run this command: mongoexport -v -d xperscore -c links ... I get as a response: Sat Oct 20 13:09:19 creating new connection to:127.0.0.1

[17:12:20] <jardineworks> Sat Oct 20 13:09:19 BackgroundJob starting: ConnectBG

[17:12:20] <jardineworks> Sat Oct 20 13:09:19 connected connection!

[17:12:20] <jardineworks> connected to: 127.0.0.1

[17:12:20] <jardineworks> exported 0 records

[17:12:34] <jardineworks> can someone tell me where I can look for more information to find out why 0 records were exported?

[17:14:04] <kali> jardineworks: the collection or database names are wrong ? the collection is empty ?

[17:16:45] <mrpro> hi

[17:16:53] <mrpro> if a replica becomes unreachable, what does master do

[17:18:56] <kali> mrpro: if it still sees a majority, nothing, it stays master

[17:19:13] <kali> mrpro: if not, it steps down

[17:22:35] <jardineworks> kali, negative on both... I can use the shell to connect to the database and run a query to find all the records in my collection

[17:24:13] <jardineworks> kali -- if I specify the dbpath though I get more details (I just realized that not specifying the path means it is looking in /data/db, which is not where my store is).

[17:24:44] <jardineworks> I get this instead --

[17:24:45] <jardineworks> Sat Oct 20 13:22:08 [tools] User Assertion: 10310:Unable to acquire lock for lockfilepath: data/db/mongod.lock

[17:24:45] <jardineworks> If you are running a mongod on the same path you should connect to that instead of direct data file access

[17:24:45] <jardineworks> Sat Oct 20 13:22:08 dbexit:

[17:24:45] <jardineworks> Sat Oct 20 13:22:08 [tools] shutdown: going to close listening sockets...

[17:24:46] <jardineworks> Sat Oct 20 13:22:08 [tools] shutdown: going to flush diaglog...

[17:24:46] <mrpro> well the question is

[17:24:47] <kali> jardineworks: if your mongod is running, you should not use the dbpath

[17:24:48] <jardineworks> Sat Oct 20 13:22:08 [tools] shutdown: going to close sockets...

[17:24:50] <jardineworks> Sat Oct 20 13:22:08 [tools] shutdown: waiting for fs preallocator...

[17:24:52] <jardineworks> Sat Oct 20 13:22:08 [tools] shutdown: closing all files...

[17:24:54] <jardineworks> Sat Oct 20 13:22:08 [tools] closeAllFiles() finished

[17:24:56] <kali> jardineworks: dont paste more than on line on irc

[17:24:58] <jardineworks> Sat Oct 20 13:22:08 dbexit: really exiting now

[17:25:07] <kali> use pastie.org

[17:25:20] <jardineworks> kali, sorry. I knew that ... just being lazy lol :)

[17:25:37] <kali> it's not lazy, it's rude.

[17:25:40] <mrpro> question is… will it buffer and overflow

[17:26:12] <mrpro> and how can my client know that it is happening so it can start thorttling in order not to kill mongo

[17:26:15] <kali> jardineworks: mongo 127.0.0.1/xperscore --eval "db.links.count()" this works and gives you the collection size ?

[17:26:23] <jardineworks> kali, ok, mongod is running, and I can see statements indicating connections are accepted, but the next line shows just an end connection

[17:27:27] <jardineworks> that says the count is 0

[17:27:39] <jardineworks> but when I run ./mongo

[17:27:48] <jardineworks> and do a find() on the collection, I get results.

[17:27:59] <kali> jardineworks: show me (not here) your mongo session

[17:28:41] <jardineworks> kali

[17:28:57] <jardineworks> kali, I think I am getting my lines crossed here. one sec.

[17:29:25] <kali> mrpro: i don't understand. if it can stay primary, it just works and the client do not have to worry about it

[17:29:40] <mrpro> oh right

[17:29:48] <mrpro> it wont try to sync up until soemthign reconnec

[17:29:51] <mrpro> reconnects?

[17:29:52] <kali> mrpro: the only think that can happen is a journal overflow, but even that will not "break" the client application

[17:29:56] <mrpro> and when it reconnects, that'll be done in bulk?

[17:30:20] <mrpro> i am just wondering when i do SafeMode.W

[17:30:22] <kali> when the secondary comes back, it will fetch the journal

[17:30:26] <kali> and replay the writes

[17:30:34] <mrpro> to make sure it goes to all replicas when i write

[17:30:43] <kali> well, it's not the journal, it's the replication log

[17:30:56] <mrpro> so replicalog can overfill?

[17:31:08] <mrpro> i dont want this thing to cascade and bring down the primary

[17:31:16] <jardineworks> kali, I am new to this so please bare with me. If I start mongod wieth a specific dbpath, then start the mongo shell, anything i run in the shell will happen against the runnig mongod instance right?

[17:31:36] <kali> mrpro: ha ! use "majority" in your writeconcern spec

[17:31:46] <mrpro> yea umm

[17:31:58] <mrpro> hm

[17:32:11] <kali> jardineworks: yes, unless you specify an alternative host and port

[17:34:08] <jardineworks> ok I have not... just mongod --dbpath data/db ... which means that it should be running on 127.0.0.1:27017

[17:34:14] <jardineworks> but my collections aren't showing

[17:34:25] <jardineworks> let me try importing them again.

[17:36:24] <Aartsie> when i do db.stats the datasize is in bytes ?

[17:37:11] <kali> Aartsie: http://docs.mongodb.org/manual/reference/database-statistics/#fields yes

[17:40:18] <jardineworks> kali, I guess murphy's law prevails... that and I obviously screwed soemthing up :)

[17:40:48] <jardineworks> kali, works now... thanks for the ear.

[17:45:39] <mrpro> is there a mongostat -> graphite daemon?

[18:40:50] <TheFuzzball> Does anyone have experience with mongoose? I want to know how best to approach creating a singleton collection?

[19:46:33] <mrpro> is there a mongostat -> graphite daemon?

[19:59:56] <hillct> Good afternoon all. I've run into a geospatial 2d index use error where it reports the index isn't available but it is, according to getIndexes() as seen here: http://pastebin.com/REQmDyiX Can anyone provide some guidance as to how to address this? Is it possible to use additional conditions in a geospatial search?

[20:09:46] <hillct> I've updated my code to eliminate the _id $lt query condition and still, it throws an error about the 2d index http://pastebin.com/hAcJKy5X Now even more confused

[20:14:14] <the-gibson> Hello everyone

[20:15:10] <the-gibson> quick question, am I understanding the doc correctly that --keyfile is only for internal use betweem shards? as in it cannot be used to connect from a mongo client to a mongod host

[20:15:45] <the-gibson> s/shards/replica sets and shards/

Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 20th of October, 2012