Backward compatible
Stuck SSH session

When logging from my laptop to remote SSH servers I had a strange problem. Whenever a big chunk of text needs to be returned, my SSH session would stuck and completely stop working. It would not disconnect, but just stay there doing nothing. I would have to log in again. By “big chunk” I mean something like 20+ lines. Output of “ps ax” for example.

This mad me so mad, because if was working on server for a few minutes making sure that I “head” and “tail” every command to reduce output and then I would forget that some command might output more. For example, using “vi” or “mcedit” was completely impossible.

My Internet connection goes through PPPoE. Websites work fine, HTTP works really well, but SSH… no go. The server on the other side is behind a firewall, so tunneling and port forwarding are here.

I searched around, and found that TCP/IP packet size might be the problem, so I tried different MTU values for my PPPoE connection, but without much luck. I was able to get a little bit more before it would stuck again.

And then I landed on this Debian bug report from 2005:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=296811

Apparently still valid. It looks like it only relates to some D-Link routers, although I have no clue what’s at the other end where the server is connected. The solution is to reduce MTU server-side. Luckily, I can still run a one-liner command, and so I did:

/sbin/ifconfig eth0 mtu 1000

Everything runs fine now. I just wonder if this would decrease server through-output on the local LAN where it runs.

Reducing dentry (slab) usage on machines with a lot of RAM
Recently I switched my main website from 2-core AMD 4GB RAM machine to 8-core 16GB RAM Intel i7 one. I also switched from CentOS 5 to CentOS 6. I set up everything the same, but suddenly the system was using much more RAM than before. And I’m not talking about filesystem cache here. I thought that increasing RAM would only increase filesystem cache, but something else was occupying RAM like crazy. Looking at output of “free”, “top” and “ps” I simply could not determine what eats RAM because running processes were fine.
So, I googled a little bit, and found that problem was in dentry cache used by Linux kernel. You can see the kernel memory usage with “slabtop” command, and my dentry was crazy, something like 5GB and growing. Googling even more, I found horror stories about servers going down, OOM killing vital processes like Apache or MySQL, etc. So I wanted to stop this.
Quick fix is to clear the cache manually. Some people even “solved” this problem by adding the command to cron job.
echo 2 > /proc/sys/vm/drop_caches
On the MRTG screenshot you can see the dentry cache size in megabytes marked as a blue line. 4000 means 4GB of cache. I have 16GB, remember. When you run the drop_caches command above, you get the effect marked by the red arrow.
I did not like the approach of adding this to crontab, so I investigated further, asked at mailing lists, learned that Linus himself says that “unused memory is dead memory” and that’s why kernel is hungry. Still, I decided to reduce the hunger and added this to /etc/sysctl.conf
vm.vfs_cache_pressure=10000
That did slow it down, but it was still growing. You can run sysctl -p to apply changes to the running kernel without restarting. Next I added these as well:
vm.overcommit_ratio=2vm.dirty_background_ratio=5vm.dirty_ratio=20
However, it was still growing, and I decided to leave it be and see what happens. Is my server going to crash, become unavailable, or something. 24 hours later, dentry was again going up like crazy and suddenly it dropped. By itself. See the blue arrow in the screenshot. It seems like kernel figure out that RAM is going to be exhausted, filesystem cache would be reduced, etc. After this point, everything went back to normal.
I tried this experiment again, about a week later, with same results. High-rise, drop and things going back to normal. So, if you’re worried your dentry cache is growing like crazy, don’t. Just tweak those settings in sysctl and wait for at least 48 hours before drawing any conclusions.

Reducing dentry (slab) usage on machines with a lot of RAM

Recently I switched my main website from 2-core AMD 4GB RAM machine to 8-core 16GB RAM Intel i7 one. I also switched from CentOS 5 to CentOS 6. I set up everything the same, but suddenly the system was using much more RAM than before. And I’m not talking about filesystem cache here. I thought that increasing RAM would only increase filesystem cache, but something else was occupying RAM like crazy. Looking at output of “free”, “top” and “ps” I simply could not determine what eats RAM because running processes were fine.

So, I googled a little bit, and found that problem was in dentry cache used by Linux kernel. You can see the kernel memory usage with “slabtop” command, and my dentry was crazy, something like 5GB and growing. Googling even more, I found horror stories about servers going down, OOM killing vital processes like Apache or MySQL, etc. So I wanted to stop this.

Quick fix is to clear the cache manually. Some people even “solved” this problem by adding the command to cron job.

echo 2 > /proc/sys/vm/drop_caches

On the MRTG screenshot you can see the dentry cache size in megabytes marked as a blue line. 4000 means 4GB of cache. I have 16GB, remember. When you run the drop_caches command above, you get the effect marked by the red arrow.

I did not like the approach of adding this to crontab, so I investigated further, asked at mailing lists, learned that Linus himself says that “unused memory is dead memory” and that’s why kernel is hungry. Still, I decided to reduce the hunger and added this to /etc/sysctl.conf

vm.vfs_cache_pressure=10000

That did slow it down, but it was still growing. You can run sysctl -p to apply changes to the running kernel without restarting. Next I added these as well:

vm.overcommit_ratio=2
vm.dirty_background_ratio=5
vm.dirty_ratio=20

However, it was still growing, and I decided to leave it be and see what happens. Is my server going to crash, become unavailable, or something. 24 hours later, dentry was again going up like crazy and suddenly it dropped. By itself. See the blue arrow in the screenshot. It seems like kernel figure out that RAM is going to be exhausted, filesystem cache would be reduced, etc. After this point, everything went back to normal.

I tried this experiment again, about a week later, with same results. High-rise, drop and things going back to normal. So, if you’re worried your dentry cache is growing like crazy, don’t. Just tweak those settings in sysctl and wait for at least 48 hours before drawing any conclusions.

Disabling alerts stops JavaScript execution in #Firefox

Today I learned about interesting issue with newer versions of Firefox (I use FF7). It has a nice web developer-friendly feature to disable alerts. This is really useful when you place alert() by mistake in some loop and you can’t get out because as soon as you click OK, you get another one.

New Firefox has a checkbox to disable future alerts. And this is great. So, what’s the problem? Once you disable alerts, and javascript code is executed that would display it, it does not keep running, but rather throws an exception. This does not look like correct behavior to me.

Imagine a web application that alerts user about something and then keeps running to finish the job. If user disabled alerts because he was in a hurry and clicked fast on different message boxes, the script would not keep going but stop. And there is no way to revert that short of reloading the page (yikes!).

I found a workaround, I created a function called tryalert that wraps the alert in try..catch block. It looks like this:

function tryalert(message) 
{
    try { alert(message); } catch(e) {}
}

This is a fine workaround. Now instead of alert() I call tryalert() and although the alert is not displayed anymore, the code keeps going as if user has been alerted.

The problem is introducing tryalert to ALL applications I’ve written so far. It’s impossible. I hope Firefox team changes this.

Unfortunately, Quicken Home Inventory does not work on Windows 7, and you might have a hard time switching to another program because QHI does not have an option to export the data.

However, there’s a way to work around this. A program called Attic Manager, can import the data directly from Quicken database, even if you don’t have Quicken installed. It even works on 64bit Windows. You just need to have your QHI.MDF database backup file.

Once data is in Attic Manager you can export it to CSV format which can be imported into Excel, OpenOffice and almost all the other Home Inventory software. Or, maybe once you try it, you would stick to using Attic Manager.

How to use Quicken Home Inventory on Windows 7 [SOLVED]

If you are looking for a way to use all the data you have already entered on Windows 7 box, you came to the right place. Although the short answer is: “you really can’t do that with QHI”, there is an easy solution to this problem…

There is a nice inexpensive replacement called Attic Manager, which is able to load data from QHI even on Windows 7 computer without Quicken instalation.

It can load locations, categories, items and images (photos) of items.

Most importantly, it runs on all modern operating systems including Windows 7 and various Linux distributions.

If you don’t have access to your old copy of QHI or Quicken Classic, it does not really matter, because Attic Manager can load the data directly from QHI database.

P.S. If you decide to buy it, use the coupon code CNVRT4 to get 40% discount off the price.

Building wxWidgets 2.8.12 on old MinGW with GCC 3.2

I had a application using wxWidgets 2.8.0 and then 2.8.8 in production. There were some bugs in earlier wxWidgets versions on Linux, so printing was not working properly. I decided to upgrade wx and that fixed it. Now I wanted to use the same version for Windows version of my application. I originally used some (now old) MinGW version and just wanted to rebuild and be done. But, I got build errors instead. I don’t really last time wxWidgets failed to build so I asked at mailing list and finally dug into the source code myself.

It looks like wx code is all fine, but there are problems in MinGW headers. I particular, you need to edit the file C:\MinGW\include\winspool.h and change DocumentPropertiesW function’s signature from:

LONG WINAPI DocumentPropertiesW(HWND,HANDLE,LPWSTR,PDEVMODEA,PDEVMODEA,DWORD);

to:

LONG WINAPI  DocumentPropertiesW(HWND,HANDLE,LPWSTR,PDEVMODEW,PDEVMODEW,DWORD);

It seems to be already fixed in newer MinGW versions.

nginx hogs cpu when proxying large files

I have a server where nginx is used as frontend for Apache. nginx serves static content and Apache serves PHP pages. This is a common setup.

Today I migrated stuff to a new server and needed to copy a 7GB database file to another server. I figured HTTP would be fastest way to do it. Unfortunatelly, DNS change already went thought so I could not serve the file on the static domain nginx was configured for.

I thought “nevermind”, placed file under one of those domains handled by Apache and started the download. It was going fine at 11MB/s for some time. However, soon it started to crawl at 850KB/s. I suspected network problems, but everything else was running fine. I looked at process list and whoa, nginx using 99% of CPU. Because of this single download, the server was brought to its knees and no other client could even get a simple “Hello World” page.

I stopped the download at the client side and nginx soon recovered (not restart needed). Then I edited /etc/hosts and place the old IP address of static domain and continued the download (wget -c). It finished few minutes later with 11MB/s average.

Merging a huge git conflict

Both me and my colleague work separately working on same git tree while being offline for a couple of days. Result: following “git pull” I got a huge conflict spanning about 100 of files.

This meant that manual resolution is out of question. Enter “git mergetool” and “kdiff3”. I installed kdiff3 from linuxpackages.net (version is for old Slackware 11.0, and I had to symlink /opt/kde/kdiff3 to /usr/bin/kdiff3 so that git finds it).

git-mergetool calls kdiff3 for each file, you merge and save. Job done very quickly.

Google Apps problems

Seems to be a fine day at Google today, perhaps engineers are pulling hair.

This morning, I was looking at a spreadsheet in Google Docs and suddenly some 20 values simply vanished right before my eyes. I wasn’t even working anywhere near that part of the sheet. I was inserting new values at bottom and somewhere in top-right corner the values were gone. I tried undo and to scroll around (big sheet) and only when I switched to another sheet and came back the values showed up again. Phew. From now on I’m doing export and download to my computer every time I finish editing.

Few hours later, a new issue. Looking at a spreadsheet I selected Save from the menu. It said that it’s ok. I did some changes, clicked Save, got no error but the screen read “Last saved 2 minutes ago”. Ok, maybe it’s just a minor glitch. 15 minutes later I tried to save again. Once again, no errors, but it still says “saved 17 minutes ago”. At this point I was confused whether save is not possible or the message is simply wrong. I exported the document to xls format, checked in OpenOffice and then closed the browser tab.

Three strikes and blog post is out. I just had another issue, now with GMail, so I guess it’s time to make all this public. I wrote an e-mail message and it said “Your connection to GMail has expired. Please log in again.” Ok, it’s not like I haven’t seen that one before, but it’s been almost a month. I though they had it fixed. I logged out, logged back in and… it still does not allow me to send an e-mail. I can read messages fine, but as soon as I try to post, I get a warning that “Your connection to GMail has expired. Please log in again.”.

Oh well, I guess we get as much as we payed for it ;)

Stackoverflow.com scaling problem

I have been stackoverflow user almost from the very start of the website. I recall reading some Jeff Atwood’s blog posts and thinking how naive he is. He has a classic case of Microsoft fanboy-ism. He swears by .net and MSSQL server and spits on Linux, PHP and… well… entire LAMP stack.

When stack became popular the website started to get a lot of traffic and Jeff was all like “Oh we don’t need all the scaling technology that all the web companies have developed since web 1.0 till today. We’re smarter, we use the all-powerful Microsoft stack, we’ll just buy more RAM, more CPU and keep it all on single machine. Machines are so powerful these days and cost almost nothing”. How little did he know.

As more and more people use the website it seems that they reached the limit of what is possible. Stack website is now inaccessible for days. By inaccessible, I don’t mean that the site does not open. It just open waaaay too slow to be usable. I sometimes wait 5 minutes to get the home page.

What I really regret is all those dumb readers on Jeff’s codinghorror blog, and all those fanboys on stack website. Some people tried to tell Jeff that this would happen, but he would not listen. He was very arrogant and dismissed all that as LAMP-crap. All his followers blindly followed his thoughts as if they really wanted that to be true. Psychology of a herd, I’d say.

Oh well, too bad that public access to such valuable resource is now limited because of stubborn owners. Maybe it’s time for a real competitor to step up, with a simple slogan: “just like stackoverflow, except that it really works”.