We’ll be performing some minor maintenance to the site at 00:00 on 04/04/09. Expected outage is approximately an hour and a half.
We made some minor modifications to the live site today, to fix some categorisation issues that people had reported. People may have experienced occasional slow loading/waiting for pages to load as new releases were made.
Category: Status and Updates
We are currently seeing occasional image upload issues. We believe these are due to problems at our upstream image hosting provider. We’re contacting our media hosting provider and will continue to monitor the situation.
Category: Status and Updates
At approx 16:40 one of the servers in our cluster stopped serving webpages, action was taken to restart the webserver and the service was made available again. Service outage for customers using this server was approximately 15 minutes.
We seem to be experiencing recurring issues with this server, so we’ll be investigating the cause next week, if we experience another outage.
Category: Status and Updates
We just experienced a webserver hang on one of the servers in the cluster, that had to be manually restarted. The automated service monitor failed to restart the affected server. Users connecting to this server would not have been able to load Folksy. Outage time was about 20 minutes.
Category: Status and Updates
We just experienced a temporary loss of connectivity to our server cluster. Connectivity has been restored and the service is back to normal. Total service downtime was about 40 minutes from about 16:45.
I’ve opened a support ticket with our hosting provider to determine the cause of the connectivity loss, which was apparently due to a firewall failure.
Category: Status and Updates
The next scheduled maintenance period will be 21st March at 00:00. Expected outage is an hour and we’ll be aiming to roll out international shipping, providing testing goes to plan.
*Update* : Scheduled maintenance was completed successfully, although took longer than planned. Total service downtime was two and a half hours. International shipping is now live.
*Update2* : It looks like one of the servers in the cluster didn’t restart properly after the new code was deployed. Any customers hitting this server would have been served an error page, unfortunately the service monitoring was disabled during the new rollout which is why this wasn’t detected sooner. Outage for this server was from around 2.30am untill 11am.
Category: Status and Updates
We’ll be taking the service down at 00:00 on the 17th of March, for approximately an hour to roll out some minor changes and all things going well, international shipping.
*Update:* The maintenance was completed successfully and required no outage. We decided against rolling out international shipping, due to some unfinished testing requirements. We’re aiming to roll out international shipping as part of the next scheduled maintenance period.
Category: Status and Updates
We just experienced one of the servers in our cluster stop serving webpages. Customers accessing this server would have been unable to access the site (blog, forum, and the main site), the webserver refused to serve any pages, so would have seen no page returned at all.
Our support team couldn’t find any obvious issue and restarted the service. Total outage of service was about 30 mins, and only affected customers accessing that particular server, customers accessing another server were unaffected.
We suspect this issue is related to moving across to the new platform, so we’ll try and collate more information about the cause of failure if we see it happen again.
*UPDATE* : We’ve added some extra service monitoring so that the webserver will automatically restart if this condition persists for more than a few minutes.
Category: Status and Updates
From around midday users may have experienced problems with certain pages on the site, receiving the ‘broken sofa’ error page. During routine maintenance work, the search index was corrupted which caused these error pages. The error was spotted a couple of hours later (via a twitter report) and was rectified. This caused issues such as sellers being unable to create shops or items and buyers not being able to checkout.
We’ve added further monitoring of the site (site search to be specific) which should allow the support team to to respond to these kinds of issues much quicker in the future.
We will be regenerating the search index at midnight tonight which will require the site to be unavailable for approximately two hours.
*UPDATE:* This index regeneration was completed successfully, the service was unavailable for approximately 1.5 hours.
Category: Status and Updates
