Archive for the 'Operations' Category

[RESOLVED] Multiple Known Issues

Friday, March 21st, 2008 by: Joppa Linden

[RESOLVED Mar 21 2008 5:09 PDT] - The issues should all be resolved now. Please contact support if you experience any related problems.

[UPDATED Mar 21 2008 3:03 PDT] - Our teams are continuing to address these issues; however, we have no new information to provide at this time. Please continue to watch this blog for further updates.

We are aware of multiple issues affecting the whole of Second Life and are working to resolve them as quickly as possible.

Please refrain from uploading textures or making any transactions for the time being. You may notice problems with logging into Second Life, grey textures/avatars, inability to Teleport, failing transactions.. among other things.

We will update this blog as soon as we have information to update with.

[Update 12:26 PM on 3/05/08] Scheduled Maintenance is complete!  Thank you for your patience.

Second Life will undergo scheduled maintenance for Asset Server work from 7:00 AM to 11:00 AM PST on 3/05/08. We do not expect to take Second Life down during this time, and there should be no effects visible to Residents. Because of the nature of this work, there may be some downtime if something should go wrong. We will post here if any problems arise during this scheduled maintenance.

Linden Lab Production Operations has an open position for an Australia-based Production Operations Engineer. The Production Operations team is responsible for ensuring that the Second Life grid, the world’s largest collaborative real-time development environment, is up and running.

Linden Lab Operations is a Debian Linux shop. We rely extensively on OSS, and our in-house systems are usually written in Python or PHP. Our team is made up of folks who have been involved in large-scale grid management and site operations for years.

We’re looking for people who can rapidly pinpoint and diagnose network failures, deployment issues, and performance bottlenecks, who can also create tools which will improve grid stability. Production Operations works extensively with the Concierge, Governance, I-world, and Development teams to triage and respond to grid problems; therefore, the ability to communicate effectively with techies and non-techies is critical. The successful candidate will have substantial *nix experience and script-fu, familiarity in managing large system installations, and no fear of complex, dynamic systems.

If this sounds like you, please click here.

Don’t you want to save the world?

[UPDATE 6:00AM PST 2008-03-01] The final batch of regions are now undergoing IP address renumbering. This should last for only a couple of hours. After which, there should be no more noticeable disruptions to residents as the final loose ends are tied up behind the scenes. Thank you all for your patience and understanding through this long process. Thanks to the Lindens who did the renumbering. Prospero, Neuro… muah! We love you! Who am I forgetting? Thanks to… *Oscar music swells and Chiyo is pushed offstage* -Chiyo

[UPDATE 8:00AM PST 2008-02-26] The next batch of regions will be undergoing IP address renumbering and is commencing now. Thank you for your patience while we work to improve the Second Life experience.

[UPDATE12:29AM PST 2008-02-26] Another batch of regions have successfully been re-numbered. More regions to be re-numbered in due course. Thanks for supporting us with this change. We’ll try to make it as painless as we can :) - Matthew

[UPDATE 11:05PM 2008-02-25] The next batch of regions undergoing IP address renumbering is being started now. Downtime for regions in this batch should be limited to a normal restart. Watch this post for further updates as events warrant. - Twilight

[UPDATE 9:38PM 2008-02-24] The IP address renumbering for today’s batch of regions has been completed. Please refer back to this post for information on the next batch of regions. - Twilight

[UPDATE 08:19AM 2008-02-24] So far things are running smoothly. Since this process will be ongoing for most of the day, we will update next time when we’re finished for today. –Lotte

[UPDATE 06:45AM 2008-02-24] We are now tackling the next batch of regions, most of which should not take more than a few minutes to come back up. In some cases, it might take up to 30 minutes for the region to be picked up by a new host. If you see a longer downtime than that, please contact our support team to let us know. –Lotte

[UPDATE 4:02PM 2008-02-23] The first batch was particularly troublesome, but should be starting to come back online as we speak. Subsequent batches should go much more smoothly. -Chiyo

[UPDATE 3:05PM 2008-02-23] Please note that this process is different from a normal server update, where the region usually comes back online in a few minutes. This process is much more involved and can take perhaps as long as 30 minutes (possibly longer for the first few groups), but we are looking into ways to reduce this time as much as possible for future batches. -Chiyo

Starting on Wednesday February 20th Thursday February 21st, we will begin moving about a third of the simulator nodes onto a new IP subnet. We will be restarting affected regions in small batches over the next two weeks as part of this process. By spreading the restarts out we hope to minimize the impact of this change. Prior to a region restarting, you will receive an inworld notification similar to a rolling restart. Due to the way regions map to simulator nodes, we will not be able to provide a schedule of when each region will be restarted.

(more…)

Second Life Scheduled Maintenance, 2/29/08

Thursday, February 28th, 2008 by: rheyalinden

Second Life will undergo scheduled maintenance for Asset Server work from 7:00 AM to 11:00 AM PST on 2/29/08. We do not expect to take Second Life down during this time, and there should be no effects visible to Residents. Because of the nature of this work, there may be some downtime if something should go wrong. We will post here if any problems arise during this scheduled maintenance. Thank you for your patience.

[25 Feb 08, 1:14 pm PST]

Operations is currently pushing a fix to all regions which will repair summary statistics for classified ad clickthroughs.

This patch will not require simulator restarts.

[EDIT] This is a fix to VWR-3200.

Second Life scheduled maintenance is complete on 2/15/08.  Thank you for your patience.

Intermittent Issues with In World Voice

Tuesday, February 12th, 2008 by: Teeple Linden

[ 12 Feb 08, 3:00 p.m. PST --teeple]

The Operations Team is tracing a leak in the voice chat system.

Over time, backbone services for various regions can become bloated and cause failure in group voice chats. Operations is activating debugging code to better assess the bug. Until there’s a fix, backbone segments will periodically have to be rebooted, which will cause about 4-5 minutes of presence issues for avatars who happen to be associated with one another through those segments. Mainly, residents may find themselves temporarily unable to see friends online status correctly.

We apologize for the inconvenience.  We have multiple developers working to resolve this issue.

On January 15th we started noticing a small number of Regions that lost many or all of their objects after the Region crashed. This required us to rollback those Regions in order to recover the Region’s assets. This ranged from about 6 to 15 Regions per day. This coincided with a spike in Resident support tickets regarding large numbers of objects vanishing from their Regions. After an intensive effort we tracked the problem down to an Internet networking hardware problem between our data centers at one of our service providers. The problem was reported to them and then promptly fixed on Saturday, February 2nd. After heavy testing we have confirmed this problem has been fixed.

[Update 12:40 AM PST] RESOLVED. All regions have now been restarted. - Matthew (more…)