Rolling Restart Tue-Thu 07/15-17 to deploy server version 1.23.1
Tuesday, July 15th, 2008 at 7:44 AM by: Prospero LindenUpdate 2008-07-16 05:40am : We have reverted the ~1000 hosts on 1.23.1 to 1.22.4.
Update 2008-07-15 05:28pm : An issue with names showing up as “(???) (???)” in estate ban lists is showing up on the regions which have been updated to 1.23.1. We tentatively plan to revert those regions back to 1.22 by tomorrow morning, and will probably slip the 1.23 roll-out by another day. We will also be analyzing server crash data from this pilot roll to look for other issues not previously identified, before making a firm decision. – Joshua Linden
Update 2008-07-15 02:22pm : The pilot roll to 1174 regions is complete. However, because of an error Prospero made when starting the roll, there are about 300 regions that will remain down for another 10-20 minutes. For this, he apologizes.
We have identified and fixed the memory leak that was in server version 1.23.0. As such, we will be rolling out server version 1.23.1 to Second Life this week. This includes all of the fixes from 1.23.0– see 1.23.0 blog post for a full list of changes– as well as a fix of the object text newline bug (SVC-2633), and the memory leak.
The server will be rolled out according to the schedule
- Tuesday, sometime during the day : a pilot roll to 1000 regions. We are going to do a larger-than-usual pilot roll to have a large enough sample to verify that there are no other memory leaks beyond the one we’ve discovered and fixed.
- Wednesday morning, 5AM-10AM : we will deploy server version 1.23.1 to half of Second Life.
- Thursday morning, 5AM-10AM : we will deploy server version 1.23.1 to the rest of Second Life.
As usual with rolling restarts, this is a change on the server side; there will be no required client udpates associated with this rolling restart. Regions will receive warnings starting five minutes before they are restarted. There is no way to delay the restart of a given region. Regions should restart within 10 minutes of going down. If your region stays down for more than 20 or 30 minuets, please contact support.


July 15th, 2008 at 8:17 AM
/me crosses her fingers. Good luck, Prospero!
Is it odd-numbered hosts first? (is it always odds first?)
Does this push the MONO plan back any?
July 15th, 2008 at 8:40 AM
Hopes for the best and that the world does not collapse
I guess mobit and Turbo wil be as usuall on the last wave of the deploy.
Looking forward to a more stable version.
July 15th, 2008 at 8:44 AM
The hosts for Tuesday will be randomly chosen.
I’m not going to tell you odd vs. even for Wednesday, but I will tell you why I’m not telling you. The reason is, it would give the false impression that I’ve given you useful information. Regions are *not* locked to hosts, except during rolling restarts. Each time a region restarts, it starts up on the next available host, which could be any host of the same class.
Once the rolling restart begins, I will announce whether I’m doing the odd or even hosts, because regions *are* locked to restart on the same host during a rolling restart.
Re; MONO : the default plan is that, if all goes well, Mono will go the second week after the 1.23 deploy. That would mean, if all goes well, Server version 1.24 (including Mono) would be deployed the week of July 28. This will slip if there are more serious bugs discovered in 1.23 (requiring either a delay of the 1.23 roll or a patch roll), or if it takes longer than expected to get Mono (and some other patches) merged into the server code and stabilized.
July 15th, 2008 at 8:47 AM
A couple of thoughts:
1) there is a huge event going on this week in SL called Relay For Life. A heck of a time to be doing rolling restarts with all the events planned, and have been in the planning stage for months.
2) Maybe for the initial “testing” rollout you could have people sign up sims for those, just like you did for the havok 4 testing. This list would always be used for the first phase of the rollout and maybe could be tested for a week before full rollout, this would give you a better test bed. then you can rollout to the rest of the grid the following tuesday and wednesday. I would be one to sign up a couple of sims for this, both full and openspace.
July 15th, 2008 at 8:50 AM
Yuk. no solid info. No solid releases . double restarts last week on regions . Weekly restarts now. And even a gamble on weather or not this release will work .
lol Tuesday, sometime during the day : a pilot roll to 1000.
Why no time? not ready yet? going to just throw it to the dogs and see?
July 15th, 2008 at 8:52 AM
Good luck with the roll Prospero! I’m really looking forward to the fixes in 1.23.1 especially the Group IM bug fix! Keep up the good work!
July 15th, 2008 at 8:54 AM
Spritely, re: (2)… the plan is to develop a list of regions designated for the initial rollout to use in the future, including those who want to opt in to it. That still won’t be for a full week though. However, we are putting major upgrades on the Preview Grid a week before it gets deployed to Second Life, and are encouraging people to test it. (The reason this week’s deploy wasn’t on the Preview Grid for a full week is that it contains just two very small, scoped changes to the 1.23 code which has now been on the Preview Grid for nearly two weeks.)
July 15th, 2008 at 9:07 AM
I’ll sit here quietly and wait for the MONO roll out to go horribly wrong and the subsequent rollback to 1.23
July 15th, 2008 at 9:09 AM
number 3 no gambling allow !!!!!
July 15th, 2008 at 9:09 AM
Hope it goes well for your team Prospero. Nobody likes a rollback especially the people that have been busting hard to get it done in the first place.
July 15th, 2008 at 9:12 AM
Ok. Thanks for the info!
I do understand about sims coming up on different hosts but, since H4, we don’t get sim restarts very often any more!
I still think it’s sorta-useful to know odd/even ahead of time but that’s mostly because I’ve been drooling over SVC-2485 ever since Simon said he had a fix about a month ago.. :\ No worries if you think it’ll do more harm than good to let us know before the roll starts - I can wait a little more.
July 15th, 2008 at 9:16 AM
I have high hopes for these next few releases… Mono will be a huge win. I never thought I’d say it, but I AM seeing better stability on the asset servers too! I’m amazed!
I am seeing an increased level of region handoff problems, and teleports seem to be a lot slower the past week and are failing a little more frequently. Not sure if this is a networking issue or what…
July 15th, 2008 at 9:42 AM
greetings, sorry off topic post - only sl web page i can get to. is SL down can’t login to world, sl homepage, support, grid status. is anyone else having these probs (past 12 hours)?
best of luck with ver 1.23
thanks
July 15th, 2008 at 9:47 AM
I just wanted to say that the SL environment has been more stable than I have ever known. I am sure a lot has to do with the latest Release Candidate. It almost refuses to be broken. Somehow, someone, somewhere in LL is getting it right. My thanks.
July 15th, 2008 at 10:02 AM
SL has been stable, I’ve only had a few drops from region crossing in the past few weeks, and only a few cases where sculpt textures haven’t loaded until I’ve selected an object. But I’ve been seeing longer “grey times” where I’m waiting for objects, textures, and avatars to rez in after teleporting than I’ve seen for a long time.
I can certainly live with that, if that’s the cost of a more stable SL!
However, if you implemented SVC-2413 then the “grey time” would be less of a problem, because things closer to the avatar would rez first, so the process would be less noticeable.
July 15th, 2008 at 10:27 AM
Re: things being more stable, it’s not your imagination. We have two central systems which have been the source of most of our problems in the last month. There’s the central database cluster, and the asset server. (They are two different things.) We have modified some code to take load off of the central database server, *and* upgraded it to a beefier machine, so it’s in fairly good shape at the moment. (It’s not perfect, and problems elsewhere– as we’ve had once or twice with some code on other central servers– can still cause too much load to be directed to the central database machine. But it’s better than it had been a couple of months ago.) The asset server was in better shape after we swapped from the one in SFO as primary to the one in Dallas as primary. (This allowed us to get the techs from the solution provider in to do some work on the SFO cluster.)
Hopefully we’ll be able to keep our heads above water.
Re: things staying grey longer, I’ve noticed that too, and have heard it from some others. At this point, though, alas, it’s at the stage of several of us noticing that “it seems to be taking longer for things to rez”. Problems with region transitions could well be related to this. We haven’t yet nailed this down to even know where or what the issue is.
July 15th, 2008 at 10:32 AM
Dagnabbit, I can’t think of a way to test the change in rezzing without a time machine.
Still, improving the initial interest list order would be a win… it’s not like initial rezzing was *ever* fast.
July 15th, 2008 at 10:35 AM
Good Luck Prospero.
“We who are about to die salute you….”
LOL
*runs for the fallout shelter
July 15th, 2008 at 10:41 AM
LOL@18, We have been through worse and it has ALWAYS gotten better given time! Good Luck Prospero and CHEERS!
July 15th, 2008 at 11:18 AM
Yep … stabler … a lot in spite of over 60K logons, especially in the weekend and while being in a fully packed sim. Only ‘crashed’ ( proces termination ) once or twice. Visited and listened to a live gig whithout hassle or crashing. Encouraging to see and actually experience stability improve .
One oddity … At 11:00 am my time today ( GMT +1:00) i got the message ‘region full’ when requesting friends to be TPd to me. The sim i live upon had to be restarted to fix that. Dunno if that’s related to this roll-out.. and now my objects keep jumping up and forth whilst editing ..extremely annoying .. anywho..
Oh.. and does the memory leak fix mean my FPS will not drop from 18 to 4 after a certain amount of time ? I sure hope so.
Kudos.
July 15th, 2008 at 11:23 AM
Hi Prospero.
Hello everyone.
I am having a good feeling about it… and we will have a better SL in version 1.24 with Mono.
I am using the viewer 1.19 that is loading things faster than lastest Release Candidate and it is really more stable than before.
The unique bug I have is when I TP home that it is over 500m and the floor become phantom. I didn’t find any ticket regarding this, but I am sure it will be fixed soon
GOOD LUCKY
hugs
July 15th, 2008 at 11:25 AM
@16
There’s the central database cluster, and the asset server. (They are two different things.) We have modified some code to take load off of the central database server,
Could this possibly be the reason for longer load times of textures causing more grey time? Just guessing here though.
July 15th, 2008 at 11:26 AM
@20.
If you are still using the standard client 1.19 — there is a bad memory leak in that, it has been cured in the RC version, and is amzingly stable now
July 15th, 2008 at 11:33 AM
@20 This is a memory leak for the server thats fixed. So it wont help your computer, it will help the computer the server is running on ^_^
July 15th, 2008 at 11:40 AM
something is not fixed again same as the other day it keeps making me lose my damn internet connection i do not know what you guys have done but its not good again
July 15th, 2008 at 11:44 AM
@24.
There is nothing that LL can do to make you lose your internet connection, I suggest you contact your ISP, or check your modem connections
July 15th, 2008 at 11:45 AM
Sorry to be off topic, but is anyone having problems logging into sl itself and the website?
July 15th, 2008 at 11:48 AM
@26.
Ska, it all seems ok to me, have you changed anything since your last loggin, firewall maybe?
July 15th, 2008 at 11:51 AM
Yes, I am having issues too Ska - Since today, without having changed anything.
July 15th, 2008 at 12:01 PM
@25 oh yes there is i have been on SL along time and many times i have had my ISP test it while SL is doing upgrades and it will lose connection now please don’t tell me it cannot happen… last week this same issue happened now they are started rolling restarts again and im having disconects again it is ONLY while on SL and it happens to my PC and my laptop same
July 15th, 2008 at 12:28 PM
@Taff: Hiya, and thanks for your reply. I didnt change anything, checked with the modem, turned off firewall just in case, reinstalled sl… finally to find out my alt logs in, my main just doesnt.
@Sandling: Sorry to hear that Sandling, but I am glad I am not the only one. I just tried logging in with my alt though, and that worked just fine. It’s just my main that refuses to log in. Waaaah!
July 15th, 2008 at 12:30 PM
Disgusted, perhaps SL is your only continuous connection? Most Internet connections are self-healing and you wouldn’t notice a disconnect even when browsing. I only notice it on occasion because the laptop has to toss up a large splash screen when it reconnects. The other machines aren’t so profusive.
I’d love to see an explanation sometime of how the various servers work together. I assume each application server isn’t a single machine, but a cluster for each (if so, is load balancing one of the issues?). Also curious how SL sees the SL servers interconnecting with open sims in the future - how SL residents are going to move beyond SL borders. I know there are no details and perhaps no roadmap yet, but surely someone has some percolating ideas to share…
July 15th, 2008 at 12:31 PM
Ska: Apparantly a DNS issue or something. I’m with the Xcess4All provider in The Netherlands. I connected wireless and am able to connect with my laptop trough another ISP, so yeah.. perhaps some colocator issue again or a DNS failure (Sadly not very technical here, but I’ve seen this happen before)
July 15th, 2008 at 12:36 PM
Sandling, I am also with Xcess4all in The Netherlands… but whats funny is that my alt connects no problem while my main gets stuck in logging in forever, then moves up to connecting to region and gets stuck there too. I am less technical than you so I have no idea what you mean with colocator :S
July 15th, 2008 at 12:37 PM
err.. ok
@:22 Running Second Life 1.20.13 (91658) RC.
@:23 .. touchez..
July 15th, 2008 at 12:40 PM
I was losing my connection when SL now and then also,finally traced it to my router. Which over a period of 3 or 4 years had become weak but only when both the wife and I were in SL using a lot of bandwidth would it act up. New router problem gone.
July 15th, 2008 at 12:41 PM
Curtis@31: see http://wiki.secondlife.com/wiki/AWG
July 15th, 2008 at 12:47 PM
Ska; Exact same issue! - I can log in with my alts too, though I have not tried to stay logged in with my alts for a longer time yet. What a weird issue.
July 15th, 2008 at 12:52 PM
@37 Sandling, It might possibly be that one avie is on the SF server and the other is on the one in Texas I believe. Its letting you connect to one but not the other. Don’t have a clue how to fix it but has run into that in the past myself.
July 15th, 2008 at 12:54 PM
Sandling, very weird indeed… I’ll keep you posted if anything changes. Good luck!
And sorry to the rest for being off topic, dont hit me *smiles sweetly*
July 15th, 2008 at 1:00 PM
@31 it is definately related to SL it only happens when i use SL and they have rolling restarts going on or back end issues every dang time never fails same thing last weeeek at he time they were doing restarts canceled them and back tracked had same diconnects and they resolved when sl resolved but when u tell them in live help it always blamed on my ISSP and my ISP has checked it many times with me and its SL causing it
July 15th, 2008 at 1:07 PM
@40,
I am afraid SL cannot cause your ISP to disconnect you, BUT , it can show up a poor connection to you ISP.
That is probably what is happening, you need a rock solid connection to run SL, while you can surf the web with a flaky one without problems.
July 15th, 2008 at 1:10 PM
Sadly, I too have had my dsl internet connection broken MANY MANY times -when downloading inventory after cache clearing. Been happening for months. And at least 30 kernel panic full computer crashes in last week. I see big blocks on screen and the gray rezzing. My computer and dsl line check out just fine. Only when in SL does this happen.
July 15th, 2008 at 1:15 PM
i pray that it works i have crashed so much all my bones are broken my poor avi should on life support.
July 15th, 2008 at 1:22 PM
Linden Lab really needs to do this test, where they announce that they worked on something and restarted a service and then watch people go crazy like “now it fails, it worked before!” or “everytime you guys change anything my connection breaks!”. And then they reveal that they didn’t change anything at all …
This is really one of the biggest psychological playground ever created
July 15th, 2008 at 1:27 PM
I’m hoping this new server release will address the #1 problem that I’m experiencing, which is periodic freezing that happens in any client newer than 1.19.0.5 (anything with Windlight). I’ve been trying the 1.20 RC series, and the problem never goes away. I’m running a generation 1 Mac Pro with 10 GB of ram(!) and a new Nvidia 8800GT video card. Have the very latest version of OS X (10.5.4) yet keep experiencing this lockup issue.
Sometimes when I run the newest RC (1.20.13) it will run just fine for hours, with no lockups at all, or sometimes just a couple of lockups early in the session, and then it clears up. Other times I’ll be running it for a while, and then it starts locking up, and gets worse and worse, until I have no choice but to log out because it’s locking up continually (freezes for about ten seconds or so, then unfreezes for maybe a couple of seconds, then locks up again, then unfreezes for a couple of seconds, and just continues like that until I log out.) I’ve tried going away from the computer for a while and then coming back to see if it clears up, but once it gets into that mode it never recovers, and I’m forced to log out.
It is my fond hope that this issue will be fixed soon. I’d really like to be able to run a Windlight based viewer.
July 15th, 2008 at 1:39 PM
I noticed a new behavior since a week or about:
Avatars do not rezz immediately, you first see like a kind of white haze then they start rezzing quite slowly…
In fact, I wonder if its possible to keep this haze in some cases as an avie?
It’s also more common when it’s bots (nothing to do with complaining about bots on sl, I do it in other posts :).
Did anyone else notice this?
July 15th, 2008 at 1:51 PM
@ 46, this is the new Ruth.
July 15th, 2008 at 1:55 PM
@Dekka
Thanks Dekka!
Well, I love it. Since I don’t use “much” of my avie, I tried to be “haze-ruthed” definitely but couldn’t…Is there a way?
July 15th, 2008 at 1:58 PM
“That’s right, you know what time it is, keep on rollin’, baby”
July 15th, 2008 at 2:02 PM
@45
or my isp looses connect.
hey, ive had bad freezes in the past (viewer 1.19.1.4 imac intel c2d 2 Ghz 1GbRam/128MbVram ati x1600 osx 10.4.11)
computer freezes completely no force quit possible - have had to power off, …
however i have set my “viewerpreferences”
Graphics/HardwareOptions/Texture Memory to -> 1/4 in my case 32Mb
no freezes the viewer runs fast and smooth as butter now, i can stay on 24/7 exept when we have a rolling restart
July 15th, 2008 at 2:12 PM
Prospero and the team - Thanks for hammering away at the code to make it more stable. Also, thanks for having the guts to admit when things aren’t right and to revert them. Keep up the good work.
And as there seems to be a “let’s blame LL for random things” theme here:
I notice my car uses more fuel after I’ve been on SL the day before. What are LL doing about that? Does the new simulator code have a fuel leak? I shall be writing to my local politician forthwith
July 15th, 2008 at 2:17 PM
Question: Since you’re so proud of all the big shiny masses of profit you’re making out of we, the residents. Why don’t you send a Happy Happy Linden out with a shopping list that has the words ‘MUCH BETTER SERVERS’ imprinted in bold text upon it? HMMM?
July 15th, 2008 at 2:30 PM
Eliott : just a reminder, this is a **server** release, and will not change anything in the client.
July 15th, 2008 at 2:31 PM
@51.
Its probably from driving fast cars here, your foot tends to drift downwards on the gas pedal, hence uses more fuel, SLOW DOWN :-))
July 15th, 2008 at 2:36 PM
After a week or so I tried SL again… it stayes a mess;
unplayable.
Don’t know what is wrong…
- SL’s website doesn’t trigger bandwith; loading takes minutes.
All other websites load perfectly and fast, except SL’s.
- Inworld crash every other 15 minutes.
- Inworld the bandwith drops every other minute to 0, making my
avie get stuck or drift.
- Masses of deserted places inworld.
C u all next week or so…
But maybe I really quit now…
July 15th, 2008 at 3:00 PM
Prospero, what was the evil thing you did wrong with the restart?
Just kidding! I appreciate your hard work. (Although I’m still trying to figure out how to measure time in minuets…a minuet is several minutes long…
)
*loves funny typos*
July 15th, 2008 at 3:07 PM
@51,54 I think havoc 4 has improved my fuel effeciency, pre havoc 4 my car was driving INTO the ground, using much more fuel than needed, now it seems to sit on top and use far less! My RL car seems to be running much better now thanks to this, I think.
@55 Linda I hope things get better soon for you, are you on a wireless network by any chance? IM me in world and i’ll help you trouble-shoot your problems. SL is worth persisting with
July 15th, 2008 at 3:23 PM
Hey! I have been timing my region for weeks preparing for an event. The script load was consistently around 0.4ms. Now after this restart, the same script load shows as 0.8ms. What was done to slow down scripts by half?
July 15th, 2008 at 3:39 PM
Prospero: You said “Eliott : just a reminder, this is a **server** release, and will not change anything in the client.”
I realize this, but I also think that there is at least a chance that the problems I’m experiencing may have something to do with the server side code. Maybe I’m totally wrong on this, but I can’t help but think that the server code is part of this problem. Regardless, it sure would be nice to see this bug fixed.
July 15th, 2008 at 3:39 PM
Say, you forced my product out of sales four months ago when you implemented havok4; are you going to fix buoyancy soon ?
July 15th, 2008 at 3:49 PM
#60, Yuki, I think buoyancy is as it is now. Almsot everyone else redid their scripts. There’s a long thread on this problem in the jira report.
BTW, on the performance issue, bug SVC-2649 filed, I now see time dilation of 0.96-1.0 when before this rolling restart it was a solid 1.0. Wings that used to sppear/disappear nearly instantly upon hover/descend now that 3-4 seconds to react.
July 15th, 2008 at 4:05 PM
rolling restart again? was wondering why revenues were a bit sluggish
erm, floating text bug back? alll on 1 line for me. resetting the scripts works, but alot of our floating text objects have the scripts deleted once we put them out :/
@stability - assets do indeed seem to be alot more stable lately, good job that
July 15th, 2008 at 4:35 PM
@ Tanya.
Blame Teagan.
____________
No really, thanks LL, this is one step in the right direction! : D
July 15th, 2008 at 4:44 PM
Linda (@55), I wonder if your ISP is interfering with SL traffic? Remember how a number of ISPs were throttling or even tampering with P2P connections a few months ago? What if your provider sees Second Life traffic as too demanding and is deliberately interfering with the connection? Don’t expect them to admit it though, and it’d take a detailed packet analysis to reveal.
Typically, I’m able to stay online for hours at a time before memory leaks overwhelm my poor, solitary gigabyte of RAM. I haven’t had major stability issues for quite some time (running the release candidates), although I find the top quality graphics settings now cause me to crash, so I run with just the basics and have to give up SL photography for a while. And I do all this over a wireless connection using a three year old laptop.
July 15th, 2008 at 4:46 PM
@61 : The thing is that my product relies on small ApplyImpulse’s, if I implement a vertical movelock, these won’t work anymore.
Also, for the test script that I posted yesterday in SVC-1792, the only thing that seems to counter the sinking is a vertical MoveToTarget, which really scatters the avatar x-y moving.
July 15th, 2008 at 4:53 PM
If only you could use the vehicle code in attachments.
July 15th, 2008 at 5:09 PM
Believe me, I have tried countless times to fix my product since havok4 release, but each time the fixes had side effects that made me want to quit using it. I believe there are enough votes and watchers on SVC-1792 and SVC-2013 for Linden Labs to do something else than just assign a DEV number to it and make the bugs sit in the process for months!
If I had access to the server source code, working on a patch would be the first thing I’d try to do, even before playing the game itself.
July 15th, 2008 at 5:19 PM
Will this sim restart help with the mainland search as most sims seem to be 10mins lag on search results and yet if you tp to another sim you get a realtime result from search basicly each sim is displaying diffrent results driving me mad lol… have a tinker with the land search server while your at it.
July 15th, 2008 at 5:25 PM
Has Prosperos Inferno torched the survey that didnt load when i tried to login as well??? And Login faild 3x in a row so ive given up trying. Looks like SL as usual! Good think I havnt logged in for the last week or so as nothing seems to have changed from the last yr that I was a premium account holder. So im just happy I reverted back to basic so at least im not paying for this grief now.
July 15th, 2008 at 6:30 PM
I have to hand it to you guys, the difference is palpable, and we all know why (not that it is a slight against those involved). M has got the grid on-track, unquestionably. I mean, it was not long ago that the grid would just be completely down with no warning, and now we get a warning, a small quality run of the update, and a reasonable schedule that disperses the downed regions over a time-slot that reduces the inconvenience to residents. Good job.
Now all we need to work on is the, uh, quality of the avatar mesh, and I’m a happy resident (Make Human would be good people to talk to there).
July 15th, 2008 at 6:39 PM
@24: are you on a wireless? If so, and using a linksys router (WRT54G in particular), turn off any cordless phones in the area. I had this same issue. The router and the phone were in each other’s radio space. I switched to a netgear and haven’t had a problem since.
SL itself cannot bring down your connection unless the ISP is seeing the traffic and freaking out. (*extremely unlikely)
I STILL haven’t actually installed the newest RC. Maybe I’ll do that right now and go looking for a 1.23 sim.
July 15th, 2008 at 7:24 PM
I appreciate the integrity of the guy who will say whose fault something is, when the fault is his. I’d rather have a guy like that on the job than a guy that apparently never makes mistakes. Good form, Prospero. 20 minutes extra down time. No one dies. It’s cool.
July 15th, 2008 at 7:27 PM
yes avatars no longer rez properly. the attachments sit there in air till you zoom into each one. Just witnessed a parcel *auto return* for a bunch of stuff properly set to group. Apparently there are numerous things in the mix right now. Hold on for the ride lol!
July 15th, 2008 at 9:08 PM
greetings all,
sorry off topic a bit, but i haven’t been able to logon to sl for a day now, also can not login to secondlife.com websites. this site works, blog.secondlife, grid websites work, linden lab.com works, just strange and haven’t changed any of my settings either.
well best of luck with the rollout-hope to login one day again, lol
using time-warner cable, anyone else having probs on time warner?
thanks
July 15th, 2008 at 9:31 PM
justa quick, follow up tried pinging servers here are the results
C:\>ping blog.secondlife.com
Pinging lindenlab.wordpress.com [72.233.2.56] with 32 bytes of data:
Reply from 72.233.2.56: bytes=32 time=53ms TTL=50
Reply from 72.233.2.56: bytes=32 time=52ms TTL=50
Reply from 72.233.2.56: bytes=32 time=51ms TTL=50
Reply from 72.233.2.56: bytes=32 time=52ms TTL=50
Ping statistics for 72.233.2.56:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 51ms, Maximum = 53ms, Average = 52ms
C:\>ping google.com
Pinging google.com [64.233.187.99] with 32 bytes of data:
Reply from 64.233.187.99: bytes=32 time=46ms TTL=240
Reply from 64.233.187.99: bytes=32 time=44ms TTL=240
Reply from 64.233.187.99: bytes=32 time=42ms TTL=240
Reply from 64.233.187.99: bytes=32 time=51ms TTL=240
Ping statistics for 64.233.187.99:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 42ms, Maximum = 51ms, Average = 45ms
C:\>ping secondlife.com
Pinging secondlife.com [8.4.128.238] with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Ping statistics for 8.4.128.238:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss)
July 15th, 2008 at 9:47 PM
“All” Search not working for me now. Other categories search is ok.
Is it just me? - Second Life 1.20.13 (91658) Jul 8 2008 16:01:06 (Second Life Release Candidate)
July 15th, 2008 at 9:52 PM
@75 Xingyun
Try pinging (and do trace route) to “data.agni.lindenlab.com”.
@ Prospero
Good luck with the roll
July 15th, 2008 at 9:57 PM
Thank you for the post update.
July 16th, 2008 at 1:26 AM
Please update this blog, my region was updated without warning and after I did a log back I did a return all, have 15000 prims available, but 1000s of ghost prims that I am deleting manually. And this all for a deploy of 1.22.4 not 1.23.1!
July 16th, 2008 at 1:36 AM
I wonder if for those who like it wouldn’t be a cool feature in one of the next server software to have a ‘random destination’ button in the map as it may save a lot of load on my external server and also on your servers as Hopping became kind of popular?
July 16th, 2008 at 2:52 AM
“However, because of an error Prospero made when starting the roll, there are about 300 regions that will remain down for another 10-20 minutes. For this, he apologizes.”
What did he do? Type “init 0″ rather than “init 6″, so that someone needs to go down there and hit the power buttons to turn the servers back on?
July 16th, 2008 at 3:16 AM
“We tentatively plan to revert those regions back to 1.22″
*Whacks you with rolled up newspaper*
July 16th, 2008 at 3:24 AM
/me nominates Prospero for the second annual Sidewinder Linden good communications award.
Great job Prospero, and we know that rollbacks are painful, but we also know you guys care about us.
July 16th, 2008 at 4:07 AM
LOL @ Soap #81
I’d have to agree with Spritely right at the top of this blog entry re: using sims that have signed up for testing.
Using the preview grid and relying on users good will to test it patently isnt working Prospero.
I have the hugest resepect for you and your team out of all the teams at SL, its always seemed to have its tasks like this well planned and executed, however lately there has been more roll backs of server code because of bugs that were not spotted in testing. Although I have to say the rollback mechanisms you have developed seem to work extremely well.
Spritelys idea is excellent and would allow a thorough testing to be carried out concurrently on the live grid instead of peacemeal as it is currently on the preview grid.
But all that said you get a *BIG* thank you for all the efforts from your good self and your team. its much appreciated
July 16th, 2008 at 4:11 AM
** YOUR AD COULD HAVE BEEN HERE **
July 16th, 2008 at 4:26 AM
What is more prolific here, is the level of information given about current and future updates/upgrades. For the first time in several years, we are not being treated like rats in the forever moving maze.
It has and will breed more contentment as LL are seen transparently to be focusing on upgrading the platform to keep up with the influx of newcomers and layers of business, giving us stability and more reliability. This has been the main focus of all those that clamber to make posts on the blog, at last the cries have been heard.
What I see, is LL coming back into the community, from appearing to shrink away for the past year or more. Excellent.
July 16th, 2008 at 5:04 AM
Come on guys !
Same old song over and over…
Roll out, roll back, roll out again, delayed ! Endless story.
And all that without any respect for the hard work of all the event builders who suffer (or die…) from that.
I’am sure you guys make a hard work too, and I totally appreciate what you do for us.
But I cannot suppport HOW you do that this way. It’s really time to use professional testing and updating procedures, and stop this daddy-like way of doing things.
The only quiet period we have had thos past months has been the god blessed two weeks long SL5B period !!! Surprisingly no roll out during that period.
Hey guys, it’s time to grow up !!!
July 16th, 2008 at 6:01 AM
What did he do? Type “init 0″ rather than “init 6″, so that someone needs to go down there and hit the power buttons to turn the servers back on?
There is a system called the “region conductor” which is what is in charge of handing out regions to simulators to run. The region conductor has a throttle that limits the number of region starts per second, so as to avoid choking the central database by starting up too many regions at once. The default value of the number of starts per second is VERY conservative… we did this because we wanted to start at a safe value. It turns out that we can start regions a factor of 10 faster still without stressing the database. Most of the time, it doesn’t matter, because regions aren’t going down and restarting fast enough. However, during a rolling restart I open up the throttle on the region conductor so that regions will restart at the rate necessary to keep up with the rolling restart. I forgot to open up the throttle until the pilot roll was nearly done.
Sorry about that :/
July 16th, 2008 at 6:02 AM
Mercia — what you describe may not be related to the rolling restart. If you have issues like that in your region, please contact support. (If this is a private island you own, you have access to concierge support.)
July 16th, 2008 at 6:22 AM
whee! rollback, sounds like a WalMart thing. Oh it’s a rollback thy fixes the changes that broke the fixes…..
*me head spins*
Any more room in that fallout shelter?
July 16th, 2008 at 6:43 AM
“I forgot to open up the throttle until the pilot roll was nearly done.”
Not a funny whoops, then… Yeah, things like that happen, we are all human. Lots of information in your post, even though I don’t think it’s actually usefull for us (we can’t touch those things anyway), I’m sure the new knowledge will be appreciated by those of us curious about the inner workings on SL.
Is it just me, or did LL really open up the information flow lately? Seems like they cut out the distortion filter also knows as marketing drones.
July 16th, 2008 at 8:15 AM
Since 24 is my lucky number, let’s just skip 1.23 and go directly 1.24
July 16th, 2008 at 8:26 AM
RE Networking issues (somewhat OT)
The way MOST ISP’s work is hot-potato. Your local ISP will probably try to hand off data to LL’s ISP (Level 3) ASAP and the nearest point.
In kind, LL’s ISP will try to dump data back off to YOUR ISP at the nearest return path.
This can lead to WILDLY different performance / reliability for upstream versus downstream data, as well as very different paths.
One thing is certain in all this. LL’s Level3 service has had lower reliability than my home DSL service… And THAT, is pathetic. Unfortunately, I’m pretty sure that LL’s contract with Level3 doesn’t give LL much of an out. I know, I’ve been there, and have dealt with contracts with several data centers. Of course I got burned once about 10 years ago by a colo with crappy service, so now all my contracts are TIGHT and are not so one sided. Along with that, none of the data centers I use have as many issues as Level3 either. I’m sure LL has learned from this, and I sure as hell hope they will renegotiate with the time comes up, or move to a better (carrier neutral) facility.
July 16th, 2008 at 9:38 AM
All I can say is all hell has broken loose on our sim which was part of this restart and subsequent rollback. I have ducklings flying like supersonic jets thru the sky and off world, a donkey that basically fell apart and an incredible amount of lag making the sim impossible to be in. Fingers crossed that things will improve.
July 16th, 2008 at 10:03 AM
Atze : if you have problems and can detail a clean reproduction, please file an issue in our public issue tracker with as much information as possible so that a Linden developer or QA person can reproduce the issue. https://jira.secondlife.com
July 16th, 2008 at 10:18 AM
lol, Zi!
/me cannot wait for Mono on the maingrid, I really hope all goes well.
Regarding the textures not loading: This was very noticable today.
Cheers,
Torrid
July 16th, 2008 at 10:46 AM
This is not directly related to a server-discussion, but since we’re getting a lot of feedback here:
In the Advanced - Rendering submenu is an option “Run Multiple Threads”. What exactly is being threaded, if that option is active at all?
July 16th, 2008 at 11:07 AM
Does anyone know how to fix the huge triangles that show up on the screen that blocks your view?
I gave a friend of mine a 1 yr old computer with 2 gigs of ram, an ATI 256 video card and has a AMD 3500+ processor.
This computer was working fine and I only gave it away so we could keep in touch after he moves to another state using SL.
Is this the issue with winlight everyone is talking about? I had great performance in SL on this computer and only bought another one to increase my ram and found I had a better selection of video cards that have the PCI express slot.
So this computer is not an old boat anchor.
July 16th, 2008 at 11:59 AM
@98,
The Run Multiple threads is active. It runs the texture decode in a separate thread if enabled and if your system can run multiple threads of course. Just how much of a benefit it is can be debated. It’s currently an experimental option and not guaranteed to work flawlessly.
July 16th, 2008 at 12:02 PM
Re Spritely’s idea about signing up sims to try new server releases - how about the Linden mainland sims?
July 16th, 2008 at 2:05 PM
@Disgusted I for one would like to say that this IS a more professional way to manage the rollout. They chose a larger initial group of servers in order to identify a wider variety of issues. Once these issues were identified and a rollback deemed necessary they did what they had to do. You cant test a widely distributed platform completely wihtout the feedback of the user base on the live grid.
I would opt for a longer period of testing between the initial rollout and the gridwide rollouts, to provide enough time for more of these issues to be identifed then the usual 24 hours.
I would also opt for a less frequent rollout then every other week, to minimize the effect on the economy. Having once been told that there should be no effect on the economy I have carefully correlated my income to the rollout schedule, and except for a few other dates where large service interuptions have occurred unrelated to a rollout, my low performing dates correspond with the rollouts quite well. Add to that the fact that a rollout has never proceeded without a corresponding lull in sales performance and I think I am quite right about the effects on the economy. I have asked a number of friends about their experiences, notably while a rollout was in progress… “what kind of day are you having” and they have expressed the same experience.
Of course, thats anecdotal evidence, LL could/should have a closer look themselves, since they have the data. Not sure they want to know
July 16th, 2008 at 3:46 PM
So are you going to do the rolling restart at 5am tomorrow (thursday the 16th)?
July 16th, 2008 at 8:01 PM
@102: I can’t help but wonder if the in-world economy sags during rolling restarts *not* because people are having technical issues/crashes/transaction failures/what-have-you *but* maybe because they simply figure “Ohwell, it’s another rolling restart” and don’t even bother to log in, opting to “wait it out” instead? That’s only speculation of course (based on, as you call it, anecdotal evidence) but… well, Zi Ree summed it up perfectly when she said SL is the biggest psychological testing ground ever
July 16th, 2008 at 10:56 PM
@104 I’ve considered that, and the possibility that some people are just fickle enough to postpone a shopping trip when the server they are on restarts - get up and walk away and not come back (you can do that???) but the effect is the same regardless of the reason. It seems reasonable that its a mix of these issues, 1000 servers restarting, requesting assets, announcing themselves to each other and the grid, is bound to stress resources, timing out transactions, etc. yesterday was “a bit sluggish” but not horrible.
Rolling out half the grid aggressively would magnify that impact on the infrastructure and I can measure the impact of that in double digit USD losses per day. I even have it graphed, with rolling restarts mapped into the data.
The point is, whatever the cause, a rolling restart affects the in-world economy in a significant way (about 30-35% of sales across the 3-day period in my case).
If that were taken seriously then a measured rollout schedule, once a month perhaps, would cut the negative impact in half, reduce the number of rollbacks by giving then more time to test those limited rollouts and get it right before a gridwide deployment.
July 16th, 2008 at 11:37 PM
Prospero…the way you tackle the issues and the way you interact in the blogs, I appreciate very much. Gives a certain feling of some “transparency” to me.
As far as I am concerned I believe, you are a an enrichment to SL (was that english?). I hope you guys can fix everything swiftly, so that