I have taken the company I work for and brought them down the VMware path. They are an Oracle programming house in business for over 30 years. They have several Government contracts and have recently gotten into the private sector. When I came on board they had 26 physical servers. They tasked me with providing redundancy for any disaster. Being Oracle based, they wanted to use Oracles virtual Server but I convinced management to go with VMware. I now have converted and moved 90% of the servers to a Host running ESXI Essentials, trying to upgrade to plus versions.
I want to know:
1. If I am to purchase the plus versions, is that a key install then features are available, or is it a reinstall?
2. I need a redundant active passive san. I have a Super micro running Open-E now. Works great and both read and write at 1 MS or less. Need a second but not as robust. Any thoughts?
3. And lastly, I am having an occasional bad block reported by Oracle. Now this is interesting because the block is good and the data is good but simplly not what Oracle was expecting to read. So it's a bad block occurring to them. But the error clearly shows this error is not a block thats bad, just simply not whats expected to be there. Anything would be appreciated to help me go down a right path. I am trying thick provisioning instead of thin. I am checking backups to see if they are interrupting the database during backup. Anyone have any ideas about this bad blog reported by Oracle?
The bad block issue indeed was the SAN. After some chasing of mysterious corruption and appearing as if a bad query was being run, I finally recieved one hardware error from the LSI Raid card in the San. After researching the Hex code that seems almost unreadable I found a chart that defined the code. "Drive not ready slot 9" is what the controller was telling me. And to my surprise the drive was good and ended up to be the slot it's self. My unbreakable SAN with redundant everything had a bad backplane. Randomly causing one drive at slot 9 to fail and write wrong data to the array. Once this occurred the domino affect was massive. Even after elimination of the slot the corruption grew from server to server. We shutdown and cleaned the individual guest os filesystems and validated databases but still the corruption persisted. Finally needing a stable safe enviroment I chose to move all servers to local storage. Each Host had a raid5 with 2.7TB so dividing up the server's by balancing the disk I/O we moved them and repaired any corruption. We saw no performance hit and had stability again. Having to understand what was causing this global corruption and having everything off the San I retraced events and steps taken. I found the root cause. And I found why after so much work the problem continued. I had missed one step in my effort to resolve the issue. I had cleaned vmaware and the guest os of every server install. I had validated every database. But I did not clean the filesystem of the SAN itself. And consistancy checks of the volumes do not look at the filesystem. Only ifnthe blocks are intact. The problem from a bad slot caused my SAN's filesystem to be corrupted just enough to start writing data to the wrong block. Since application servers and front end servers are static for the most part they were last to see corruption. Only the ever changing database servers were affected at 1st. This made troubleshooting even more difficult giving the illusion of a stable SAN when in fact it was not. The variables were huge and changing. We now have a repaired SAN and after analysis and discovery of the issues we are reworking the array and are going to add a raid 10 of ssd drives and reinitialize the raid 6. Place the database servers on the raid 10 for speed and the rest on the cost effective raid 6. No matter how much you may read or sell or install, you never truly own and know a technology until you have a very odd issue, and you successfully wrestle it to the ground.
Dan Gillman
I want to believe you are migrating your Oracle data. Can your confirm? and if you are, what migration tool are you utilizing? The wrong migration tool will results in bad block related events.
1. Just use new License of VMware Enterprise Plus. That's all.
By the way, my suggestion is assign a dedicated management server/workstation/Desktop (physical) for vCenter. Do not install into VM. Otherwise you'll be in trouble while failure occur.
2. Need to understand your current design of virtual environment and SAN as well. Could you please share with me your design? It could be more easier for me to help you in this regards. But anyway, After purchasing VMware Enterprise Plus, you'll get Stoarge vMotion, Storage I/O control, Storage DRS, Storage API for data protection. Additionally I can suggest you to buy Virtual SAN of VMware as well. It'll help to consolidate your storage either it could be your SAN or DAS or both.
3. I'm not sure but from my instinct says you've to check your Open-E storage OS compatibility issue with Oracle.
Thanks.
We are using ESXi 5.1 Enterprise. Whenever the license expires, the vcenter goes to standard license mode with some features disabled until the new key is entered and the enterprise-license additional features are available without a re-install.
For your other problem of bad-block, does the problem arise after you shifted on virtualization? If so, then check for any features such as High Availability that ensures the fault tolerance in case of a virtual disk failure.
You do not have to reinstall to apply a license. License management is in the vCenter console.
I cannot give you any input on the other questions.