Как проверить диск на ошибки в esxi - ErrorsMaster.ru - большая энциклопедия ошибок и их решений

RSS

Use the vmkfstools command to check or repair a virtual disk if it gets corrupted.

-x|--fix [check|repair]

For example,

vmkfstools -x check /vmfs/volumes/my_datastore/my_disk.vmdk

check-circle-line

exclamation-circle-line

close-line

Источник

You would have come across a lot of instances of hard disk failures of your physical servers. It is necessary to identify the exact disk which is failed on the server. It can be easliy checked using hardware managenet tools like HP system Management, HP ILO or even in Hardware status tab of ESXi host from vSphere Client. This post talks about the checking the status of disk failures for esxi host command line utilities. In this post, i am going to discuss about the HP hardware’s and how to check the disk failures from command line in Hp hardware’s. This post will guide you step by step procedure to verify the disk status in ESXi host using HPSSACLI utility which is part of HP ESXi Utilities Offline bundle for VMware ESXi 5.x.

HP ESXi Utilities Offline bundle for VMware ESXi 5.x will be available as part of HP customized ESXi installer image but if it is not a HP customized ESXi image then you may need to download and install HP ESXi Utilities Offline bundle for VMware ESXi 5.x.This ZIP file contains 3 different utilities HPONCFG , HPBOOTCFG and HPSSACLI utilities for remote online configuration of servers.

HPONCFG — Command line utility used for obtaining and setting ProLiant iLO configurations.
HPBOOTCFG — Command line utility used for configuring ProLiant server boot order.
HPSSACLI – Command line utility used for configuration and diagnostics of ProLiant server SmartArrays.

You can download and install HP ESXi utilities offline bundle for ESXi 5.X using below command

esxcli software vib install -f -v /tmp/hp-esxi5.5uX-bundle-1.7-13.zip

You can even directly donwload HPSSACLI utility and Upload the VIB file into your ESXi host and execute the below command to install the HPACUCLI utility.

esxcli software vib install -f -v /tmp/hpssacli-1.60.17.0-5.5.0.vib

Once it is installed. Browse towards the directory /opt/hp/hpssacli/bin and verify the installation.

Check the Disk Failure Status:

Type the below command to check the status of Disks in your ESXi host. It displays the status of the Disk in All Arrays under the Controller.

/opt/hp/hpssacli/bin/hpssacli controller slot=0 physicaldrive all show

Thats it. We identified the disk failure, You may need to generate the HP ADU (Array Diagnostics Utility) report to raise the support case with hardware vendor. Please refer my blog post “How to Generate HP ADU Disk Report in ESXi host” to understand the step by step guide to generate ADU report from ESXi host command line. I hope this is informative for you. Thanks for Reading!!!. Be Social and Share it in Social media, if you feel worth sharing it.

Источник

17 Replies

Is this on a SAN/NAS or a local disk on the ESXi Server?

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

What’s running this ESXi Host?

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

This is on a local RAID array on the ESXi server.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

ESXi is running on a dell power edge 2950 server.

Was this post helpful?
thumb_up
thumb_down
Replicate your data and replace your array. As far as I know, VMFS does it’s own housekeeping and there is no way to force a disk check on the VMFS level. An NTFS chkdisk will only be so effective. Because, as you said, it’s sitting on top of VMFS.

Whenever you suspect a bad block on disk in a production environment, it’s always better to replace first ask questions later.

And I would also advise to stay away from RAID 5 if that is what you are using currently:

RAID 5 vs RAID 10 Opens a new window

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Was this post helpful?
thumb_up
thumb_down
Is the hardware under any type of warranty? If so, you can probably get it replaced on that error by talking to a support person. I’ve done it — as far as seeing which one is bad, you will need to go into the RAID controller software.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Yes, the server is under warranty. Ok, I’ll see if Dell will replace the drive. Thanks

Was this post helpful?
thumb_up
thumb_down
Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

Was this post helpful?
thumb_up
thumb_down
Scott does make a point. Can you not see from the health status in vCenter which disk? You still may need the OMSA to rebuild your array and it’s nice to have available.

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

Scott Alan Miller wrote:

Josh@Acts360 wrote:

Wow….that seems very drastic…can I just replace a drive? Is there a way to tell which drive in the array has the bad block?

Doesn’t the server tell you where the error is when it tells you that there is an error?

ESXi should be able to tell you (Though I’ve got the «Dell customized» version of ESXi installed, you can get it of vmware’s site.

attach_file
Attachment

vcenterstorage.PNG
112 KB

Was this post helpful?
thumb_up
thumb_down
Yeah, Jaguar nailed it. You should, at minimum, be able to see the state of your storage in vCenter. At that point you can identify the drive. Having OMSA on your host just makes it easier to perform some of your functions, like storage configurations and changes, without having to reboot and go through the bios to get to it.

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

No Vcenter all I have is the free stuff. I can see some information listed. It actually shows that there is no problem with the system… I also have an ISCSI device connected so maybe that device is the one throwing the errors.

Oh, looks like that is it…I just looked and the event logs and I see the hard disk is disk 1 which points to the ISCSI disk. I updated the firmware on this device (which is a Synology disk station 1010+) and this fixed the issue with this device.

Log Name:      System
Source:        disk
Date:          9/17/2010 9:11:31 AM
Event ID:      51
Task Category: None
Level:         Warning
Keywords:      Classic
User:          N/A
Computer:      FBLDC.fbdomain.local
Description:
An error was detected on device DeviceHarddisk1DR4 during a paging operation.
Event Xml:
<Event xmlns=»http://schemas.microsoft.com/win/2004/08/events/event Opens a new window«>
<System>
    <Provider Name=»disk» />
    <EventID Qualifiers=»32772″>51</EventID>
    <Level>3</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime=»2010-09-17T13:11:31.421Z» />
    <EventRecordID>177451</EventRecordID>
    <Channel>System</Channel>
    <Computer>FBLDC.fbdomain.local</Computer>
    <Security />
</System>
<EventData>
    <Data>DeviceHarddisk1DR4</Data>
    <Binary>030080000100000000000000330004802D0100000E0000C0000000000000000000000000000000006262170000000000FFFFFFFF010000005800002100000000BB20101242032040001000003C0000000000000000000000789BF70C80FAFFFF0000000000000000909B010A80FAFFFF0000000000000000E807640000000000880000000000006407E8000000080000000000000000000000000000000000000000000000000000</Binary>
</EventData>
</Event>

attach_file
Attachment

VMWare.png
9.18 KB

Was this post helpful?
thumb_up
thumb_down
ACTS360 is an IT service provider.

poblano

Turns out that if I had just looked at the event log closer I would have noticed that the drive that the event refered to was pointing to my ISCSI, which was offline…

Was this post helpful?
thumb_up
thumb_down
Jaguar

This person is a verified professional.

Verify your account
to enable IT peers to see that you are a professional.

habanero

should have said it was an iSCSI

Glad you got it fixed.

Was this post helpful?
thumb_up
thumb_down

Источник

В системе есть команда esxcli

Список жестких дисков

esxcli storage core device list

Просмотр интеллектуальной информации о жестком диске

esxcli storage core device smart get -d <disk>

smart

Установить поддержку сообщества smartctl

Сначала загрузите программное обеспечение
http://pfoo.unscdf.org/esxi/smartctl-6.6-4433.x86_64.vib
Загрузить на виртуальную машину с помощью winscp
Установите, чтобы разрешить программное обеспечение сообщества
установить программное обеспечение
Используйте команду smartctl

/opt/smartmontools/smartctl -d sat -a /vmfs/devices/disks/<disk>

Имя жесткого диска можно получить указанным выше способом.

Вы также можете использовать все команды ssh

cd /vmfs/volumes/datastore1
wget http://pfoo.unscdf.org/esxi/smartctl-6.6-4433.x86_64.vib
esxcli software acceptance set --level=CommunitySupported
esxcli software vib install -v /vmfs/volumes/datastore1/smartctl-6.6-4433.x86_64.vib

Частичная ссылка https://wiki.csnu.org/index.php/ESXi_smart_/_smartctl

Источник

Здравствуйте!
Увидел в логах ошибки

Device tlO.ATA_TO5HIBA_DT01ACA200_X3FBT5LK5 performance has deteriorated. I/O latency increased from average value of 2410 microseconds to 78012 microseconds.

И задумался о состоянии жестких дисков, но в разделе Health Status не отображаются HDD.
На сколько я знаю, это связано с тем, что нету драйвера для этого контроллера.
И я где-то читал, что его нужно скачать у вендора и установить .
Но мои поиски не увенчались успехом.
ESXi 5.5.0, 1331820
Материнская плата P8B-M с контроллером intel c204.
Возможно есть другие способы посмотреть SMART?

з.ы. знаю что ESXi не работает с программный рейдом, но и не стоит задачи его использовать, контроллер работает в режиме SATA

з.ы.ы.
Подключился по ssh, и с помощью скрипта получил вот это:

/usr/lib/vmware/vm-support/bin # ./smartinfo.sh
SMART Information for disks.

Device:  t10.ATA_____TOSHIBA_DT01ACA200_________________________________X3FBTSLKS
Parameter                     Value  Threshold  Worst
-----------------------------------------------------
Health Status                 OK     N/A        N/A
Media Wearout Indicator       N/A    N/A        N/A
Write Error Count             N/A    N/A        N/A
Read Error Count              100    16         100
Power-on Hours                99     0          99
Power Cycle Count             100    0          100
Reallocated Sector Count      100    5          100
Raw Read Error Rate           100    16         100
Drive Temperature             142    0          142
Driver Rated Max Temperature  N/A    N/A        N/A
Write Sectors TOT Count       200    0          200
Read Sectors TOT Count        N/A    N/A        N/A
Initial Bad Block Count       N/A    N/A        N/A


Device:  t10.ATA_____TOSHIBA_DT01ACA200_________________________________739NE17KS
Parameter                     Value  Threshold  Worst
-----------------------------------------------------
Health Status                 OK     N/A        N/A
Media Wearout Indicator       N/A    N/A        N/A
Write Error Count             N/A    N/A        N/A
Read Error Count              100    16         100
Power-on Hours                99     0          99
Power Cycle Count             100    0          100
Reallocated Sector Count      100    5          100
Raw Read Error Rate           100    16         100
Drive Temperature             142    0          142
Driver Rated Max Temperature  N/A    N/A        N/A
Write Sectors TOT Count       200    0          200
Read Sectors TOT Count        N/A    N/A        N/A
Initial Bad Block Count       N/A    N/A        N/A

Источник