Creating a Cent OS7 template for VMware Guest Customization and Deployment

To start off with, Download the latest version of CentOs7 from https://www.centos.org/download/

At the time of writing, this is: CentOS-7 (1804)

Lets start by creating a new Virtual machine.  I will select Esxi 6.5 for backward compatibility  with other host.

Mount the ISO,

Power on the VM and begin installing of the OS and begin the installation


Note: During the install we will enable the default nic interface and set this to DHCP

Note: Since the ISO used was from a most recent release, openvm-tools is auto installed along with the linux installer. if you are using an older version of the cent os installer iso, you must install open-vm-tools with the below command (will need the VM connected to the internet).

yum install open-vm-tools

I would recommend updating the tools to the latest release.

then followed by installing pearl (pre-requisites for guest customization)

yum install perl

Once done, power down the VM and convert it to a template.

Test the template by deploying a VM with guest customization

When the VM boots up you should see the host name set to the name of the VM (the spec that I used to customize uses the name as of vsphere inventory as the the virtual machine name)

Looking at the VM that was just deployed, we see the host name has changed as per the specification.

Troubleshooting:
Log file for guest cust:

/var/log/vmware-imc/toolsDeployPkg.log

Generate a memory dump for a frozen VMware, windows virtual machine.

Enable complete memory dump feature by changing following registry keys:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl
CrashDumpEnabled    REG_DWORD    0x1

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\Memory Management
PagingFiles    REG_MULTI_SZ    c:\pagefile.sys 13312 13312


Enable keyboard crash dump feature by adding following registry keys:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\i8042prt\Parameters
Value Name: CrashOnCtrlScroll
Data Type:    REG_DWORD
Value:    1

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\kbdhid\Parameters
Value Name: CrashOnCtrlScroll
Data Type:    REG_DWORD
Value:    1

Enable NMI crash dump feature by adding following key:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl
Value Name: NMICrashDump
Data Type:  REG_DWORD
Value:   1

  • Restart Server to take effects.
  • Do a memory dump test with below steps:

Capture a kernel memory dump in following ways:

o    Send NMI to Guest OS Link:How to send NMI to Guest OS on ESXi 6.x (2149185)

Or

o   On VM console, press Right Ctrl + Scroll Lock button 2 times.

Machine crashes into blue screen and save a memory dump, restart automatically once dump generation reaches 100%. You should be able to see 12GB (memory allocated to the VM) here: C:\Windows\MEMORY.DMP file.

Statement diagnostic data from driver is XX002:0:7:ERROR: index “pk_index_name” contains corrupted page at block xxxx;

on the vpxd.log:
--> Error while executing the query
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [VdbStatement] Connection diagnostic data from driver is HY000:0:110:
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [VdbStatement] Bind parameters:
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [VdbStatement] [0]datatype: 1, size: 4, arraySize: 0
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [VdbStatement] value = 172237
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [Vdb::IsRecoverableErrorCode] Unable to recover from XX002:7
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [Vdb::IsRecoverableErrorCode] Unable to recover from HY000:110
2018-09-25T21:17:09.351Z error vpxd[7F3FC958B800] [Originator@6876 sub=Default] [VdbStatement] SQLError was thrown: "ODBC error: (XX002) - ERROR: index "pk_vpx_alarm_runtime" contains corrupted page at block 5877;
--> Error while executing the query" is returned when executing SQL statement "DELETE FROM VPX_ALARM_RUNTIME WHERE ENTITY_ID=?"
2018-09-25T21:17:09.355Z error vpxd[7F3FC958B800] [Originator@6876 sub=Daemon] Unhandled exception: Error[VdbODBCError] (-1) "ODBC error: (XX002) - ERROR: index "pk_vpx_alarm_runtime" contains corrupted page at block 5877;
--> Error while executing the query" is returned when executing SQL statement "DELETE FROM VPX_ALARM_RUNTIME WHERE ENTITY_ID=?"
2018-09-25T21:17:09.355Z info vpxd[7F3FC958B800] [Originator@6876 sub=SupportMgr] Wrote uptime information
(END)

Resolution:

Take a snapshot of the vcsa
Export affected table:

VCDB=# copy (select * from vpx_alarm_runtime) to '/tmp/vpx_alarm_runtime_select_.csv' with CSV DELIMITER ',';
COPY 3854
Drop affected constrain.
VCDB=# alter table vpx_alarm_runtime drop CONSTRAINT PK_VPX_ALARM_RUNTIME;

Recreate constrain

alter table vpx_alarm_runtime add constraint PK_VPX_ALARM_RUNTIME primary key (ENTITY_ID, ALARM_ID,
EXPRESSION_NAME)
recreate Table: VPX_ALARM_RUNTIME
/*==============================================================*/
/* Table: VPX_ALARM_RUNTIME */
/*==============================================================*/
create table VPX_ALARM_RUNTIME (
ENTITY_ID INTEGER not null,
ALARM_ID INTEGER not null,
ENTITY_TYPE INTEGER,
EXPRESSION_NAME VARCHAR(440) not null,
STATE_VALUE VARCHAR(255),
METRIC_VALUE INTEGER,
CREATED_TIME TIMESTAMP,
STATUS_VALUE VARCHAR(50),
EVENT_KEY INTEGER null,
constraint PK_VPX_ALARM_RUNTIME primary key (ENTITY_ID, ALARM_ID,
EXPRESSION_NAME)
);
 

remount Esxi boot bank

In certain scenario when Esxi’s local/alt boot bank/ boot volume goes offline (symlinks broken),if the boot device is available, it needs to be remounted via CLI

To remount bootbank, run the below command.

localcli --plugin-dir=/usr/lib/vmware/esxcli/int/ boot system restore --bootbanks

To determine if the boot device is available, run through the below:

Determining the boot volume:

[root@AMD-fx:~] ls -ltrh /
total 1161

lrwxrwxrwx    1 root     root          49 Mar 24 05:44 store -> /vmfs/volumes/5c963b7e-a90346aa-0102-0007e9b5fb18
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 bootbank -> /vmfs/volumes/fbff1b71-b140971e-160d-5c5f543035b8     <------------
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 altbootbank -> /vmfs/volumes/29249a7e-c37cdaf0-dbe5-30b1bb5afdd9  <------------
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 scratch -> /vmfs/volumes/5c963b87-d1260d20-cc4a-0007e9b5fb18      <-------------
lrwxrwxrwx    1 root     root          29 Mar 24 05:44 productLocker -> /locker/packages/vmtoolsRepo/
lrwxrwxrwx    1 root     root           6 Mar 24 05:44 locker -> /store

If the symlnks to the UUID are not created, look at the /var/log/boot.gz to determine why the device was not detected (most likely bad/missing drivers or passed through USB

Determine boot device (use the UUID of bootbank/altbotbank)

[root@AMD-fx:~] vmkfstools -P /vmfs/volumes/fbff1b71-b140971e-160d-5c5f543035b8
vfat-0.04 (Raw Major Version: 0) file system spanning 1 partitions.
File system label (if any):
Mode: private
Capacity 261853184 (63929 file blocks * 4096), 108040192 (26377 blocks) avail, max supported file size 0
Disk Block Size: 512/0/0
UUID: fbff1b71-b140971e-160d-5c5f543035b8
Partitions spanned (on "disks"):
        t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L:5
Is Native Snapshot Capable: NO

Determining if the boot devise is available:

[root@AMD-fx:~] esxcli storage core device list -d t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
   Display Name: Local ATA Disk (t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L)
   Has Settable Display Name: true
   Size: 95396
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
   Vendor: ATA
   Model: HTS721010G9SA00
   Revision: C10H
   SCSI Level: 5
   Is Pseudo: false
   Status: on                         
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: false
   Is VVOL PE: false
   Is Offline: false                   <-------------------------------------------------
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.01000000002020202020204d50435a4e375930475a3435324c485453373231
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: true
   Device Max Queue Depth: 31
   No of outstanding IOs with competing worlds: 31
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

adding VCSA to domain renames the dns suffix to the domain.

The other day, I had a customer having all management applications on a different DNS suffux as that of the domain.

Ie: Domain : ntitta.in

Management host’s: on mgmt.local

on the customer’s setup, the VCSA was deployed with an FQDN VCSA.MGMT.local However, when the appliance was added to domain ntitta.in, the VCSA renames itself to VCSA.ntitta.in

Apparently the likewise scripts on VCSA is set to rename the appliance to the domain suffix. This might cause all sort of strange behaviour/PNID mismatches on normal functionality.

In order to sort this/set this right, we  wanna invoke the domain join script ignoring  the hostname.

 /opt/likewise/bin/domainjoin-cli join --disable hostname domain_name domain_user
Example: 
root@vcsa [ ~ ]# /opt/likewise/bin/domainjoin-cli join --disable hostname ntitta.in nik
Joining to AD Domain:   ntitta.in
With Computer DNS Name: vcsa.mgmt.local

Note that the script acknowledges that it is going to join to join AD with the computer name vcsa.mgmt.local. this is precisely what we want.

VCSA/VCHA 6.x network configuration file

Below is created by VCHA configuration and is sometimes set to DHCP when VCHA is broken.

cat /etc/systemd/network/99-dhcp-en.network
[Match]
Name=e*
[Network]
DHCP=yes
[DHCP]
UseDNS=false
and
10-eth0.network
[Match]
Name=eth0
[Network]
Gateway=172.30.1.101
Address=172.30.1.28/16
DHCP=no
Domains=abc.de
[DHCP]
UseDNS=false

The file can be removed safely and restart the appliance before re-configuring VCHA.

Esxi Root password lock out/ Determining source of last failed ssh login on Esxi

Generally.  Should the root account be locked out, SSH and UI/client access to the host fails. In order to work this around

  • Bring up a Console session to the Host and enable Esxi Shell (under troubleshooting options)
  • on the console session, press ALT+F1,
  • log in as  root and password:
  • In order to unlock the root account and determine the last log on failure, type the below:
    • /sbin/pam_tally2 -r -u root

The root account should now be unlocked. Review the IP listed there to prevent logon(scripted or 3’rd party monitoring)

migrate option grayed out for VM’s on the vCenter view

the migrate option is normal grayed out when there is an ongoing task (clone, backup, snapshot take/consolidate/reconfigure etc) running against the VM)

In certain rare cases, an orphaned DB record could also cause this. From the vCenter server database, Look at the table VPX_DISABLED_METHODS

Select * from VPX_DISABLED_METHODS;

Result:
Select * from VPX_DISABLED_METHODS;
entity_mo_id_val | method_name | source_id_val | reason_id_val
------------------+-------------+---------------+---------------
(0 rows)

IF there are no such task’s and should you find this to be an orphaned entery, the contents of the table may be cleared

Delete from VPX_DISABLED_METHODS where entity_mo_id_val =x;

VMware Power CLI: Could not establish trust relationship for the SSL/TLS secure channel with authority

With the Newer version of Power CLI, the Connect-ViServer fails with message:

Connect-VIServer : 28-04-2018 11:41:42 Connect-VIServer Error: Invalid server certificate. Use
Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you'd like to connect
once or to add a permanent exception for this server.
Additional Information: Could not establish trust relationship for the SSL/TLS secure channel with authority
'vc.ntitta.in'.
At line:1 char:1
+ Connect-VIServer vc.ntitta.in
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : SecurityError: (:) [Connect-VIServer], ViSecurityNegotiationException
+ FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_CertificateError,VMware.VimAutomation.ViCore.
Cmdlets.Commands.ConnectVIServer

This can easily be worked around by

  • Importing The VMCA trusted root Certificate
  • Use Set-PowerCLIConfiguration to ignore certs

Importing The VMCA trusted root Certificate to windows Trusted Root Store

  • Extract the ZIP file and import the certificate to windows trusted root store

Use Set-PowerCLIConfiguration to ignore certs

  • on an elevated power CLI, Run the below

Set-PowerCLIConfiguration -Confirm:$false -Scope AllUsers -InvalidCertificateAction Ignore -DefaultVIServerModeSingle

connect to vcenter: