remount Esxi boot bank

In certain scenario when Esxi’s local/alt boot bank/ boot volume goes offline (symlinks broken),if the boot device is available, it needs to be remounted via CLI

To remount bootbank, run the below command.

localcli --plugin-dir=/usr/lib/vmware/esxcli/int/ boot system restore --bootbanks

To determine if the boot device is available, run through the below:

Determining the boot volume:

[root@AMD-fx:~] ls -ltrh /
total 1161

lrwxrwxrwx    1 root     root          49 Mar 24 05:44 store -> /vmfs/volumes/5c963b7e-a90346aa-0102-0007e9b5fb18
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 bootbank -> /vmfs/volumes/fbff1b71-b140971e-160d-5c5f543035b8     <------------
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 altbootbank -> /vmfs/volumes/29249a7e-c37cdaf0-dbe5-30b1bb5afdd9  <------------
lrwxrwxrwx    1 root     root          49 Mar 24 05:44 scratch -> /vmfs/volumes/5c963b87-d1260d20-cc4a-0007e9b5fb18      <-------------
lrwxrwxrwx    1 root     root          29 Mar 24 05:44 productLocker -> /locker/packages/vmtoolsRepo/
lrwxrwxrwx    1 root     root           6 Mar 24 05:44 locker -> /store

If the symlnks to the UUID are not created, look at the /var/log/boot.gz to determine why the device was not detected (most likely bad/missing drivers or passed through USB

Determine boot device (use the UUID of bootbank/altbotbank)

[root@AMD-fx:~] vmkfstools -P /vmfs/volumes/fbff1b71-b140971e-160d-5c5f543035b8
vfat-0.04 (Raw Major Version: 0) file system spanning 1 partitions.
File system label (if any):
Mode: private
Capacity 261853184 (63929 file blocks * 4096), 108040192 (26377 blocks) avail, max supported file size 0
Disk Block Size: 512/0/0
UUID: fbff1b71-b140971e-160d-5c5f543035b8
Partitions spanned (on "disks"):
        t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L:5
Is Native Snapshot Capable: NO

Determining if the boot devise is available:

[root@AMD-fx:~] esxcli storage core device list -d t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
   Display Name: Local ATA Disk (t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L)
   Has Settable Display Name: true
   Size: 95396
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/t10.ATA_____HTS721010G9SA00_______________________________MPCZN7Y0GZ452L
   Vendor: ATA
   Model: HTS721010G9SA00
   Revision: C10H
   SCSI Level: 5
   Is Pseudo: false
   Status: on                         
   Is RDM Capable: false
   Is Local: true
   Is Removable: false
   Is SSD: false
   Is VVOL PE: false
   Is Offline: false                   <-------------------------------------------------
   Is Perennially Reserved: false
   Queue Full Sample Size: 0
   Queue Full Threshold: 0
   Thin Provisioning Status: unknown
   Attached Filters:
   VAAI Status: unsupported
   Other UIDs: vml.01000000002020202020204d50435a4e375930475a3435324c485453373231
   Is Shared Clusterwide: false
   Is SAS: false
   Is USB: false
   Is Boot Device: true
   Device Max Queue Depth: 31
   No of outstanding IOs with competing worlds: 31
   Drive Type: unknown
   RAID Level: unknown
   Number of Physical Drives: unknown
   Protection Enabled: false
   PI Activated: false
   PI Type: 0
   PI Protection Mask: NO PROTECTION
   Supported Guard Types: NO GUARD SUPPORT
   DIX Enabled: false
   DIX Guard Type: NO GUARD SUPPORT
   Emulated DIX/DIF Enabled: false

adding VCSA to domain renames the dns suffix to the domain.

The other day, I had a customer having all management applications on a different DNS suffux as that of the domain.

Ie: Domain : ntitta.in

Management host’s: on mgmt.local

on the customer’s setup, the VCSA was deployed with an FQDN VCSA.MGMT.local However, when the appliance was added to domain ntitta.in, the VCSA renames itself to VCSA.ntitta.in

Apparently the likewise scripts on VCSA is set to rename the appliance to the domain suffix. This might cause all sort of strange behaviour/PNID mismatches on normal functionality.

In order to sort this/set this right, we  wanna invoke the domain join script ignoring  the hostname.

 /opt/likewise/bin/domainjoin-cli join --disable hostname domain_name domain_user
Example: 
root@vcsa [ ~ ]# /opt/likewise/bin/domainjoin-cli join --disable hostname ntitta.in nik
Joining to AD Domain:   ntitta.in
With Computer DNS Name: vcsa.mgmt.local

Note that the script acknowledges that it is going to join to join AD with the computer name vcsa.mgmt.local. this is precisely what we want.

VCSA/VCHA 6.x network configuration file

Below is created by VCHA configuration and is sometimes set to DHCP when VCHA is broken.

cat /etc/systemd/network/99-dhcp-en.network
[Match]
Name=e*
[Network]
DHCP=yes
[DHCP]
UseDNS=false
and
10-eth0.network
[Match]
Name=eth0
[Network]
Gateway=172.30.1.101
Address=172.30.1.28/16
DHCP=no
Domains=abc.de
[DHCP]
UseDNS=false

The file can be removed safely and restart the appliance before re-configuring VCHA.

Esxi Root password lock out/ Determining source of last failed ssh login on Esxi

Generally.  Should the root account be locked out, SSH and UI/client access to the host fails. In order to work this around

  • Bring up a Console session to the Host and enable Esxi Shell (under troubleshooting options)
  • on the console session, press ALT+F1,
  • log in as  root and password:
  • In order to unlock the root account and determine the last log on failure, type the below:
    • /sbin/pam_tally2 -r -u root

The root account should now be unlocked. Review the IP listed there to prevent logon(scripted or 3’rd party monitoring)

migrate option grayed out for VM’s on the vCenter view

the migrate option is normal grayed out when there is an ongoing task (clone, backup, snapshot take/consolidate/reconfigure etc) running against the VM)

In certain rare cases, an orphaned DB record could also cause this. From the vCenter server database, Look at the table VPX_DISABLED_METHODS

Select * from VPX_DISABLED_METHODS;

Result:
Select * from VPX_DISABLED_METHODS;
entity_mo_id_val | method_name | source_id_val | reason_id_val
------------------+-------------+---------------+---------------
(0 rows)

IF there are no such task’s and should you find this to be an orphaned entery, the contents of the table may be cleared

Delete from VPX_DISABLED_METHODS where entity_mo_id_val =x;

VMware Power CLI: Could not establish trust relationship for the SSL/TLS secure channel with authority

With the Newer version of Power CLI, the Connect-ViServer fails with message:

Connect-VIServer : 28-04-2018 11:41:42 Connect-VIServer Error: Invalid server certificate. Use
Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Prompt if you'd like to connect
once or to add a permanent exception for this server.
Additional Information: Could not establish trust relationship for the SSL/TLS secure channel with authority
'vc.ntitta.in'.
At line:1 char:1
+ Connect-VIServer vc.ntitta.in
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : SecurityError: (:) [Connect-VIServer], ViSecurityNegotiationException
+ FullyQualifiedErrorId : Client20_ConnectivityServiceImpl_Reconnect_CertificateError,VMware.VimAutomation.ViCore.
Cmdlets.Commands.ConnectVIServer

This can easily be worked around by

  • Importing The VMCA trusted root Certificate
  • Use Set-PowerCLIConfiguration to ignore certs

Importing The VMCA trusted root Certificate to windows Trusted Root Store

  • Extract the ZIP file and import the certificate to windows trusted root store

Use Set-PowerCLIConfiguration to ignore certs

  • on an elevated power CLI, Run the below

Set-PowerCLIConfiguration -Confirm:$false -Scope AllUsers -InvalidCertificateAction Ignore -DefaultVIServerModeSingle

connect to vcenter:

Replacing vmdir certificates on vCenter 6.0

vmdir is a vCenter component that Listens on port 389 and 636(LDAPs/LDAP)

We will start creating a new configuration file called vmdir.cfg with the below content: (replace the contents under v3_req with the fields appropriate to your environment)

	[ req ]
	distinguished_name = req_distinguished_name
	encrypt_key = no
	prompt = no
	string_mask = nombstr
	req_extensions = v3_req
	[ v3_req ]
	basicConstraints = CA:false
	keyUsage = nonRepudiation, digitalSignature, keyEncipherment
	subjectAltName = DNS:psc1.domain.com, DNS:psc1, IP: x.x.x.x
	[ req_distinguished_name ]
	countryName = US
	stateOrProvinceName = State
	localityName = City
	0.organizationName = Company
	organizationalUnitName = Department
	commonName = psc1.domain.com

using openssl, create a new CSR file with the above configuration:

"%VMWARE_OPENSSL_BIN%" req -new -out c:\cert\vmdir.csr -newkey rsa:2048 -keyout c:\cert\vmdir.key -config c:\cert\vmdir.cfg

If the solution user certificates are signed with a CA cert, sign the CSR with the same issuing CA
else, Sign them using VMCA using the instructions below.

Signing the CSR with the VMCA certificate.

  • Copy root.cer and privatekey.pem from C:\ProgramData\VMware\vCenterServer\data\vmca
    (appliance: /var/lib/vmware/vmca/) to c:\cert\

Run the brow command to sign the certificate:

"%VMWARE_OPENSSL_BIN%" x509 -req -days 3650 -in c:\cert\vmdir.csr -out c:\cert\vmdir_signed.crt -CA c:\cert\root.cer -CAkey c:\cert\privatekey.pem -extensions v3_req -CAcreateserial -extfile c:\cert\vmdir.cfg

Now we have a certificate that can be used to replace the existing vmdir certificates. To proceed with the certificate replacement, Stop all vCenter services

service-control --stop --all

Note: For windows, you must be on path: “C:\Program Files\VMware\vCenter Server\bin”

  • Go into path: C:\ProgramData\VMware\vCenterServer\cfg\vmdird (appliance: ‘/usr/lib/vmware-vmdir/share/config/’)
  • (backup original certificates) vmdircert.pem and vmdirkey.pem to a temp directory
  • rename vmdir_signed.crt to vmdircert.pem  and  vmdir.key to vmdirkey.pem on the above directory

Start all services

service-control --start--all

Note: If the services fail to start (most likely inventory) then you it means that the wrong root cert was used when sigining the certificate. Replace the original files on the directory and restart the service to roll back to previous configuration.

web client service crashes java.lang.OutOfMemoryError: PermGen space and java.lang.OutOfMemoryError

vSphere web client refused to start with memory errors
log location:

Windows: c:\programdata\VMware\vCenter\Logs\vsphere-client
Appliance: /var/log/vmware/vsphere-client
wrapper.log
	INFO | jvm 1 | 2018/04/03 15:34:25 | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "org.springframework.scheduling.timer.TimerFactoryBean#0"
	INFO | jvm 1 | 2018/04/03 15:34:33 |
	INFO | jvm 1 | 2018/04/03 15:34:33 | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "http-bio-9443-exec-10"
	INFO | jvm 1 | 2018/04/03 15:35:12 |
	INFO | jvm 1 | 2018/04/03 15:35:12 | Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "http-bio-9443-exec-6"
	vsphere_client_virgo.log
	[2018-04-03T15:28:45.855-04:00] [ERROR] http-bio-9090-exec-2 com.vmware.vise.util.concurrent.WorkerThread http-bio-9090-exec-2 terminated with exception: java.lang.OutOfMemoryError: PermGen space
	[2018-04-03T15:28:46.773-04:00] [ERROR] http-bio-9090-exec-5 com.vmware.vise.util.concurrent.WorkerThread http-bio-9090-exec-5 terminated with exception: java.lang.OutOfMemoryError: PermGen space

Cause: insufficient  Web-client heap size/Insufficient PermGen space?configuration change

Scenario 1:  Heap-size

Resolution:

  • Ensure there is sufficient free memory on the vCenter
free -m
  • Review and Increase (double) the heap size of the web client

Appliance:

cloudvm-ram-size -l vsphere-client

windows:

 C:\Program Files\VMware\vCenter Server\visl-integration\usr\sbin\cloudvm-ram-size.bat -l
To increase, use the below
cloudvm-ram-size.bat -C XXX 


  • start vsphere-client and observe if this still crashes.

Scenario 2: PermGen

  • Take a copy of the file service-layout.mfx as service-layout.mfx.bak
    Appliance path: /etc/vmware/
    Windows Path: C:\ProgramData\VMware\vCenterServer\cfg\
  • Edit service-layout.mfx with a text editor
  •  Change MaxPermMB size from 256 to 512 for the row vspherewebclientsvc. (increase accordingly depending on the number of plugins configured with vCenter_webclient)
  • start vsphere-client service

Scenario 3: Problem persists even after increasing/maxing out scenario 1 and scenario 2.

  • backup configuration file before you proceed
  cp      /usr/lib/vmware-vsphere-client/server/wrapper/bin/vsphere-client /usr/lib/vmware-vsphere-client/server/wrapper/bin/vsphereclient.bak
  • Edit the file using a text editor
vi /usr/lib/vmware-vsphere-client/server/wrapper/bin/vsphere-client
  • Look for the line  “RUN_AS_USER=vsphere-client” and hash this

Start vsphere-client service.

Connecting to VMware appliance postgres/PSQL instance from an external computer/pgadmin

By default, the postgres instance on vCenter/vSphere replication..etc.. are configured to not accept connections from a computer on the network. On this  post, I will show you how to re-configure this to allow connections from an external box for tools like PGadmin etc.

Note: Depending on the appliance, the postgres, configuration files/paths might be different. On this post, we will search for the configuration and then change the respective file.

Start by ssh into the appliance.

Type the below command to search for the configuration file: postgresql.conf

find / -iname postgresql.conf

take a copy of the configuration.

cp /storage/db/vpostgres/postgresql.conf /storage/db/vpostgres/postgresql.conf.backup

Edit the file

vi /storage/db/vpostgres/postgresql.conf

Look for the line that says “listen_addresses = ‘XXXX”
In some cases, this will be hashed out, remove the hash. and replace local host with *

Save the configuration file (key combination: “Esc” + “:” and then type in “wq!”

Search for the Postgres configuration file

find / -iname pg_hba.conf

backup the configuration file

cp /storage/db/vpostgres/pg_hba.conf /storage/db/vpostgres/pg_hba.conf.bak

Edit the file

vi /storage/db/vpostgres/pg_hba.conf

Look for the below and replace this with the your IP subnet

host all all 192.168.1.0/24 trust   <———————————–From the below putty, you can see that I am on a 192.168.1.x subnet

The method is set to trust (not recommended) as I did not want to log into the DB with a password

Save the configuration file (key combination: “Esc” + “:” and then type in “wq!”

restart vmware-postgres service

service vmware-vpostgres restart

For vCenter server 6.5

service-control --vmware-vpostgres restart

Conform postgres port number and if it is listening to (vSPhere Replication appliance listens to a different port! it is best to know which port you need to connect to when accessing from an external box)

netstat -anop | grep postgres

From the above, we know the port is 5432

Launch pgadmin and add a new server

Also note that in most cases, the db credentials is stored in certain configuration files like

  • VCDB.properties
find / -iname vcdb.properties
Cat  /etc/vmware-vpx/vcdb.properties
  • or the .pgpass from the home directory
ls -ltha ~/

cat ~/.pgpass

vCenter Pre-upgrade fails

Error: Internal error occurred during execution of upgrade process.

Resolution: Send upgrade log files to VMware technical support team for further Assistance.

Upgrade logs say:

	less /var/log/vmware/upgrade/bootstrap.log
	2018-03-23T20:14:34.11Z ERROR transport.guestops Invalid command: "/bin/bash" --login -c '/opt/vmware/share/vami/vami_get_network eth0 1>/tmp/vmware-root/exec-vmware47-
	stdout 2>/tmp/vmware-root/exec-vmware235-stderr'
	None
	2018-03-23T20:14:34.12Z ERROR upgrade_commands Unable to execute pre-upgrade checks on host 10.1.0.209
	Traceback (most recent call last):
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/upgrade_commands.py", line 2199, in execute
	preupgradeResult = self._executePreupgradeChecks()
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/upgrade_commands.py", line 2655, in _executePreupgradeChecks
	srcIpv4Address, srcIpv4SubnetMask, srcIpv6Address, srcIpv6Prefix = retrieveNetworkingConfiguration(self.opsManager)
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/transfer_network.py", line 1309, in retrieveNetworkingConfiguration
	interface)
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/apply_networking.py", line 188, in _retrieveNetworkIdentity
	networkConfig = vamiGetNetwork(processManager, interface)
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/apply_networking.py", line 144, in vamiGetNetwork
	output = _execNetworkConfigCommand(processManager, [VAMI_GET_NETWORK_CMD, interface])
	File "/usr/lib/vmware/cis_upgrade_runner/bootstrap_scripts/apply_networking.py", line 66, in _execNetworkConfigCommand
	cr = transport.executeCommand(processManager, cmd)
	File "/usr/lib/vmware/cis_upgrade_runner/libs/sdk/transport/__init__.py", line 122, in executeCommand
	return processManager.pollProcess(processUid, True)
	File "/usr/lib/vmware/cis_upgrade_runner/libs/sdk/proxy.py", line 81, in __call__
	ret = self.func(*args, **kwargs)
	File "/usr/lib/vmware/cis_upgrade_runner/libs/sdk/transport/guestops.py", line 1184, in pollProcess
	self._checkInvalidCommandError(processInfo, stderr)
	File "/usr/lib/vmware/cis_upgrade_runner/libs/sdk/transport/guestops.py", line 1123, in _checkInvalidCommandError
	raise ExecutionException(error, ErrorCode.INVALID_REQUEST)
	ExecutionException: ('Invalid command: "/bin/bash" --login -c \'/opt/vmware/share/vami/vami_get_network eth0 1>/tmp/vmware-root/exec-vmware47-stdout 2>/tmp/vmware-root/
	exec-vmware235-stderr\'', 1)
	2018-03-23T20:14:39.442Z ERROR __main__ ERROR: Fatal error during upgrade REQUIREMENTS. For more details take a look at: /var/log/vmware/upgrade/requirements-upgrade-runner.log
	 

Now look at the source appliance.

	VMware VirtualCenter 6.0.0 build-3339084
	vCenter:~ # ifconfig
	eth0 Link encap:Ethernet HWaddr 00:50:56:AC:53:FD
	inet addr:x.x.x.x Bcast:x.x.x.x Mask:255.255.252.0
	UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
	RX packets:45028984 errors:0 dropped:28266 overruns:0 frame:0
	TX packets:16476384 errors:0 dropped:0 overruns:0 carrier:0
	collisions:0 txqueuelen:1000
	RX bytes:74680502042 (71220.8 Mb) TX bytes:7187692049 (6854.7 Mb)
	lo Link encap:Local Loopback
	inet addr:127.0.0.1 Mask:255.0.0.0
	inet6 addr: ::1/128 Scope:Host
	UP LOOPBACK RUNNING MTU:16436 Metric:1
	RX packets:147809637 errors:0 dropped:0 overruns:0 frame:0
	TX packets:147809637 errors:0 dropped:0 overruns:0 carrier:0
	collisions:0 txqueuelen:0
	RX bytes:93984509789 (89630.6 Mb) TX bytes:93984509789 (89630.6 Mb)

Run /opt/vmware/share/vami/vami_get_network  less returns an dependency error:

vCenter:~ # /opt/vmware/share/vami/vami_get_network eth0 1 | less
	/opt/vmware/share/vami/vami_get_network: error while loading shared libraries: libvami-common.so: cannot open shared object file: No such file or directory

To resolve this, re-create the link to dependency by running the below commands.

echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}/opt/vmware/lib/vami/" >> /etc/profile
echo 'export LD_LIBRARY_PATH' >> /etc/profile

Re-run the command to confirm if it is returning the IP details

/opt/vmware/share/vami/vami_get_network
	vCenter55:~ # /opt/vmware/share/vami/vami_get_network
	interface: eth0
	config_present: true
	config_flags: STATICV4
	config_ipv4addr: 10.1.0.209
	config_netmask: 255.255.252.0
	config_broadcast: 10.1.3.255
	config_gatewayv4:
	config_ipv6addr:
	config_prefix:
	config_gatewayv6: 10
	autoipv6:
	active_ipv4addr: 10.1.0.209
	active_netmask: 255.255.252.0
	active_broadcast: 10.1.3.255
	active_ipv6addr:
	active_prefix:
	active_gatewayv4: 10.1.0.61
	active_gatewayv6:
	hasdhcpv6: 1
	Traceback (most recent call last):
	File "/opt/vmware/share/vami/vami_ovf_process", line 25, in <module>
	import libxml2
	File "/usr/lib64/python2.6/site-packages/libxml2.py", line 1, in <module>
	ImportError: No module named libxml2mod
	managed:

vami_ovf_process and libxml2.py can be ignored
Re-run the upgrade/migration.