Content library hack(DB)

VMware Content library, A unique way to make VM templates/ISO’s available across multiple vCenter’s. However, It does not handle Datacenter segregation very well when storing the contents on NFS/VMFS.

A subscribed library would need dedicated storage space and would be pointless in my setup as I had presented the NFS volume to all host.

So what seems to be the problem?

I have an NFS volume presented across all the host across several Data center. Although the NFS UUID and the name of the datastore are the same (ISO), on the vCenter database, this is stored with a different datacenter ID as they are segregated by the datacenter object.


  id  |      name      | datacenter_id
 -----+----------------+---------------
  157 | Template       |            77
  158 | SlowBro_400    |            77
  159 | ISO            |            77
  160 | SharedLUN      |            77
  161 | 10.154         |            77
  156 | template       |            77
   13 | SlowBro_legasy |             2
   14 | 10.128         |             2
   91 | ISO            |             2
   16 | Template       |             2
   92 | is-tse-d129_1  |             2
   12 | SlowBro_400    |             2

VCDB=# select * from vpx_entity where id=77;
  id |     name     | type_id | parent_id
 ----+--------------+---------+-----------
  77 | BLR          |       8 |         1
 (1 row)


VCDB=# select * from vpx_entity where id=2;
  id |    name    | type_id | parent_id
 ----+------------+---------+-----------
   2 | HYD        |       8 |         1
 (1 row)

With this configuration, When I had created a content library, as per the db, this was only referencing to once of the site datastore, IE: we only see iso (id 159) from BLR (id 77) datacenter.

What this means is The content library objects can only be deployed to one of the datacenter rather than being able to deploy them across all datacenter’s on the vCenter.

VCDB=# select * from cl_storage;
                  id                  |                          storageuri                          |   type
--------------------------------------+--------------------------------------------------------------+-----------
 a0018db8-f630-4c04-b0f1-30c900ad691c | Datastore:datastore-159:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c501 | Datastore:datastore-157:481639ff-d88d-4622-8872-ec6856e6b157 | Datastore


The above table references:
VCDB=# select * from cl_library_storage;
              library_id              |              storage_id
--------------------------------------+--------------------------------------
 3336e2ad-8166-4e6a-850d-a9d81c41ba01 | a0018db8-f630-4c04-b0f1-30c900ad691c
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c501

So.. Logically, If I add the other ID for ISO on this table cl_storage and create a referencing record in cl_library_storage then we should be able to use the same content library across all datacenter.

The ID from the above must be unique and must match the two tables. I added the below records (i added another record by incrementing one of the values after the co-relating table.

after change
VCDB=# select * from cl_storage;
                  id                  |                          storageuri                          |   type
--------------------------------------+--------------------------------------------------------------+-----------
 a0018db8-f630-4c04-b0f1-30c900ad691c | Datastore:datastore-159:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c501 | Datastore:datastore-157:481639ff-d88d-4622-8872-ec6856e6b157 | Datastore
 bf8e8dcb-5b28-4b03-863e-89308bc8c502 | Datastore:datastore-11:481639ff-d88d-4622-8872-ec6856e6b157  | Datastore

VCDB=# select * from cl_library_storage;
              library_id              |              storage_id
--------------------------------------+--------------------------------------
 3336e2ad-8166-4e6a-850d-a9d81c41ba01 | a0018db8-f630-4c04-b0f1-30c900ad691c
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c501
 946cea65-5cd0-41e0-83ab-17259f690ce1 | bf8e8dcb-5b28-4b03-863e-89308bc8c502

after adding the above records, I am now able to deploy VM’s from the content library across Datacenters.

content library DB schema can be found here:

/usr/lib/vmware-content-library/support/scripts/db/PostgreSQL/cls_unified/cls60.sql

vCenter REST API returns 404 com.vmware.vapi.rest.httpNotFound

Okay, so the other day I had someone reach out to me for a vCenter rest API issue, Apparently, REST API to any vCenter component using the API explorer or the CLI would return the error

com.vmware.vapi.rest.httpNotFound

Cli: Generate session: (on bash shell): (or simply go grab the complete command from the API explorer)

**edit the below**
 VC_ADDRESS=vcenter.domain.local
 [email protected]
 VC_PASSWORD=password

**do not change the below***
curl -u "$VC_USER:$VC_PASSWORD" \
    -X POST \
    -k --cookie-jar cookies.txt \
    "https://$VC_ADDRESS/rest/com/vmware/cis/session"

**the cookies file should now have a sessionID, use this session in the upcomeing commands*** 

test VAPI using bash

root@nvcsa-01 [ /tmp ]# curl -X GET --header 'Accept: application/json' --header 'vmware-api-session-id: 71b32ba6c59bc4bc284757b2a0d6e525' 'https://vcsa/rest/vcenter/cluster'
{"name":"com.vmware.vapi.rest.httpNotFound","localizableMessages":[{"defaultMessage":"Not found.","id":"com.vmware.vapi.rest.httpNotFound"}],"majorErrorCode":404}r

at this time clearly, something is wrong. Looking at the URL: https://vcsa_URL/rest/vcenter

Looking at the rhttp configuration:

root@VCSA [ /etc/vmware-rhttpproxy/endpoints.conf.d ]# grep rest *.conf
vapi-endpoint.conf:/rest local 12346 redirect allow
vapi-endpoint.conf:/site/rest local 12346 redirect allow

Looking at the process:

root@vcsa [ /etc/vmware-rhttpproxy/endpoints.conf.d ]# netstat -anop | grep -i listen | grep 12346
tcp6       0      0 ::1:12346               :::*                    LISTEN      11956/vmware-vapi-e off (0.00/0/0)
tcp6       0      0 127.0.0.1:12346         :::*                    LISTEN      11956/vmware-vapi-e off (0.00/0/0)

So it is clear that the service responsible for REST is vapi and the service is running.
looking at the logs did not reveal anything out of the ordinary. in fact it reported nothing. aside from the below

2019-11-27T02:45:19.623+08:00 | INFO  | state-manager1            | DefaultStateManager            | Invoking http-server
2019-11-27T02:45:19.624+08:00 | INFO  | state-manager1            | BaseServerBuilder              | Creating endpoint with name 'default' on address(es): 127.0.0.1, ::1 with port: 12346
2019-11-27T02:45:19.682+08:00 | WARN  | state-manager1            | BaseServerBuilder              | Failed to bind /0:0:0:0:0:0:0:1:12346 while testing the endpoint validity
java.net.SocketException: Protocol family unavailable
        at java.net.PlainSocketImpl.socketBind(Native Method)
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
        at java.net.Socket.bind(Socket.java:644)
        at com.vmware.vapi.endpoint.http.BaseServerBuilder.isPortAccessible(BaseServerBuilder.java:172)
        at com.vmware.vapi.endpoint.http.BaseServerBuilder.trimInvalidEndpoints(BaseServerBuilder.java:147)
        at com.vmware.vapi.endpoint.http.BaseServerBuilder.populateEndpointSettings(BaseServerBuilder.java:183)
        at com.vmware.vapi.endpoint.http.BaseServerBuilder.createServer(BaseServerBuilder.java:233)
        at com.vmware.vapi.endpoint.http.BaseServerBuilder.buildInitial(BaseServerBuilder.java:75)
        at com.vmware.vapi.state.impl.DefaultStateManager.build(DefaultStateManager.java:354)
        at com.vmware.vapi.state.impl.DefaultStateManager$1.doInitialConfig(DefaultStateManager.java:168)
        at com.vmware.vapi.state.impl.DefaultStateManager$1.run(DefaultStateManager.java:151)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

okay, so this made clear to me that the application “default” had problems with a port bind, (bind /0:0:0:0:0:0:0:1:12346), so I took a look at the configuration file /etc/vmware-vapi/endpoint.properties, removed the ::1 from the config (I don’t think this is a problem TBH since all the other web apps load up and is accessible via rest)

So I bumped up the VAPI logging (instructions here) and found the below

Looking deeper at the logs with trivia enabled, I found the below when the REST calls failed:

2019-11-27T04:21:10.398+08:00 | DEBUG | vAPI-I/O dispatcher-1     | JsonServerConnection           | Sending JSON response of size 50
2019-11-27T04:21:21.213+08:00 | DEBUG | jetty-default-35          | RequestDispatcher              | method=GET, uriInfo=/vcenter/datacenter
2019-11-27T04:21:21.214+08:00 | DEBUG | jetty-default-35          | UriLocatorImpl                 | Matched uriTemplates are not found for requesturi = /vcenter/datacenter
2019-11-27T04:21:21.215+08:00 | DEBUG | jetty-default-35          | RestMainServlet                | Failed to process request.
RestException [majorErrorCode=404, messageId=com.vmware.vapi.rest.MessageId@1721d534, params=[], message=null]
        at com.vmware.vapi.rest.RequestDispatcher.dispatch(RequestDispatcher.java:64)
        at com.vmware.vapi.endpoint.servlet.rest.RestMainServlet.service(RestMainServlet.java:44)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at com.vmware.vapi.endpoint.common.ProxyServlet.service(ProxyServlet.java:50)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)
        at org.eclipse.jetty.servlets.DoSFilter.doFilterChain(DoSFilter.java:471)
        at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:323)
        at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:293)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
        at com.vmware.vapi.endpoint.http.RequestSizeFilter.doFilter(RequestSizeFilter.java:59)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:499)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:258)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:748)

Okay, so now we are getting somewhere. “Matched uriTemplates are not found for requesturi = /vcenter/datacenter”

I quickly installed jxplorer and reviewed the endpoints for vApi (configuration>site>%site_name%>lookupService>ServiceRegisteration)
Basically, I was looking for the endpoints registered as serviceType.product=com.vmware.cis serviceType.type=cs.vapi

Basically, you wanna look for an endpoint type: .protocol=vapi.json.http and ID com.vmware.vapi.vcenter under the vCenter’s vapi instance similar to the below which was missing on the problematic vCenter.

note that the endpoint name does not really matter. On the problematic vCenter, there existed endpoint9 but it was for a different endpoint.

So what is the difference?

I quickly grabbed the service ID for vApi service for both the working and nonworking instance as a spec using the below command: (replace the id with the service ID on your instance)

/usr/lib/vmidentity/tools/scripts/lstool.py get --url "https://localhost/lookupservice/sdk" --id "f8ab1f69-c73a-4d94-9ac8-7bf85308954e" --no-check-cert --as-spec > /tmp/spec.txI

I observed the below endpoint where missing on the broken vCenter:

endpoint9.type.id=com.vmware.vapi.vcenter  <----------------------------
endpoint9.url=http://localhost:12346/vcenter
endpoint9.ssltrust0=
endpoint9.data0.key=com.vmware.vapi.metadata.authentication.remote
endpoint9.data0.value=http://localhost:12346/vcenter
endpoint9.data1.key=com.vmware.vapi.metadata.metamodel.remote
endpoint9.data1.value=http://localhost:12346/vcenter
endpoint9.data2.key=com.vmware.vapi.metadata.cli.remote
endpoint9.data2.value=http://localhost:12346/vcenter

On the broken vCenter VAPi endpoint spec, I added the above as endpoint12 (11 was in use), Saved the file and re-imported it back in using the below command

/usr/lib/vmidentity/tools/scripts/lstool.py reregister --spec spec.txt --url https://localhost/lookupservice/sdk --user [email protected] --password "Admin!23" --id "f8ab1f69-c73a-4d94-9ac8-7bf85308954e" --no-check-cert

Note that the value for “endpoint9.ssltrust0=” should be filled up with the contents found in the spec file

Restarted the VAPI service and VOLA! Rest API for vCenter started to work!!!!

looking for a specific record on a mssql DB

create a stored procedure using the below

CREATE PROCEDURE FindMyData_String
    @DataToFind NVARCHAR(4000),
    @ExactMatch BIT = 0
AS
SET NOCOUNT ON

DECLARE @Temp TABLE(RowId INT IDENTITY(1,1), SchemaName sysname, TableName sysname, ColumnName SysName, DataType VARCHAR(100), DataFound BIT)

    INSERT  INTO @Temp(TableName,SchemaName, ColumnName, DataType)
    SELECT  C.Table_Name,C.TABLE_SCHEMA, C.Column_Name, C.Data_Type
    FROM    Information_Schema.Columns AS C
            INNER Join Information_Schema.Tables AS T
                ON C.Table_Name = T.Table_Name
        AND C.TABLE_SCHEMA = T.TABLE_SCHEMA
    WHERE   Table_Type = 'Base Table'
            And Data_Type In ('ntext','text','nvarchar','nchar','varchar','char')


DECLARE @i INT
DECLARE @MAX INT
DECLARE @TableName sysname
DECLARE @ColumnName sysname
DECLARE @SchemaName sysname
DECLARE @SQL NVARCHAR(4000)
DECLARE @PARAMETERS NVARCHAR(4000)
DECLARE @DataExists BIT
DECLARE @SQLTemplate NVARCHAR(4000)

SELECT  @SQLTemplate = CASE WHEN @ExactMatch = 1
                            THEN 'If Exists(Select *
                                          From   ReplaceTableName
                                          Where  Convert(nVarChar(4000), [ReplaceColumnName])
                                                       = ''' + @DataToFind + '''
                                          )
                                     Set @DataExists = 1
                                 Else
                                     Set @DataExists = 0'
                            ELSE 'If Exists(Select *
                                          From   ReplaceTableName
                                          Where  Convert(nVarChar(4000), [ReplaceColumnName])
                                                       Like ''%' + @DataToFind + '%''
                                          )
                                     Set @DataExists = 1
                                 Else
                                     Set @DataExists = 0'
                            END,
        @PARAMETERS = '@DataExists Bit OUTPUT',
        @i = 1

SELECT @i = 1, @MAX = MAX(RowId)
FROM   @Temp

WHILE @i <= @MAX
    BEGIN
        SELECT  @SQL = REPLACE(REPLACE(@SQLTemplate, 'ReplaceTableName', QUOTENAME(SchemaName) + '.' + QUOTENAME(TableName)), 'ReplaceColumnName', ColumnName)
        FROM    @Temp
        WHERE   RowId = @i


        PRINT @SQL
        EXEC SP_EXECUTESQL @SQL, @PARAMETERS, @DataExists = @DataExists OUTPUT

        IF @DataExists =1
            UPDATE @Temp SET DataFound = 1 WHERE RowId = @i

        SET @i = @i + 1
    END

SELECT  SchemaName,TableName, ColumnName
FROM    @Temp
WHERE   DataFound = 1
GO

now execute the stored procedure with the object that you would like to search the DB

exec FindMyData_string 'FindME', 0

ubuntu 18.04 getting VMware guest customization to work

ubuntu 18.x is by default shipped with cloud-init/netplan that breaks when customizing the VM using vCenter custom spec. In this blog, I’ll show you how to get the customization to work with vCenter.

On a fresh install of ubuntu 18.04, create a bash script with the below contents (mine was setup using DHCP)

cleanup.sh

sudo cloud-init clean --logs
sudo touch /etc/cloud/cloud-init.disabled
sudo rm -rf /etc/netplan/50-cloud-init.yaml
sudo apt purge cloud-init -y
sudo apt autoremove -y


# Don't clear /tmp
sudo sed -i 's/D \/tmp 1777 root root -/#D \/tmp 1777 root root -/g' /usr/lib/tmpfiles.d/tmp.conf

# Remove cloud-init and rely on dbus for open-vm-tools
sudo sed -i 's/Before=cloud-init-local.service/After=dbus.service/g' /lib/systemd/system/open-vm-tools.service



# cleanup current ssh keys so templated VMs get fresh key
# sudo rm -f /etc/ssh/ssh_host_*

# add check for ssh keys on reboot...regenerate if neccessary
sudo tee /etc/rc.local >/dev/null <<EOL
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#

# By default this script does nothing.
# test -f /etc/ssh/ssh_host_dsa_key || dpkg-reconfigure openssh-server
# exit 0
EOL

# make the script executable
sudo chmod +x /etc/rc.local

# cleanup apt
sudo apt clean

# reset the machine-id (DHCP leases in 18.04 are generated based on this... not MAC...)
echo "" | sudo tee /etc/machine-id >/dev/null

# disable swap for K8s
sudo swapoff --all
sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab

# cleanup shell history and shutdown for templating
history -c
history -w
sudo shutdown -h now

Note, sometimes copy-paste can change the special characters, should that be the case, please use this link to download the file:

once the script is run, the VM should power off automatically. convert the VM to the template and then test by deploying this with a guest customization spec

Note: Do not run the command directly from putty/shell. in some cases i’ve noticed the networking on the VM goes blank causing the VM to go off-network when the netplan is being removed..

always invoke the above via the bash script local to the guest os.

any host/VM tasks performed on vCenter errors with ““A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections””

“A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections”

logs

Vpxd logs

19-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] [UpdateValuesInt] Updating stored value for property at index 2
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.cancelable.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.error.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.state.
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=vpxLro opID=27cffcd2] [VpxLRO] -- FINISH task-101413
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=Default opID=27cffcd2] [VpxLRO] -- ERROR task-101413 -- vm-1889 -- vim.VirtualMachine.powerOn: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."
-->    msg = ""
--> }
--> Args:
-->
--> Arg host:
-->
---------
---------
---------
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature Reference URI: `#_cae765cb-f129-42d3-9387-423e307ed6f2' ; is-valid: true
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Missing reference count: 0
2019-08-28T16:19:51.239-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature SignedInfo: is-valid: true
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] Successfully acquired token: SamlToken [subject={Name: vpxd-5b47a55c-75af-455c-979f-83eb915e7a61; Domain:vsphere.local}, groups=[{Name: Use
rs; Domain:vsphere.local}, {Name: SolutionUsers; Domain:vsphere.local}, {Name: SystemConfiguration.Administrators; Domain:vsphere.local}, {Name: ComponentManager.Administrators; Domain:vsphere.local}, {Name: LicenseService.Administrators
; Domain:vsphere.local}, {Name: Everyone; Domain:vsphere.local}], delegationChain=[], startTime=2019-08-28 23:19:51.204, expirationTime=2019-08-29 07:19:51.204, renewable=false, delegable=false, isSolution=true,confirmationType=1]
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000001 opID=27cffcd2-01] [PopPendingConnection] No pending connections to <cs p:00007ff888079eb0, SsoCustomConnectionSpec:vcenter-hp.vsphere.local:4
43>
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] END operation SecurityTokenServiceImpl::AcquireTokenByCertificate
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=[SSO][SsoWrapperImpl] opID=27cffcd2-01] [AcquireToken] Token acquired successfully.
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000211 opID=27cffcd2-01] [IncConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> incremented to 1
2019-08-28T16:19:51.239-07:00 warning vpxd[05398] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007ff828296790, h:86, <TCP '127.0.0.1 : 55994'>, <TCP '127.0.0.1 : 8190'>>, e: 111(Connection refused)
2019-08-28T16:19:51.239-07:00 trivia vpxd[05398] [Originator@6876 sub=Default] Setting error in state 1 : N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting con
nections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 trivia vpxd[05398] [Originator@6876 sub=HttpConnectionPool-000211] [DecConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> dec to 0
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=pbm opID=27cffcd2-01] [ConnectLocked] Failed to login to service: N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, O
R a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Get exception while executing action vpx.vmprov.CheckCompatibility: N7Vmacore9ExceptionE(Connection refused: The remote service is not running,
OR is overloaded, OR a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.241-07:00 info vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Workflow context:
--> (vpx.vmprov.MigrateContext) {
-->    cbData = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "workflow.startTime",
-->          value = 5013023961
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "pbmPreCheckSkipped",
-->          value = true

from the above snippet, it appears the connection to vcenter port: 8190 was being rejected. as per Vmware docs, port 8190 is used by profile driven storage so we take a look at profile-driven storage log:


Sps.log

2019-08-28T16:25:31.402-07:00 [main] INFO  opId=sps-Main-34727-852 com.vmware.vim.storage.common.util.PropertiesWrapper - Ignoring missing property file sps-ext.properties
2019-08-28T16:25:31.402-07:00 [main] ERROR opId=sps-Main-34727-852 com.vmware.sps.util.SpsConfiguration - Error reading the configuration file: java.lang.NumberFormatException: null

at this stage, the service refused to start pointing to an invalid entry in the configuration file. I took a look at sps.properties and it appeared to have 2 lines compared to that of a working setup.

To resolve the service startup issue, I copied the sps.properties from a working box (no changes done). I have listed the contents of this file below:

sps.properties

[‎29-‎08-‎2019 05:04 AM]  
No Title 
# IMPORTANT: To edit an entry in this file, create sps-ext.properties and specify the required key/value details.
#
# sps server port configuration
#
sps.http.port = 21000
sps.https.port = 21100
# sps server instance GUID
sps.serverGuid = ##SPS_SERVER_GUID##
# Service extension key registered with VC
sps.extensionKey = com.vmware.vim.sps
# Re-connect config to VC
# If true, SPS will retry connection to VC until success
sps.vcConnection.infiniteAttempt = false
# If infiniteAttempt is false, SPS will try to connect to VC until the number specified by attemptNumber
sps.vcConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.vcConnection.sleepInterval = 60
# Re-connect config to QS
# If true, SPS will retry connection to QS until success
sps.qsConnection.infiniteAttempt = true
# If infiniteAttempt is false, SPS will try to connect to QS until the number specified by attemptNumber
sps.qsConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.qsConnection.sleepInterval = 60
sps.queryFile = sps-xqueries.xml
sps.overWriteQsData = false
# Time in seconds to wait for the internal compliance tasks.
sps.compliance.complianceTaskWaitTime = 300
# Time in milliseconds to check for task completion for each policy blob.
sps.compliance.complianceTaskCheckInterval = 100
# VC Server GUID
vpxd.vcGuid = C89B6A4D-489E-435E-97C6-847E892F254F
# number of retries when connecting to kv service (Set -1 for infinite attempts)
sps.connectionRetryAttempts = -1
# retry intervals when connecting to kv service in seconds
sps.connectionRetryInterval = 10
# Time in seconds to wait before retrying sync policy.
sps.syncPolicy.retryWaitTime = 60
# Thread pool queue size for all sps tasks
spbm.threadpool.queueSize = 100
# Thread pool keepAlive timeout in seconds for all sps tasks
spbm.threadpool.keepAlive = 10
# Thread pool config for profile
spbm.profile.threadpool.corePoolSize = 5
spbm.profile.threadpool.maxPoolSize = 32
# Thread pool config for policy blob
spbm.policyBlob.threadpool.corePoolSize = 10
spbm.policyBlob.threadpool.maxPoolSize = 32
# Thread pool config for vendor provider
spbm.vendorProvider.threadpool.corePoolSize = 10
spbm.vendorProvider.threadpool.maxPoolSize = 32
# Thread pool config for vcquery related tasks
spbm.vcquery.threadpool.corePoolSize = 10
spbm.vcquery.threadpool.maxPoolSize = 32
# Thread pool config for VLSI thread pool
# There are two modes, auto which is computed and assigned during runtime
# and manual which can be assigned manually by setting in sps-ext.properties
spbm.vlsi.threadpool.config = auto
spbm.vlsi.threadpool.corePoolSize.manual = 10
spbm.vlsi.threadpool.corePoolSize.auto = 10
spbm.vlsi.threadpool.maxPoolSize = 50 
spbm.vlsi.threadpool.queueSize = 50
# Thread pool config for generic SPS
spbm.generic.threadpool.corePoolSize = 5
spbm.generic.threadpool.maxPoolSize = 32 

Enable TFTP on VCSA

Start TFTP service

service atftpd start

Allow TFTP port on the VCSA firewall

iptables -A port_filter -m state --state New -i eth0 -p udp --dport 69 -j ACCEPT

Confirm if the port is allowed on the firewall

iptables -nL | grep 69


Make the firewall rules persistent:

Export Ip tables rule

iptables-save > /etc/iptables.rules

Create a startup script at path: /etc/init.d/startftp.sh with the below contents:

#! /bin/sh
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd

### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO

service atftpd start
iptables-restore -c < /etc/iptables.rules

change the permissions of the script

chmod +x /etc/init.d/startftp.sh

set the script to run during startup:

chkconfig --add /etc/init.d/startftp.sh

copy the contents of TFTP from autodeploy_zip to /var/lib/tftpboot

Esxi, I node full

Use the below commands to check and delete the stale indoe

for f in $(find /var/run/vmware -type l); do if [ ! -e "$f" ]; then echo "$f"; fi; done > /tmp/suspect

 find /var/run/vmware -type l | while read f; do if [ ! -e "$f" ]; then rm -f "$f"; fi; done