any host/VM tasks performed on vCenter errors with ““A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections””

“A general system error occurred: Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections”

logs

Vpxd logs

19-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] [UpdateValuesInt] Updating stored value for property at index 2
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.cancelable.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.error.
2019-08-28T16:19:51.247-07:00 trivia vpxd[05386] [Originator@6876 sub=PropertyProvider opID=27cffcd2] RecordOpInt called for info.state.
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=vpxLro opID=27cffcd2] [VpxLRO] -- FINISH task-101413
2019-08-28T16:19:51.247-07:00 info vpxd[05386] [Originator@6876 sub=Default opID=27cffcd2] [VpxLRO] -- ERROR task-101413 -- vm-1889 -- vim.VirtualMachine.powerOn: vmodl.fault.SystemError:
--> Result:
--> (vmodl.fault.SystemError) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>,
-->    reason = "Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting connections."
-->    msg = ""
--> }
--> Args:
-->
--> Arg host:
-->
---------
---------
---------
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature Reference URI: `#_cae765cb-f129-42d3-9387-423e307ed6f2' ; is-valid: true
2019-08-28T16:19:51.238-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Missing reference count: 0
2019-08-28T16:19:51.239-07:00 verbose vpxd[05386] [Originator@6876 sub=Vmacore::Xml::Security opID=27cffcd2-01] Verification of signature SignedInfo: is-valid: true
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] Successfully acquired token: SamlToken [subject={Name: vpxd-5b47a55c-75af-455c-979f-83eb915e7a61; Domain:vsphere.local}, groups=[{Name: Use
rs; Domain:vsphere.local}, {Name: SolutionUsers; Domain:vsphere.local}, {Name: SystemConfiguration.Administrators; Domain:vsphere.local}, {Name: ComponentManager.Administrators; Domain:vsphere.local}, {Name: LicenseService.Administrators
; Domain:vsphere.local}, {Name: Everyone; Domain:vsphere.local}], delegationChain=[], startTime=2019-08-28 23:19:51.204, expirationTime=2019-08-29 07:19:51.204, renewable=false, delegable=false, isSolution=true,confirmationType=1]
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000001 opID=27cffcd2-01] [PopPendingConnection] No pending connections to <cs p:00007ff888079eb0, SsoCustomConnectionSpec:vcenter-hp.vsphere.local:4
43>
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=SsoClient opID=27cffcd2-01] END operation SecurityTokenServiceImpl::AcquireTokenByCertificate
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=[SSO][SsoWrapperImpl] opID=27cffcd2-01] [AcquireToken] Token acquired successfully.
2019-08-28T16:19:51.239-07:00 trivia vpxd[05386] [Originator@6876 sub=HttpConnectionPool-000211 opID=27cffcd2-01] [IncConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> incremented to 1
2019-08-28T16:19:51.239-07:00 warning vpxd[05398] [Originator@6876 sub=Default] Failed to connect socket; <io_obj p:0x00007ff828296790, h:86, <TCP '127.0.0.1 : 55994'>, <TCP '127.0.0.1 : 8190'>>, e: 111(Connection refused)
2019-08-28T16:19:51.239-07:00 trivia vpxd[05398] [Originator@6876 sub=Default] Setting error in state 1 : N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, OR a firewall is rejecting con
nections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 trivia vpxd[05398] [Originator@6876 sub=HttpConnectionPool-000211] [DecConnectionCount] Number of connections to <cs p:00007ff8442f8a10, TCP:localhost:8190> dec to 0
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=pbm opID=27cffcd2-01] [ConnectLocked] Failed to login to service: N7Vmacore15SystemExceptionE(Connection refused: The remote service is not running, OR is overloaded, O
R a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.240-07:00 error vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Get exception while executing action vpx.vmprov.CheckCompatibility: N7Vmacore9ExceptionE(Connection refused: The remote service is not running,
OR is overloaded, OR a firewall is rejecting connections.)
--> [context]zKq7AVECAAAAAEk92wAOdnB4ZAAA4AArbGlidm1hY29yZS5zbwAAWCUbAP6dGADHOCMADN0lAFTlJQDqASYAsQsmADmiIwBxbyMAOnIjAJ1WKwHUcwBsaWJwdGhyZWFkLnNvLjAAAt2ODmxpYmMuc28uNgA=[/context]
2019-08-28T16:19:51.241-07:00 info vpxd[05386] [Originator@6876 sub=VmProv opID=27cffcd2-01] Workflow context:
--> (vpx.vmprov.MigrateContext) {
-->    cbData = (vmodl.KeyAnyValue) [
-->       (vmodl.KeyAnyValue) {
-->          key = "workflow.startTime",
-->          value = 5013023961
-->       },
-->       (vmodl.KeyAnyValue) {
-->          key = "pbmPreCheckSkipped",
-->          value = true

from the above snippet, it appears the connection to vcenter port: 8190 was being rejected. as per Vmware docs, port 8190 is used by profile driven storage so we take a look at profile-driven storage log:


Sps.log

2019-08-28T16:25:31.402-07:00 [main] INFO  opId=sps-Main-34727-852 com.vmware.vim.storage.common.util.PropertiesWrapper - Ignoring missing property file sps-ext.properties
2019-08-28T16:25:31.402-07:00 [main] ERROR opId=sps-Main-34727-852 com.vmware.sps.util.SpsConfiguration - Error reading the configuration file: java.lang.NumberFormatException: null

at this stage, the service refused to start pointing to an invalid entry in the configuration file. I took a look at sps.properties and it appeared to have 2 lines compared to that of a working setup.

To resolve the service startup issue, I copied the sps.properties from a working box (no changes done). I have listed the contents of this file below:

sps.properties

[‎29-‎08-‎2019 05:04 AM]  
No Title 
# IMPORTANT: To edit an entry in this file, create sps-ext.properties and specify the required key/value details.
#
# sps server port configuration
#
sps.http.port = 21000
sps.https.port = 21100
# sps server instance GUID
sps.serverGuid = ##SPS_SERVER_GUID##
# Service extension key registered with VC
sps.extensionKey = com.vmware.vim.sps
# Re-connect config to VC
# If true, SPS will retry connection to VC until success
sps.vcConnection.infiniteAttempt = false
# If infiniteAttempt is false, SPS will try to connect to VC until the number specified by attemptNumber
sps.vcConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.vcConnection.sleepInterval = 60
# Re-connect config to QS
# If true, SPS will retry connection to QS until success
sps.qsConnection.infiniteAttempt = true
# If infiniteAttempt is false, SPS will try to connect to QS until the number specified by attemptNumber
sps.qsConnection.attemptNumber = 10
# Wait time for next retry connection, the unit is seconds
sps.qsConnection.sleepInterval = 60
sps.queryFile = sps-xqueries.xml
sps.overWriteQsData = false
# Time in seconds to wait for the internal compliance tasks.
sps.compliance.complianceTaskWaitTime = 300
# Time in milliseconds to check for task completion for each policy blob.
sps.compliance.complianceTaskCheckInterval = 100
# VC Server GUID
vpxd.vcGuid = C89B6A4D-489E-435E-97C6-847E892F254F
# number of retries when connecting to kv service (Set -1 for infinite attempts)
sps.connectionRetryAttempts = -1
# retry intervals when connecting to kv service in seconds
sps.connectionRetryInterval = 10
# Time in seconds to wait before retrying sync policy.
sps.syncPolicy.retryWaitTime = 60
# Thread pool queue size for all sps tasks
spbm.threadpool.queueSize = 100
# Thread pool keepAlive timeout in seconds for all sps tasks
spbm.threadpool.keepAlive = 10
# Thread pool config for profile
spbm.profile.threadpool.corePoolSize = 5
spbm.profile.threadpool.maxPoolSize = 32
# Thread pool config for policy blob
spbm.policyBlob.threadpool.corePoolSize = 10
spbm.policyBlob.threadpool.maxPoolSize = 32
# Thread pool config for vendor provider
spbm.vendorProvider.threadpool.corePoolSize = 10
spbm.vendorProvider.threadpool.maxPoolSize = 32
# Thread pool config for vcquery related tasks
spbm.vcquery.threadpool.corePoolSize = 10
spbm.vcquery.threadpool.maxPoolSize = 32
# Thread pool config for VLSI thread pool
# There are two modes, auto which is computed and assigned during runtime
# and manual which can be assigned manually by setting in sps-ext.properties
spbm.vlsi.threadpool.config = auto
spbm.vlsi.threadpool.corePoolSize.manual = 10
spbm.vlsi.threadpool.corePoolSize.auto = 10
spbm.vlsi.threadpool.maxPoolSize = 50 
spbm.vlsi.threadpool.queueSize = 50
# Thread pool config for generic SPS
spbm.generic.threadpool.corePoolSize = 5
spbm.generic.threadpool.maxPoolSize = 32 

Enable TFTP on VCSA

Start TFTP service

service atftpd start

Allow TFTP port on the VCSA firewall

iptables -A port_filter -m state --state New -i eth0 -p udp --dport 69 -j ACCEPT

Confirm if the port is allowed on the firewall

iptables -nL | grep 69


Make the firewall rules persistent:

Export Ip tables rule

iptables-save > /etc/iptables.rules

Create a startup script at path: /etc/init.d/startftp.sh with the below contents:

#! /bin/sh
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd

### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO

service atftpd start
iptables-restore -c < /etc/iptables.rules

change the permissions of the script

chmod +x /etc/init.d/startftp.sh

set the script to run during startup:

chkconfig --add /etc/init.d/startftp.sh

copy the contents of TFTP from autodeploy_zip to /var/lib/tftpboot