Skip to main content

How to diagnose issues if something fails

1. Health Check Endpoint

Before diagnosing issues, it is important to verify the health status of the containers. The Agriconnect platform provides an endpoint to check the health of its services:

Health Check URL: https://agriconnect.akvotest.org/api/health-check 

Purpose

The health check endpoint returns the status of the containers, indicating whether they are up and running. A successful health check implies that the services are operational.

Expected Response

When all containers are healthy, the endpoint returns a JSON response similar to the following:

{
  "status": "healthy",
  "services": {
    "rabbitmq": true,
    "chromadb": true,
    "database": true,
    "assistant": true,
    "eppo-librarian": true,
    "backend": true
  }
}

Response Details

  • status: Indicates the overall health of the containers. If all services are operational, the status will be "healthy".

  • services: A list of services with their respective health status. A value of true means the service is running, while false indicates a problem with that service.

2. Diagnosing Issues

If the health check indicates that one or more services are not operational (i.e., the status of a service is false), follow these steps to diagnose and resolve the issue:

a. Identify the Affected Service(s)

Review the JSON response from the health check to pinpoint which services have a false status.

b. Check Logs

Access the logs for the affected service(s) to identify any error messages or anomalies that may indicate the cause of the issue. Since we use Google Cloud, you will need to check the logs using the Google Cloud Console. Access Google Cloud Console:

  • Go to Google Cloud Console.

  • Navigate to Kubernetes Engine > Workloads.

  • Filter Logs:

    • Use the Cluster and Namespace filters to locate the relevant logs.

    • Filter by the test cluster and agriconnect-namespace namespace to view the specific workloads.

Within the workloads, identify the container with an unhealthy status. Select this container and then navigate to the "Logs" tab to view its logs. Analyzing these logs can provide insights into the issue, such as error messages or patterns that suggest the cause. A common issue might be related to an unstable Socket.IO connection, but the logs will help pinpoint the specific problem. Use this information to begin debugging and addressing the issue.

c. Restart the Service

If the issue is not apparent from the logs, restarting the affected service(s) may resolve the problem. This can be done by deleting the pods associated with the unhealthy container. 

In the Google Cloud Console, navigate to the "Manage Pods" section for the specific container that is reported as unhealthy. Within this section, you can access the "Pods Detail" view. By deleting the pod from this view, the container will automatically restart, which may help restore it to a healthy state. This process is often effective in clearing transient issues that cause service disruptions.

3. Most Common Issues

Below are examples of some common issues that may occur on the Agriconnect platform, along with their related containers and explanations.