Skip to content

Monitoring workfows

scheduler interface

check the run interface

http://localhost:3000/runs

If there is a failure, click on the runid of the run, then you can look at the run log

portainer status

The gleaner and nabu are run as services are prefixed with sch_ pattern is

sch_PROJECT_step

so if it looks like something is not working, find the container starting with sch_project_step,

then go into a terminal,

you may need to use bin/sh

cd logs
ls -l 
tail some log name

Issues

--- title: Some Debugging Logic --- flowchart TB Failed[Failed] 409_error[HTTP 409 error - dupe container] source_not_found[Missing source in a gleaner config two for now] logs[logs in minio] s3[ did data make it to s3]

why did it fail

Server: * is disk full... [find in minio logs] Basic Source issues * does sitemap exist [ check url] * do urls in sitemap have jsonld [ from sitemap check some urls on validator.schema.org ] * are the JSONLD's types we handle [@type dataset, datacatalog]

Basic Failure in gleaner * Nonexistent partition keys [ add source gleaner configs, and run s3 job manually if needed ] * 409_error[HTTP 409 error - dupe container running in portainer, remove old container] * source_not_found[Missing source in a gleaner config two for now] * logs[logs in minio] * s3[ did data make it to s3]

release * is there data in s3 * is there a release file in s3

summarize * is there data in s3 * is this data a dataset?

Sitemap issue

``{"file":"/home/runner/work/gleaner/gleaner/internal/summoner/acquire/resources.go:75","func":"github.com/gleanerio/gleaner/internal/summoner/acquire.ResourceURLs","level":"error","msg":"Error getting sitemap urls for: wodbXML syntax error on line 1800: element \u003clink\u003e closed by \u003c/head\u003e","time":"2024-11-21T17:44:43Z"} logs/gleaner-runstats-2024-11-21-17-44-43.log0000644000000000000000000000054114717670613016147 0ustar0000000000000000RunStats: Start: 2024-11-21 17:44:41.875835094 +0000 UTC m=+0.627571534 Reason: Complete Soruce: - name: wodb Start: 2024-11-21 17:44:43.81414505 +0000 UTC m=+2.565881545 End: 2024-11-21 17:44:43.874671482 +0000 UTC m=+2.626407982 SitemapCount: 0 SitemapHttpError: 0 SitemapIssues: 0 SitemapSummoned: 0 logs/repo-wodb-loaded-2024-11-21-17-44-43.log0000644000000000000000000000010414717670613015770 0ustar0000000000000000level=info msg="Queuing URLs for wodb" level=info msg="URL Count 0" logs/repo-wodb-stats-2024-11-21-17-44-43.log0000644000000000000000000000041114717670613015677 0ustar0000000000000000SourceStats:

409_error

this will be found in the error in dagster

source_not_found

this is found in the logs for a gleaner on minio {"file":"/home/runner/work/gleaner/gleaner/cmd/gleaner/main.go:125","func":"main.main","level":"error","msg":"CAUTION: no matching source, did your -source VALUE match a sources.name VALUE in your config file?","time":"2024-11-21T17:46:22Z"}

sumarize is there data

no data returned from a summary qeury error in dagster loading Summary graph failed. argument of type 'numpy.float64' is not iterable