Development

If you look in the doc/README.md that description is probably better.

Two types:

dagster dev - Dagster runs the UI in development mode
Container based. This uses docker and locally deployed containers

!!! note NOTE, the Dagster and the Code containers need to be the same. For local development images are named dagster-local:latest and code containers named dagster-gleanerio-local:latest and built in the compose_local.yaml for production, * dagster named: nsfearthcube/dagster-gleanerio:${CONTAINER_DAGSTER_TAG:-latest} * code containers are named nsfearthcube/dagster-gleanerio-workflow:${CONTAINER_TAG:-latest}

DAGSTER DEV

At the top level (dagster/implents) you can run

dagster dev

You need to set the environment based on dagster/implnets/deployment/envFile.env

For local development, these environment variable needs to be set

DAGSTER_HOME=dagster/dagster_home
DAGSTER_LOCAL_ARTIFACT_STORAGE_DIR=/Users/valentin/development/dev_earthcube/scheduler/dagster/dagster_home/

It should run workflows/tasks/tasks

defined in the pyproject.toml

[tool.dagster]
module_name = "workflows.tasks.tasks"

Setting up in pycharm

you can add runconfigs in pycharm

You should/need to add the envFile plug in so that env files can be

testing tasks

cd dagster/implnets/workflows/tasks You need to set the environment based on dagster/implnets/deployment/envFile.env

export $(sed '/^[ \t]*#/d' ../../deployment/.env | sed '/^$/d' | xargs)

dagster dev

will run just the task, and in editable form, i think.

testing by materializing assets

Env variables:

DAGSTER_LOCAL_ARTIFACT_STORAGE_DIR=/Users/valentin/development/dev_earthcube/scheduler/dagster/dagster_home/
GLEANERIO_GLEANER_CONFIG_PATH=/Users/valentin/development/dev_earthcube/scheduler/dagster/implnets/configs/eco/gleanerconfig.yaml
PROJECT=test

To materialize an asset from teh command line, you will probably need to materialize the assets it uses (at least the first time) (might need test_task... )

python -m dagster asset materialize -m tasks --select task/task_tenant_sources,task/loadstatsCommunity --partition dev

python -m dagster asset materialize -m tasks --select task/source_list,task/loadstatsHistory

jobs: https://docs.dagster.io/concepts/ops-jobs-graphs/job-execution#dagster-ui python -m dagster job list

dagster dagster job execute eco_summon_and_release_job --partition geocodes_demo_data

TESTING CONTAINERS

Containers are a well tested approach. We deploy these container to production, so it's a good way to test. There are a set of required files:

env variables file
gleaner/nabu configuration files, without any passwords, servers. Those are handled in the env variables
docker compose file
docker networks and volumes for the compose files
three files uploaded to docker as configs
- workspace.yaml -- dagster

PORTAINER API KEY

note on how to do this. ''

Start

For production environments, script, dagster_setup_docker.sh should create the networks, volumes, and upload configuration files

setup a project in configs directory, if one des not exist 2) add gleanerconfig.yaml, nabuconfig.yaml~~, and workspace.yaml~~ (NOTE NEED A TEMPLATE FOR THIS)
copy envFile.env to .env, and edit
run ./dagster_localrun.sh
go to https://loclahost:3000/
run a small test dataset.

cd dagster/implnets/deployment
cp envFile.env .env
# configure environment in .env 

./dagster_localrun.sh

If you look in dagster_localrun.sh you can see that the $PROJECT variable is used to define what files to use, and define, and to setup a separate 'namespace' in traefik labels.

If you look in compose_local_eco_override.yaml you can see that additional mounts are added to the containers.

These can be customized in the compose_local_PROJECT_override.yaml for local development.

customizing the configs

for local development three configs

configs/PROJECT/gleanerconfigs.yaml gleaner/nabu
configs/PROJECT/nabuconfigs.yaml - gleaner/nabu
configs/local/workspace.yaml -- dagster

Editing/testing code

if you run pygen, then you need to regnerate code. the makefile or a pycharm run config is the best way.

MOVING TO PRODUCTION

you need to deploy a compose_project.yaml, and a compose_project_ingest.yaml If a dagster scheduler is already running, you can deploy just the compose_project_ingest.yaml In future we hope to run multiple compose_project_ingest.yaml

After creating a compose_project_ingest.yaml stack

clone and edit a workspace.yaml to docker config (this is for an eco project)

load_from:

      - grpc_server:
            host: dagster-code-eco-tasks
            port: 4000
            location_name: "eco-tasks"
      - grpc_server:
            host: dagster-code-eco-ingest
            port: 4000
            location_name: "eco-ingest"

change the env variable GLEANERIO_DOCKER_WORKSPACE_CONFIG to point to that config. push 'save' button
push pull and redeploy button

command line deploy

docker compose -env .env -f compose_project.yaml  up
docker compose -env .env -f compose_project_ingest.yaml  up