Is it possible to fetch the latest snapshot of an input dataset in the middle of a Python script?

jws383jws383 Member Posts: 4

I want to be able to access the latest snapshot of a Domino data set pragmatically within the script and then load files into the python script. Is this possible using the Domino API?

Tagged:

Comments

  • dan.sterndan.stern Member, Moderator, Domino Posts: 25 mod

    Hi jws383,

    The files in the latest snapshot from a dataset that is attached to your project should be jut available to your python script in the /domino/datasets/ folder. Let me know if I'm misunderstanding the workflow you are asking about here.

    Dan.

  • jws383jws383 Member Posts: 4

    Hi Dan,

    I realize I didn't explain the context very well. In script A, I want to be able to refresh the dataset to get the most current snapshot with a function call in the script. Another script B will be adding files to the dataset script A is accessing in real-time, so in script A, I want to be able to check if any new files were added by script B before doing more processing.

    Jacob

  • melanie.vealemelanie.veale Member, Domino Posts: 14

    Hi Jacob,

    It's not currently possible to refresh a Dataset in a running session (or to mount an existing Snapshot after the session starts). Only the Snapshots specified when you launch the session will be available. Since Snapshots are read-only once created, and not accessible across sessions until then, they are not really suited for truly streaming data workflows. But there might be some things you can do:

    If you are operating within interactive Workspaces you might find the scratch space useful (https://docs.dominodatalab.com/en/4.2/reference/data/datasets/Datasets_Scratch_Spaces.html).

    Or, if new files are not actually added very often, but you want to have a long-running script A "watching" for them to do some processing, you might find the Domino API useful (https://docs.dominodatalab.com/en/4.2/api/Domino_API.html). Using the API you can check the list of Dataset Snapshots, and if there is a new one, launch a separate Job via an API call in your code to do the processing on the new snapshot.

    Let us know if that answers things for you!

    Melanie

Sign In or Register to comment.