This was a communication that I sent out to key customer contacts earlier today to address some customers’ questions around Domino Support and business continuity given the COVID-19 pandemic. We wanted to cross-post this here to give visibility to all customers (and, if you didn't receive the email, but feel that you should - then, please let your Customer Success Manager know and we will update our contact list).
The growing coronavirus (COVID-19) outbreak continues to impact the health and well being of individuals, communities, and businesses around the world. We appreciate that you are probably facing similar challenges in your organization. As Domino is a critical vendor for many of our customers, we would like to share the steps we are taking to provide a safe and healthy environment that will allow our employees to ensure business continuity, service, and support to our customers. We also want to encourage any customers who are encountering issues using Domino because they are working remotely to reach out to our support team at [email protected] for help.
The health of our employees is our top priority. We are closely monitoring guidance from government agencies and health authorities to ensure that our policies and practices keep our teams healthy and productive. As of today (March 13th), there is no direct impact on our workforce or our ability to provide our normal level of service to our customers. And, we are happy to report that no Domino employees have tested positive for coronavirus.
Domino’s Business Continuity Plan is up-to-date and includes pandemic response procedures, as well as secondary communication methods. In the event any employees with critical job functions become affected, we have designated alternates who have appropriate access and can assume responsibility for those critical functions.
While our offices remain open at this time, the vast majority of our employees are now performing their work duties from home, and we have already halted all optional business related travel to minimize exposure. We will evaluate the developing situation on a daily basis and will transition to mandatory fully remote work if and when appropriate. Given our geographically diverse workforce, we do not anticipate a material impact in our ability to support our customers.
Thank you for your support and understanding during this critical time. If you have any additional questions for us, please reach out to your Customer Success Manager.
Chief Customer Officer
Domino Data Lab
I have been trying to get a Python Streamlit app to publish as an App, but just get an empty 'Please wait...' screen rather than the app content.
I have a
config.toml file and a
credentials.toml preconfigured to host on
0.0.0.0 and port
8888 which are moved to
~/.streamlit/ as part of a custom Post Setup script. Those were necessary to get the app sort of working and the logs to stop throwing errors, but there still appears to be issues.
Anyone else been able to get Streamlit (or any other Tornado app) to work and host in Domino?
When installing Pyspark from Pip, Anaconda or downloading from https://pypi.org/project/pyspark/ you will find that it will only work with certain versions of Hadoop (Hadoop 2.7.2 for Pyspark 2.4.3 for example.) If you are using a different version of Hadoop and try to access something like S3 from Spark you will receive errors such as
Py4JJavaError: An error occurred while calling o69.parquet: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.s3a.S3AFileSystem at java.lang.Class.forName0(Native Method)
This despite that fact you see to have the correct JARs in place. The reason for this is that Pyspark is built with a specific version of Hadoop by default. In order to work around this you will need to install the "no hadoop" verison of Spark, build the Pyspark installation bundle from that, install it, then install the Hadoop core libraries needed and point Pyspark at those libraries.
Below is a dockerfile to do just this using Spark 2.4.3 and Hadoop 2.8.5:
# # Download Spark 2.4.3 WITHOUT Hadoop. # Unpack and move to /usr/local and make symlink from /usr/local/spark to specific Spark version # Build and install Pyspark from this download # RUN wget http://archive.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-without-hadoop.tgz && \ tar xzvf spark-2.4.3-bin-without-hadoop.tgz && \ mv spark-2.4.3-bin-without-hadoop /usr/local/spark-2.4.3 && \ ln -s /usr/local/spark-2.4.3 /usr/local/spark && \ cd /usr/local/spark/python && \ python setup.py sdist && \ pip install dist/pyspark-2.4.3.tar.gz # # Download core Hadoop 2.8.5 # Unpack and move to /usr/local and make symlink from /usr/local/hadoop to specific Hadoop version # RUN wget https://www.apache.org/dist/hadoop/core/hadoop-2.8.5/hadoop-2.8.5.tar.gz && \ tar xzvf hadoop-2.8.5.tar.gz && \ mv hadoop-2.8.5 /usr/local && \ ln -s /usr/local/hadoop-2.8.5 /usr/local/hadoop # # Download correct AWS Java SDK for S3 for Hadoop version being used # and put in Hadoop install location # RUN wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar && \ mv aws-java-sdk-s3-1.10.6.jar /usr/local/hadoop/share/hadoop/tools/lib/ # # Set SPARK_DIST_CLASSPATH in $SPARK_HOME/conf/spark-env.sh to point to Hadoop install # RUN echo 'export SPARK_DIST_CLASSPATH="/usr/local/hadoop/etc/hadoop:'\ '/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:'\ '/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:'\ '/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:'\ '/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:'\ '/usr/local/hadoop/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:'\ '/usr/local/hadoop/share/hadoop/tools/lib/*"' >> /usr/local/spark/conf/spark-env.sh
The key parts of this dockerfile are where the Pyspark package is built and installed:
cd /usr/local/spark/python && \ python setup.py sdist && \ pip install dist/pyspark-2.4.3.tar.gz
and ensuring you have the correct AWS S3 SDK JAR to match your Hadoop version:
RUN wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar && \ mv aws-java-sdk-s3-1.10.6.jar /usr/local/hadoop/share/hadoop/tools/lib/
To find the correct AWS S3 SDK JAR version you should go to the Maven Repository page for the hadoop-aws.jar file for the Hadoop version you're installing (in the example above this is hadoop-aws-2.8.5jar so the page is https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.8.5) and look at the AWS S3 SDK JAR version in the Compile Dependencies section In this case it is aws-java-sdk-s3-1.10.6.jar.
SPARK_DIST_CLASSPATH is set to your
HADOOP_CLASSPATH as output by the
hadoop classpath command.
Please see this guide for adding git repositories to Domino.
If you want to add a repo from a git server with ssh configured to another port (in this example, it's 12345) you will need to use the below format:
If you have any issues setting this up, please let us know!
Note: Don't forget to add your git credentials for the domain in question.