
Learn to write unit tests for sim engines, using Excel power queries for joins (including left joins) and group bys, and set up PyCharm testing with debugging and virtual environments.
Apply groupby and join on small data to validate results in Python or Spark SQL, and debug Spark sessions while managing Hadoop and Hive views in a local virtual environment.
Configure PyCharm for Python testing with pytest, set up source and test folders, ignore the virtual environment, and manage installs via pip, setup.py, or setup.sh, including git bash on Windows.
Explore different types of code, including back end, full stack, UI based, and functional code, and unit test old code in the full stack framework with task and workflow code.
Learn task based unit tests where workflows orchestrate input and reference tasks, keeping spark session inside functions, converting csv to pandas to spark data frames, and comparing frames.
Explore Windows local environment variables like Hadoop home, Java home, and Spark home that power full stack frameworks and Hive metastore, and how backend switches impact testing caches.
Learn to translate Python test workflows into Excel by mimicking input handling, csv joins and group by operations, enabling non-programmers to test code using Excel and Power BI tools.
Use a pandas notebook to mimic PySpark code and recreate data outputs, enabling independent verification and testing of SimEng Python code conversions for big data workflows.
Explore related courses that share the same framework, including full stack back end engines, Monte Carlo simulation, remote independent working, comprehensive testing, and Python big data, to prepare for positions.
Interview Prep: Writing Tests for SimEng - Code conversion concepts
Write unit test for prevailing code in Python Pyspark SQL and configure tests in Pycharm
What you will learn:
How to write unit test for prevailing code in Python Pyspark SQL
How to use Excel power queries
How to setup Pycharm, venv, unit testing, coverage for testing
How to write code for smaller and bigger function class task workflows
How to setup correct venv locally
Intro to code release process
Setting up local env variables
Topics:
In Excel power queries we use mostly group by and join on small data to check our results and share with seniors.
We can create a simpler notebook of the same logic of the code
A lot of type the errors are due to Local Spark sessions which can be created in many many ways, but if we want to create spark sql views which uses hive then we have to install a hadoop and spard by downloading them.
Debugging and creating break points and then using step into / step over to generate the output and match or save it with excel
Making sure we mimic spark from older tests because otherwise we can get error in Jenkins build