Data Engineering with Python
(1 Student Review)
- Release Date: 03rd July 2025
- Last Updated: 03rd July 2025
Course Overview
This intensive course focuses on the principles and practices of data engineering using Python, enabling participants to build, maintain, and optimize data pipelines for analytics and business intelligence applications. It blends theoretical concepts with hands-on programming to prepare learners for roles in data management, ETL processes, and big data environments.
Students will gain proficiency in Python programming geared towards data manipulation, automation, and workflow orchestration. The curriculum covers database interaction, API integration, cloud storage, and distributed data processing frameworks such as Apache Spark. Emphasis is placed on writing efficient, scalable, and maintainable code to handle large volumes of structured and unstructured data.
By the program’s end, participants will be able to design and implement robust data architectures that serve analytical and operational needs in data-driven organizations.
Prerequisites
- Basic Python programming knowledge
- Familiarity with databases and SQL
- Understanding of data formats like JSON and CSV
Program Outcomes
- Develop data ingestion, transformation, and loading (ETL) pipelines
- Work with relational and NoSQL databases programmatically
- Automate workflows and schedule data jobs
- Utilize cloud platforms and big data tools for scalable processing
- Ensure data quality, integrity, and security in engineering tasks
Program Coverage
Duration: 08 Hours
Python program structure
data types
expressions
logic application
Duration: 08 Hours
Modules
pacages
data structures
problem-solving
Duration: 08 Hours
Built-in libraries, OOPs concepts, encapsulation, abstraction, inheritance, polymorphism
Duration: 04 Hours
Data observation, pre-processing methods, data cleaning
Duration: 02 Hours
Schema, tables, relations, keys, stored procedures, functions
Duration: 04 Hours
Data sources, metadata, data integration, scalability
Duration: 02 Hours
Data flow, data management blueprint
Duration: 02 Hours
Data storage, data retrieval, datamarts scope
Duration: 04 Hours
Fact table, dimension table, in-memory processing, fault tolerance
Duration: 04 Hours
Spark architecture, PySpark functionality
Duration: 04 Hours
Data analytics, patterns, trends
Duration: 08 Hours
Visualization tools, dashboards creation, automation
Duration: 02 Hours
GDPR principles, data categorization, data validation
VILT (Virtual Instructor Led)
SDL (Self Directed Learning)
This course includes:
- 13 Modules
- 60 Hours
- HCL Certification
Investment:
- LKR 24,000.00