New Release
Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
the following is a revised edition.
We added a summarised version below for those who prefer the written word, made easy for you to skim and record top insights! 📝
Additional note from community moderators: We’re presenting the insights as-is and do not promote any specific tool, platform, or brand. This is to simply share raw experiences and opinions from actual voices in the analytics space to further discussions.
Prefer watching over listening? Watch the Full Episode here ⚡️
Abhinav Singh is a recognised Data Engineer with a strong community presence of over 40K engineers and analysts following his lead. He has over five years of experience in end-to-end ETL processes, pipeline development, reporting, and analysis across Real estate, Telecom, and Insurance industries! Abhinav has consistently collaborated with Fortune 500 companies, helping them solve complex business problems and enhance their data infrastructure. We highly appreciate him joining the MD101 initiative and sharing his much-valued insights with us!
We’ve covered a RANGE of topics with Abhinav. Dive in! 🤿
Before diving in, sign up to get notified when Episode 5 goes LIVE! ⏺️
I bring over five years of expertise in data engineering and analytics, working for Fortune 500 companies across diverse domains such as telecom, real estate, and insurance. Currently, I am building a robust data infrastructure and reporting setup for a U.S.-based insurance client focused on casualty coverage for truck drivers. My technical toolkit includes cloud platforms, Spark, SQL, Python, and PySpark, which are integral to delivering scalable and efficient solutions.
Cloud is undeniably the future of data engineering, and it’s essential for both freshers and experienced professionals to prioritize mastering it. The beauty of cloud platforms lies in their ability to handle administrative tasks like auto-scaling and infrastructure management, allowing engineers to focus on IT development—the core of impactful projects. With industries rapidly adopting cloud-based solutions, except for a few like finance and banking, it’s clear that most projects will soon migrate to the cloud. Embracing this shift is key to staying relevant in the field.
First of all, when working with on-prem data infrastructure, you must handle administrative tasks, manage logging properties, and oversee the entire infrastructure. In contrast, cloud infrastructure simplifies this significantly. You only need to focus on learning specific services or technologies relevant to your domain—for example, in data engineering on Azure, you’d work with tools like Azure Data Lake Storage (ADLS) for storage, Azure Data Factory for ETL/ELT, and Synapse Analytics for processing.
These services are easy to learn, provided you have strong fundamentals in programming and database concepts. Once you grasp one cloud platform, transitioning to others becomes straightforward, as their services are similar, differing mainly in infrastructure and UI.
Security is a prime factor when dealing with sensitive data, and data masking mechanisms play a crucial role in addressing this. Traditionally, industries like banking and finance were hesitant to move to the cloud due to security concerns. However, with major cloud providers obtaining certifications like HIPAA and CPAA, clients are becoming more aware and confident, making cloud adoption increasingly sensible.
The approach to data engineering differs between large enterprises and startups, primarily in scope and learning opportunities. In large enterprises, like Fortune 500 companies, you’re often part of a big team, working on a specific aspect of a project for months, which might limit exposure to end-to-end architecture and development. In contrast, startups typically offer opportunities to build projects from scratch, providing a different and more holistic learning experience. Each has its pros and cons, and the choice depends on the work culture and learning path you want at that point in your career.
If I were starting my data engineering journey today, the first skill I'd focus on is SQL. It's the foundation for querying databases and understanding how data flows. From there, I'd prioritize learning Python, as it's versatile and widely used in the industry. Next, I'd explore big data technologies like Apache Hadoop or Spark to understand how to process large-scale data. Finally, I'd dive into cloud platforms like AWS, Azure, or GCP because cloud services are the future of data engineering, solving challenges like security and infrastructure management. While on-premise projects can help you grasp fundamentals, transitioning to cloud projects early is essential for staying ahead.
To become a data engineer, the path starts with mastering SQL, progresses through learning Python, and then moves to big data technologies like Apache Spark before concluding with cloud services.
Each step builds on the previous one, creating a comprehensive skill set for tackling data engineering challenges effectively.
The core data engineering projects I've worked on have been significant learning experiences. My expertise lies primarily in building data pipelines on Azure Cloud, though I’ve also explored AWS through POCs. For ELT ingestion, I’ve used Azure Data Factory to bring data from multiple sources into a Data Lake, layered with a medallion architecture. My toolkit includes Databricks, Spark, and Azure DevOps for CI/CD processes. On the fundamentals side, I frequently use Python, have some experience with Scala, and work with PySpark and SQL. These technologies form the foundation of my work.
To grow and expand your knowledge effectively, you must become like a sponge—absorbing insights from peers, observing their approaches, and learning from their expertise. Engage with colleagues from diverse backgrounds, examine their code, documentation, error handling, and even how they communicate through blogging. Apply these observations in your own style. For example, I once worked with a team lead with 12 years of experience who taught me how to interact with clients, gather requirements, and pitch the right technologies. By interacting, observing, and adapting, you stay ahead in a rapidly evolving tech landscape.
To excel in data engineering, focus on three key areas:
It’s difficult to enter the data domain without mastering these areas, especially the technical documentation, which often feels cumbersome. As someone who was once a beginner, I try to simplify concepts to make them accessible to others, aiming to be the mentor I once needed.
Data engineering is evolving towards data lakehouse architectures and unified platforms. The trend is moving away from using separate tools for different tasks (data ingestion, visualization, etc.) towards integrated solutions. Companies are now offering unified platforms, with products like Databricks, Snowflake, and Microsoft Fabric leading the charge. These platforms combine multiple capabilities into one, streamlining processes. Databricks, in particular, are a key technology to watch in the next four to five years, as it is expected to become a dominant tool in the field.
To ensure data privacy, particularly in sectors like banking and finance, data masking is essential. For sensitive information such as credit card details, dates of birth, and addresses, tools like Databricks offer native functionalities to mask data in your code. It’s a must-do practice to handle sensitive data responsibly.
A data product is a tool, service, or platform that utilizes a company's data to provide insights or streamline processes. For example, in my current scenario, a platform that handles reporting for different cases of a client could be considered a data product.
To apply product thinking to data, the first step is to understand the business use case—what problem the business is trying to solve. For beginners, this is crucial. Once you grasp the business problem, you can determine the approach to solve it. This might involve coding or selecting the right technology. For example, if tasked with a project that has the purpose X, focus on how to reach X, break it down into steps, and choose the technology best suited to build a solution for that path.
As a data engineer, it's essential to understand the business use cases. For instance, if the goal is to create a reporting platform, you need to know the KPIs that will be displayed. Once you understand the KPIs, you can design the data model to support them and build the reporting infrastructure. After that, develop a scalable data pipeline that supports the problem statement and ensures the reporting system can handle changing data over time. The pipeline should be flexible and adaptable to evolving data needs.
It will provide analytics tailored to your specific company or data project. If built correctly and aligned with its intended purpose, it will offer valuable insights. For example, metrics like employee count or other KPIs can be used to develop further data products or solutions. The interconnectivity of data products is also crucial, as various teams, personas, and stakeholders are involved in decision-making. When these components communicate and work together, it significantly enhances the process.
I’m super into fitness right now—it’s like my whole day and nutrition revolve around it. Honestly, it’s been my anchor over the past year. Besides that, I’m all about personal development—whether it’s health or finance, I love working on myself.
Speaking of good vibes, I had an amazing 10-day trip to Meghalaya recently. It’s such a stunning place—the hills, the waterfalls, just nature everywhere. It’s like the perfect recharge spot.
If I could pick a superpower, it’d definitely be mind-reading. I know it’s kind of cliché, but it’d make life so much easier—both personally and professionally.
And yeah, I really enjoyed this conversation and hope what I shared helps people looking to transition into data roles!
📝 Note from Editor
The above insights are summarised versions of Abhinav Singh’s actual dialogue. Feel free to refer to the transcript or play the audio/video to capture the true essence and details of his as-is insights. There’s also a lot more information and hidden bytes of wonder in the interview, listen in for a treat!
Thanks for reading Modern Data 101! Subscribe for free to receive new posts and support our work.
Connect with me on LinkedIn 🙌🏻