The Future is Data-Driven: Your Pathway to AI Proficiency

Tat Yuen
Mar 4
6 min read

I was motivated to dive deep into machine learning after taking Andrew Ng's Coursera course titled, "AI for Everyone". Launched in 2019, the course is a non-technical introduction to AI, what it can and can not do, and the societal impacts it may have.

It's a must-read for any leader especially if you are planning your first use case for AI. Learn about restructuring your organisation and workflows to execute with managed risk.

I have taken other courses related to AI including from the University of Michigan and IBM to get different perspectives as to how the subject is taught.

And here's some of what I've learned. AI and Data is a broad topic and it must be taken in context.

Being broad in scope also means that there are many roles involved requiring different skills and varying levels of proficiency.

Diagram Courtesy of Ravit Jain

And here is a typical "Data Pipeline" minus the discovery interaction with stakeholders.

Note that it has characteristics of design thinking and is an iterative process. And when building and comparing the accuracy and confidence levels of model outputs, hyperparameters can be tuned to improve results. Sometimes the marginal improvements in accuracy may not be worth the extra computational expense.

Levels of Abstraction and the Human-Machine Interface

It may seem daunting but not to worry. Since I first started following trends in AI about five years ago, software solutions have advanced tremendously making every aspect of data science easier and more accessible for those who are not developers but can read code. AI is helping developers build better tools with levels of abstraction that is changing the human-machine interface (HMI) making data science accessible to the layman.

There are so many AutoML tools (Automated Machine Learning) available today to make the development and deployment of machine learning models more efficient, accessible, and scalable. But you will still have to have a good grasp of AI principles for the drag and drop user interfaces to be useful.

Python libraries are being developed at a rapid pace. There's a Python plugin for Excel and the dataprep library makes EDA (exploratory data analysis) much easier. There are even drag and drop apps for building models. Jupyterlab is a big improvement over Jupyter notebooks and it's available in the cloud. This notebook style of working is the way to go if you want to get into data science. So what knowledge and skills do you need to get started? Let's start from the basics.

Computer Literacy

You will need to learn how to use a computer. Choose what you like and what you can afford. Unix, Windows, or Mac. It's you learning tool so get very familiar with it and that includes using the browser (yes, I have to say that). I use a M1 Mac. No GPU, no problems. For heavy lifting thre Google Colabs and you can use the free tier or pay as you need. I find Macs easier to use but it's a personal preference (it comes installed with Python but be mindful of the version).

Data Literacy

Data literacy means being able to read, analyze, and communicate data effectively. You need to have the skills and knowledge to get meaningful insights from data and use that information to make decisions and solve problems.

Understanding data: What does that data represent? How it is collected? What are its limitations and biases? This includes understanding data types, formats, and structures like databases, spreadsheet data, XML, JSON, etc.

Interpreting data: You will need the ability to critically analyze and interpret data through techniques like statistical analysis, data visualization, and identifying patterns and trends using BI tools like BigQuery, PowerBI and Tableau.

Deriving insights: Use analytical skills to draw meaningful conclusions and insights from data that can inform decision-making processes. This skill is both art and science and ofter require validation from stakeholders and subject matter experts.

Communicating findings: Presenting data and findings clearly and effectively through reports, visualizations, or other means, tailoring the communication to the intended audience. Data storytelling is an emerging skill but an important one where cognitive science meets the three-part drama structure.

Thinking critically: Applying critical thinking skills to evaluate the quality, relevance, and limitations of data, as well as the implications of the findings. Learn about metacognition and be aware of your own biases (confirmation bias). Be empathetic to understand who the data impacts.

Using data tools: Having familiarity with data tools and technologies, such as spreadsheets, databases, data visualization software, and statistical programming languages. There are many tools to choose from and when getting started, use one that you already know or one that has a large online community where you can get help. Go open source when you can and pay only if you have to.

Data Fluency

Data fluency goes beyond just data literacy by emphasizing the application and use of data skills.

Understanding Data Concepts: Have a conceptual grasp of statistical methods, data structures, data quality issues, and analytical techniques.

Interpreting Data: Explore, analyze, and find meaningful patterns in data through techniques like visualization, statistical modeling, and data mining.

Applying Domain Knowledge: Combine data skills with subject matter expertise to ask relevant questions and apply insights appropriately within a specific context.

Data Storytelling: Communicating findings from data clearly and compellingly through reports, visualizations, and narratives tailored to the audience. Learn to profile you audience for maximum effect.

Data Ethics: Recognizing ethical issues around data privacy, security, bias, and the appropriate use of data.

Using Data Tools: Proficiency with spreadsheets, query languages, statistical software, and other data tools to work with and manipulate data effectively. SQL, Python, and R are all good tools.

Data-Driven Decision Making: Use data-based evidence and insights to develop informed strategies, streamline processes, and make better business decisions.

Beyond Data Fluency

Taking to even greater heights you can go for mastery and leadership after mastering data fluency.

Advanced analytic methods using machine learning and deep learning
Data strategy and governance to ensure data quality, security and compliance
Product development to package tools to create accessible, usable, useful, and reliable applications
Data architecture and engineering to build scalable robust data infrastructures to support apps and workloads
Domain expertise across adjacent sectors as well as upstream and downstream sectors
Data innovation and research to help reduce computation costs and new algorithms such as liquid neural networks for temporal data processing and algorithms for non-linear systems
Data leadership to drive organisational and cultural change in the adoption of AI and data science-related technologies

Learning to Code

While Python has become one of the most popular programming languages for AI and machine learning due to its simplicity, readability, and vast ecosystem of libraries and frameworks. Other languages like R, Java, C++, and more recently Julia, are also used for AI and machine learning applications. Python and R are interpreted languages while Java and C++ code need to be compiled before running.

Interpreted code is executed line by line without having to compile it to machine language making it a more interactive experience. And it's so easy to write and read code in a Jupyter notebook. I'm not a developer and I've programmed in many languages including Assembly, Pascal, Fortran, C#, C, C++, JavaScript, and Python. The thinking behind the language are very similar (my applications don't require me to manage memory manually) so I like to keep it simple if speed is not an issue. Python is easy to learn and easy to read.

And there's no need to be of developer calibre. You just need to know how to think like a programmer so don't memorize all that syntax. AI can do that for you. But you should know how to read code. Can you spot a lambda function or a list comprehension? Remember, AI code is far from perfect but it's getting better every day. And besides, the code you write for machine learning tasks in Jupyter notebook are short meaning less than twenty lines of code. Don't let the fear of coding stop you from learning how to build models.

Summary

We all have to start from somewhere and AI research and apps that use it are advancing rapidly, thanks in large part to AI itself. And the growth looks now to be logarithmic almost like a near right-angle hockey stick. And who knows what disruptive breakthroughs it'll make in life science, mathematics, material sciences, energy, and quantum computing. We owe it to ourselves to learn as much about AI as we are able and to learn more about ourselves at the same time. AI poses an personal existential crisis for each of us and we have to deal with what's ours in our own way. Continue to learn and teach if you can. I plan on doing both.