Advanced Certificate in Applied Data Analytics Module 4: Text Mining with Web Scraping and Natural Language Processing Techniques
- Analytics & Tech
- Innovation & Business Improvement
This programme is conducted online.
3 days
Weeknights (6.30pm - 10:45pm)
Saturday (9am – 6.15pm)
Who Should Attend
- Data analysts and engineers who want to gain insights in end-to-end data pipelines
- Software developers and engineers looking to expand their knowledge of data processing workflows
- Cloud practitioners planning to use Cloud services for data pipelines and analytics
- Project managers or team leads who need an understanding of data pipeline to better communicate with technical teams and stakeholders
- Aspiring data professions seeking a foundational understanding of modern data pipelines and cloud technologies
PREREQUISITES
- Experience in Python programming (equivalent to that attained in Professional Certificate in Python Programming programme)
SYSTEM REQUIREMENTS
- Functional Laptop: (1) CPU must be of at least intel core I3, (2) GPU must have an integrated graphics card and (3) RAM must be of at least 4GB
Overview
Participants will learn this versatile technique that e-commerce firms famously rely on to trawl data from their competitors in search of competitive advantages. As much of this data is presented as unstructured text, gaining a rudimentary grasp of modern text processing techniques is an essential skill in subsequently making sense of the scraped data.
This course first presents HTML as a format for structuring and embedding semantic metadata into text documents, which can be exploited to yield additional data insights when analysing webpages. Participants will learn how to use the Beautiful Soup 4 package to perform web scraping to collect raw text data, and then preprocess the scraped data and perform textual analysis through one of the best text processing facilities of any modern programming language—the Python NLTK package. Finally, participants will be introduced to the fundamentals of Large Language Models (LLMs)—their architecture, key differences from traditional NLP models, and real-world applications. Through hands-on exercises, participants will learn to use pre-trained LLMs for tasks like text generation and summarisation, while also exploring ethical considerations, limitations, and fine-tuning techniques.
This module is part of a sequential programme and is not available on a standalone basis.
Learning Objectives
- Extract and process data from webpages and APIs using libraries like Beautiful Soup, requests, and Pandas
- Preprocess and analyse text data using Natural Language Processing (NLP) techniques with tools like NLTK
- Organise and process large text corpora to develop predictive models for extracting insights and making predictions
- Understand the fundamentals of Large Language Models (LLMs) and apply them to tasks like text generation, summarisation, and fine-tuning for specific use cases
Topic/Structure
- Introduction to HTML and Web Scraping
- Data Collection using APIs
- Natural Language Processing with the NLTK package
- Introduction to LLMs
Assessment
- Individual assessment
CERTIFICATION
Upon meeting the attendance and assessment criteria, participants will be awarded a digital certificate for participating in each module. Please refer to our course policies to view the attendance and assessment criteria.
Upon completion of all modules required for this programme within a maximum duration of 3 years, participants will be awarded a digital certificate.
Calculate Programme Fee
Fee Table
COMPANY-SPONSORED | |||
PARTICIPANT PROFILE |
SELF-SPONSORED |
SME |
NON-SME |
Singapore Citizen < 40 years old Permanent Resident LTVP+
|
$784.80 (After SSG Funding 70%) |
$304.80 (After SSG Funding 70% |
$784.80 (After SSG Funding 70%) |
Singapore Citizen ≥ 40 years old |
$304.80 (After SSG Funding 70% |
$304.80 (After SSG Funding 70% |
$304.80 (After SSG Funding 70% |
International Participant |
$2,616 (No Funding) |
$2,616 (No Funding) |
$2,616 (No Funding) |
All prices include 9% GST
Please note that the programme fees are subject to change without prior notice.
Post Secondary Education Account (PSEA)
PSEA can be utilised for subsidised programmes eligible for SkillsFuture Credit support. Click here to find out more.
Self Sponsored
SkillsFuture Credit
Singapore Citizens aged 25 and above may use their SkillsFuture Credits to pay for the course fees. The credits may be used on top of existing course fee funding.
This is only applicable to self-sponsored participants. Application to utilise SkillsFuture Credits can be submitted when making payment for the course via the SMU Academy TMS Portal, and can only be made within 60 days of course start date.
Please click here for more information on the SkillsFuture Credit. For help in submitting an SFC claim, you may wish to refer to our step-by-step guide on claiming SkillsFuture Credits (Individual).Workfare Skills Support Scheme
From 1 July 2023, the Workfare Skills Support (WSS) scheme has been enhanced. Please click here for more details.
Company Sponsored
Enhanced Training Support for SMEs (ETSS)
- Organisation must be registered or incorporated in Singapore
- Employment size of not more than 200 or with annual sales turnover of not more than $100 million
- Trainees must be hired in accordance with the Employment Act and fully sponsored by their employers for the course
- Trainees must be Singapore Citizens or Singapore Permanent Residents
- Trainees must not be a full-time national serviceman
- Trainees are eligible for ETSS funding only if their company's SME status is approved prior to the course commencement date. To verify your SME's status, please click here.
Please click here for more information on ETSS.
Absentee Payroll
Companies who sponsor their employees for the course may apply for Absentee Payroll here. For more information, please refer to:
AP Guide (Non-SME Companies)
Declaration Guide (SME Companies)
Loading schedule information...