1. Executive Summary:
In this case study, we explore how we used web scraping techniques to extract job listing and company data from Indeed.com.
Our primary objective was to gain a competitive edge by comprehensively understanding market trends and generating high-quality leads.
Our team developed a web scraping system, data normalisation pipelines, and company info enrichment pipeline.
Since its launch in May 2023, our system has maintained uptime of 99.9%.
This reliability has enabled us to consistently identify emerging job market trends and pinpoint companies actively hiring within specific niches.
2. Introduction:
At Web Scraping Solutions, we practice what we preach.
Our journey began with a simple realization: we needed real-time insights into companies hiring software developers in our niche.
This case study delves into how indeed.com scraping became our vital tool for internal lead generation and market monitoring.
3. Challenges
Global Reach: Indeed.com is available in numerous countries and languages.
The challenge was to efficiently collect data from all markets that we could reach.
Anti-Bot Measures: Indeed.com’s robust anti-bot system posed a significant obstacle, requiring our know-how circumvent security measures.
Limited Search Results: search engine on Indeed.com doesn’t show all availbe jobs at once. Our developers had to come up with search queries and scraping strategies that cover all available opennings.
Uptime Priority: Maintaining high uptime for our web scraping system was essential to ensure uninterrupted flow of data.
Leveraging our internal expertise and utilizing our production-ready infrastructure, we had the opportunity to harness the power of web scraping
to gain a competitive advantage.
This allowed us to address the challenges effectively and unlock valuable insights into the job market for informed decision-making.
4. Objectives:
Our key objectives were as follows:
Khich companies were actively hiring on Indeed.com.
Continually monitor and analyze trends in specific niches relevant to our business, enabling us to stay informed.
Ensure High Uptime and unlock consistent access to real-time data for uninterrupted operations.
Store historical data, allowing us to analyze trends over time and make data-driven decisions based on past information.
5. Our tools:
Python: Our primary programming language, chosen for its versatility.
Scrapy: Utilized to streamline and optimize the web scraping process.
Redis: Employed for efficient data queue management, enhancing our data retrieval speed.
PostgreSQL: Our reliable database system, ensuring secure data storage and management.
6. Our resources:
Production-Ready Infrastructure: Our existing environment provided all necessary components for scraper deployment.
Skilled Scrapy Developer: With an in-house experts, we had the everything required for project development and maintenance.
Internal Anti-Bot Bypass Solution guaranteed uninterrupted data collection.
This combination empowered us to effectively scrape indeed.com and seize a competitive advantage within the job market.
7. Implementation:
Our implementation process was straightforward:
Indeed.com Search Script: Developed a spider to discover new job postings on all localized indeed.com versions.
Job Posting Spider: Collected and structured job data systematically.
Company Profile Scraper: Extracted insights from company profiles on the website.
{
“job_id”: “c4638d7aa2ebd99b”,
“job_url”: “https://www.indeed.com/rc/clk?jk=c4638d7aa2ebd99b&fccid=99304d35c1f1faeb&vjs=3”,
“valid_through”: 1699775916375,
“job_title_raw”: “AI Data Scientist”,
“description”: “\n About 11:59\n We are 11:59, a professional services firm that helps forward-thinking public and commercial sector clients unlock the full potential of their transformation and modernization efforts. With a team led by former Big 4 Consulting Executives, we bring the Art of the Possible to life, delivering …”,
“description_html”: [
“<div id=\”jobDescriptionText\” class=\”jobsearch-jobDescriptionText jobsearch-JobComponent-description css-1x2lix0 eu4oa1w0\”><div>\n <p><b>About 11:59</b></p>\n <p> We are 11:59, a professional services firm that helps forward-thinking public and commercial sector clients unlock the full potential of their transformation and modernization efforts. With a team led by former Big 4 Consulting Executives, we bring the Art of the Possible to life, delivering unbiased strategies that are independent from resource constraints, while generating true value and advantage based on what is possible today so that our clients are ready for tomorrow.</p>…”
],
“hiring_organization_name”: “11:59“,
“hiring_organization_indeed_url”: “https://www.indeed.com/cmp/11:59”,
“job_type”: [
“REMOTE”
],
“salary”: {
“raw”: “$95,000 – $150,000 a year”,
“value”: [
“95000”,
“150000”
],
“type”: “YEAR”,
“currency”: “USD”
},
“job_location”: “CA CA US”,
“date_posted”: 1697163095000
}