Job Summary:
We are seeking a skilled Web Scraping Developer to design and develop scalable and efficient web scraping applications to extract data from various websites. The ideal candidate should be proficient in PHP/Python, web scraping frameworks (Scrapy, Beautiful Soup), MySQL, HTML/CSS and Javascript. This role requires strong analytical thinking, problem-solving abilities, and a proactive approach to tackling challenges.
Key Responsibilities:
1. Requirement Analyzing:
● Understand the requirement from the Project Manager via the Project Management tool.
● Analyze tasks and discuss them with the team.
● Record all details in the Project Management Tool, including time estimation for completion.
2. Data Extraction:
● Research and identify websites, APIs, and data sources for extraction.
● Analyze website structures (HTML, CSS, JavaScript) to determine the best extraction approach.
● Develop scripts using PHP, Python, or JavaScript to extract data.
● Implement techniques to bypass anti-scraping measures (e.g., CAPTCHAs, rate limiting, IP blocking).
● Store extracted data in structured formats (JSON, CSV).
3. Data Processing & Storage:
● Clean and preprocess extracted data to ensure accuracy and completeness.
● Transform data into required formats (JSON, CSV).
● Handle missing data using imputation techniques or data cleaning methods.
● Design a database schema to store extracted data securely.
● Implement data validation and error-handling mechanisms.
4. Troubleshoot Issues:
● Identify and resolve issues related to web scraping applications.
● Ensure data extraction remains functional despite website changes.