
嗨,我需要精通网络抓取的人的帮助,因为我是编程新手。我的任务是从工作链接中提取“关于客户”部分。我的脚本仅提取一个“关于客户端”,但对于其他链接,它不会执行此操作并引发错误。问题是有一个 xml 文件链接,我从中提取作业链接,当这些链接打开时,html 代码位于我使用 selenium 的 java 脚本下。我已经尝试了一切,但没有得到解决方案。`def extract_client_info(job_url):
client_info = {'关于客户': np.nan}
if job_url and job_url != "N/A":
try:
# Open the job URL
driver.get(job_url)
# Wait for the page to load
WebDriverWait(driver, 30).until(
EC.presence_of_element_located((By.CSS_SELECTOR, '.cfe-about-client-v2'))
)
# Extract specific details
about_client_section = driver.find_element(By.CSS_SELECTOR, '.cfe-about-client-v2')
client_location = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-location"]').text.strip()
client_job_posting_stats = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]').text.strip() if about_client_section.find_elements(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]') else "N/A"
client_company_profile = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-company-profile"]').text.strip()
# Combine extracted information
client_info['About the Client'] = (
f"Location: {client_location}\n"
f"Job Posting Stats: {client_job_posting_stats}\n"
f"Company Profile: {client_company_profile}"
)
except Exception as e:
print(f"Failed to get 'About the Client' for {job_url}: {e}")
client_info['About the Client'] = np.nan
finally:
# Wait for 10 seconds before making the next request
time.sleep(10)
return client_info`
以上就是需要帮助!的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号