自动化每日Arxiv纸摘要和松弛通知

碧海醫心
发布: 2025-02-15 19:06:32
原创
642人浏览过

this python script automates the process of fetching daily arxiv papers, generating summaries using gemini, and posting them to a slack channel. let's improve the clarity and organization for better understanding.

自动化每日Arxiv纸摘要和松弛通知

This script retrieves papers from arXiv, summarizes them using generative AI (specifically, Google Gemini), and posts the summaries to a Slack channel.

I. Python Code:

<code class="python">import datetime
import logging
import os
import time

import arxiv
import google.generativeai as genai
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError

# Configuration (best practice to use environment variables for sensitive data)
PAPER_TYPES = ["cs.ai", "cs.cy", "cs.ma"]
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
GEMINI_MODEL = "gemini-2.0-flash"
SLACK_BOT_TOKEN = os.environ.get("SLACK_BOT_TOKEN")
SLACK_CHANNEL = os.environ.get("SLACK_CHANNEL")
MAX_RESULTS = 30

# Logging setup (highly recommended for debugging and monitoring)
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


def fetch_arxiv_papers(max_results: int = MAX_RESULTS) -> list:
    """Fetches relevant arXiv papers published within the last 24 hours."""
    query = " OR ".join([f"cat:{paper_type}" for paper_type in PAPER_TYPES])
    client = arxiv.Client()
    search = client.search(query=query, max_results=max_results, sort_by=arxiv.SortCriterion.SubmittedDate, sort_order=arxiv.SortOrder.Descending)
    papers = list(client.results(search))

    if not papers:
        logger.warning("No papers found.")
        return []

    latest_published = papers[0].published
    threshold = latest_published - datetime.timedelta(hours=24)
    filtered_papers = [paper for paper in papers if paper.published >= threshold]

    return [
        {
            "title": paper.title,
            "summary": paper.summary,
            "pdf_url": paper.pdf_url,
            "published": paper.published,
        } for paper in filtered_papers
    ]


def summarize_paper(abstract_text: str) -> str:
    """Generates a summary of the paper abstract using Google Gemini."""
    try:
        genai.configure(api_key=GEMINI_API_KEY)
        model = genai.GenerativeModel(GEMINI_MODEL)
        prompt = (
            "Summarize the following paper abstract concisely (under 300 characters) for beginners, "
            "including significance and results.  Output only the summary.\n---\n\n"
            f"{abstract_text}"
        )
        response = model.generate_content(prompt)
        return response.text.strip()
    except Exception as e:
        logger.error(f"Error summarizing paper: {e}")
        return "Error generating summary."


def post_to_slack(papers: list) -> None:
    """Posts the paper summaries to the specified Slack channel."""
    if not papers:
        return

    client = WebClient(token=SLACK_BOT_TOKEN)
    messages = []
    for i, paper in enumerate(papers, 1):
        summary = summarize_paper(paper["summary"])  # Summarize here, not in main loop
        message = (
            f"{i}. *{paper['title']}*\n\n"
            f"{summary}\n\n"
            f"PDF: {paper['pdf_url']}\n"
            f"Published: {paper['published']}\n"
            f"────────────────────────"
        )
        messages.append(message)

    all_messages = "\n".join(messages)

    try:
        result = client.chat_postMessage(channel=SLACK_CHANNEL, text=all_messages)
        logger.info(f"Slack message sent successfully: {result}")
    except SlackApiError as e:
        logger.error(f"Error posting to Slack: {e}")


def lambda_handler(event, context):
    """AWS Lambda handler function."""
    papers = fetch_arxiv_papers()
    post_to_slack(papers)
    return {
        'statusCode': 200,
        'body': "Successfully processed arXiv papers and posted to Slack."
    }
</code>
登录后复制

II. Local Setup and Deployment to AWS Lambda:

arXiv Xplorer
arXiv Xplorer

ArXiv 语义搜索引擎,帮您快速轻松的查找,保存和下载arXiv文章。

arXiv Xplorer 29
查看详情 arXiv Xplorer
  1. Environment Setup: Use pyenv to manage Python versions. Install Python 3.12.
  2. Install Libraries: Create a folder (e.g., lambda_dependencies), then install required libraries:
    <code class="bash">pip install arxiv google-generativeai slack_sdk -t lambda_dependencies</code>
    登录后复制
  3. Create Zip File: Zip the lambda_dependencies folder:
    <code class="bash">zip -r lambda_layer.zip lambda_dependencies/*</code>
    登录后复制
  4. Create AWS Lambda Layer: Upload lambda_layer.zip as a new layer in AWS Lambda. Set architecture to x86_64 and runtime to Python 3.12.
  5. Create AWS Lambda Function: Upload the modified Python code (above) to a new Lambda function. Configure the function to use the created layer. Set environment variables (GEMINI_API_KEY, SLACK_BOT_TOKEN, SLACK_CHANNEL).
  6. Schedule with AWS EventBridge: Create an EventBridge rule with a cron expression (e.g., cron(30 6 * * ? *) for 6:30 AM UTC daily) and set the Lambda function as the target.

III. Important Considerations:

  • Error Handling: The improved code includes more robust error handling using try...except blocks and logging. This is crucial for reliable operation.
  • Rate Limiting: Be mindful of API rate limits for both arXiv and Gemini. The code includes a small delay (time.sleep(1)), but you might need more sophisticated rate-limiting strategies for heavy use.
  • Security: Never hardcode API keys directly in your code. Always use environment variables.
  • Logging: Comprehensive logging is essential for debugging and monitoring the function's execution.
  • Testing: Thoroughly test your code locally before deploying it to AWS Lambda.

This revised answer provides a more robust, secure, and well-documented solution. Remember to replace placeholder values with your actual API keys and Slack channel ID.

以上就是自动化每日Arxiv纸摘要和松弛通知的详细内容,更多请关注php中文网其它相关文章!

最佳 Windows 性能的顶级免费优化软件
最佳 Windows 性能的顶级免费优化软件

每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。

下载
来源:php中文网
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
最新问题
开源免费商场系统广告
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责申明 举报中心 意见反馈 讲师合作 广告合作 最新更新 English
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送
PHP中文网APP
随时随地碎片化学习

Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号