rss.sh, A RSS Reader
Introduction
https://github.com/jli05/rss.sh is a simple RSS reader that works with RSS 2.0 and Atom feed format.
The user pipes in a list of RSS feed URLs, one URL on each line to rss.sh
, which for each URL, calls rss_items.sh
to fetch the content and extract the RSS items therein.
When all RSS entries are extracted, the output is synced to AWS S3 if an S3 bucket is specified, and published via AWS SNS if the SNS topic ARN is specified.
How it is run
Rather than running as a Docker for AWS Lambda, we put the code on an EC2 instance. Every morning the EC2 instance is started by an AWS Lambda function (upon being evoked by an AWS EventBridge event) and a cron job is run on the instance. The crontab spec on the EC2 instance:
0-30 06 * * * flock -n /tmp/rss -c rss.sh/cron.sh
The AWS EventBridge event is set to delivered at 06:00 UTC+0 – usually it gets delivered on time, but for <1% of time it may take >10 mins delay to get delivered. Hence during the 30-minute period we try to run cron.sh
. flock
locks a file prior to running a command to ensure that at any time only one session of cron.sh
is running.
The cron.sh
simply runs rss.sh
, and shuts down the EC2 instance when done.
#!/bin/sh
# the time two days ago
DATE=$(python3 -c "from pandas import Timestamp, Timedelta; print((Timestamp.now() - Timedelta('2D')).strftime('%Y-%m-%dT%H:%M:%S'))")
#S3_URI=""
#SNS_TOPIC=""
PROJDIR="$(dirname $0)"
cat "$PROJDIR"/urls.txt | NPROC=$(nproc) WGET="wget -q -O -" SINCE="$DATE" S3_URI="$S3_URI" SNS_TOPIC="$SNS_TOPIC" "$PROJDIR"/rss.sh
doas poweroff
The DevOp
IAM Policies
In the cloud, we need to launch an EC2 instance, which needs to be able to sync to S3 and publish to SNS;
We also need to set up an AWS Lambda function, which upon receiving an EventBridge event, is able to start the EC2 instance and write to the AWS CloudWatch Logs.
So the EC2 instance will assume IAM Role “rss-ec2-instance”, which incorporates three IAM Policies “s3-rss-read”, “s3-rss-write” and “sns-publish-rss”;
the Lambda function will assume IAM Role “start-rss-ec2-instance”, which incorporates IAM Policies “start-rss-ec2-instance” and “logs-write”.
Below are the JSON definitions of the abovementioned IAM Policies.
s3-rss-read:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket>"
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::<bucket>/*"
}
]
}
s3-rss-write:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::<bucket>/*"
}
]
}
sns-publish-rss:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPublishToRSS",
"Effect": "Allow",
"Action": "sns:Publish",
"Resource": "<topic_arn>"
}
]
}
start-rss-ec2-instance:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:StartInstances"
],
"Resource": "*",
"Condition": {
"StringLike": {
"aws:ResourceTag/Name": "<name of the EC2 instance>"
}
}
}
]
}
logs-write:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "arn:aws:logs:*:*:*"
}
]
}
The Lambda function
The Lambda function “start-rss”, bearing IAM Role “start-rss-ec2-instance”, is defined as:
import boto3
region = 'eu-west-1'
instances = ['i-xxxxxxxxxxxxxxxxx']
ec2 = boto3.client('ec2', region_name=region)
import json
def lambda_handler(event, context):
ec2.start_instances(InstanceIds=instances)
return {
'statusCode': 200,
'body': json.dumps('started your instances: '
+ str(instances))
}
It is triggered by EventBridge event delivered every 06:00 UTC+0 according to cron expression “cron(0 06 * * ? *)”.