Check availability of external links in your web pages
Check availability of external links in your web pages
Original post from linux.xvx.cz
When you create your web pages in most cases you are using the images, external links, videos which may not be a static part of the web page itself, but it’s stored externally.
At the time you wrote your shiny page you probably checked all these external dependencies to be sure it’s working to make your readers happy, because nobody likes to see errors like this:
Now the page is working fine with all external dependencies because I checked it properly - but what about in a few months / years / … ?
Web pages / images / videos may disappear from the Internet especially when you can not control them and then it’s handy from time to time to check your web pages if all the external links are still alive.
There are many tools which you may install to your PC and check the “validity” of your web pages instead of manually clicking the links.
I would like to share how I’m periodically checking my documents / pages using the GitHub Actions.
Here is the GitHub Action I wrote for this purpose: My Broken Link Checker
In short you can simply create a git repository in GitHub and store there the file defining which URLs should be checked/verified:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
git clone git@github.com:ruzickap/check_urls.git
cd check_urls || true
mkdir -p .github/workflows
cat > .github/workflows/periodic-broken-link-checks.yml << \EOF
name: periodic-broken-link-checks
on:
schedule:
- cron: '0 0 * * *'
pull_request:
types: [opened, synchronize]
paths:
- .github/workflows/periodic-broken-link-checks.yml
push:
branches:
- master
paths:
- .github/workflows/periodic-broken-link-checks.yml
jobs:
broken-link-checker:
runs-on: ubuntu-latest
steps:
- name: Broken link checker
env:
INPUT_URL: https://google.com
EXCLUDE: |
linkedin.com
localhost
myexample.dev
mylabs.dev
run: |
export INPUT_CMD_PARAMS="--one-page-only --verbose --buffer-size=8192 --concurrency=10 --exclude=($( echo ${EXCLUDE} | tr ' ' '|' ))"
wget -qO- https://raw.githubusercontent.com/ruzickap/action-my-broken-link-checker/v1/entrypoint.sh | bash
EOF
git add .
git commit -m "Add periodic-broken-link-checks"
git push
The code above will store the GitHub Action Workflow file into the repository and start checking the https://google.com every midnight (UTC).
This is the screencast where you can see it all in action:
This URL checker script is based on muffet and you can set its parameters by changing the INPUT_CMD_PARAMS variable.
Feel free to look at more details here: https://github.com/ruzickap/action-my-broken-link-checker
I hope this may help you to keep the quality of the web pages by finding the external link errors quickly.
Enjoy :-)
