Published Nov 16, 2020 by
Intro
This quick article shows you how simply and easily you can parse a website text and count the number of words.
I was recently watching Kyle Robinson Young's Youtube video on How To Make Chrome Extensions and found this useful gem.
Code + Explanation
Here's the full code:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<meta http-equiv="X-UA-Compatible" content="ie=edge" />
<title>Static Template</title>
<script>
document.addEventListener("DOMContentLoaded", function () {
const re = new RegExp("ideas", "gi");
const matched_word_count = document.documentElement.innerText.match(re);
console.log(matched_word_count.length); // 2
});
</script>
</head>
<body>
<h1>What am I upto?</h1>
<p>
I have some very abitious product ideas. User experience and making a
meaningful difference in our users lives are at the core of all these
product ideas. I can't wait to see them come to life!
</p>
<p>
I started my journey in product development as a designer initially. Then my interest split between user experience and web development. What I love about programming is the vastness of this ocean. There is so much explore,
discover, learn, and contribute!
</p>
</body>
</html>
Explanation:
Our main interest will be in line 11. We introduce a constant matched_word_count
that will store the array result from innerText.match(re)
. Here, re
is the regex that will consist of the pattern we are looking to match in the document.documentElement
that consists of the text string of the webpage.
//line 11
const matched_word_count = document.documentElement.innerText.match(re);
The array returned in matched_word_count
contains all the matched words with the regex on line 10.
//line 10
const re = new RegExp("ideas", "gi");
About the regex:
- It's essential looking for the string "ideas"
- The
gi
modifier is used to do a case insensitive search of all occurrences of a regular expression in a string - source: w3schools.com
We then look at the array's length on line 12 to count the number of words found.
console.log(matched_word_count.length); // 2
I'll be covering more about the new RegExp
in a future post.
A newbie mistake I made when running this code. I didn't wrap my code with the DOMContentLoaded
event listener. This caused our JS code to run before the DOM was fully parsed and no match was returned. Make sure to wrap your code with document.addEventListener("DOMContentLoaded", function () { ... });
to avoid this mistake.
Some Use Case Ideas:
- You need to perform number of words in a given web page
- Simple search to check if a particular word appears on the page
Up next:
- I'll do a write up exploring
new RegEx()