Creating a tool to increase web development productivity at work: FixHTML
I work in the digital marketing department of a large company. To create new web pages, we use a content operations management tool called Gather Content. The content writers create their work in this system, and the my team then transfers this into a content management system (dotCMS) for publication to the website.
Within Gather Content, there is a button that exports an HTML version of the content to the clipboard. Unfortunately however, this exported code is full of extraneous tags, such as
<p> tags, meaningless in-line styles, HTML comments, etc. My team members and I then spent considerable time removing these tags, so that the code on the final web pages met our established best practice standards. This drastically slowed down the process of publishing web content.
processText. This function takes two arguments: the HTML code as copied from Gather Content (with all of the extra tags) and the domain name for the site that is being built. The function takes a domain argument so that full URLs can be replaced with relative URLs.
Within this function, an array of the tags to be removed is created, with each tag is represented as a string or a regular expression. Regular expressions provide a way to remove not just the tags, but also all of the text between the tags (as is necessary for HTML comments and inline styles).
Upon iterating through this array, for each element, a new regular expression object is created, using the
g (global) flag to indicate that every instance of each expression should be matched. Finally, the
replace function replaces each match with an empty string, thereby deleting it from the original text.
I embedded this function in a simple HTML page, with text areas in which to paste the input text, a button to execute the function, and simple instructions so that I could share the project with my team (and not forget them myself). It ended up vastly decreasing the amount of time I needed to spend on building web pages. It also freed up mental space so that I could focus on any other edits that were required.
My solution deletes all
<br> tags, because there are so unnecessary instances of these tags in the code exported from Gather Content. However, occasionally these line breaks are intentional. In these situations, I must replace these manually.
Also, because this is primarily used by me, I have not taken the time to properly design this tool. Instead, it is a “quick and dirty” solution that has a single use case. It could be interesting to update the styling and also to add more functionality, such as letting users select which tags to delete.