I recently noticed there is a website copying my blog content and I started looking into it. I realized how easy it is to copy other people’s blog/website contents by using wordpress plugins. It’s completely automatic! Although I was really annoyed by this, I am really interested how it works. Also knowing how a auto blogging software work better might increase the chances that I can stop them.
Please remember this project is purely for study and research, do not use it for illegal activity!
I wrote a post about how to add post credit at the end of the content when someone try to copy and paste. It’s the simplest way and it might make it more annoying for human copier, but it can’t stop robots trying to copy your content.
This is how I am going to design the auto blogging robot script.
- sources of webpages
- grab webpage HTML
- find out where the desired content is by analysing HTML tag
- (optional) auto-rewrite post
- post in wordpress
Source of webpages
There are mainly 3 ways I can think of to get the appropriate source of webpages we want to copy.
- RSS feed of the websites we pre-defined
- URL of the webpages we want to copy
- third party webpages database categorized by keywords
I will go for 1 and 2 because they are easier and more accurate.