Friday, 18 October 2013

Web Content Filtering with DansGuardian

 "Learn how to setup a parental blocker to protect yourself and your network from potentially harmful sites."

Requirements:

  • A Linux distro
  • Internet Access
  • Administrative Privileges*

*If you are not logged in as root, you may need to type "sudo" prior to any command - this runs the command as the administrator. A password will need to be provided the first time it's used

DansGuardian:

DansGuardian is software that has "smart" web content filtering. It looks at all of the text in a web page and has different "weights" for specific words. For example: "Breast Cancer". The word "Breast" may have a weight of (+5), and "Cancer" may have a weight of (-10). Combined, the weight is (-5). If the weight of the page exceeds a particular amount, the page is blocked. Together with the proxy server Squid, we can direct ALL HTTP traffic to Dans Guardian for filtering. Normally it would be easy for the user to just reconfigure the proxy settings in their web browser to work around it, but we'll show you how that wouldn't have any effect on our filter.

Step 1: Install "dansguardian"

You can use your package manager to find, download, and install "dansguardian". Using "apt-get" or "yum" would work for most distros. Dansguardian should be in most major repositories (Ubuntu contains it for sure).
If you would like to download the package yourself, you can download it from the DansGuardian download page.

Step 2: Install "squid"

Just as the previous step, look for "squid" in your package manager and download/install it. It should definitely be in your repositories. Otherwise you can download it from their download page.

Step 3: Configure Squid

Next we need to make sure squid is configured properly. We want squid to run "transparently" in the background, so we need to modify its configuration file. To do so, run the following command in the terminal:

 username@localhost:$ sudo gedit /etc/squid/squid.conf

Now search for the line that has "http_port" in it. Ther will be a few in the commented sections, but we're looking for the one that has its default port listed. It should be something like "http_port 3128".
Once you have found this line, add the word "transparent" right after it:
"http_port 3128 transparent"
Make sure you save, then exit. This is all we needed to do to configure Squid.

Step 4: Configure DansGuardian

DansGuardian allows us to make precise configurations in order to make the slightest changes to details according to our needs. We can change a whole ton of things in the configuration file. Here I will take you through the simplest ways in order to just get it running. Later, I'll show you how to change the "Access Denial" page to customize it in anyway you wish (optional, of course). But for now, we need to open up the configuration file by typing the following command:

username@localhost:$ sudo gedit /etc/dansguardian/dansguardian.conf

The first thing we want to do is find the line that tells DansGuardian which port to look for our Squid proxy server that we had just setup. We need to find the line that says "proxyport = ". By default, it may already have the proper port "3128". Make sure it does, and also make sure our "filterport=" is set to "8080": 

"proxyport = 3128"
"filterport = 8080"

The last line we need to edit is the "UNCONFIGURED" line. This line is up near the top of the file, and says something like: "UNCONFIGURED - Please remove this line after configuration". This is DansGuardian's way of knowing that we have setup the configuration file and are ready for it to start using our configurations. You can either remove this line entirely, or comment it out by placing a "#" in front of it.

Once completed, save the file and exit. 

Step 5: Restart Squid and DansGuardian

You can restart Squid and DansGuardian by rebooting your computer, or typing the following lines, in order:

 username@localhost:$ sudo /etc/init.d/squid restart

username@localhost:$ sudo dansguardian -q

username@localhost:$ sudo dansguardian

Now, if you try to go to our "BAD" test page, you should still get through. This is because we need to setup our web browser to follow the proxy server settings.

Step 6: Setup Web Browser's Proxy Server Settings

Each web browser has their settings in a different location - but most are within a "Preferences" or "Options" menu. Locate the settings window and change the settings to the following:

Manual Proxy Settings
HTTP Proxy - "localhost:8080"

This will tell the browser to use the proxy server instead of directly connecting to the internet. Once completed, close out and try accessing this page again. At this point, you should see a denial page from DansGuardian. The question is, if it's that easy to change proxy settings, why can't they just change it back? That's what the next step will take care of...

Step 7: Direct All HTTP Traffic Through Squid

We really want to make sure that all of the HTTP (port 8080) traffic is sent through Squid, and therefore, through DansGuardian. To test this, we can run the following command:

username@localhost:$ iptables -t nat -A OUTPUT -p tcp -m owner ! --uid-owner proxy --dport 80 -j REDIRECT --to-port 8080


This tells the system that for ANY HTTP traffic (any "port 80", including port "8080") should be redirected to port "8080", which is where Squid resides. This way, whether the browsers are set to listen to the proxy, or are told to connect directly to the internet, it will ALWAYS be redirected to our proxy. This is technically modifying iptables, so if you have a separate firewall installed, you may need to configure it. Ubuntu users from a fresh-install (no firewalls added) - this will work just fine as is.

Now you can reconfigure your browser to NOT listen to the proxy (connect directly to the internet) - and try accessng this page again. You should still see the rejection page.

This command will stay in effect until the system is rebooted. So to make it always run upon startup, we need to place an executable script in "/etc/init.d/" with that command in it. Let's call it "tproxy":

username@localhost:$ sudo gedit /etc/init.d/tproxy

This will create that file and bring up gedit. So just paste the above command ("iptables -t nat -A OUTPUT -p tcp -m owner ! --uid-owner proxy --dport 80 -j REDIRECT --to-port 8080") into the file. Save and exit gedit.

Now we need to make it executable. You can do that by running this command:

username@localhost:$ sudo chmod a+x /etc/init.d/tproxy

Lastly, we need to tell the system to run this script at startup. Do this by running the command:

username@localhost:$ sudo update-rc.d tproxy defaults

Now the iptable redirection will occur at every starup. Because Squid and DansGuardian also run at startup, you will constantly have web filtering on your machine - reguardless of what other users may attempt to do. Of course, this is all based on the assumption that the other users don't have the root password!

Step 8: Customizing Rejection Page (Optional).

You can easily customize the rejection page of DansGuardian simply by replacing the one they have by default. The default location of this file is in the following (assuming English version was downloaded):

"/etc/dansguardian/languages/ukenglish/template.html"

You can replace it with something like my example page.


One neat thing about DansGuardian is you can place variable names within your HTML page, and when DansGuardian retrieves the HTML page, it replaces those variables with actual text. Let's take a look at what variables we have:

       

-URL-                                gives the URL the user was trying to access
-REASONGIVEN-            gives the "nice" reason (i.e. not quoting the banned phrase)
REASONLOGGED-         gives the reason that gets logged including full details
-USER-                              gives the username (if known)
-IP-                                     gives the originating IP address   
-HOST-                               gives the originating host name (if known)    
-RAWFILTERGROUP-     gives the group number
-FILTERGROUP-              gives the group name
-SERVERIP-                      gives the IP address on which the filter is running
-BYPASS-                          gives a URL which allows temporary bypass of denied page
-CATEGORIES-                gives the categories assigned to the banned content

Note that all of these "variables" have dashes on either side of them "-variable-". This tells DansGuardian that it's a variable and not plain text.
With these in hand, you can whip up a pretty slick-looking denail page if you know a little HTML. Otherwise you can use the default page, or my page and save it as "template.html" in the language directory. 

Step 9: View Log of Denials (Optional)

So you have all of this setup and everything, but how to we see who was denied what and when? Of course DansGuardian logs everything, and does a pretty good job of it too. And of course, you can specify where it writes the logs to in the configuration file.
By default, it keeps the log file here:

"/var/log/dansguardian/access.log"

To change this, open up the "dansguardian.conf" file as root and find where it says "loglocation = ", and specify where you want it. You even have different options for log file format! You can leave it as default, or you can change that as well (search for "logfileformat = ") and changing it to something like option 4. You can also change WHAT it logs. By default, it logs everything. This can be space-consuming, and make it harder to see what denials have occured. I changed my setting ("loglevel = ") to 1 - which is "just denied".

Once we have this updated, we need to reload the configuration files. We can do this by running the following command:

username@localhost:$ sudo dansguardian -r

 

Step 10: Allowing Blocked Sites/Denying Allowed Sites (Optional)

Ok so DansGuardian does a really good job of blocking sites, sometimes TOO good. We may want to access sites that are blocked, but for reasons that aren't that relavent to us.

DansGuardian has a bunch of various lists that we can use. Depending on what we want to do, there will be a list for it. Keep in mind that in order to modify all lists, we need to have root access (makes sense, huh?)

Here are the lists and what they are used for:

Banned Lists (/etc/dansguardian/lists/)

bannedextensionlist          denies any file with an extension in this list
bannediplist                       denies access to any IP address in this list
bannedmimetypelist          denies access to certain MIME types
bannedphraselist                denies access to a page that contains any phrase in this list
bannedregexpheaderlist     bands certain outgoing HTTP headers in list
bannedregexpurllist           bands regular expression URLs
bannedsitelist                     denies access to particular websites - includes some in blacklists folder
bannedurllist                     denies access to certain pages of a website - such as   
                                             "example.com/badpart/"    where        
                                               "example.com" would still be allowed

Exception Lists (/etc/dansguardian/lists/) 

exceptionextensionlist               allows any file with an extension in this list
exceptioniplist                           allows access to any IP address in this list
exceptionmimetypelist              allows access to certain MIME types
exceptionphraselist                    allows access to a page that contains any phrase in this list
exceptionregexpheaderlist         allows certain outgoing HTTP headers in list
exceptionregexpurllist               allows regular expression URLs
exceptionsitelist                         allows access to particular websites
exceptionurllist                          allows access to certain pages of a website - such as    
                                                   "bad.com/goodpart/"
                                                    where the rest of "bad.com" would still be blocked
                  

There are a handful of more lists in this directory that you may explore for yourself...............

   


 

 

 

 

 

 

 





     



          

                     


          



 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


No comments:

Post a Comment