Yacy is a free software peer-to-peer distributed search engine program written in Java. You can run it on your desktop or a server. It crawls websites and makes a searchable index which is shared among other peers. Searches are done by looking at the local index while also querying other peers in the network for results. In theory and principle it's exactly what the Internet lacks and should have; a censorship-resistant unbiased all-finding all-knowing distributed search engine. In practice it is totally unusable as a general-purpose search engine due to few relevant search results presented in seemingly random order. However, it can be useful for researchers and more specialized use-cases.
|Developed by||Michael Peter Christen|
|Latest release||1.9 / 2016|
|OS||Java (Platform independent The GNU Operating System, Windows, Mac OS X, etc)GNU/Linux|
Features and usability[edit | edit source]
YaCy acts as a web server which provides a page which looks like a typical search-engine. It will by default be listening at 127.0.0.1:8090 and you can go there and immediately do your searches once it's installed.
You will very quickly notice some of it's deal-breaking flaws if you configure it to be your browsers default search engine and use it for all your searches.
The search-results are lacking and they appear to be ordered by
/dev/urandom. It simply fails at providing useful results. That's a big problem with a search engine.
Further, YaCy doesn't even provide any results half the time. It loves to just time out. This is very typical and almost a rule not an exception when a search query has more than two words. It may manage to produce some random results for a simple query for
Twice (none of which will be about that k-pop group) but a query for
who is the most talented Twice member and why is it Nayeon will make it's search thread crash and the search result page will be blank.
Resource-use is also a problem. Leave YaCy running for a while and sooner or later it will decide to peg one CPU core at 100% permanently. It will also, over time, use up all storage it has access to until there is no more unless you specifically configure it to not grow above a given size.
This is not the kind of review we like to write. The first revision of this page was written in 2007. We pointed out it's performance-problems and the fact that "it doesn't do a very good job at sorting the pages according to relevance". In 2019, 12 years later, we have no choice but to conclude that YaCy is a completely useless piece of software. And that's just sad because there is a rapidly increasing need for a uncensored unbiased peer to peer based search engine.
Specialized use-cases[edit | edit source]
"I do internet research in many areas. Primarily Alternative Energy, Organic Agriculture and Esoterica. What I love about YaCy is what you apparently consider one of its many flaws.
If I want to see something on one of my favorite subjects, Stirling Engines, Google always returns the same websites in the same order so I have to navigate through several pages of results and in actuality han't seen anything new on the subject from Google in some years. It's just too time consuming to bother with.
With YaCy, because results are random, or hand picked by other peers, I immediately found several new links chock full of interesting information about Stirling Engines on the first page of results. Another search sometime later had the same results. More new stuff I had never seen before searching page after page of the same old prioritized, commercial oriented Google results such as where to buy some cheap knock off Stirling engines on Amazon or Ebay.
With YaCy I found more of the kind of information I want to find. Projects and ideas from other Stirling Engine enthusiasts and Stirling Engine model builders, further, I could spider entire domains about Stirling Engines and quickly locate dozens of other good recommended links and all my favorite sites would be saved locally making them easy to find with a subsequent search without bookmarking.
Doing research I'm generally looking for new information, With YaCy, finding some new quality information is easier and faster."
The technology[edit | edit source]
YaCy distributes uses a distributed hash table (DHT) to share a reverse word index (RWI) among peers in the network.
The concept and the technology itself is very appealing and interesting.
Searx+Yacy: A Huge Disappointment[edit | edit source]
Searx is a metasearch engine program written in Python which actually works. It does lack of any index of it's own which makes it prone to censorship and blacklisting by search-engines due to too many queries from one source. So why not add a full-featured distributed search-engine like YaCy to Searxs sources? That sounds like a the perfect combination.
In practice it's just not worth having a local YaCy instance as a Searx source. YaCy will be the slowest responding source and most of the time it won't respond at all which results in Searx waiting the maximum time YaCy's allowed before producing results.
Searx is almost a better experience without YaCy as a source. The few results YaCy adds to the mix the few times it does produce something are not useful and the added response-time from Searx when YaCy is used as a source is noticeable. And the most of the time shown warning about "Engines cannot retrieve results: yacy" is annoying.
One trick to combining Searx with YaCy is to limit the timeout for YaCy in
timeout : 3.0 - which may seem counter-intuitive. YaCy should, in theory, work better if it gets 30 seconds in instead of 3 seconds to produce results. In practice YaCy tends to either produce results quickly or timeout regardless of it being given 10 or 20 or 30 seconds. Thus; the only difference between a timeout of 30 seconds over 3 seconds is that Searx produces results quicker.
Do note that while we don't recommend using YaCy as a Searx source since YaCy is quite useless we actually do use it as a Searx source. Because.. it's a free search engine and a cool technology. It is not very practically beneficial or profitable; but it's cool.
Search and usage tips[edit | edit source]
YaCy supports searching by language and a lot of other metrics too. It uses it's own special
/switches. A search limited to English can be done by adding
/language/en to a search (
Searches can be limited to a given site with
site:example.tld and file-types can be search with
HOWTO install YaCy on Fedora[edit | edit source]
First, install Java using dnf as root:
dnf -y install java-latest-openjdk-headless
You can find out what the latest released version is the latest by visiting
https://yacy.net/en/ - as of now that's a version from 2016. There is also a secret folder with snapshots available at luccioman's github page and that's a better choice.
First, login to your server and go root and make a user for Yacy with a home folder in
adduser --system yacy -m -d /opt/yacy
Now it's time to install
lynx if you do not have that text-based browser.
Switch to the yacy user you created,
su - yacy
You should now be in the folder
/home/yacy and you can verify that you are with
Now it's time to download and unpack it. Get either the stable version from 2016:
wget https://yacy.net/release/yacy_v1.92_20161226_9000.tar.gz tar xfvz yacy_v1.92_20161226_9000.tar.gz -C ..
Or a newer development version (recommended):
wget https://github.com/luccioman/yacy_search_server/releases/download/Release_1.921.9828-dev/yacy_v1.921_20181121_9828.tar.gz tar xfvz yacy_v1.921_20181121_9828.tar.gz -C ..
-C ..? The tarball contains a folder called
yacy/. Asking tar to go one folder down results in the contents being extracted to
You can now start your server by running
You should get a message saying
>> YaCy started as daemon process. Administration at http://localhost:8090 <<
- Go to and press enter on "(BUTTON) Administration"
- Scroll down and choose "Use Case & Account"
- Next, scroll down and choose "Accounts" under "Use Case & Account"
- Check the box next to Access only with qualified account!
- Scroll down a bit further to User Administration and Admin Account and set a password for Admin. You could change username too.
- Select "Define Administrator" and press Enter
qto quit lynx.
|Note: Make sure Access only with qualified account is checked. Revisit |
Now it's time to create a systemd file for YaCy. Type
exit to leave the YaCy user and create a systemd service file:
reload systems services
and start YaCy:
systemctl --user start yacy.service
systemctl start yacy.service
verify that it's running with
tail -n 50 /opt/yacy/DATA/LOG/yacy00.log
To be able to stop YaCy using systemd you have to add a line with the password to the top of
stopYACY.sh in YaCy's root folder:
This isn't all that safe but the alternative is to set YaCy to allow unrestricted access from localhost. You could that but you will have a problem if you combine it with something like Searx or Tor which would be talking to YaCy from localhost.
Lastly, you may want to point a browser at
127.0.0.1:8090 or the machines IP:8090 and play around with some of the settings. Good luck.
Essential configuration[edit | edit source]
We strongly recommend changing some of the default "delay" values if you plan to run YaCy use it for web crawling.
Go to the administration panel and choose
System Administration and then the
Performance Settings of Busy Queues tab. Change the delay values for
Local Crawl and
Remote Crawl Job to
10000 for "Delay between idle loops" and "Delay between busy loops" to
3000 which equals 3 seconds. The robots standard dictates that a web crawler should wait 3 seconds between requests. YaCy doesn't do that by default which is why a lot of sites outright ban it.
YaCy has some "special" configuration options available by going to "System Administration" and clicking "Advanced Properties". This will present a list of text strings similar to the
about:config interface in Mozilla Firefox. Changing the
postprocessing.maximum_load to something like the number of cores on your system minus one is likely a great idea. The reason is that the default value of
2.5 results in no postprocessing ever when YaCy is running on a multi-core system. If you have a dual-core then 2.5 is probably fine. If you have 6 cores and 12 threads on the machine YaCy is running on you'll want a higher value. YaCy's results will be (a lot) better when postprocessing is done.
Do note that a lot of the options there do absolutely nothing. As an example, changing
crawler.userAgent.string does not actually change anything at all. You can use a user-agent which points to a page informing webmasters why you are crawling. YaCy will instead leave a string with the specific kernel version you are using - which is a huge security as well as a privacy risk.
Entries per word[edit | edit source]
Index administration there is a tab named
Reverse Word Index where "Limitation of number of references per word" is by default set to 100. Increasing it to 1000 is a good idea.
Multiple YaCy instances used by Searx[edit | edit source]
System Administration link
Debug/Analysis Settings opens a settings page where it is possible to configure
Search data sources. It may be wise to uncheck everything but
Local Solr index if you are configuring a YaCy instance which will be one of multiple YaCy instances used by a front-end like Searx.
[edit | edit source]
- YaCy's homepage is at https://yacy.net/
- The version listed on the homepage is from 2016
- There are newer "snapshot" releases at https://github.com/luccioman/yacy_search_server/releases/
We are currently still running a YaCy+Search instance at searx.everdot.org. Any search there will include results for yacy. It is also possible to just search the YaCy back-end using the
!yacy prefix to searches (
!yacy wjsn save you save me to search for WJSN's wonderful hit-single "Save You, Save Me").
While it's kind of pointless since YaCy is kind of useless it is the best peer to peer software simply because it's the only peer to peer search engine solution. It is quite sad that it's as bad as it is.