Searx/LinuxReviews

Searx search plugin for linuxreviews.org.

File: searx/engines/linuxreviews.py

"""
 @website     LinuxReviews (https://linuxreviews.org/)
 @provide-api yes (https://linuxreviews.org/w/api.php)

 @using-api   yes
 @results     JSON
 @stable      yes
 @parse       url, title

 @todo        content
"""

from json import loads
from string import Formatter
from searx.url_utils import urlencode, quote
from searx.utils import html_to_text

# engine dependent config
categories = ['general','news']
language_support = True
paging = True
number_of_results = 5
search_type = 'title'  # possible values: title, text, nearmatch

# search-url
base_url = 'https://linuxreviews.org/'
search_postfix = 'w/api.php?action=query'\
    '&list=search'\
    '&{query}'\
    '&format=json'\
    '&sroffset={offset}'\
    '&srlimit={limit}'

# get first meaningful paragraph
def extract_first_paragraph(content):
    first_paragraph = None

    failed_attempts = 0
    for wparagraph in content.split(']'):
        for paragraph in wparagraph.split('['):
            length = len(paragraph)

            if length >= 60:
                first_paragraph = paragraph
                break

        failed_attempts += 1
        if failed_attempts > 5:
            return None
    return first_paragraph


# do search-request
def request(query, params):
    offset = (params['pageno'] - 1) * number_of_results

    string_args = dict(query=urlencode({'srsearch': query}),
                       offset=offset,
                       limit=number_of_results,
                       searchtype=search_type)

    format_strings = list(Formatter().parse(base_url))

    if params['language'] == 'all':
        lang = 'en-US'   
    else:
        lang = params['language']

    if lang == 'en': 
        lang = 'en-US'

    if lang != 'en-US':
        return None

    search_url = base_url + search_postfix

    params['url'] = search_url.format(**string_args)

    return params


# get response from search-request   
def response(resp):
    results = []

    search_results = loads(resp.text)
   
    # return empty array if there are no results
    if not search_results.get('query', {}).get('search'):
        return []
   
    # parse results  
    for result in search_results['query']['search']:
        if result.get('snippet', '').startswith('#REDIRECT'):
            continue
        url = base_url + quote(result['title'].replace(' ', '_').encode('utf-8'))

        extract = result['snippet']
        exttext = html_to_text(extract)
        summary = extract_first_paragraph(exttext)

        foo = summary

        # append result
        results.append({'url': url,
                        'title': result['title'],
                        'content': summary})

    # return results
    return results

This plugin can probably be improved. Feel free to change it to the better.

The plugin can use configuration such as:

File: searx/settings.yml

  - name : linuxreviews.org
    shortcut : lr
    engine : mediawiking
    base_url : https://linuxreviews.org/
    weight : 2
    number_of_results : 5
    categories: general,news,it

Searx/LinuxReviews

Navigation menu

Page actions

Page actions

Personal tools

Search

Navigation

fun free games

software benchmarks

educational videos

Comparisons

Great software

for beginners

cheat sheets

HOWTO

한국어

confused?

feed reader feeds

try your luck

logs

Tools