pypolibox Package

pypolibox Package

database Module

The database module is responsible for parsing the user’s requirements (both from command line options, as well as interactively from the Python interpreter), transforming these requirements into an SQL query, querying the sqlite database and returning the results.

class pypolibox.database.Book(db_item, db_columns, query_args)[source]

a Book instance represents one book from a database query

get_number_of_book_matches()[source]

calculates the number of query parameters that a book matches

Return type:int
class pypolibox.database.Books(results)[source]

a Books instance stores all books that were found by a database query as a list of Book instances in self.books

get_book_ranks(possible_matches)[source]

ranks ‘OR query’ results according to the number of query parameters they match.

Parameters:possible_matches (int) – the number of (meaningful) parameters of the query.
Returns:book_ranks – a list of tuples, where each tuple consists of the score of a book and its index in self.books
Return type:list of (float, int) tuples
class pypolibox.database.Query(argv)[source]

a Query instance represents one user query to the database

Queries can be made from the command line, as well as from the Python interpreter. From the command line, queries can be made using either abbreviated or long parameters. The following examples both query the database for books that contain code examples and deal with both semantics and parsing:

python pypolibox.py -k semantics, parsing -c 1
python pypolibox.py --keywords semantics, parsing --codeexamples 1

When calling pypolibox.py from within the Python interpreter, the same query can be made using the following command:

Query(["-k", "semantics", "parsing", "-c", "1"])

If you print the Query instance (by using the print command), it will return the SQL query that was constructed from the user input:

SELECT * FROM books WHERE keywords like '%semantics%' AND keywords
like '%parsing%' AND examples = 1

TODO: This module talks directly to the database. To make it easier to adapt pypolibox to a different domain, an SQL abstraction layer (e.g. SQL Alchemy) should be used.

class pypolibox.database.Results(query)[source]

A Results instance sends queries to the database, retrieves and stores the results.

get_number_of_possible_matches()[source]

Counts the number of query paramters that could be matched by books from the results set. The actual scoring of books takes place in Books.get_book_ranks().

For example, if a query contains the parameters:

keywords = pragmatics, keywords = semantics, language = German

it means that a book could possible match 3 parameters (possible_matches = 3).

Returns:the number of possible matches
Return type:int
get_table_header(table_name)[source]

get the column names (e.g. title, year, authors) and their index from the books table of the db and return them as a dictionary.

Parameters:table_name (str) – name of a database table, e.g. ‘books’
Returns:a dictionary, which contains the names of the table columns

as keys and their index as values :rtype: dict, with str keys and int values

pypolibox.database.get_column(column_name)[source]

debugging: primitive db query that returns all the values stored in a column, e.g. get_column(“title”) will return all book titles stored in the database

Return type:list of str

debug Module

The debug module contains a number of functions, which can be used to test the behaviour of pypolibox’ classes, test its error handling or simply provides short cuts to generate frequently needed data.

pypolibox.debug.abbreviate_textplan(textplan)[source]

recursive helper function that prints only the skeletton of a textplan (message types and RST relations but not the actual message content)

Parameters:textplan (TextPlan or ConstituentSet or Message) – a text plan, a constituent set or a message
Returns:a message (without the attribute value pairs stored in it)
Return type:Message
pypolibox.debug.apply_rule(messages, rule_name)[source]

debugging: take a rule and apply it to your list of messages.

the resulting ConstituentSet will be added to the list, while the messages involved in its construction will be removed. repeat this step until you’ve found an erroneous/missing rule.

pypolibox.debug.compare_hlds_variants()[source]

TODO: kill bugs

BUG1: sentence001-original-test contains 2(!) <item> sentences.

pypolibox.debug.compare_textplans()[source]

helps to find out how many different text plan structures there are.

pypolibox.debug.enumprint(obj)[source]

prints every item of an iterable on its own line, preceded by its index

pypolibox.debug.find_applicable_rules(messages)[source]

debugging: find out which rules are directly (i.e. without forming ConstituentSets first) applicable to your messages

pypolibox.debug.findrule(ruletype='', attribute='', value='')[source]

debugging: find rules that have a certain ruleType and some attribute-value pair

Example: findrule(“Concession”, “nucleus”, “usermodel_match”) finds rules of type ‘Concession’ where rule.nucleus == ‘usermodel_match’.

pypolibox.debug.gen_all_messages_of_type(msg_type)[source]

generate all messages for all books from all testqueries, but return only those which match the given message type, e.g. ‘id’ or ‘extra’.

pypolibox.debug.gen_all_textplans()[source]

generates all text plans for each query in the predefined list of test queries.

Return type:list of ``TextPlan``s or ``str``s
Returns:
pypolibox.debug.gen_textplans(query)[source]

debug function: generates all text plans for a query.

Parameters:query (int or list of str) – can be the index of a test query (e.g. 4) OR a list of

query parameters (e.g. [“-k”, “phonology”, “-l”, “German”])

Return type:TextPlans
Returns:a TextPlans instance, containing a number of text plans
pypolibox.debug.genallmessages(query)[source]

debug function: generates all messages plans for a query.

Parameters:query (int or list of str) – can be the index of a test query (e.g. 4) OR a list of

query parameters (e.g. [“-k”, “phonology”, “-l”, “German”])

Return type:AllMessages
Returns:all messages that could be generated for the query
pypolibox.debug.genmessages(booknumber=0, querynumber=10)[source]

generates all messages for a book regarding a specific database query.

Parameters:booknumber (int) – the index of the book from the results list (“0”

would be the first book with the highest score)

Parameters:querynumber (int) – the index of a query from the predefined list of

test queries (named ‘testqueries’)

Return type:list of ``Message``s
pypolibox.debug.genprops(querynumber=10)[source]

generates all propositions for all books in the database concerning a specific query.

Parameters:querynumber (int) – the index of a query from the predefined list of

test queries (named ‘testqueries’)

Return type:AllPropositions
pypolibox.debug.msgtypes(messages)[source]

print message types / rst relation types, no matter which data structure is used to represent them

pypolibox.debug.printeach(obj)[source]

prints every item of an iterable on its own line

pypolibox.debug.test_cli(query_arguments=[[], ['-k', 'pragmatics'], ['-k', 'pragmatics', '-r', '4'], ['-k', 'pragmatics', 'semantics'], ['-k', 'pragmatics', 'semantics', '-r', '7'], ['-l', 'German'], ['-l', 'German', '-p', 'Lisp'], ['-l', 'German', '-p', 'Lisp', '-k', 'parsing'], ['-l', 'English', '-s', '0', '-c', '1'], ['-l', 'English', '-s', '0', '-e', '1', '-k', 'discourse'], ['-k', 'syntax', 'parsing', '-l', 'German', '-p', 'Prolog', 'Lisp', '-s', '2', '-t', '0', '-e', '1', '-c', '1', '-r', '7']])[source]

run several complex queries and print their results to stdout

facts Module

The facts module takes the information stored in Book instances and converts them into attribute value matrices (Facts). Furthermore, the module compares each book with its predecessor (e.g. book A is newer than book B and has code examples, while B is shorter and targets beginners ...). The insights gathered from these comparisons are also stored in Facts instances.

class pypolibox.facts.AllFacts(b)[source]

Simply speaking, an AllFacts instance contains all facts about all books that were returned by a database query. More formally, it contains a Facts instance for each Book in a Books instance.

In a Books instance, all books returned by a database query are sorted by the number of query parameters they match (‘user model match’) in descending order. This means, that AllFacts will contain facts about the best-matching book, followed by facts about the second-best matching book (including a comparison to the best matching one), followed by facts about the third-best matching book (including a comparison to the second one) etc.

class pypolibox.facts.Facts(book, book_score, index=0, preceding_book=False)[source]

A Facts instance represents facts about a single book, but also contains a comparison of that particular book with its predecessor.

generate_extra_facts(index, book)[source]

generates extra_facts, if the current book is very new/old or very short/long.

Parameters:
  • index (int) – the index of the book in the Books list of books
  • book (Book) – a Book instance
Returns:

a dictionary that contains information about the recency and

length of a book :rtype: dict

generate_id_facts(index, book)[source]

generates a dictionary of id facts about the current book which will be stored in self.facts["id_facts"]. In contrast to other facts, id_facts are those kind of facts that can be directly retrieved from the database (i.e. there is no comparison between books or reasoning involved). The id_facts dictionary contains the following keys:

id_facts keys       database book table columns

'authors'
'codeexamples'      'examples'
'exercises'
'keywords'
'language'          'lang'
'pages'
'proglang'          'plang'
'target'
'title'
'year'

The key names should be self-exlanatory. In those cases where they do not exactly match their counterparts in the database, the corresponding database table column name is given in the table above.

Parameters:
  • index (int) – the index of the book in the Books list of books
  • book (Book) – a Book instance
Returns:

a dictionary with the keys described above

Return type:

dict

generate_lastbook_facts(index, book, preceding_book)[source]

generates facts that compare the current book with the preceding one. A typical example of a lastbook_facts dictionary would look like this:

lastbook_facts:
    lastbook_nomatch:
        {'language': 'German',
        'keywords_preceding_book_only':
            set(['pragmatics', 'chart parsing']),
        'keywords_current_book_only':
            set([' ', 'grammar', 'language hierarchy', 'corpora',
                'syntax', 'morphology', 'left associative
                grammar']),
        'codeexamples': 0,
        'proglang': set(['Lisp']),
        'newer': 11,
        'keywords':
            set([' ', 'grammar', 'language hierarchy', 'corpora',
            'syntax', 'left associative grammar', 'morphology',
            'chart parsing', 'pragmatics']),
        'proglang_preceding_book_only':
            set(['Lisp'])}
    lastbook_match:
        {'exercises': 1, 'keywords': set(['semantics',
        'parsing']), 'target': 0, 'pagerange': 1}

This method will calculate if is newer/older/shorter/longer than its predecessor (if so, it will store the difference as an integer). For keys that have sets as their values (keywords and proglang), the resulting dictionary will list which values differed and which were only present in either the preceding or the current book.

Parameters:
  • index (int) – the index of the book in the Books list of books
  • book (Book) – a Book instance
  • preceding_book – if True, there is a book preceding this one

and both books will be compared :type preceding_book: bool

Returns:a dictionary with two keys: lastbook_match and

lastbook_nomatch, which in turn are dictionaries themselves and contain facts that are shared between the two books (lastbook_match) or that differ between the two (lastbook_nomatch).

generate_query_facts(index, book, book_score)[source]

generates facts that describes if a book matches (parts of) the query (a.k.a the user model). a typical query_facts dictionary will look like this:

query_facts:
    usermodel_nomatch: {'codeexamples': 0}
    usermodel_match: {'exercises': 1, 'keywords':
                     set(['semantics', 'parsing']), 'language':
                     'German'}
    book_score: 0.8

The book described in this examples matches 80 % of the user requirements (it contains exercises and deals with semantics and parsing and is written in German) but does not contain code examples (as was asked for by the user).

Parameters:
  • index (int) – the index of the book in the Books list of books
  • book (Book) – a Book instance
  • book_score – the score of the book that was calculated in

Books.get_book_ranks() :type book_score: float

Returns:a dictionary that contains three keys, the book_score,

the usermodel_match as well as the usermodle_nomatch. ‘usermodel_match’ contains all the features that were requested by the user and are present in the book. ‘usermodle_nomatch’ contains all features that were requested but are missing from the book. :rtype: dict

hlds Module

HLDS (Hybrid Logic Dependency Semantics) is the format internally used by the OpenCCG realizer. This module shall allow the conversion between HLDS-XML files and NLTK feature structures. In addition, it can also be used as a commandline to convert HLDS-XML files in printable versions of ``nltk.FeatStruct``s. The following command produces a LaTeX file that can be compiled into a PDF:

python hlds.py --format latex --outfile output.tex input1.xml input2.xml

Alternatively, you can also produce ‘ASCII art’ with this command:

python hlds.py --format nltk --outfile output.tex input1.xml input2.xml

This way, the phrase ‘das Buch’ can be transformed from this HLDS-XML representation:

<?xml version="1.0" encoding="UTF-8"?>
<xml>
  <lf>
    <satop nom="b1:artefaktum">
      <prop name="Buch"/>
      <diamond mode="NUM">
        <prop name="sing"/>
      </diamond>
      <diamond mode="ART">
        <nom name="d1:sem-obj"/>
        <prop name="def"/>
      </diamond>
    </satop>
  </lf>
  <target>das Buch</target>
</xml>

To this attribute-value matrix (LaTeX):

\begin{avm}
    \[ $*$nom$*$  & `b1:artefaktum' \\
       $*$prop$*$ & `Buch' \\
       $*$text$*$ & `das Buch' \\
       NUM        & \[ prop & `sing' \] \\
       ART        & \[ nom  & `d1:sem-obj' \\
                       prop & `def' \] \\
    \]
\end{avm}

or this one (plain text):

[ *root_nom*        = 'b1:artefaktum'           ]
[ *root_prop*       = 'Buch'                    ]
[ *text*            = 'das Buch'                ]
[                                               ]
[ 00__NUM           = [ *mode* = 'NUM'  ]       ]
[                     [ prop   = 'sing' ]       ]
[                                               ]
[                     [ *mode* = 'ART'        ] ]
[ 01__ART           = [ nom    = 'd1:sem-obj' ] ]
[                     [ prop   = 'def'        ] ]
class pypolibox.hlds.Diamond(features=None, **morefeatures)[source]

Bases: nltk.featstruct.FeatDict

A {Diamond} represents an HLDS diamond in form of a (nested) feature structure containing the elements nom? prop? diamond*

<diamond mode="AGENS">
    <nom name="s1:addition"/>
    <prop name="sowohl"/>
    <diamond mode="NP1">
        <nom name="h1:nachname"/>
        <prop name="Hausser"/>
    </diamond>
    ...
</diamond>
append_subdiamond(subdiamond, mode=None)[source]

appends a subdiamond structure to an existing diamond structure, while allowing to change the mode of the subdiamond

Parameters:mode (str or NoneType) – the mode that the subdiamond shall have. this will

also be used to determine the subdiamonds identifier. if the diamond already has two subdiamonds (e.g. “00__AGENS” and “01__PATIENS”) and add a third subdiamond with mode “TEMP”, its identifier will be “02__TEMP”. if mode is None, the subdiamonds mode will be left untouched.

change_mode(mode)[source]

changes the mode of a Diamond, which is sometimes needed when embedding it into another Diamond or Sentence.

insert_subdiamond(index, subdiamond_to_insert, mode=None)[source]

insert a Diamond into this one before the index, while allowing to change the mode of the subdiamond.

Parameters:mode (str or NoneType) – the mode that the subdiamond shall have. this will

also be used to determine the subdiamonds identifier. if the diamond already has two subdiamonds (e.g. “00__AGENS” and “01__PATIENS”) and we’ll insert a third subdiamond at index ‘1’ with mode “TEMP”, its identifier will be “01__TEMP”, while the remaining two subdiamond identifiers will will be changed accordingly, e.g. “00__AGENS” and “02__PATIENS”. if mode is None, the subdiamonds mode will be left untouched.

prepend_subdiamond(subdiamond_to_prepend, mode=None)[source]

prepends a subdiamond structure to an existing diamond structure, while allowing to change the mode of the subdiamond

Parameters:mode (str or NoneType) – the mode that the subdiamond shall have. this will

also be used to determine the subdiamonds identifier. if the diamond already has two subdiamonds (e.g. “00__AGENS” and “01__PATIENS”) and we’ll prepend a third subdiamond with mode “TEMP”, its identifier will be “00__TEMP”, while the remaining two subdiamond identifiers will will be incremented by 1, e.g. “01__AGENS” and “02__PATIENS”. if mode is None, the subdiamonds mode will be left untouched.

class pypolibox.hlds.HLDSReader(hlds, input_format='file')[source]

represents a list of sentences (as NLTK feature structures) parsed from an HLDS XML testbed file.

parse_sentence(sentence, single_sent=True)[source]
parse_sentences(tree)[source]

Parses all sentences (represented as HLDS XML structures) into feature structures. These structures are saved as a list of ``Sentence``s in self.sentences.

If there’s only one sentence in a file, it’s root element is <xml>. If there’s more than one, they are each <xml> sentence “roots” is wrapped in an <item>...</item> (and <regression> becomes the root tag of the document).

Parameters:tree (etree._ElementTree) – an etree tree element
class pypolibox.hlds.Sentence(features=None, **morefeatures)[source]

Bases: nltk.featstruct.FeatDict

represents an HLDS sentence as an NLTK feature structure.

create_sentence(sent_str, expected_parses, root_nom, root_prop, diamonds)[source]

wraps all ``Diamond``s that were already constructed by HLDSReader.parse_sentences() plus some meta data (root verb etc.) into a NLTK feature structure that represents a complete sentence.

Parameters:
  • sent_str (str) – the text that should be generated
  • expected_parses (int) – the expected number of parses
  • root_prop (str) – the root element of that text (in case we’re

actually generating a sentence: the main verb)

Parameters:
  • root_nom (str) – the root (element/verb) category, e.g. “b1:handlung”
  • diamonds (list of ``Diamond``s) – a list of the diamonds that are contained in the

sentence

pypolibox.hlds.add_mode_suffix(diamond, mode='N')[source]
pypolibox.hlds.add_nom_prefixes(diamond)[source]

Adds a prefix/index to the name attribute of every <nom> tag of a Diamond or Sentence structure. Without this, ccg-realize will only produce gibberish.

Every <nom> tag has a ‘name’ attribute, which contains a category/type-like description of the corresponding <prop> tag’s name attribute, e.g.:

<diamond mode="PRÄP">
    <nom name="v1:zugehörigkeit"/>
    <prop name="von"/>
</diamond>

Here ‘zugehörigkeit’ is the name of a category that the preposition ‘von’ belongs to. usually, the nom prefix is the first character of the prop name attribute with an added index. index iteration is done by a depth-first walk through all diamonds contained in the given feature structure. In this example ‘v1:zugehörigkeit’ means, that “von” is the first diamond in the structure that starts with ‘v’ and belongs to the category ‘zugehörigkeit’.

pypolibox.hlds.convert_diamond_xml2fs(etree)[source]

transforms a HLDS XML <diamond>...</diamond> structure (that was parsed into an etree element) into an NLTK feature structure.

Parameters:etree_or_tuple (etree._Element) – a diamond etree element
Return type:Diamond
pypolibox.hlds.create_diamond(mode, nom, prop, nested_diamonds_list)[source]

creates an HLDS feature structure from scratch (in contrast to convert_diamond_xml2fs, which converts an HLDS XML structure into its corresponding feature structure representation)

NOTE: I’d like to simply put this into __init__, but I don’t know how to subclass FeatDict properly. FeatDict.__new__ complains about Diamond.__init__(self, mode, nom, prop, nested_diamonds_list) having too many arguments.

pypolibox.hlds.create_hlds_file(sent_or_sent_list, mode='test', output='etree')[source]

this function transforms ``Sentence``s into a a valid HLDS XML testbed file

Parameters:
  • sent_or_sent_list (Sentence or list of ``Sentence``s) – a Sentence or a list of ``Sentence``s
  • mode (str) – “test”, if the sentence will be part of a (regression)

testbed file (ccg-test). “realize”, if the sentence will be put in a file on its own (ccg-realize).

Parameters:output (str) – “etree” (etree element) or “xml” (formatted, valid xml

document as a string)

Return type:str
pypolibox.hlds.diamond2sentence(diamond)[source]

Converts a Diamond feature structure into a Sentence feature structure. This becomes necessary whenever we want to realize a short utterance, e.g. “die Autoren” or “die Themen Syntax und Pragmatik”.

Note: OpenCCG does not really distinguish between a sentence and smaller units of meaning. It simply assigns the <sentence> tag to every HLDS structure it realizes, whereas each substructure of this “sentence” (no matter how complex) is labelled as a <diamond>.

Return type:Sentence
pypolibox.hlds.etreeprint(element, debug=True, raw=False)[source]

pretty print function for etree trees or elements

Parameters:debug – if True: not only return the XML string, but also print it to

stdout. if False: only return the XML string

Parameters:raw – if True: just transform the etree (element) into a string,

don’t add or prettify anything. if False: add an XML declaration and use pretty print to make the output more readable for humans.

pypolibox.hlds.featstruct2avm(featstruct, mode='non-recursive')[source]

converts an NLTK feature structure into an attribute-value matrix that can be printed with LaTeX’s avm environment.

Return type:str
pypolibox.hlds.hlds2xml(featstruct)[source]

debug function that returns the string representation of a feature structure (Diamond or Sentence) and its HLDS XML equivalent.

Return type:str
pypolibox.hlds.last_diamond_index(featstruct)[source]

Returns the highest index currently used withing a given Diamond or Sentence. E.g., if this structure contains three diamonds (“00__ART”, “01__NUM” and “02__TEMP”), the return value will be 2. The return value is -1, if the feature structure doesn’t contain any ``Diamond``s.

Return type:int
pypolibox.hlds.main()[source]

parse command line args and do the conversions

pypolibox.hlds.remove_nom_prefixes(diamond)[source]
pypolibox.hlds.test_conversion()[source]

tests HLDS XML <-> NLTK feature structures conversions. converts an HLDS XML testbed file into a list of sentences in NLTK feature structure. picks one of these sentences randomly and converts it back to HLDS XML. prints boths versions of this sentence. returns an HLDSReader instance (containing a list of ``Sentence``s in NLTK feature structure notation) and a HLDS XML testbed file (as a string) created from those feature structures.

Return type:tuple of (HLDSReader, str)
Returns:a tuple containing an HLDSReader instance and a string

representation of an HLDS XML testbed file

lexicalization Module

This module shall convert ``TextPlan``s into HLDS XML structures which can be utilized by the OpenCCG surface realizer to produce natural language text.

pypolibox.lexicalization.gen_abstract_autor(num_of_authors)[source]

given an integer (number of authors), returns a Diamond instance which generates “der Autor” or “die Autoren”.

Parameters:num_of_authors (int) – the number of authors of a book
Return type:Diamond
pypolibox.lexicalization.gen_abstract_keywords(num_of_keywords)[source]

generates a Diamond for ‘das Thema’ vs. ‘die Themen’

pypolibox.lexicalization.gen_abstract_title(number_of_books)[source]

given an integer representing a number of books returns a Diamond, which can be realized as either “das Buch” or “die Bücher”

Return type:Diamond
pypolibox.lexicalization.gen_art(article_type='def')[source]

generates a Diamond describing an article

pypolibox.lexicalization.gen_complete_name(name)[source]

takes a name as a string and returns a corresponding nested HLDS diamond structure.

Return type:Diamond
pypolibox.lexicalization.gen_enumeration(diamonds_list, mode='')[source]

Takes a list of Diamond instances and combines them into a nested Diamond. This nested Diamond can be used to generate an enumeration, such as:

A
A und B
A, B und C
A, B, C und D
...
Return type:Diamond
Returns:a Diamond instance (containing zero or more nested Diamond

instances)

pypolibox.lexicalization.gen_gender(genus='mask')[source]

generates a Diamond representing masculine, feminine or neuter

pypolibox.lexicalization.gen_keywords(keywords, mode='N')[source]

takes a list of keyword (strings) and converts them into a nested Diamond structure and prepends “das Thema” or “die Themen”

Return type:Diamond
pypolibox.lexicalization.gen_komma_enumeration(diamonds_list, mode='')[source]

This function will be called by gen_enumeration() and takes a list of Diamond instances and combines them into a nested Diamond, expressing comma separated items, e.g.:

Manning, Chomsky Manning, Chomsky, Allen ...
Return type:Diamond
Returns:a Diamond instance (containing zero or more nested Diamond

instances)

pypolibox.lexicalization.gen_komp(modality='komp')[source]

generates a Diamond expressing adjective modality, i.e. ‘positiv’, ‘komperativ’ or ‘superlativ’.

pypolibox.lexicalization.gen_lastname_only(name)[source]

given an authors name (“Christopher D. Manning”), the function returns a Diamond instance which can be used to realize the author’s last name.

NOTE: This does not work with last names that include whitespace, e.g. “du Bois” or “von Neumann”.

Return type:Diamond
pypolibox.lexicalization.gen_length_lastbook_nomatch(length, lexicalized_title, lexicalized_lastbooktitle)[source]
Parameters:length (Diamond) –

a feature structure that compares the length of two books:

[ direction = '+'                  ]
[                                  ]
[ magnitude = [ number = 122     ] ]
[             [ unit   = 'pages' ] ]
[                                  ]
[ rating    = 'neutral'            ]
[ type      = 'RelativeVariation'  ]
pypolibox.lexicalization.gen_mod(modifier, modifier_type='kardinal')[source]

generates a Diamond representing a modifier

pypolibox.lexicalization.gen_nested_given_names(given_names)[source]

given names are represented as nested (diamond) structures in HLDS (instead of using indices to specify the first given name, second given name etc.), where the last given name is the outermost structural element and the first given name is the innermost one.

Return type:empty list or Diamond
Returns:returns an empty list if given_names is empty. otherwise returns a

Diamond (which might contain other diamonds)

pypolibox.lexicalization.gen_num(numerus=1)[source]

generates a Diamond representing singular or plural

Parameters:numerus (str or int) – either a string representing singular or plural

(“sing”, “plur”), or an integer. :rtype: Diamond

pypolibox.lexicalization.gen_pages_extra(length_description, lexicalized_title)[source]

das Buch ist etwas kurz das Buch ist sehr umfangreich

Parameters:length_description (str) – “very long”, “very short”
pypolibox.lexicalization.gen_pages_id(pages_int, lexicalized_title, lexeme='random')[source]
Parameters:pages_int (int) – number of pages of a book
pypolibox.lexicalization.gen_pers(person)[source]

generates a Diamond representing 1st, 2nd or 3rd person

pypolibox.lexicalization.gen_personal_pronoun(count, gender, person, mode='')[source]
Parameters:
  • count (int) – 1 for ‘singular’; > 1 for ‘plural’
  • gender (str) – ‘masc’, ‘fem’ or ‘neut’
  • person (int) – 1 for 1st person, 2 for 2nd person ...
pypolibox.lexicalization.gen_prep(preposition, preposition_type='zugeh\xc3\xb6rigkeit')[source]

generates a Diamond representing a preposition

pypolibox.lexicalization.gen_proglang(proglang, mode='')[source]

generates a Diamond representing programming languages, e.g. ‘die Programmiersprache X’, ‘die Programmiersprachen X und Y’ or ‘keine Programmiersprache’.

Parameters:proglang (tuple of (frozenset, str)) – a tuple consisting of a set of programming languages

(as strings) and a rating (string)

Parameters:mode (str) – sets the mode attribute of the resulting Diamond
Return type:Diamond
pypolibox.lexicalization.gen_pronoun(person, pronoun_type, gender, numerus, mode='')[source]

generates any kind of pronoun.

Parameters:
  • person (int) – 1 for 1st person, 2 for 2nd person ...
  • pronoun_type (str) – type of the pronoun, e.g. “reflpro” or “perspro”
  • gender (str) – ‘masc’, ‘fem’ or ‘neut’
  • count – “sing” or “plur”
  • mode (str) – the mode string the resulting diamond should have
Return type:

Diamond

pypolibox.lexicalization.gen_recency_extra(recency_description, lexicalized_title)[source]

das Buch ist besonders neu das Buch ist sehr alt

Parameters:recency_description (str) – “recent”, “old”
pypolibox.lexicalization.gen_recency_lastbook_nomatch(recency, lexicalized_title, lexicalized_lastbooktitle)[source]

dieses Buch ist 23 Jahre älter/neuer als das erste Buch.

pypolibox.lexicalization.gen_spez(specifier, specifier_type)[source]

generates a Diamond which expresses a specifier, e.g. ‘sehr’

pypolibox.lexicalization.gen_tempus(tense='pr\xc3\xa4s')[source]

generates a Diamond representing a tense form

pypolibox.lexicalization.gen_title(book_title)[source]

Converts a book title (string) into its corresponding HLDS diamond structure. Since book titles are hard coded into the grammar, the OpenCCG output will differ somewhat, e.g.:

'Computational Linguistics' --> '„ Computational Linguistics “'
Return type:Diamond
pypolibox.lexicalization.lexicalize_authors(authors_tuple, realize='abstract')[source]

converts a list of authors into several possible HLDS diamond structures, which can be used for text generation.

Parameters:author_tuple – tuple containing a set of names, e.g. ([“Ronald

Hausser”, “Christopher D. Manning”]) and a rating, i.e. “neutral”

Parameters:realize (str) – “abstract”, “lastnames”, “complete”.

“abstract” realizes ‘das Buch’ / ‘die Bücher’. “lastnames” realizes only the last names of authors, while “complete” realizes their given and last names.

Return type:Diamond
Returns:a Diamond instance, which generates “der Autor”/”die Autoren”,

the authors last names or the complete names of the authors.

realize one author abstractly: >>> openccg.realize(lexicalize_authors(([‘author1’], “”), realize=’abstract’)) [‘dem Autoren’, ‘den Autoren’, ‘der Autor’, ‘des Autors’]

realize two authors abstractly: >>> openccg.realize(lexicalize_authors(([‘author1’, ‘author2’], “neutral”) , realize=’abstract’)) [‘den Autoren’, ‘der Autoren’, ‘die Autoren’]

realize two authors, only using their lastnames: >>> openccg.realize(lexicalize_authors(([‘Christopher D. Manning’, ‘Detlef Peter Zaun’], “”), realize=’lastnames’)) [‘Manning und Zaun’, ‘Mannings und Zauns’]

realize two authors using their full names: >>> openccg.realize(lexicalize_authors(([‘Christopher D. Manning’, ‘Detlef Peter Zaun’], “”), realize=’complete’)) [‘Christopher D. Manning und Detlef Peter Zaun’, ‘Christopher D. Mannings und Detlef Peter Zauns’]

pypolibox.lexicalization.lexicalize_codeexamples(examples, lexicalized_title, lexicalized_proglang=None, lexeme='random')[source]

das Buch enthält (keine) Code-Beispiele (in der Programmiersprache X). das Buch beinhaltet (keine) Code-Beispiele.

das Buch enthält Code-Beispiele in den Programmiersprachen A und B. „On Syntax“ beinhaltet keine Code-Beispiele.

Parameters:examples (tuple of (int, str)) – a tuple, e.g. (0, ‘neutral’), describing if a book

uses code examples (1) or not (0)

Parameters:lexeme (str) – “beinhalten”, “enthalten” or “random”.

realize “das Buch enthält Code-Beispiele”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_codeexamples((1, ""), lexicalized_title=title, lexeme="enthalten"))
['das Buch Code-Beispiele enth\xc3\xa4lt', 'das Buch enth\xc3\xa4lt Code-Beispiele', 'enth\xc3\xa4lt das Buch Code-Beispiele']

realize “das Buch enthält keine Code-Beispiele”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_codeexamples((0, ""), lexicalized_title=title, lexeme="enthalten"))
['das Buch enth\xc3\xa4lt keine Code-Beispiele', 'das Buch keine Code-Beispiele enth\xc3\xa4lt', 'enth\xc3\xa4lt das Buch keine Code-Beispiele']

realize “das Buch enthält Code-Beispiele in den Programmiersprachen A + B”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> plang = lexicalize_proglang((["Ada","Scheme"], ""), realize="embedded")
>>> openccg.realize(lexicalize_codeexamples((1, ""), lexicalized_title=title, lexicalized_proglang=plang, lexeme="enthalten"))
['das Buch Code-Beispiele in den Programmiersprachen Ada und Scheme enth\xc3\xa4lt', 'das Buch enth\xc3\xa4lt Code-Beispiele in den Programmiersprachen Ada und Scheme', 'enth\xc3\xa4lt das Buch Code-Beispiele in den Programmiersprachen Ada und Scheme']

realize “d. Buch von X + Y beinhaltet Code-Bsp. in den Prog.sprachen A + B”

>>> authors = lexicalize_authors((["Alan Kay", "John Hopcroft"], ""), realize="lastnames")
>>> title = lexicalize_title(("foo", ""), lexicalized_authors=authors, realize="abstract", authors_realize="preposition")
>>> plang = lexicalize_proglang((["Ada","Scheme"], ""), realize="embedded")
>>> openccg.realize(lexicalize_codeexamples((1, ""), lexicalized_title=title, lexicalized_proglang=plang, lexeme="beinhalten"))
['beinhaltet das Buch von Kay und Hopcroft Code-Beispiele in den Programmiersprachen Ada und Scheme', 'das Buch von Kay und Hopcroft Code-Beispiele in den Programmiersprachen Ada und Scheme beinhaltet', 'das Buch von Kay und Hopcroft beinhaltet Code-Beispiele in den Programmiersprachen Ada und Scheme']
pypolibox.lexicalization.lexicalize_exercises(exercises, lexicalized_title, lexeme='random')[source]

das Buch enthält/beinhaltet (keine) Übungen.

Parameters:exercises (tuple of (int, str)) – a tuple stating if a book contains exercises (1,

“neutral”) or not (0, “neutral”).

realize “das Buch enthält Übungen”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_exercises((1, ""), title, lexeme="enthalten"))
['das Buch enth\xc3\xa4lt \xc3\x9cbungen', 'das Buch \xc3\x9cbungen enth\xc3\xa4lt', 'enth\xc3\xa4lt das Buch \xc3\x9cbungen']

realize “das Buch beinhaltet keine Übungen”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_exercises((0, ""), title, lexeme="beinhalten"))
['beinhaltet das Buch keine \xc3\x9cbungen', 'das Buch beinhaltet keine \xc3\x9cbungen', 'das Buch keine \xc3\x9cbungen beinhaltet']
pypolibox.lexicalization.lexicalize_keywords(keywords_tuple, lexicalized_title=None, lexicalized_authors=None, realize='complete', lexeme='random')[source]
Parameters:
  • keywords_tuple (tuple of (frozenset of str, str)) – e.g. (frozenset([‘generation’, ‘discourse’, ‘semantics’, ‘parsing’]), ‘neutral’)
  • realize (str) – “abstract”, “complete”.

“abstract” realizes ‘das Thema’ / ‘die Themen’. “complete” realizes an enumeration of those keywords.

realize one keyword abstractly, using an abstract author and the lexeme behandeln:

>>> author = lexicalize_authors((["author1"], ""), realize="abstract")
>>> openccg.realize(lexicalize_keywords((frozenset(["keyword1"]), ""), lexicalized_authors=author, realize="abstract", lexeme="behandeln"))
['behandelt der Autor das Thema', 'der Autor behandelt das Thema', 'der Autor das Thema behandelt']

realize one keyword concretely, using two concrete authors and the lexeme beschreiben:

>>> authors = lexicalize_authors((["John E. Hopcroft","Jeffrey D. Ullman"], ""), realize="complete")
>>> openccg.realize(lexicalize_keywords((frozenset(["parsing", "formal languages"]), ""), lexicalized_authors=authors, realize="complete", lexeme="beschreiben"))
['John E. Hopcroft und Jeffrey D. Ullman beschreiben die Themen formal_languages und parsing', 'John E. Hopcroft und Jeffrey D. Ullman die Themen formal_languages und parsing beschreiben', 'beschreiben John E. Hopcroft und Jeffrey D. Ullman die Themen formal_languages und parsing']

realize 4 keywords, using 1 author’s last name and the lexeme eingehen:

>>> author = lexicalize_authors((["Ralph Grishman"], ""), realize="lastnames")
>>> openccg.realize( lexicalize_keywords((frozenset(["parsing","semantics","discourse","generation"]), ""), lexicalized_authors=author, realize="complete", lexeme="eingehen"))
['Grishman geht auf den Themen discourse , generation , parsing und semantics ein', 'Grishman geht auf die Themen discourse , generation , parsing und semantics ein', 'geht Grishman auf den Themen discourse , generation , parsing und semantics ein', 'geht Grishman auf die Themen discourse , generation , parsing und semantics ein']

TODO: “___ geht auf den Themen ein” is not OK

realize 1 keyword, using an abstract book title and the lexeme aufgreifen:

>>> title = lexicalize_title(("book1", ""), realize="abstract")
>>> openccg.realize(lexicalize_keywords((frozenset(["regular expressions"]), ""), lexicalized_title=title, realize="complete", lexeme="aufgreifen"))
['das Buch greift das Thema regular_expressions auf', 'greift das Buch das Thema regular_expressions auf']

realize 2 keywords, using a concrete book title and the lexeme beschreiben:

>>> title = lexicalize_title(("Grundlagen der Computerlinguistik", ""), realize="complete")
>>> openccg.realize(lexicalize_keywords((frozenset(["grammar", "corpora"]), ""), lexicalized_title=title, realize="complete", lexeme="beschreiben"))
['beschreibt \xe2\x80\x9e Grundlagen_der_Computerlinguistik \xe2\x80\x9c die Themen corpora und grammar', '\xe2\x80\x9e Grundlagen_der_Computerlinguistik \xe2\x80\x9c beschreibt die Themen corpora und grammar', '\xe2\x80\x9e Grundlagen_der_Computerlinguistik \xe2\x80\x9c die Themen corpora und grammar beschreibt']
pypolibox.lexicalization.lexicalize_language(language, lexicalized_title, realize='random')[source]

das Buch ist Deutsch. das Buch ist in deutscher Sprache.

Parameters:language (tuple of (str, str)) – (“English”, “neutral”) or (“German”, “neutral”).

NOTE: negation isn’t possible w/ the current grammar (“nicht auf Deutsch”)

realize “das Buch ist in englischer Sprache”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_language(("English", ""), title, realize="adjective"))
['das Buch in englischer Sprache ist', 'das Buch ist in englischer Sprache', 'ist das Buch in englischer Sprache']

realize “das Buch ist auf Deutsch”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_language(("German", ""), title, realize="noun"))
['das Buch auf Deutsch ist', 'das Buch ist auf Deutsch', 'ist das Buch auf Deutsch']
pypolibox.lexicalization.lexicalize_length(length, lexicalized_title, lexicalized_lastbooktitle=None)[source]
Parameters:message_block (str) – “lastbook_nomatch” or “extra”

realize “$thisbook ist 122 Seiten länger als $lastbook”:

>>> length_lastbook_nomatch = FeatDict(direction='+', rating='neutral', type='RelativeVariation', magnitude=FeatDict(number=122, unit="pages"))
>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> lasttitle = lexicalize_title(("Natural Language Processing", ""), realize="complete")
>>> openccg.realize(lexicalize_length(length_lastbook_nomatch, title, lasttitle))
['das Buch 122 Seiten l\xc3\xa4nger als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c ist', 'das Buch ist 122 Seiten l\xc3\xa4nger als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c', 'ist das Buch 122 Seiten l\xc3\xa4nger als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c']

realize “es ist 14 Seiten kürzer als $lastbook”:

>>> length_lastbook_nomatch = FeatDict(direction='-', rating='neutral', type='RelativeVariation', magnitude=FeatDict(number=14, unit="pages"))
>>> title = lexicalize_title(("foo", ""), realize="pronoun")
>>> lasttitle = lexicalize_title(("Angewandte Computerlinguistik", ""), realize="complete")
>>> openccg.realize(lexicalize_length(length_lastbook_nomatch, title, lasttitle))
['es 14 Seiten k\xc3\xbcrzer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c ist', 'es ist 14 Seiten k\xc3\xbcrzer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c', 'ist es 14 Seiten k\xc3\xbcrzer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c']
pypolibox.lexicalization.lexicalize_pages(pages, lexicalized_title, lexeme='random')[source]

___ hat einen Umfang von 546 Seiten ___ umfasst 546 Seiten ___ ist 546 Seiten lang

Parameters:pages (tuple of (int, str)) – a tuple stating how many pages a book contains, e.g. (546, “neutral”)
Return type:Diamond

realize “$title hat einen Umfang von $pages Seiten”:

>>> title = lexicalize_title(("Natural Language Processing", ""), realize="complete")
>>> openccg.realize(lexicalize_pages((600, ""), lexicalized_title=title, lexeme="umfang"))
['hat \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c einen Umfang von 600 Seiten', '\xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c einen Umfang von 600 Seiten hat', '\xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c hat einen Umfang von 600 Seiten']

realize “$abstracttitle umfasst $pages Seiten”:

>>> title = lexicalize_title(("title1", ""), realize="abstract")
>>> openccg.realize(lexicalize_pages((600, ""), lexicalized_title=title, lexeme="umfassen"))
['das Buch 600 Seiten umfasst', 'das Buch umfasst 600 Seiten', 'umfasst das Buch 600 Seiten']

TODO: generates ungrammatical phrases, e.g. “ist das Buch lange 600 Seiten” realize “$abstracttitle ist $pages Seiten lang”:

>>> title = lexicalize_title(("title1", ""), realize="abstract")
>>> openccg.realize(lexicalize_pages((600, ""), lexicalized_title=title, lexeme="länge"))
['das Buch 600 Seiten lang ist', 'das Buch ist 600 Seiten lang', 'das Buch ist lange 600 Seiten', 'das Buch lange 600 Seiten ist', 'ist das Buch 600 Seiten lang', 'ist das Buch lange 600 Seiten']

realize within an extra message - “das Buch ist sehr umfangreich”:

>>> length = ("very long", "neutral")
>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_pages(length, title))
['das Buch ist sehr umfangreich', 'das Buch sehr umfangreich ist', 'ist das Buch sehr umfangreich']

realize within an extra message - “es ist etwas kurz”:

>>> length = ("very short", "neutral")
>>> title = lexicalize_title(("foo", ""), realize="pronoun")
>>> openccg.realize(lexicalize_pages(length, title))
['es etwas kurz ist', 'es ist etwas kurz', 'ist es etwas kurz']
pypolibox.lexicalization.lexicalize_proglang(proglang, lexicalized_title=None, lexicalized_authors=None, realize='embedded')[source]
Parameters:proglang (tuple of (frozenset, str)) – a tuple consisting of a set of programming languages

(as strings) and a rating (string)

Parameters:realize (str) – “embedded” or “complete”.

if “embedded”, the function will just generate a noun phrase, e.g. “die Programmiersprache Perl”. if “complete”, it will generate a sentence, e.g. “das Buch verwendet die Programmiersprache(n) X (und Y)” or “der Autor/ die Autoren verwenden die Programmiersprache(n) X (und Y)”.

realize “keine Programmiersprache”: >>> openccg.realize(lexicalize_proglang((frozenset([]), “”), realize=”embedded”)) [‘keine Programmiersprache’, ‘keiner Programmiersprache’]

realize “die Programmiersprachen A, B und C”:

>>> openccg.realize(lexicalize_proglang((frozenset(["Python" ,"Lisp", "C++"]), ""), realize="embedded"))
['den Programmiersprachen Python , Lisp und C++', 'der Programmiersprachen Python , Lisp und C++', 'die Programmiersprachen Python , Lisp und C++']

realize two authors who use several programming languages:

>>> authors = lexicalize_authors((["Horst Lohnstein", "Ralf Klabunde"], ""), realize="lastnames")
>>> openccg.realize(lexicalize_proglang((frozenset(["Python" ,"Lisp", "C++"]), ""), lexicalized_authors=authors, realize="complete"))
['Lohnstein und Klabunde die Programmiersprachen Python , Lisp und C++ verwenden', 'Lohnstein und Klabunde verwenden die Programmiersprachen Python , Lisp und C++', 'verwenden Lohnstein und Klabunde die Programmiersprachen Python , Lisp und C++']

realize a book title with several programming languages:

>>> title = lexicalize_title(("Natural Language Processing", ""), realize="complete")
>>> openccg.realize(lexicalize_proglang((frozenset(["Python" ,"Lisp", "C++"]), ""), lexicalized_title=title, realize="complete"))
['verwendet \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c die Programmiersprachen Python , Lisp und C++', '\xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c die Programmiersprachen Python , Lisp und C++ verwendet', '\xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c verwendet die Programmiersprachen Python , Lisp und C++']
pypolibox.lexicalization.lexicalize_recency(recency, lexicalized_title, lexicalized_lastbooktitle=None)[source]

realize “es ist 7 Jahre neuer als $lastbook”:

>>> recency_lastbook_nomatch = FeatDict(direction='+', rating='neutral', type='RelativeVariation', magnitude=FeatDict(number=7, unit="years"))
>>> title = lexicalize_title(("foo", ""), realize="pronoun")
>>> lastbooktitle = lexicalize_title(("Angewandte Computerlinguistik", ""), realize="complete")
>>> openccg.realize(lexicalize_recency(recency_lastbook_nomatch, title, lastbooktitle))
['es 7 Jahre neuer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c ist', 'es ist 7 Jahre neuer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c', 'ist es 7 Jahre neuer als \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c']

realize “das Buch ist 23 Jahre älter als $lastbook”:

>>> recency_lastbook_nomatch = FeatDict(direction='-', rating='neutral', type='RelativeVariation', magnitude=FeatDict(number=23, unit="years"))
>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> lastbooktitle = lexicalize_title(("Natural Language Processing", ""), realize="complete")
>>> openccg.realize(lexicalize_recency(recency_lastbook_nomatch, title, lastbooktitle))
['das Buch 23 Jahre \xc3\xa4lter als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c ist', 'das Buch ist 23 Jahre \xc3\xa4lter als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c', 'ist das Buch 23 Jahre \xc3\xa4lter als \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c']

realize “das Buch ist sehr alt”:

>>> recency_extra = FeatDict(description="old", rating="negative")
>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_recency(recency_extra, title))
['das Buch ist sehr alt', 'das Buch sehr alt ist', 'ist das Buch sehr alt']

realize “das Buch ist besonders neu”:

>>> recency_extra = FeatDict(description="recent", rating="positive")
>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_recency(recency_extra, title))
['das Buch besonders neu ist', 'das Buch ist besonders neu', 'ist das Buch besonders neu']
pypolibox.lexicalization.lexicalize_target(target, lexicalized_title)[source]
das Buch richtet sich an Anfänger
an Einsteiger mit Grundkenntnissen an Fortgeschrittene an Experten

NOTE: we could add these to the grammar: - das Buch setzt keine Kenntnisse voraus - das Buch richtet sich an ein fortgeschrittenes Publikum

Parameters:target (tuple of (int, str)) – a tuple, e.g. (0, “neutral”), states that the book is

targeted towards beginners.

realize ”... richtet sich an Anfänger”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_target((0, ""), title))
['das Buch richtet sich an Anf\xc3\xa4nger', 'richtet sich das Buch an Anf\xc3\xa4nger', 'sich das Buch an Anf\xc3\xa4nger richtet']

realize ”... richtet sich an Einsteiger mit Grundkenntnissen”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_target((1, ""), title))
['das Buch richtet sich an Einsteiger mit Grundkenntnissen', 'richtet sich das Buch an Einsteiger mit Grundkenntnissen', 'sich das Buch an Einsteiger mit Grundkenntnissen richtet']

realize ”... richtet sich an Fortgeschrittene”:

>>> title = lexicalize_title(("foo", ""), realize="abstract")
>>> openccg.realize(lexicalize_target((2, ""), title))
['das Buch richtet sich an Fortgeschrittene', 'richtet sich das Buch an Fortgeschrittene', 'sich das Buch an Fortgeschrittene richtet']

realize ”... richtet sich an Experten”: >>> target = lexicalize_target((3, “”), title) >>> openccg.realize(target) [‘das Buch richtet sich an Experten’, ‘richtet sich das Buch an Experten’, ‘sich das Buch an Experten richtet’]

pypolibox.lexicalization.lexicalize_title(title_tuple, lexicalized_authors=None, realize='complete', authors_realize=None)[source]
Parameters:
  • title (tuple of (str, str)) – tuple containing a book title and a rating (neutral)
  • authors – an optional Diamond containing a lexicalized

authors message

Parameters:realize (str) – “abstract”, “complete”, “pronoun” or “authors+title”
  • “abstract” realizes ‘das Buch’
  • “pronoun” realizes ‘es’
  • “complete” realizes book titles in the format specified in the OpenCC grammar, e.g. „ Computational Linguistics. An Introduction “
Parameters:authors_realize (str or NoneType) – None, “possessive”, “preposition”, “random”.
  • “possessive” realizes ‘Xs Buch’
  • “preposition” realizes ‘das Buch von X (und Y)’
  • “random” chooses between “possessive” and “preposition”
  • None just realizes the book title, e.g. “das Buch” or “NLP in Lisp”

realize one book title abstractly (“das Buch”):

>>> openccg.realize(lexicalize_title( ("book", "neutral"), realize="abstract"))
['das Buch', 'dem Buch', 'des Buches']
>>> openccg.realize(lexicalize_title(("book title", ""), realize="pronoun"))
['es', 'ihm', 'seiner']

realize “das Buch von X und Y”:

>>> authors = lexicalize_authors((["Alan Kay", "John Hopcroft"], ""), realize="lastnames")
>>> openccg.realize(lexicalize_title(("title", "neutral"), lexicalized_authors=authors, realize="abstract", authors_realize="preposition"))
['das Buch von Kay und Hopcroft', 'dem Buch von Kay und Hopcroft', 'des Buches von Kay und Hopcroft']

realize “Xs Buch”:

>>> author = lexicalize_authors((["Alan Kay"], ""), realize="complete")
>>> openccg.realize(lexicalize_title(("Natural Language Processing", "neutral"), lexicalized_authors=author, realize="complete", authors_realize="possessive"))
['Alan Kays \xe2\x80\x9e Natural_Language_Processing \xe2\x80\x9c']

we can’t realize a book title as a pronoun with an author, e.g. “Chomskys es” or “es von Noam Chomsky”:

>>> authors = lexicalize_authors((["Kay", "Manning"], ""), realize="lastnames")
>>> lexicalize_title(("a book","") , lexicalized_authors=authors, realize="pronoun")
Traceback (most recent call last):
AssertionError: can't realize title as pronoun with an author, e.g. 'Chomskys es'

we can’t realize “A und Bs Buch” properly, due to current restrictions in the current grammar. instead, the ‘prepositional’ realization will be chosen:

>>> authors = lexicalize_authors((["Kay", "Manning"], ""), realize="lastnames")
>>> openccg.realize(lexicalize_title(("random", ""), lexicalized_authors=authors, realize="abstract", authors_realize="possessive"))
['das Buch von Kay und Manning', 'dem Buch von Kay und Manning', 'des Buches von Kay und Manning']
pypolibox.lexicalization.lexicalize_title_description(title_tuple, authors_tuple, year_tuple=None)[source]
realizes a title description as an independent sentence, e.g.:
  • „ Angewandte Computerlinguistik ist ein Buch von Ludwig Hitzenberger“
  • „ Angewandte Computerlinguistik “ von Ludwig Hitzenberger ist im Jahr 1995 erschienen
>>> title = ("Angewandte Computerlinguistik", "")
>>> authors = (set(["Ludwig Hitzenberger"]),"")
>>> openccg.realize(lexicalize_title_description(title, authors))
['ist \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c ein Buch von Ludwig Hitzenberger', '\xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c ein Buch von Ludwig Hitzenberger ist', '\xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c ist ein Buch von Ludwig Hitzenberger']
>>> year = ("1995", "")
>>> openccg.realize(lexicalize_title_description(title, authors, year))
['ist \xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c von Ludwig Hitzenberger im Jahr 1995 erschienen', '\xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c von Ludwig Hitzenberger im Jahr 1995 erschienen ist', '\xe2\x80\x9e Angewandte_Computerlinguistik \xe2\x80\x9c von Ludwig Hitzenberger ist im Jahr 1995 erschienen']
pypolibox.lexicalization.lexicalize_year(year, lexicalized_title)[source]

___ ist/sind 1986 erschienen.

type year:int or str
type lexicalized_title:
 Diamond
param lexicalized_title:
 Diamond containing a title description
rtype:Diamond

realize a book’s year of publishing:

>>> title = lexicalize_title(("a book", ""), realize="abstract")
>>> openccg.realize(lexicalize_year(1986, lexicalized_title=title))
['das Buch 1986 erschienen ist', 'das Buch ist 1986 erschienen', 'ist das Buch 1986 erschienen']

#~ a more complex example: two unnamed books by the same author were #~ published in $year:

#~
#~ >>> author = lexicalize_authors(([“Alan Kay”], “”), realize=”complete”) #~ >>> title = lexicalize_title((“a book”, “book 2”, “”), lexicalized_authors=author, realize=”abstract”, authors_realize=”preposition”) #~ >>> openccg.realize(lexicalize_year(1986, lexicalized_title=title)) #~ [‘die Bxc3xbccher von Alan Kay 1986 erschienen sind’, ‘die Bxc3xbccher von Alan Kay sind 1986 erschienen’, ‘sind die Bxc3xbccher von Alan Kay 1986 erschienen’]
pypolibox.lexicalization.phrase2sentence(diamond)[source]

turns the lexicalization of a phrase (e.g. “das Buch ist neu”) into the lexicalization of a sentence, e.g. “das Buch ist neu .” (the initial letter of a sentence will not be written in uppercase, as the grammar cannot cope with implicit upper/lowercase distinctions).

lexicalize_messageblocks Module

The lexicalize_messageblocks module realizes message blocks which consist of one or more messages.

pypolibox.lexicalize_messageblocks.lexicalize_authors_variations(authors)[source]

lexicalize all the possible variations of author descriptions and put them in a dictionary, so other functions can easily choose one of them.

Parameters:author_tuple – tuple containing a set of names, e.g. ([“Ronald

Hausser”, “Christopher D. Manning”]) and a rating, i.e. “neutral”

Return type:dict of str, Diamond key-value pairs
Returns:“abstract” lexicalizes “der Autor” or “die Autoren”, “complete”

realizes a list of author names (incl. surname), and “lastnames” realizes a list of author last names.

>>> authors_variations = lexicalize_authors_variations((frozenset(["Christopher Manning", "Alan Kay"]),""))
>>> openccg.realize(authors_variations["complete"])
['Christopher Manning und Alan Kay', 'Christopher Mannings und Alan Kays']
>>> openccg.realize(authors_variations["lastnames"])
['Manning und Kay', 'Mannings und Kays']
>>> openccg.realize(authors_variations["abstract"])
['den Autoren', 'der Autoren', 'die Autoren']
pypolibox.lexicalize_messageblocks.lexicalize_extra(extra_message_block)[source]

lexicalize all the messages contained in an extra message block (aka Message)

Type:Message
Param:a message (of type “extra”)
Return type:List of ``Diamond``s
Returns:a list of lexicalized phrases, which can be realized with

tccg directly or turned into sentences beforehand with lexicalization.phrase2sentence to remove ambiguity

NOTE: “außerdem” works only in a limited number of contexts, e.g. ‘das Buch ist neu, außerdem ist es auf Deutsch’ but not ‘das Buch ist neu, außerdem ist das Buch auf Deutsch’. therefore, no connective is used here so far.

pypolibox.lexicalize_messageblocks.lexicalize_id(id_message_block)[source]

lexicalize all the messages contained in an id message block (aka Message)

Type:Message
Param:a message (of type “id”)
Return type:List of ``Diamond``s
Returns:a list of lexicalized phrases, which can be realized with

tccg directly or turned into sentences beforehand with lexicalization.phrase2sentence to remove ambiguity

pypolibox.lexicalize_messageblocks.lexicalize_lastbook_match(lastbook_match_message_block)[source]

lexicalize all the messages contained in a lastbook_match message block (aka Message)

Type:Message
Param:a message (of type “lastbook_match”)
Return type:List of ``Diamond``s
Returns:a list of lexicalized phrases, which can be realized with

tccg directly or turned into sentences beforehand with lexicalization.phrase2sentence to remove ambiguity

possible: sowohl X als auch Y / beide Bücher implemented: beide Bücher

TODO: implement lexicalize_pagerange

pypolibox.lexicalize_messageblocks.lexicalize_lastbook_nomatch(lastbook_nomatch_message_block)[source]

Im Gegensatz zum ersten / vorhergehenden / anderen Buch ____

pypolibox.lexicalize_messageblocks.lexicalize_message_block(messageblock)[source]
pypolibox.lexicalize_messageblocks.lexicalize_title_variations(title, authors)[source]

generates several book title lexicalizations and stores them in a dict.

Parameters:title (tuple of (str, str)) – tuple containing a book title and a rating (neutral)
Return type:dict of str, Diamond key-value pairs

generate several different lexicalizations of book title / author combinations: >>> title_variations = lexicalize_title_variations((“Angewandte Computerlinguistik”, “”), (set([“David Cole”]), “”))

realize “das Buch”: >>> openccg.realize(title_variations[“abstract”]) [‘das Buch’, ‘dem Buch’, ‘des Buches’]

realize “es” (in the meaning of “it” / “the book”): >>> openccg.realize(title_variations[“pronoun”]) [‘es’, ‘ihm’, ‘seiner’]

realize “$booktitle”: >>> openccg.realize(title_variations[“complete”]) [‘xe2x80x9e Angewandte_Computerlinguistik xe2x80x9c’]

realize “$fullname’s $booktitle”: >>> openccg.realize(title_variations[“title+complete-names-possessive”]) [‘David Coles xe2x80x9e Angewandte_Computerlinguistik xe2x80x9c’]

realize “$lastname’s $booktitle”: >>> openccg.realize(title_variations[“title+lastnames-possessive”]) [‘Coles xe2x80x9e Angewandte_Computerlinguistik xe2x80x9c’]

realize “$booktitle von $fullname”: >>> openccg.realize(title_variations[“title+complete-names-preposition”]) [‘xe2x80x9e Angewandte_Computerlinguistik xe2x80x9c von David Cole’]

realize “$booktitle von $lastname”: >>> openccg.realize(title_variations[“title+lastnames-preposition”]) [‘xe2x80x9e Angewandte_Computerlinguistik xe2x80x9c von Cole’]

realize “das Buch von $fullname”: >>> openccg.realize(title_variations[“abstract-title+complete-names-preposition”]) [‘das Buch von David Cole’, ‘dem Buch von David Cole’, ‘des Buches von David Cole’]

realize “das Buch von $lastname”: >>> openccg.realize(title_variations[“abstract-title+lastnames-preposition”]) [‘das Buch von Cole’, ‘dem Buch von Cole’, ‘des Buches von Cole’]

pypolibox.lexicalize_messageblocks.lexicalize_usermodel_match(usermodel_match_message_block)[source]

erfüllt Anforderungen / entspricht ihren Wünschen

pypolibox.lexicalize_messageblocks.lexicalize_usermodel_nomatch(usermodel_nomatch_message_block)[source]

erfüllt (leider) Anforderungen nicht / entspricht nicht ihren Wünschen

pypolibox.lexicalize_messageblocks.random_variation(lexicalization_dictionary)[source]
Parameters:lexicalization_dictionary (Dict) – a dictonary, where each key holds the

name of a message and the value holds the corresponding Message :rtype: a randomly chosen value from the given dictionary

messages Module

The messages module contains the Message class and related classes.

Message``s contain propositions about books. The text planner applies ``Rule``s to these ``Message``s to form ``ConstituentSet``s. ``Rule``s will also be applied to ``ConstituentSet``s, ultimately forming one ``TextPlan that contains all the information to be realized.

class pypolibox.messages.AllMessages(allpropositions)[source]

represents all Messages generated from AllPropositions about all Books() that were returned by a query

class pypolibox.messages.Message(msgType=None)[source]

Bases: nltk.featstruct.FeatDict

A Message combines and stores knowledge about an object (here: books) in a logical structure. Messages are constructed during content selection (taking the user’s requirements, querying a database and processing its results), which precedes text planning.

Each Message has a msgType which describes the kind of information it includes. For example, the msgType ‘id’ specifies information that is needed to distinguish a book from other books:

[ *msgType*    = 'id'                                ]
[ authors      = frozenset(['Roland Hausser'])       ]
[ codeexamples = 0                                   ]
[ language     = 'German'                            ]
[ pages        = 572                                 ]
[ proglang     = frozenset([])                       ]
[ target       = 0                                   ]
[ title        = 'Grundlagen der Computerlinguistik' ]
[ year         = 2000                                ]
class pypolibox.messages.Messages(propositions)[source]

represents all Message instances generated from Propositions about a Book.

add_identification_to_message(message)[source]

Adds special ‘reference_title’ and ‘reference_authors’ attributes to messages other than the id_message.

In contrast to the id_message, other messages will not be used to produce sentences that contain their content (i.e. no statement of the ‘author X wrote book Y in 1979’ generated from an ‘extra_message’ or a ‘lastbook_nomatch’ message). Nevertheless, they will need to make reference to the title and the authors of the book (e.g. ‘Y is a rather short book’). As an example, look at this ‘usermodel_match’ message:

[ *msgType*           = 'usermodel_match'                     ]
[ *reference_authors* = frozenset(['Ulrich Schmitz'])         ]
[ *reference_title*   = 'Computerlinguistik. Eine Einführung' ]
[ language            = 'German'                              ]
[ proglang            = frozenset(['Lisp'])                   ]

The message contains two bits of information (the language and programming language used), which both have regular strings as keys. The ‘referential’ keys on the other hand are nltk.Feature instances and not strings. This distinction should be regarded as a syntactic trick used to emphasize a semantic differce (READ: if you have a better solution, please change it).

generate_extra_message(proposition_dict)[source]

generates a Message from an ‘extra’ Proposition. Extra propositions only exist if a book is remarkably new / old or very short / long.

generate_lastbook_nomatch_message(proposition_dict)[source]

generates a Message from a ‘lastbook_nomatch’ Proposition. A lastbook_nomatch propositions states which differences exist between two books.

generate_message(proposition_type)[source]

generates a Message from a ‘simple’ Proposition. Simple propositions are those kinds of propostions that only give information about one item (i.e. describe one book) but don’t compare two items (e.g. book A is 12 years older than book B).

propositions Module

The propositions module evaluates the facts generated by the pypolibox.facts module and stores its results as nested dictionaries.

class pypolibox.propositions.AllPropositions(allfacts)[source]

contains propositions about ALL the books that were listed in a query result

class pypolibox.propositions.Propositions(facts)[source]

represents propositions (positive/negative/neutral ratings) of a single book. Propositions() are generated from Facts() about a Book().

pypolibox Module

The pypolibox module is the ‘main’ module of the pypolibox package. It’s the module you’d usually call from the command line or load into your Python interpreter. It just imports all the important modules and runs some demo code in case it is run from the command line without any arguments.

pypolibox.pypolibox.check_and_realize_textplan(openccg, textplan)[source]

realizes a text plan and warns about message blocks that cannot be realized due to current restrictions in the OpenCC grammar.

Parameters:
  • openccg (OpenCCG) – a running OpenCCG instance
  • textplan (TextPlan) – text plan to be realized
pypolibox.pypolibox.generate_textplans(query)[source]

generates all text plans for a database query

pypolibox.pypolibox.initialize_openccg()[source]

starts OpenCCG’s tccg realizer as a server in the background (ca. 20s).

pypolibox.pypolibox.main()[source]

This is the pypolibox commandline interface. It allows you to query the database and generate book recommendatins, which will either be handed to OpenCCG for generating sentences or printed to stdout in an XML format representing the text plans.

pypolibox.pypolibox.test()[source]

test and realize all text plans for all test queries

realization Module

The realization module shall take HLDS XML structures, realize them with the OpenCCG surface realizer and parse its output string.

class pypolibox.realization.OpenCCG(grammar_dir='/home/docs/checkouts/readthedocs.org/user_builds/pypolibox/envs/latest/local/lib/python2.7/site-packages/pypolibox-1.0.2-py2.7.egg/pypolibox/grammar')[source]

Bases: object

command-line interaction with OpenCCG’s tccg parser/generator, which can either be run as a JSON-RPC server or simply imported as a Python module.

parse(text, verbose=True, raw_output=True)[source]

This is the core interaction with the parser.

It returns a Python data-structure, while the parse() function returns a JSON object

Returns:if raw_output=True, the raw response string from the server

will be returned. otherwise, a list of dictionaries will be returned (one for each input sentence). :rtype: str OR list of ``dict``s

realize(featstruct, raw_output=True)[source]

converts a Diamond or Sentence feature structure into HLDS-XML, write it to a temporary file, realizes this file with tccg and parses the output it returns.

realize_hlds(hlds_xml_filename)[source]
terminate()[source]
pypolibox.realization.parse_tccg_generator_output(tccg_output)[source]

parses the output string returned from tccg’s interactive generator shell.

rules Module

The rules module contains rules, which are used by the text planner to combine messages into constituent sets and ultimately form one TextPlan.

class pypolibox.rules.ConstituentSet(relType=None, nucleus=None, satellite=None)[source]

Bases: nltk.featstruct.FeatDict

ConstituentSet is the contstuction built up by applying Rules to a set of ConstituentSet``s and ``Message``s. Each ``ConstituentSet is of a specific relType, and has two constituents, one which is designated the nucleus and one which is designated aux. These ``ConstituentSet``s can then be combined with other ``ConstituentSet``s or ``Message``s.

ConstituentSet is based on nltk.featstruct.FeatDict.

class pypolibox.rules.Rule(name, ruleType, nucleus, satellite, conditions, heuristic)[source]

Bases: object

Rules are the elements which specify relationships which hold between elements of the document. These elements can be ``Message``s or ``ConstituentSet``s.

Each Rule specifies a list of inputs, which are is a minimal specification of a Message or ConstituentSet. To be a valid input to this Rule, a given Message or ConstituentSet must subsume one of the specified ``input``s.

Each Rule can also specify a set of conditions which must be met in order for the Rule to hold between the inputs.

Each Rule specifies a heuristic, which will be evaluated to provide a score by which to rank the order in which rules should be applied.

Each Rule specifies which of the inputs will be the nucleus and which will be the aux of the output ConstituentSet.

find_message_candidates(messages, message_prototype)[source]

takes a list of messages and returns only those with the right message type (as specified in Rule.inputs)

Parameters:messages (list of ``Message``s) – a list of Message objects, each containing one

message about a book

Parameters:message_prototype – a tuple consisting of a message name and a

Message or ConstituentSet :type message_prototype: tuple of (string, Message or ConstituentSet)

Return type:list of tuple``s of (string, ``Message)
Returns:a list containing all (name, message) tuples which are

subsumed by the input message type (self.nucleus or self.satellite). If a rule should only be applied to UserModelMatch and UserModelNoMatch messages, the return value contains a list of messages with these types.

get_conditions(group)[source]

applies __name_eval to all conditions a Rule has, i.e. checks if a group meets all conditions

ConstituentSet) :param group: a list of message tuples of the form (message name, message)

Return type:list of bool
Returns:a list of truth values, each of which tells if a group met

all conditions specified in self.conditions

get_options(messages)[source]

this is the main method used for document planning

From the list of Messages, get_options selects all possible ways the Rule could be applied.

The planner can then select with the textplan.__bottom_up_search function one of these possible applications of the Rule to use.

non_empty_message_combinations is a list of combinations, where each combination is a (nucleus, satellite)-tuple. both the nucleus and the satellite each consist of a (name, message) tuple.

The method returns an empty list if get_options can’t find a way to apply the Rule.

Parameters:messages (list of Message objects) – a list of Message objects, each containing one

message about a book

Return type:empty list or a list containing one tuple of (int,

ConstituentSet, list), where list consists of Message or ConstituentSet objects :return: a list containing one 3-tuple (score, ConstituentSet, inputs) where:

  • score is the evaluated heuristic score for this application of

the Rule - ConstituentSet is the new ConstituentSet instance returned by the application of the Rule - inputs is the list of inputs (Message``s or ``ConstituentSets used in this application of the rule

get_satisfactory_groups(groups)[source]

Message or ConstituentSet) :param groups: a list of group elements. each group contains a list which contains one or more message tuples of the form (message name, message)

Return type:list of list‘s of tuple‘s of (str, Message

or ConstituentSet) :return: a list of group elements. contains only those groups which meet all the conditions specified in self.conditions

class pypolibox.rules.Rules[source]

creates Rule() instances

Each rule of the form Rule(ruleType, inputs, conditions, nucleus, aux, heuristic) is generated by its own method. Important note: these methods have to adhere to a naming convention, i.e. begin with ‘genrule_‘; otherwise, self.__init__ will fail!

genrule_book_differences()[source]

Contrast({id, id_extra_sequence}, lastbook_nomatch)

Meaning: id/id_extra_sequence. In contrast to book X, this book is in German, targets advanced users and ... Condition: There are differences between the two books

genrule_book_similarities()[source]

Elaboration(id_usermodelmatch, lastbook_match)

Meaning: ‘id_usermodelmatch’ mentions that the books matches ALL requirements. In addition, the book shares many features with its predecessor. Condition: There are both differences and commonalities (>=50%) between the two books.

genrule_compare_eval()[source]

Sequence(concession_books, {pos_eval, neg_eval, usermodel_match, usermodel_nomatch})

Meaning: ‘concession_books’ describes common and diverging features of the books. ‘pos_eval/neg_eval/usermodel_match/usermodel_nomatch’ explains how many user requirements they meet

genrule_concession_book_differences_usermodelmatch()[source]

Concession(book_differences, usermodel_match)

Meaning: ‘book_differences’ explains the differences between both books. Nevertheless, this book meets ALL your requirements ... Condition: All user requirements are met.

genrule_concession_books()[source]

Concession(book_differences, lastbook_match)

Meaning: After ‘book_differences’ explains the differences between both books, their common features are explained.

genrule_contrast_books_posneg_eval()[source]

Sequence(book_differences, {pos_eval, neg_eval})

Meaning: book_differences mentions the differences between the books, pos_eval/neg_eval explains how many user requirements they meet Conditions: matches some of the requirements

genrule_id_extra_sequence()[source]

Sequence(id_complete, extra), if ‘extra’ exists:

adds an additional “sentence” about extra facts after the id messages

genrule_id_usermodelmatch()[source]

Elaboration({id, id_extra_sequence}, usermodel_match), if there’s no usermodel_nomatch

Meaning: This book fulfills ALL your requirments. It was written in ..., contains these features ... and ... etc

genrule_neg_eval()[source]

Concession(usermodel_nomatch, usermodel_match)

Meaning: Although this book fulfills some of your requirements, it doesn’t match most of them. Therefore, this book might not be the best choice.

genrule_no_similarities_concession()[source]

Concession({id, id_extra_sequence}, lastbook_nomatch)

Meaning: Book X has these features BUT share none of them with its predecessor. Condition: There is a predecessor to this book, but they don’t share ANY features.

genrule_pos_eval()[source]

Concession(usermodel_match, usermodel_nomatch)

Meaning: Book matches many (>= 50%) of the requirements, but not all of them

genrule_single_book_complete()[source]

Sequence({id, id_extra_sequence}, {pos_eval, neg_eval})

Meaning: The nucleus mentions all the (remaining) facts (that aren’t mentioned in the evaluation), while the satellite evaluates the book (in terms of usermodel matches)

genrule_single_book_complete_usermodelmatch()[source]

Sequence({id, id_extra_sequence}, usermodel_match)

Meaning: The satellite states that the book matches ALL the user’s requirements. The nucleus mentions the remaining facts about the book. Condition: there’s no preceding book and there are only usermodel matches.

genrule_single_book_complete_usermodelnomatch()[source]

Sequence({id, id_extra_sequence}, usermodel_nomatch)

Meaning: The satellite states that the book matches NONE of the user’s requirements. The nucleus mentions the remaining facts about the book. Condition: there’s no preceding book and there are no usermodel matches.

textplan Module

The textplan module is based on Nicholas FitzGerald’s py_docplanner``[1], in particular on his idea to represent RST trees as attribute value matrices by using the ``nltk.featstruct data structure.

textplan converts Proposition instances into Message``s (using attribute value notation). Via a set of ``Rule``s, these messages are combined into ``ConstituentSet``s. Rules are applied bottom-up, via a recursive best-first search (cf. ``__bottom_up_search).

Not only messages, but also constituent sets can be combined via rules. If all messages present can be combined into one large ConstituentSet, this constituent set is called a TextPlan. A TextPlan represents a complete text plan in form of an attribute value matrix.

[1] Fitzgerald, Nicholas (2009). Open-Source Implementation of Document Structuring Algorithm for NLTK.

class pypolibox.textplan.TextPlan(book_score=None, dtype='TextPlan', text=None, children=None)[source]

Bases: nltk.featstruct.FeatDict

TextPlan is the output of Document Planning. A TextPlan consists of an optional title and text, and a child ConstituentSet.

TODO: append __str__ method: should describe verbally if a TP is
describing one book or comparing two books
class pypolibox.textplan.TextPlans(allmessages, debug=False)[source]

Bases: object

generates all TextPlan``s for an ``AllMessages instance, i.e. one DocumentPlan for each book that is returned as a result of the user’s database query

pypolibox.textplan.generate_textplan(messages, rules=[<pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>, <pypolibox.rules.Rule object>], book_score=None, dtype='TextPlan', text='')[source]

The main method implementing the Bottom-Up document structuring algorithm from “Building Natural Language Generation Systems” figure 4.17, p. 108.

The method takes a list of Message``s and a set of ``Rule``s and creates a document plan by repeatedly applying the highest-scoring Rule-application (according to the Rule's heuristic score) until a full tree is created. This is returned as a ``TextPlan with the tree set as children.

If no plan is reached using bottom-up, None is returned.

Parameters:messages – a list of ``Message``s which have been selected during

content selection for inclusion in the TextPlan :type messages: list of Message``s :param rules: a list of ``Rule``s specifying relationships which can hold between the messages :type rules: list of ``Rule``s :param dtype: an optional type for the document :type dtype: string :param text: an optional text string describing the document :type text: string :return: a document plan. if no plan could be created: return None :rtype: ``TextPlan or NoneType

pypolibox.textplan.linearize_textplan(textplan)[source]

takes a text plan (an RST tree represented as a NLTK.featstruct data structure) and returns an ordered list of ``Message``s for surface generation.

Return type:list of ``Message``s
pypolibox.textplan.test_textplan2xml_conversion()[source]

test text plan to XML conversion with all the text plans that were generated for all test queries with debug.gen_all_textplans().

pypolibox.textplan.textplan2xml(textplan)[source]

converts one TextPlan into an XML structure representing it.

Return type:etree._ElementTree
pypolibox.textplan.textplans2xml(textplans)[source]

converts several ``TextPlan``s into an XML structure representing these text plans.

Return type:etree._ElementTree

util Module

The util module contains a number of ‘bread and butter’ functions that are needed to run pypolibox, but are not particularly interesting (e.g. format converters, existence checks etc.).

There shouldn’t be any code in this module that require loading other modules from pypolibox!

pypolibox.util.ensure_unicode(string_or_int)[source]

ensures that a string does use unicode instead of UTF8. converts integer input to a unicode string.

pypolibox.util.ensure_utf8(string_or_int)[source]

ensures that a string does not use unicode but UTF8. converts integer input to a string.

pypolibox.util.exists(thing, namespace)[source]

checks if a variable/object/instance exists in the given namespace

Return type:bool
pypolibox.util.flatten(nested_list)[source]

flattens a list, where each list element is itself a list

Parameters:nested_list (list) – the nested list
Returns:flattened list
pypolibox.util.freeze_all_messages(message_list)[source]

makes all messages (``FeatDict``s) immutable, which is necessary for turning them into sets

pypolibox.util.msgs_instance_to_list_of_msgs(messages_instance)[source]

converts a Messages instance into a list of Message instances

pypolibox.util.sql_array_to_list(sql_array)[source]

converts SQL string “arrays” into a list of strings

Our book database uses ‘[‘ and ‘]’ to handle attributes w/ more than one value: e.g. authors = ‘[Noam Chomsky][Alan Touring]’. This function turns those multi-value strings into a set with separate values.

Parameters:sql_array (str) – a string from the database that represents one or

more items delimited by ‘[‘ and ‘]’, e.g. “[Noam Chomsky]” or “[Noam Chomsky][Alan Touring]”

Return type:list of str
Returns:a list of strings, where each string represents one item from

the database, e.g. [“Noam Chomsky”, “Alan Touring”]

pypolibox.util.sql_array_to_set(sql_array)[source]

converts SQL string “arrays” into a set of strings

our book database uses ‘[‘ and ‘]’ to handle attributes w/ more than one value: e.g. authors = ‘[Noam Chomsky][Alan Touring]’

this function turns those multi-value strings into a set with separate values

Parameters:sql_array (str) – a string from the database that represents one or

more items delimited by ‘[‘ and ‘]’, e.g. “[Noam Chomsky]” or “[Noam Chomsky][Alan Touring]”

Return type:set of str
Returns:a set of strings, where each string represents one item from

the database, e.g. [“Noam Chomsky”, “Alan Touring”]

pypolibox.util.write_to_file(str_or_obj, file_path)[source]

takes a string and writes it to a file or takes any other object, pickles it and writes it to a file