1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 | // Code derived from Stroustrup's PPP2 book
// § 6.4.1 A detour: English grammar
// -and beginning on p 193
//------------------------------------------------------------------------------
/*
via pp 193, 194 Parsing the sentence: birbs fly but fish swim
(read from bottom up):
Sentence
^
A simple sentence grammar |
-------------+--------------
| | |
Sentence: Sentence | Sentence
Noun Verb ^ | ^
Sentence Conjunction Sentence | | |
| | |
Conjunction: | Conjunction |
"and" | ^ |
"or" | | |
"but" -----+----- | -----+-----
| | | | |
Noun: Noun | | Noun |
"birbs" ^ | | ^ |
"fish" | | | | |
"C++" | | | | |
| | | | |
Verb: | Verb | | Verb
"rules" | ^ | | ^
"fly" | | | | |
"swim" | | | | |
birbs fly but fish swim
*/
//------------------------------------------------------------------------------
#include <iostream>
#include <map>
#include <set>
#include <string>
using std::cerr;
using std::cin;
using std::cout;
using std::string;
//------------------------------------------------------------------------------
// Note: we're introducing a map<> for words + their types
// word strings classified by types (n, v, c, p)
const std::map<std::string, std::set<char>> all_words = {
// noun
{"birbs", {'n'}},
{"fish", {'n'}},
{"C++", {'n'}},
// verb
{"rules", {'v'}},
{"fly", {'v'}},
{"swim", {'v'}},
// conjunction
{"and", {'c'}},
{"or", {'c'}},
{"but", {'c'}},
// punctuation
{".", {'p'}},
{"?", {'p'}},
{"!", {'p'}},
{";", {'p'}}};
// returns the type code of word (n, v, c, p)
char type_of(std::string word)
{
if (auto it = all_words.find(word); it != all_words.end()) {
return *it->second.begin(); // the first type code for this word
} else {
std::cerr << word << " word not found\n";
return 'K'; // invalid
}
}
//------------------------------------------------------------------------------
// a simple user-defined type
class Token {
public:
Token(char ch) : kind{ch} {}
Token(char ch, std::string word) : kind{ch}, value{word} {}
char kind = '0'; // 'n', 'v', 'c', 'p' codes
string value = ""; // the word itself
};
//------------------------------------------------------------------------------
class Token_stream {
public:
Token_stream(); // make a Token_stream that reads from cin
Token get(); // get a Token
void putback(Token t); // put a Token back
private:
bool full; // is there a Token in the buffer?
Token buffer; // here is where we keep a Token put back using putback()
};
// The constructor just sets full to indicate that the buffer is empty:
Token_stream::Token_stream() : full{false}, buffer{0} // no Token in buffer
{
}
Token Token_stream::get()
{
if (full) { // do we already have a Token ready?
// remove token from buffer:
full = false;
return buffer;
}
string word;
cin >> word;
char ch = type_of(word);
// clang-format off
switch (ch) {
case 'n': case 'v': case 'c': case 'p': { // words & punctuations
return Token{ch, word};
}
default:
return Token{'K', "K"}; // invalid
}
// clang-format on
}
// The putback() member function puts its argument back into the Token_stream's
// buffer:
void Token_stream::putback(Token t)
{
buffer = t; // copy t to buffer
full = true; // buffer is now full
}
//------------------------------------------------------------------------------
std::string sentence();
std::string conjunct();
std::string noun();
std::string verb();
Token_stream ts; // provides get() and putback()
//------------------------------------------------------------------------------
// Note: this is an example to demonstrate simplistic parsing the expression of
// these six tokens:
//
// birbs fly but fish swim ;
//
// -the idea for now is just to use the functions to print out the type of the
// words according to the grammar given above
// -but this basic system could be extended into a more powerful framework later
int main()
{
cout << "enter: birbs fly but fish swim ; (leave a space before ;) \n";
cout << sentence() << '\n';
}
std::string sentence()
{
string left = "";
Token t = ts.get(); // get the first Token from the Token stream
while (true) {
switch (t.kind) { // see which kind of token it is
case 'c':
ts.putback(t);
left += conjunct();
break;
case 'n':
ts.putback(t);
left += noun();
break;
case 'v':
ts.putback(t);
left += verb();
break;
case 'p': return "sentence: " + left + t.value; break;
default: std::cerr << "unknown type/word not found\n"; return left;
}
t = ts.get(); // get the next Token from the Token stream
}
}
std::string conjunct()
{
Token t = ts.get();
cout << "c " << t.value << '\n';
return t.value + ' ';
}
std::string noun()
{
Token t = ts.get();
cout << "n " << t.value << '\n';
return t.value + ' ';
}
std::string verb()
{
Token t = ts.get();
cout << "v " << t.value << '\n';
return t.value + ' ';
}
|
build & run:
g++ -std=c++20 -O2 -Wall -pedantic ./ch_06/main_p193.cpp && ./a.out