| | 207 | //Under Construction -- need to enumerate the options and check what the spreadsheet |
| | 208 | importer is doing.// |
| | 209 | |
| | 210 | There are two main categories of representation: |
| | 211 | |
| | 212 | - Formatting, such as which of the file layouts is used, what the separator character |
| | 213 | is, how the text is escaped, which cells are structured... This is the "parsing" |
| | 214 | aspect of the representation. |
| | 215 | |
| | 216 | - The actual mapping of the source schema to our schema, that is, once we have their |
| | 217 | structured objects read in, how do we create our objects out of theirs? |
| | 218 | |
| | 219 | We should distinguish between the external specification that a user would submit |
| | 220 | with their files, or produce via a UI, from the importer's internal representation. |
| | 221 | We want the external specification to be easy for a person to construct rather than |
| | 222 | easy for the importer to use. The importer can produce from that an internal |
| | 223 | representation that is convenient for running the data conversion. |
| | 224 | |
| 209 | | - If the data uses a format we specify, we don't need a schema mapping -- we just need |
| 210 | | to be told it's our formatting. |
| 211 | | |
| 212 | | - If the source has a schema that does not match ours, a means of mapping from the |
| 213 | | source's schema to ours will be needed. |
| 214 | | For an existing major source, it is likely that we would write the schema mapping. |
| 215 | | (But for a source we draw on regularly, there may be better means of pulling data |
| 216 | | than CSV files...) |
| | 227 | - The file format (the options described above) seems to be largely independent of the |
| | 228 | schema mapping. Let's try specifying them separately. |
| | 229 | |
| | 230 | - If the data uses a format and schema we specify, we don't need a format or mapping |
| | 231 | supplied -- we just need to be told it's our native format and schema. |
| | 232 | |
| | 233 | - For an existing major source, it is likely that we would write the schema mapping. |
| | 234 | But for a source we draw on regularly, there may be better means of pulling data |
| | 235 | than CSV files... |
| 234 | | - In any case, by the time the back end is called, we should have a schema mapping. |
| 235 | | |
| 236 | | ==== Options for format and schema mapping representations: ==== |
| 237 | | |
| 238 | | //Under Construction -- need to enumerate the options and check what the spreadsheet |
| 239 | | importer is doing.// |
| 240 | | |
| 241 | | We want the representation to be easy for a person to construct rather than easy for |
| 242 | | the importer to use. The importer can always produce an internal representation that |
| 243 | | is convenient for running the data conversion. |
| 244 | | |
| 245 | | There are two main categories of representation: |
| 246 | | |
| 247 | | - Formatting, such as which of the file layouts is used, what the separator character |
| 248 | | is, how the text is escaped, which cells are structured... This is the "parsing" |
| 249 | | aspect of the representation. |
| 250 | | |
| 251 | | - The actual mapping of the source schema to our schema, that is, once we have their |
| 252 | | structured objects read in, how do we create our objects out of theirs? |
| | 254 | - If the user specification and internal specification differ, the conversion can be |
| | 255 | done as a preliminary step. For prominent sources, we might save either or both of |
| | 256 | the user and internal representations. The user and internal specification may |
| | 257 | change due to either a change in the source schema for the file format they use, |
| | 258 | or to a change in our schema. |
| | 259 | |
| | 260 | ==== File format specification: ==== |
| | 261 | |
| | 262 | ==== Schema mapping specification: ==== |